36C3 preroll music Herald: Please put your hands together and give a warm round of applause to Will Scott. Applause Will Scott: Thank you. Applause Will: All right. Welcome. So. The basic structure of this talk is sort of twofold. The first thing is to provide an overview of the different mechanisms that exist in this space of secure communication and try to tease apart a bunch of the individual choices and tradeoffs that have to be made and the implications of them. Because a lot of times we talk about security or privacy as very broad terms that cover a bunch of individual things. And breaking that down gives us a better way to understand what it is we're giving up or whether or why these decisions actually get made for the systems that we end up using. And the way that it's going to sort of the arc that I'll cover is first trying to provide a sort of taxonomy or classification of a bunch of the different systems that we see around us. And from there identify the threats that we often are trying to protect against and the mechanisms that we have to mitigate those threats and then go into some of these mechanisms and look at what's happening right now on different systems. And by the end, we'll sort of be closer to the research frontier of what is still happening, where are places where we have new ideas, but there's still quite a high tradeoff to usability or for other reasons where these haven't gained mass adoption. So I'll introduce our actors: Alice and Bob. The basic structure for pretty much all of this is one to one messaging. So this is primarily systems that are enabling us to have a conversation that looks a lot like what we would have in person. That's sort of the thing that we're modelling is I want to have a somewhat synchronous real time communication over a span of weeks, months, years, resume it, and in the same way that in real life I know someone and I recognize them when I come and talk to them again I expect the system to give me similar sorts of properties. So the way we're going to then think about systems is initially, we have systems that look very much the same as how we would have a real life communication, where I can - on a local network - use AirDrop or use a bunch of things that just work directly between my device and a friend's device to communicate. On a computer, this might look like using Netcat or a command line tool to just push data directly to the other person. And this actually results in a form of communication that looks very similar. Right, it's ephemeral, it goes away afterwards unless the other person saves it. But there is already a set of adversaries or threats that we can think about how do we secure this sort of communication? One of those would be the network. So, can someone else see this communication and how do we hide from that? And we have mechanisms against that, namely encryption. Right, I can disguise my communication and encrypt it so that someone who is not my intended recipient cannot see what's happening. And then the other one would be the other...these end devices themselves. Right, so there's a couple of things that we need to think about when we think about what is it that we're trying to protect against on an end device. One is there might be other bad software that either, later gets installed and tries to steal or learn about what was said. Either, either at the same time or afterwards. And so we have mechanisms there. One of them would be message expiry. So we can make the messages go away, make sure we delete them from disk at some point. And the other would be making sure that we've sort of isolated our chats so that it doesn't overlap and other applications can't see what's happening there. So, we have these direct communication patterns but that's a small minority of most of what we think of when we chat. Instead, most of the systems that we're using online use a centralized server. There's some logically centralized thing in the cloud and I send my messages there and it then forwards them to my intended recipient. And so whether it's Facebook or WhatsApp or Signal or sorry, Slack or IRC or Signal or Wire or Threema or whatever, you know, cloud chat app we're using today, this same model applies. So we can identify additional threats here and then we can think about why we do this. So one threat is the network. And I'll tear that apart a little bit. You've got the local network that we had before. So someone who's on the network near the person who's sending messages or receiving messages, so someone else in the coffee shop, your local organization, your school, your work, you've got the Internet as a whole that messages are passing over. So the ISPs or the countries that you're in may want to look at or prevent you from sending messages. You've also got an adversary in the network, sort of local or near the server that can see most of the messages going in and out of the server because these services have to exist somewhere be that in a data center that they physically have computers in or in AWS or Google or one of these other clouds. And now you've got a set of actors that you need to think about that are near the server that can see most of the traffic going in and out of that server. We also have to think about the server itself as a potential adversary. There's a few different threats that we need to think about. The server could get threatened... could get hacked or otherwise compromised. So parts of the communication or bugs in the software can potentially be a problem. You've got a legal entity typically that is running this server. And so the jurisdiction that it's in can send requests to get data from users or to compel it to provide information. So there's this whole threat of what is the server required to turn over. And then you've got sort of how is the server actually or this company making money and sustaining itself. Is it going to get acquired by someone that you don't trust, even if you trust it now? So there's this future view of how do we ensure that the messages I have now don't get misused in the future? And we have a set of techniques that mitigate these problems as well. So one of them would be we can use traffic obfuscation or circumvention techniques to make our traffic look less obvious to the network. And that prevents a large amount of these. And then, I'm calling this server hardening but it's really a sort of a broad set of techniques around how do we trust the server less? And how do we make those potential compromises of the server, either code based or it having to reveal information less damaging? It's worth saying that there are a bunch of reasons why we have primarily used centralized messaging. You've got availability. It's very easy to go to a single place and it also makes a bunch of problems like handling multiple devices and mobile push in particular, because both Google and Apple expect or allocate sort of a single authorized provider who can send notifications to the app user's mobile devices. And so that sort of requires you to have a centralized place that knows when to send those messages if you want to provide real time alerts to your application users. The cons is that it is both cost, there's some entity now that is responsible for all of this cost and has to have a business model and also that there is a single entity that people can come to and that now faces the legal and regulatory issues. So this is not the only type of system we have, right? The next most common is probably federated. E-mail is a great example of this. An email is nice that now as a user I can choose an email provider that I trust out of many, or if I don't trust any of the ones that I see, I can even spin up my own with a small group so we can decentralize cost. We can make this more approachable. And so while I can gain more confidence in my individual provider, I don't have as much trust in, you know, is the recipient, is Bob in this case, I don't know how secure his connection is to his provider. Because we've separated and decentralized that. There's also a bunch of problems, both in figuring out identity and discovery securely and mobile push. But we have a number of successful examples of this. So beyond email, the Fediverse and Mastodon, Riot chat and even SMS are examples of federated systems where there's a bunch of providers and it's not a single central place. As you continue this sort of metaphor of splitting apart and decentralizing and reducing the trust in a single party, you end up with a set of decentralized messaging systems as well. And so it's worth mentioning that as we sort of get onto this fringe. There's sort of two types: One is using Gossip protocols. So things like Secure Scuttlebutt. And in those you connect to either the people around you or people that you know. And when you get messages, you gossip, you send them on to all of the people around you. And so messages spread through the network. That is still an area where we are learning the tradeoff of how much metadata gets leaked and things, but is nice in its level of decentralization. The others basically tried to make all of the users have some relatively low trusted participation in the serving infrastructure. And so you can think of this as evolving out of things like distributed hash tables that that are used in BitTorrent. You see something very similar in in things like ricochet or tox.chat, which will use either tor like relays for sending messages or have an explicit DHT for routing where all of the members provide some amount of lookup to help with discovery and finding other participants. OK, so let's now turn to some of these mechanisms that we've uncovered and we can start with encryption. So when you're sending messages to a server by default, there's no encryption. This is things like IRC. Email used to be primarily unencrypted and you can think of that like a postcard. So you've got a letter or a postcard in this case that you're sending. It has where that message is coming from, where it's going to and the contents. In contrast, when you use transport encryption -- and so this is now a standard for most of the centralized things. What that means is you're taking that postcard and you're putting it in an envelope that the network can't open. And that's what TLS and other forms of transport encryption are going to give you, is the network link just sees the source and destination. It sees there's a message coming between Alice and Facebook or whatever cloud provider, but can't look into that and see that that's really a message for Bob or what's being said. It just sees individuals communicating with that cloud provider. And so, you know, SMTPS, there are secure versions of IRC and e-mail and most other protocols are using transport security at this point. The thing that we have now is called end-to-end encryption or E2E, and so now the difference here is the message that Alice is sending is addressed to Bob. And it's encrypted so that the provider Facebook can't open that either and can't look at the contents. OK? So the network just sees a message going between Alice and Facebook still, but Facebook can't open that and actually see the contents of the message. And so end-to-end encryption has gained pretty widespread adoption. We have this in Signal, for the most part in iMessage, we have tools like PGP and GPG that are implementing forms of this. For messaging there's a few that are worth sort of covering in the space: the Signal protocol, which was initially called axolotl, is adopted in WhatsApp, in Facebook private messaging and sort of is... I guess it has generalized into something called the noise framework and is gaining a lot of adoption. OMEMO looks a lot like that specifically for XMPP, and so it is a specific implementation. The other one is called Off-The-Record or OTR and Off-The-Record sort of developed a little bit ... or independently from this, thinks a lot about deniability. I'm not going to go too deep into the specific nits of what these protocols are doing, but I guess the intuition is the hard parts here is not encrypting a message, but rather the hard parts is how do you send that first message and establish a session, especially if the other person is offline. So I want to start a communication. I type in the first message I'm sending to someone. I need to somehow get a key and then send a message that only that person can read and also establish this sort of shared secret. And doing all of that in one message or with the other device not online ends up being tricky. Additionally, figuring out the mapping between a user and their devices, especially as that changes and making sure you've appropriately revoked devices, added new devices without keys falling over or getting too many warnings to the error ehm too many warnings to the user ends up being a lot of the trick in these systems. There's two problems that sort of come into play when we start using an end. One is we need to think about connection establishment. So, so this is the problem of saying who is Bob? So, so I find a contact and I know them in some way by an email address, by a phone number. Signal uses phone numbers. You know, a lot of systems maybe use an email address. There's things like Threema that use a unique identifier that they generate for you. But somehow I have to go from that identifier to some actual key or some knowledge of of a cryptographic secret that identifies the other person. And I have figure out who I trust to do that mapping of of gaining this thing that I'm now using for encryption. And then also there's this "Well, how do we match?" So a lot of systems do this by uploading your address book or trying to match with existing contacts to solve the user- interface problem of discovery, which is: If they can already know the identifiers and have this mapping, then when someone new comes in they can suggest and have "prefound" these keys and you just sort of trust the server to hold this address book and to do this mapping between what they're using as their identifier and and the keys themselves that you're getting out. Signal is nice here, it says it's not uploading your contacts, which is true. They're uploading hashes of your phone number rather than the actual phone numbers. But but it's a similar thing. They've got a directory of known phone numbers. And then as people search, you'll search for a hash of the phone number and get back, you know, the key that you hope signal has correctly given you. So there's sort of a couple of ways that you reduce your trust here. Signal has been going down a path using SGX to raise the cost of attacks, oblivious RAM and a bunch of sort of systems mechanisms to reduce the costs... or increase the cost of attack against their discovery mechanism. The other way that you do this is you allow for people to use pseudonyms or anonymous identifiers. So wire you can just register on an anonymous email address. And now the cost to you is potentially less if that gets compromised. And it's worth noting Moxie will be talking tomorrow at 4:00 p.m. about the evolution of the space around Signal, so there's probably a bunch more depth there that you can expect. So what if we don't want to trust the server to do matchmaking? One of the early things that has been around is the web of trust around GPG. And this is the notion that. I, if I have in real life or otherwise associated an identifier with a key, I can publicly provide a signed statement saying that I trust that mapping and then people who don't know someone but have a link socially maybe can find these proofs and use that to trust this mapping. So I know an identifier and I know that I trust someone who has said, well, this is the key associate with that identifier and I can use that network to eventually find an identifier that that I'm willing to trust or a key that I'm willing to encrypt to. There's some user interface tradeoff here. This is a manual process in general. And this year we've had a set of denial-of- service attacks on the web-of-trust infrastructure. And so the the specific attack is that anyone can upload these attestations or trust, and so if a bunch of random users or sybils start uploading trusts, when you go to try and download this, you end up overwhelmed by the amount of information. And so the system does not scale because it's very hard to filter to people you care about without telling the system who you care about and revealing your network, which you're trying to avoid. Keybase takes another approach. They made the observation that when I go to try and talk to someone, what I actually care about is the person that I believe owns a specific GitHub or Twitter or other social profile. And so I can provide an attestation where I say: "Well, this is a key that's associated with the account that controls this Twitter account or this Reddit account or this, you know, Facebook account." And so by having that trust of proofs, I can connect an individual and a cryptographic identity with the person behind who has the passwords to a set of other systems. Keybase also this year began to provide a monetary incentive for users and then struggled with the number of sign ups. And so there's a lot of work in figuring out: "OK, do these identities actually correspond to real people and how do you prevent a similar denial-of-service--style attack that the web of trust faced in identifying things here?" On our devices, we end up in general resorting to a concept called tofu or Trust-On-First-Use, and what that means is when I first see a key that identifies someone, I'll save that. And if I ever get another need to communicate with that person again, I've already got a key and I can keep using that same key and expect that key to stay the same. And so that that continuation and the ability to pin keys once you've seen them means that if when you first establish a connection with someone, it's the real person, then someone who compromises them later can't take over or change that. Finally, one of the sort of exciting things that came out - this is circa 2015 and is largely defunct now - was a system by Adam Langley called Pond that looked at hardening a modern version of email. And one of the things that Pond did was it had something called a password authenticated key exchange. And so this is an evolving cryptographic area where you're saying if two people can start with some weak shared secret - So I can perhaps publicly or in plain text ask the challenge, the other person: "Where were we at a specific day?" And so now we both know something that maybe has a few bits of entropy, at least. If we can write the same textual answer, we can take that, run a key derivation function to end up with a larger amount of shared entropy and use that as a bootstrapping method to do a key exchange and end up finding a strong cryptographic identity for the other person. So Pond has a system that they call Panda for linking to individuals based on a challenge response and this is also something that you'll find in off- the-record systems around Jabber. The other thing that we need to be careful about in end-to-end--encrypted systems is deniability. When I'm chatting one on one with someone, that conversation is eventually fairly deniable. Either a person can have their recollection of what happened and there's no proof that the other person said something unless you've recorded it or otherwise, you know, brought some other technology into play. But with an encrypted thing where I've authenticated the other person, I end up with a transcript - potentially - that, you know, I can turn over later and say, look, this person said this. And and, you know, we've seen recently that things like emails that come out are authenticated in this way. The DKIM system that authenticates email senders showed up in the WikiLeak's releases of Hillary Clinton's emails and was able to say: "Look the text in these hasn't been changed." And it was signed by the real server that we would expect. So the thing that we get from Off-The-Record and the Signal protocol is something called deniability or reputability. And this plays into a concept of a forward secrecy, which is: We're going to sort of throw away stuff afterwards in a way that our chat goes back to being more ephemeral. And so we can think about this in two ways. There's actually two properties that interlink in this: We have keys that we're using to form our shared session that we're expecting to use to have our secret message. And each time I send a message, I'm going to also provide some new key material and begin changing that secret key that we're using. So I provide a next key. And when Bob replies, he's going to now use my next key as part of that and give me his next key. And the other thing that I can then do is when I send a message, I can provide the secret bit of my previous key. So I can say: "My last private key that I used to send you that previous message was this." And now at the end of our conversation, we both know all of the private keys such that we both could have created that whole conversation on our own computer. At any given time, it's only the most recent message that is that only could have been sent by the other person and the rest of the transcript that you have is something you could have generated yourself. There is a talk on day three about Off-The-Record v4, the fourth version of that, that will go deeper into that, that's at 9:00 p.m. in the about:freedom assembly. So I encourage you to do that if you're interested in this. OK. The next one to talk about is expiry. This is sort of a follow on to this concept of forward secrecy. But there's sort of two attacks here to consider. One is something that we should maybe, I guess, give credit to Snapchat for popularizing, which is this concept of "the message goes away after some amount of time". And really, this is protecting against not fully trusting the other person from like sharing it later or sharing in a way you didn't attend ehm intent. And this is also like a snapshot adversary. So a bunch of apps will alert the other participant if you take a screenshot. This is why some apps will blank the screen when they go to the task switcher. So if you're swapping between apps, you'll see that some of your applications will just show a blank screen or will not show contents. And that's because the mobile operating systems APIs don't tell them when you're in that mode when you take a screenshot and so they want to just be able to notify you if the other person does. It's worth noting that this is all just raising the cost of these attacks and providing sort of a social incentive not to, right. I can still use another camera to take a picture of my phone and get evidence of something that has been said. But it's discouraging it and setting social norms. The other reason for expiry is: After the fact, a compromise of a device, so whether that's - you know, someone gets hold of the device and tries to do forensic analysis to pull off previous messages or the chat database or whether someone tries to install an application that then scans through your phone... So that's Fengcai is a application that's been installed as a surveillance app in China. And this also boils down to a user interface and user experience question, which is how longer you're going to save logs, how much history are you going to save and what norms are you going to have? And there's there's a tradeoff here. It's useful sometimes to scroll back. And especially for companies that believe that they have value added services around being able to do data analytics on your chat history. They're wary of getting rid of that. The next thing that we have is isolation and OS sand boxing. Right. So this is a lot of this is up one layer, which is what is the operating system doing to secure your application, your chat system from the other things, the malware or the compromises of the the broader device that it's running on. We have a bunch of projects around us at Congress that are innovating on this. There are chat systems that also attempt to do this sort of on their own. One sort of extreme example is called tinfoil chat, which makes use of three devices and a physical diode which is designed to have one device that is sending messages and another device that is receiving messages. And the thought is: if you receive a message that somehow compromises the device, the malware or the malicious file can never get any communication back out and so becomes much less valuable to have compromised. And they implement this with like a physical hardware diode. The other side of this is recovery and backups. Which is you've got a user experience tradeoff between a lot of people losing their devices and wanting to get back their contact list or their chat history and the fact that now you're keeping this extra copy and have this additional place for things to get compromised. Apple has done a lot of work here that we don't look out so much. They gave a blackout talk a few years ago where they discuss how they use custom hardware security modules in their data centers, much like the T2 chip. In the end, devices that will hold the backup keys that get used for their iclub backups and do similar amounts of rate limiting. And they consider a set of - a pretty wide set of adversaries - more than we might expect. So including things like what happens when the government comes and asks us to write new software to compromise this? And so they set up their HSMs such that they cannot provide software updates to them, which is, you know, a sort of a step of how do you do this cloud security side that we don't think about as much. So there's a set of slides that you can find from from this. And these slides will be online, too, as a pointer to to look at their solution, which considers a large number of adversaries that you might not have thought about. So traffic obfuscation is primarily a network side adversary. The technique that is getting used as sort of what people are using if they feel they need to do this, is something called domain fronting and domain fronting, had its heyday maybe in 2014 ish and has become somewhat less effective, but it's still effective enough for most of the chat things. The basic idea behind domain fronting is that there's a separation of layers behind that envelope and the message inside of it that we get in HTTP in the Web. So when I create a secure connection to a CDN to a content provider like Amazon or Google or Microsoft, I can make that connection and do perform the security layer and provide a fairly generic service that I'm connecting to. I just want to establish a secure connection to CloudFlare. And then once I've done that, the message that I can send inside can be a chat message to a specific customer of that CDN or that cloud provider. And so this is an effective way to prevent the network from knowing what specific service you're accessing. It got used for a bunch of circumvention things. It then got used for a bunch of malware things and this caused a bunch of the cloud providers to stop allowing you to do this. But it's still getting used. This is still what sort of happening when you turn on certain censorship circumvention in signal, it's what telegram is using for the most part. And it's the same basic technique is getting another revival with DNS over HTTPS and encrypted SNI extensions to TLS which allow for a standardized approach to establish a connection to a service without providing any specific identifiers to the network for which service you want to connect to. It's worth sort of mentioning that probably the most active chat service for this sort of obfuscation or circumvention is telegram, which has a bunch of users in countries that are not fans of having lots of users of telegram. And so they have both systems where they can bounce between IPs very quickly and change where their servers appear to be. And they've also used techniques like sending messages over DNS tunnels to mitigate some of these censorship things From the provider's perspectives this is really accessing their user population. They're not really thinking about your local network or caring about that as much as as much as they are like, oh, there's millions of users that should probably still have access to us. So we can maybe hide the characteristics of traffic in terms of what specific service we're connecting. There's some other things about traffic, though, that also are revealing to the network. And this is sort of this additional metadata that we need to think about. So one of these is padding or the size of messages can be revealing. So one sort of immediate thing is the size of a chat or a text message is going to be very different from the size of an image or voice or movies. And you see this on airplanes or in other bandwidth limited settings: they might allow text messages to go through, but images won't. There's been research that shows, for instance, on voice, even if I encrypt my voice, we've actually gotten really good at compressing audio of human speech. So much so that different phonemes, different sounds that we make take up different sizes. And so I can say something, compress it, encrypt it and then recover what was said based on the relative sizes of different sounds. So there was there was a paper in 2011 that Oakland S&P that demonstrated this potential for attacks. And so what this is telling us perhaps is that there's a tradeoff between how efficiently I want to send things and how much metadata or revealing information for distinguishing them I'm giving up. So I can use a less efficient compression that's constant bit rate or that otherwise is not revealing this information, but it has higher overhead and won't work as well in constrained network environments. The other place this shows up is just when people are active. So if I can look at when someone is tweeting or when messages are sent, I can probably figure out pretty quickly what timezone they're in. Right. And so this leads to a whole set of these metadata based attacks. And in particular, there's confirmation attacks and intersection attacks. And so intersection attacks is looking at the relative activity of multiple people and trying to figure out: OK, when Alice sent a message, who else was online or active at the same time? And over time, can I narrow down or filter to specific people that were likely who Alice was talking to? Pond also is a service to look at or a system to look at in this regard. Their approach was that a client would hopefully be always be online and would at a regular pattern check in with the server with the same amount of data, regardless of whether there was a real message to send or not. So that from the network's perspective, every user looked the same. The downside being that you've now got this message being sent by every client every minute or so and that creates a huge amount of overhead of, you know, just padded data that doesn't have any meaning. So finally, I'll take a look at server hardening and the things that we're doing to reduce trust in the server. There's a few examples of why we would want to do this. So one is that you've had messaging servers, plenty of times, that have not been as secure as they claim. One example being that there was a period where the Skype subsidiary in China was using a blacklist of keywords on the server to either prevent or intercept some subset of their users messages without telling anyone that they were doing that. And then also just sort of this uncertain future of, OK, I trust the data now, but what can we do so that I don't worry about what the corporate future of this service entails for my data. One of the sort of elephants in the room is: the software development is probably pretty centralized. So even if I don't trust the server, there's some pretty small number of developers who are writing the code. And how do I trust that the updates that they are making to this, either the server or to my client that they pushed my client isn't reducing my security. Open source is a great start to mitigating that, but it's certainly not solving all of this. So one thing, one way we can think about how we reduce trust in the server is by looking at what the server knows after end to end encryption. It knows things about the size. It knows where the message is coming from. It knows where the message is going to. Size: we've talked about some of these padding things that we can use to mitigate. So how do we reduce the amount of information about sources and destinations in this network graph that the server knows? So this is a concept called linkability, which is being able to link the source and destination of a message. We start to see some mitigations or approaches to reducing linkability entering mainstream systems. So Signal has a system called "Sealed Sender" that you can enable, where the source of the message goes within the encrypted envelope. So that Signal doesn't see that. The downside being that Signal is still seeing your IP address but the thought is that they will throw those out relatively quickly and so they will have less logs about this source to destination. Theoretically, though, there is a bunch of work in this. The first thing I'll point to is a set of systems that we classify as mixnets. A mixnet works by having a set of providers rather than a single entity that's running the servers. A bunch of users will send messages to the first provider, which will shuffle all of them and send them to the next provider, which will shuffle them again and send them to a final provider that will shuffle them and then be able to send them to destinations. And this de-links. Where none of the individual providers know both the source and destination of these messages. So this looks maybe a bit like Tors onion routing, but differs in in sort of a couple of technicalities. One is typically, you will wait for some number of messages rather than just going through with bandwidth and low latency. And so by doing that, you can get a theoretical guarantee that this batch had at least n messages that got shuffled and therefore you can prevent there being some time where only one user was using the system. And so you got a stronger theoretic guarantee. There's an active project making a messaging system using mixnets called Katzenpost. They gave a talk at Camp this summer and I'd encourage you to look at their website or go back to that talk to learn more about mixnets. The project that I was, I guess, tangentially helping with is in a space called private information retrieval, which is another technique for doing this delinking. Private information retrieval frames the question a little bit differently. And what it asks is: if I have a server that has a database of messages and I want a client to be able to retrieve one of those messages without the server knowing which message the client got or asked for. So this sounds maybe hard. I can give you a straw man to convince yourself that this is doable and the straw man is: I can ask the server for its entire database and then take the message that I want and the server hasn't learned anything about which message I cared about. But I spent a lot of network bandwidth probably doing that. So there's a couple of constructions for this. I'm going to focus on the information theoretic private information retrieval. And so we're going to use a similar setup to what we had in our threat model for a mixed net, which is we've got a set of providers now that have the same database. And I'm going to assume that they're not all talking to each other or colluding. So I just need at least one of them, to be honest. And one of the things that we'll use here is something called the exclusive or operation. To refresh your memory here exclusive or is a binary bitwise operation. And the nice property that we get is if I xor something with itself, it cancels out. So if I have some piece of data and I xor it against itself, it just goes away. So if I have my systems that have the database, I can ask each one to give me a superposition of some random subset of its database so I can ask the first server, give me items for 11, 14 and 20 xor together. I'm assuming all of the items are the same size so that you can do these xors. And then if I structure that, it can appear to each server independently or as in the request that it sees that I just ask for some random subset. But I can do that so that when I xor the things I get back, everything just cancels out except the item that I care about. Unless you saw all of the requests that I made, you wouldn't be able to tell which item I cared about. So by doing this, I've reduced the network bandwidth. I'm only getting one item of size back from every server. Now, you might you might have a concern that I'm asking the server to do a whole lot of work here. It has to look through its entire database and compute this superposition thing. And that seems potentially like a lot of work, right. The thing that I think is exciting about this space is it turns out this sort of operation of going out to a large database and like searching for all of the things and then coming back with a small amount of data looks a lot like the hardware that we're building for A.I. and for a bunch of these sorts of search like things. And so this runs really quite well on a GPU where I can have all of those thousands of cores compute little small parts of the XOR and then pull back this relatively small amount of information. And so with GPUs, you can actually have databases of gigabytes, tens of gigabytes of data and compute these XORs across all of it in order of a millisecond or less. So a couple of things in this space. "Talek" is the system that I helped with that demonstrates this working. The converse problem is called private information storage. And that one is how do I write an item into a database without the database knowing which item I wrote, the mathematical construction there is not quite as simple to explain. But there's a pretty cool new work in the last month or two out of Dan Boneh and Henry Corrigan- Gibbs at Stanford called Express and Saba as first author that is showing how to fairly practically perform that operation. I'll finish just with a couple minutes on multiparty chat or group chat, so small groups. You've sort of got a choice here in terms of how assisted chat systems are implementing group chat. One is you can not tell the server about the group. And as someone who is part of the group, I just send the same message to everyone in the group. And maybe I can tag it for them so that they know it's part of the group or you do something more efficient where you tell the server about group membership and I send the message once to the server and it sends it to everyone in the group. Even if you don't tell the server about it, though, you've got a bunch of things to worry about leaked correlation, which is: if at a single time someone sends the same sized message to five other people and then later someone else sends the same sized message to five other people, and those basically overlap, someone in the network basically knows who the group membership is. So it's actually quite difficult to conceal group membership. The other thing that breaks down is our concept of deniability once again, which is now if multiple people have this log. Even if both of them individually could have written it, the fact that they have the same cryptographic keys from this other third party probably means that third party made that message. So there continues to be work here. Signal is working on providing again and SGX and centralized construction for grid management to be able to scale better, given I think the pretty realistic fact that the server in these cases is probably going to be able to figure out group membership in some case, you might as well make it scale. On the other side, one of the cool systems that's being prototyped is called "cwtch" out of open privacy. And this is an extension to ricochet that allows for offline messages and small group chats. It works for order of 5 to 20 people, and it works by having a server that obliviously forwards on messages to everyone connected to it. So when I send a message to a group, the server sends the message to everyone it knows about, not just the people in the group, and therefore the server doesn't actually know the subgroups that exist. It just knows who's connected to it. And that's a neat way. It doesn't necessarily scale to large groups, but it allows for some concealing of group membership. They've got an Android prototype as well that's sort of a nice extension to make this usable. Wonderful. I guess the final thought here is: there's a lot of systems, I'm sure I haven't mentioned all of them. But this community is really closely tied to the innovations that are happening in the space of private chat. And this is the infrastructure that supports communities and is some of the most meaningful stuff you can possibly work on. And I encourage you to find new ones and look at a bunch of them and think about the tradeoffs and encourage friends to play with new systems, because that's how they gain adoption and how people figure out what mechanisms do and don't work. So with that, I will take questions. Applause Herald: Wasn't necessary to encourage you to come with an applause. There are microphones that are numbered in the room, so if you start lining up behind the microphones, then we can take your questions. We already have a question from the Internet. Question: Popularity and independency are a contradiction. How can I be sure that an increasingly popular messenger like Signal stays independent? Answer: I guess I would question whether independence is a goal in and of itself. It's true that the value is increasing. And so one of the things I think about is, is using systems that have open protocols or that are federated or otherwise not centralized. And again, this is reducing that need to have confidence in the future business model of single legal entity. But I don't know if independence is of the company is the thing that you're trying to trade off with popularity. Herald: Well, and we have questions at the microphones. We'll start a microphone, number one. Question: Thanks for the talk. First of all, we talked to you talked a lot about content and encryption. What about the initial problem? History shows that if I'm an individual already observed in a sensitive area, that might no need to encrypt or decrypt the message on sending. It's already identified. I'm sending at a specific location at a specific time. Is there any chance to hide that or do something against it? Answer: So make things hidden again after the fact? That seems very hard. I mean, so. So there's a couple thoughts there, maybe. There's sort of this real world intersection attack, which is if there's a real world observable action of who actually shows up at the protest, that's a pretty good way to figure out who is chatting about the protests beforehand, potentially. And so, I mean, I think what we've seen in real world organizing is things like either really decentralizing that, where it happens across a lot of platforms, and happens very spontaneously close to the event. So there's not enough time to respond in advance or using or hiding your presence or otherwise trying to stagger your actual actions so that they are harder to correlate to a specific group. But it's not something the chat systems are talking about. I don't think. Herald: We have time for more questions. So please line up in the microphones and if you're leaving, then leave quietly. We have a question from microphone number 4. Question: So if network actress translation is the original sin to the end to end principle, and due to that, we now have to run servers, someone has to pay for it. Do you know any solution to that economic problem? Answer: I mean, we had to pay for things even without network address translation, but we could move more of that cost to end users. And so we have another opportunity with IP v six to potentially keep more of the cost with end users or develop protocols that are more decentralized where that cost stays more fairly distributed. You know, our phones have a huge amount of computation power and figuring out how we make our protocols so that work happens there is, I think, an ongoing balance. I think some of the reasons why network address translation or centralization is so common is because distribute systems are pretty hard to build and pretty hard to gain confidence in. So more tools around how we can test and feel like we understand and that the system actually is, you know, going to work 99.9% of the time for distributed systems is going to make people less wary of working with them. So better tools on distribute systems is maybe the best answer. Herald: We also have another question from the internet, which we'll take now. Question: What do you think of technical novices, acceptance and dealing with OTR keys, for example, Matrix Riot? Most people I know just click "I verified this key" even if they didn't. Anwer: Absolutely. So this, I think goes back to a lot of these problems are sort of a user experience tradeoff, which is, you know, we saw initial versions of Signal where you would actually try and regularly verify some QR code between each and then that sort of has gotten pushed back to a harder to access part of the user interface because not many people wanted to deal with that. And an early matrix riot you would get a lot of warnings about: There's a new device. Do you want to verify this new device? Do you only want to send to the previous devices that you trusted. And now you're getting the ability to sort of more automatically just sort of accept these changes and you're weakening some amount of the encryption security, but you're getting a better, smoother user interface because most users are just going to sort of click "yes" because they want to send the message. Right. And so there's this tradeoff: when you have built the protocols such that you are standing in the way of the person doing what they want to do. That's not really where you want to put that friction. So figuring out other ways where you can have this on the side or supporting the communication rather than hindering it is probably the types of user interfaces or systems that we should be thinking about that can be successful. Herald: We have a couple of more questions. We'll start at microphone number 3. Question: Thank you for your talk. You talked about deniability by sending the private key with the last message. And how I you get the private key for the last message in the whole conversation Anwer: In the OTR, XMPP, Jabber systems there would be an explicit action to end the conversation that would then make it repudiateable that would that would send that final message to to close it. What you have in things like Signal is it's actually happening every message as part of the confirmation of the message. Question: OK. Thank you. Herald: We still probably have questions , time for more questions. So please line up if you have any. Don't hold back. We have a question from microphone number 7. Question: So, first of all, a brief comment. The riot thing still doesn't even do tofu. They they haven't figured this out. But I think there's a much more subtle conversation that needs to happen around deniability, because most of the time, if you have people with with a power imbalance, the non repudiatable conversation actually benefits the weaker person. So we actually don't want deniability in most of our chat applications or whatever, except that's still more subtle than that, because when you have people with equal power, maybe you do. It's kind of weird. Anwer: Absolutely. And I guess the other part of that is, is that something that should be shown to users and is that a concept? Is there a way that you express that notion in a way that users can understand it and make good choices? Or is it just something that your system makes a choice on for all of your users? Herald: We have one more question. Microphone number seven, please line up if you have any more. We still have a couple of more minutes. Microphone number seven, please. Question: Hi, Thanks for the talk. You talked about the private information retrieval and how that would stop the server from knowing who retrieved the message. But for me, the question is, how do I find out in the first place which message is for me? Because if he, for example, always use message slot 14, then obviously over a conversation, it would again be possible to deanonymize the users in like, OK, they always accessing this one in like all those queries. Answer: Absolutely. So I didn't explain that part. The trick is that between the two people, we will share some secret, which is our conversation secret. And what we will use that conversation secret for is to seed a pseudo random number generator. And so we will be able to generate the same stream of random numbers. And so each next message will go at the place determined by the next item in that random number generator. And so now the person writing can just write out random places as far as the server tells and when it wants to write the next message in this conversation, it'll make sure to write at that next place in its a random number generator for that conversation. There is a paper that will describe a bunch more of that system. But that's the basic sketch. A: Thank you. H: we have a question from the Internet. Question: It seems like identity is the weak point of the new breed of messaging apps. How do we solve this part of Zooko's triangle, the need for identifiers and to find people? Answer: Identity is hard, and I think identity has always been hard and will continue to be hard. Having a variety of ways to be identified, I think remains important and is why there isn't a single winner takes all system that we use for chat. But rather you have a lot of different chat protocols that you use for different circles and different social circles that you find yourself in. And part of that is our desire to not be confined to a single identity, but to be able to have different facets to our personalities. There are systems where you can identify yourself with a unique identifier to each person you talk to rather than having a single identity within the system. So that's something else that Pond would use. Was that the identifier that you gave out to each separate friend was different. And so you would appear as a totally separate user to each of them. It turns out that's at the same time very difficult, because if I post an identifier publicly, suddenly that identifier is now linked to me for everyone who uses that identifier. So you have to give these out privately in a one on one setting, which limits your discoverability. So that that concept of how we deal with identities I think is inherently messy and inherently something that there's not going to be something satisfying that solves. Herald: And that was the final question concluding this talk. Please give a big round of applause for Will Scott. Will: Thank you Postroll music subtitles created by c3subtitles.de in the year 2019. Join, and help us!