WEBVTT 00:00:00.000 --> 00:00:18.702 36C3 preroll music 00:00:18.702 --> 00:00:23.372 Herald: Please put your hands together and give a warm round of applause to Will Scott. 00:00:23.373 --> 00:00:25.045 Applause 00:00:25.045 --> 00:00:25.635 Will Scott: Thank you. 00:00:25.635 --> 00:00:31.510 Applause 00:00:31.510 --> 00:00:42.960 Will: All right. Welcome. So. The basic structure of this talk is sort of twofold. 00:00:42.960 --> 00:00:50.979 The first thing is to provide an overview of the different mechanisms that exist in 00:00:50.979 --> 00:00:58.500 this space of secure communication and try to tease apart a bunch of the individual 00:00:58.500 --> 00:01:03.380 choices and tradeoffs that have to be made and the implications of them. Because a 00:01:03.380 --> 00:01:07.680 lot of times we talk about security or privacy as very broad terms that cover a 00:01:07.680 --> 00:01:12.939 bunch of individual things. And breaking that down gives us a better way to 00:01:12.939 --> 00:01:18.670 understand what it is we're giving up or whether or why these decisions actually 00:01:18.670 --> 00:01:24.079 get made for the systems that we end up using. And the way that it's going to sort 00:01:24.079 --> 00:01:29.509 of the arc that I'll cover is first trying to provide a sort of taxonomy or 00:01:29.509 --> 00:01:34.800 classification of a bunch of the different systems that we see around us. And from 00:01:34.800 --> 00:01:39.810 there identify the threats that we often are trying to protect against and the 00:01:39.810 --> 00:01:44.189 mechanisms that we have to mitigate those threats and then go into some of these 00:01:44.189 --> 00:01:48.599 mechanisms and look at what's happening right now on different systems. And by the 00:01:48.599 --> 00:01:53.420 end, we'll sort of be closer to the research frontier of what is still 00:01:53.420 --> 00:01:59.509 happening, where are places where we have new ideas, but there's still quite a high 00:01:59.509 --> 00:02:06.159 tradeoff to usability or for other reasons where these haven't gained mass adoption. 00:02:06.159 --> 00:02:11.330 So I'll introduce our actors: Alice and Bob. The basic structure for pretty much 00:02:11.330 --> 00:02:17.651 all of this is one to one messaging. So this is primarily systems that are 00:02:17.651 --> 00:02:21.220 enabling us to have a conversation that looks a lot like what we would have in 00:02:21.220 --> 00:02:25.720 person. That's sort of the thing that we're modelling is I want to have a 00:02:25.720 --> 00:02:30.130 somewhat synchronous real time communication over a span of weeks, 00:02:30.130 --> 00:02:35.240 months, years, resume it, and in the same way that in real life I know someone and I 00:02:35.240 --> 00:02:38.500 recognize them when I come and talk to them again I expect the system to give me 00:02:38.500 --> 00:02:41.810 similar sorts of properties. 00:02:41.810 --> 00:02:44.860 So the way we're going to then think about systems is 00:02:44.860 --> 00:02:51.920 initially, we have systems that look very much the same as how we would have a real 00:02:51.920 --> 00:02:59.630 life communication, where I can - on a local network - use AirDrop or use a bunch 00:02:59.630 --> 00:03:04.310 of things that just work directly between my device and a friend's device 00:03:04.310 --> 00:03:06.650 to communicate. 00:03:06.650 --> 00:03:09.870 On a computer, this might look like using Netcat or a command line 00:03:09.870 --> 00:03:14.790 tool to just push data directly to the other person. And this actually results in 00:03:14.790 --> 00:03:18.060 a form of communication that looks very similar. Right, it's ephemeral, it goes 00:03:18.060 --> 00:03:24.450 away afterwards unless the other person saves it. But there is already a set of 00:03:24.450 --> 00:03:27.260 adversaries or threats that we can think about how do we secure this sort of 00:03:27.260 --> 00:03:30.240 communication? 00:03:30.240 --> 00:03:34.590 One of those would be the network. So, can someone else see this 00:03:34.590 --> 00:03:39.320 communication and how do we hide from that? And we have mechanisms against that, 00:03:39.320 --> 00:03:43.910 namely encryption. Right, I can disguise my communication and encrypt it 00:03:43.910 --> 00:03:49.600 so that someone who is not my intended recipient cannot see what's happening. 00:03:49.600 --> 00:03:53.630 And then the other one would be the other...these end devices themselves. 00:03:53.630 --> 00:03:57.570 Right, so there's a couple of things that we need to think about when we think about 00:03:57.570 --> 00:04:00.330 what is it that we're trying to protect against on an end device. One is there 00:04:00.330 --> 00:04:05.682 might be other bad software that either, later gets installed and tries 00:04:05.682 --> 00:04:09.980 to steal or learn about what was said. 00:04:09.980 --> 00:04:12.240 Either, either at the same time or afterwards. 00:04:12.240 --> 00:04:15.460 And so we have mechanisms there. One of them would be message 00:04:15.460 --> 00:04:19.690 expiry. So we can make the messages go away, make sure we delete them from disk 00:04:19.690 --> 00:04:25.460 at some point. And the other would be making sure that we've sort of isolated 00:04:25.460 --> 00:04:28.810 our chats so that it doesn't overlap and other applications can't see what's 00:04:28.810 --> 00:04:32.440 happening there. 00:04:32.440 --> 00:04:36.470 So, we have these direct communication patterns but that's a small 00:04:36.470 --> 00:04:42.801 minority of most of what we think of when we chat. Instead, most of the systems that 00:04:42.801 --> 00:04:48.110 we're using online use a centralized server. There's some logically centralized 00:04:48.110 --> 00:04:52.940 thing in the cloud and I send my messages there and it then forwards them to my 00:04:52.940 --> 00:04:58.880 intended recipient. And so whether it's Facebook or WhatsApp or Signal or sorry, 00:04:58.880 --> 00:05:05.110 Slack or IRC or Signal or Wire or Threema or whatever, you know, cloud chat app 00:05:05.110 --> 00:05:12.740 we're using today, this same model applies. So we can identify additional 00:05:12.740 --> 00:05:19.810 threats here and then we can think about why we do this. So one threat is the 00:05:19.810 --> 00:05:24.150 network. And I'll tear that apart a little bit. You've got the local network that we 00:05:24.150 --> 00:05:28.850 had before. So someone who's on the network near the person who's sending 00:05:28.850 --> 00:05:33.280 messages or receiving messages, so someone else in the coffee shop, your local 00:05:33.280 --> 00:05:38.910 organization, your school, your work, you've got the Internet as a whole that 00:05:38.910 --> 00:05:44.770 messages are passing over. So the ISPs or the countries that you're in may want to 00:05:44.770 --> 00:05:48.840 look at or prevent you from sending messages. You've also got an adversary in 00:05:48.840 --> 00:05:54.040 the network, sort of local or near the server that can see most of the messages 00:05:54.040 --> 00:05:57.680 going in and out of the server because these services have to exist somewhere be 00:05:57.680 --> 00:06:04.200 that in a data center that they physically have computers in or in AWS or Google or 00:06:04.200 --> 00:06:08.530 one of these other clouds. And now you've got a set of actors that you need to think 00:06:08.530 --> 00:06:11.530 about that are near the server that can see most of the traffic going in and out 00:06:11.530 --> 00:06:13.526 of that server. 00:06:14.600 --> 00:06:17.610 We also have to think about the server itself as a potential 00:06:17.610 --> 00:06:21.870 adversary. There's a few different threats that we need to think about. The server 00:06:21.870 --> 00:06:27.060 could get threatened... could get hacked or otherwise compromised. So parts of 00:06:27.060 --> 00:06:32.450 the communication or bugs in the software can potentially be a problem. 00:06:32.450 --> 00:06:34.060 You've got a 00:06:34.060 --> 00:06:39.370 legal entity typically that is running this server. And so the jurisdiction 00:06:39.370 --> 00:06:44.180 that it's in can send requests to get data from users or to compel it to provide 00:06:44.180 --> 00:06:49.050 information. So there's this whole threat of what is the server required to turn 00:06:49.050 --> 00:06:55.210 over. And then you've got sort of how is the server actually or this company making 00:06:55.210 --> 00:06:58.690 money and sustaining itself. Is it going to get acquired by someone that you don't 00:06:58.690 --> 00:07:02.750 trust, even if you trust it now? So there's this future view of how do we 00:07:02.750 --> 00:07:08.809 ensure that the messages I have now don't get misused in the future? 00:07:08.809 --> 00:07:10.339 And we have a 00:07:10.339 --> 00:07:14.370 set of techniques that mitigate these problems as well. So one of them would 00:07:14.370 --> 00:07:18.980 be we can use traffic obfuscation or circumvention techniques to make our 00:07:18.980 --> 00:07:25.919 traffic look less obvious to the network. And that prevents a large amount of these. 00:07:25.919 --> 00:07:29.210 And then, I'm calling this server hardening but it's really a sort of a broad set of 00:07:29.210 --> 00:07:34.130 techniques around how do we trust the server less? And how do we make those 00:07:34.130 --> 00:07:38.930 potential compromises of the server, either code based or it having to reveal 00:07:38.930 --> 00:07:42.940 information less damaging? 00:07:44.080 --> 00:07:46.680 It's worth saying that there are a bunch of reasons 00:07:46.680 --> 00:07:50.760 why we have primarily used centralized messaging. 00:07:50.760 --> 00:07:53.290 You've got availability. It's 00:07:53.290 --> 00:07:58.260 very easy to go to a single place and it also makes a bunch of problems like 00:07:58.260 --> 00:08:04.040 handling multiple devices and mobile push in particular, because both Google and 00:08:04.040 --> 00:08:09.870 Apple expect or allocate sort of a single authorized provider who can send 00:08:09.870 --> 00:08:16.270 notifications to the app user's mobile devices. And so that sort of requires you 00:08:16.270 --> 00:08:20.040 to have a centralized place that knows when to send those messages if you want to 00:08:20.040 --> 00:08:24.440 provide real time alerts to your application users. 00:08:24.440 --> 00:08:25.830 The cons is that it is 00:08:25.830 --> 00:08:32.130 both cost, there's some entity now that is responsible for all of this cost 00:08:32.130 --> 00:08:35.769 and has to have a business model and also that there is a single entity that people 00:08:35.769 --> 00:08:41.170 can come to and that now faces the legal and regulatory issues. 00:08:41.170 --> 00:08:42.020 So this is not the 00:08:42.020 --> 00:08:46.610 only type of system we have, right? The next most common is probably federated. 00:08:46.610 --> 00:08:52.810 E-mail is a great example of this. An email is nice that now as a user I can 00:08:52.810 --> 00:08:58.240 choose an email provider that I trust out of many, or if I don't trust any of the 00:08:58.240 --> 00:09:02.830 ones that I see, I can even spin up my own with a small group so we can decentralize 00:09:02.830 --> 00:09:09.550 cost. We can make this more approachable. And so while I can gain more confidence in 00:09:09.550 --> 00:09:16.040 my individual provider, I don't have as much trust in, you know, is the recipient, 00:09:16.040 --> 00:09:21.520 is Bob in this case, I don't know how secure his connection is to his provider. 00:09:21.520 --> 00:09:26.220 Because we've separated and decentralized that. 00:09:26.220 --> 00:09:27.890 There's also a bunch of problems, 00:09:27.890 --> 00:09:35.160 both in figuring out identity and discovery securely and mobile push. But we 00:09:35.160 --> 00:09:38.530 have a number of successful examples of this. So beyond email, the Fediverse and 00:09:38.530 --> 00:09:43.810 Mastodon, Riot chat and even SMS are examples of federated systems where 00:09:43.810 --> 00:09:50.570 there's a bunch of providers and it's not a single central place. 00:09:50.570 --> 00:09:52.990 As you continue 00:09:52.990 --> 00:09:57.870 this sort of metaphor of splitting apart and decentralizing and reducing the trust 00:09:57.870 --> 00:10:01.510 in a single party, you end up with a set of decentralized messaging systems as 00:10:01.510 --> 00:10:07.420 well. And so it's worth mentioning that as we sort of get onto this fringe. There's 00:10:07.420 --> 00:10:11.430 sort of two types: One is using Gossip protocols. So things like Secure 00:10:11.430 --> 00:10:15.740 Scuttlebutt. And in those you connect to either the people around you or people 00:10:15.740 --> 00:10:20.210 that you know. And when you get messages, you gossip, you send them on to all of 00:10:20.210 --> 00:10:26.550 the people around you. And so messages spread through the network. That is still 00:10:26.550 --> 00:10:33.050 an area where we are learning the tradeoff of how much metadata gets leaked and 00:10:33.050 --> 00:10:41.450 things, but is nice in its level of decentralization. The others basically 00:10:41.450 --> 00:10:47.610 tried to make all of the users have some relatively low trusted participation in 00:10:47.610 --> 00:10:52.390 the serving infrastructure. And so you can think of this as evolving out of things 00:10:52.390 --> 00:10:57.610 like distributed hash tables that that are used in BitTorrent. You see something very 00:10:57.610 --> 00:11:05.760 similar in in things like ricochet or tox.chat, which will use either tor like 00:11:05.760 --> 00:11:10.740 relays for sending messages or have an explicit DHT for routing where all of the 00:11:10.740 --> 00:11:15.480 members provide some amount of lookup to help with discovery 00:11:15.480 --> 00:11:18.424 and finding other participants. 00:11:19.810 --> 00:11:24.870 OK, so let's now turn to some of these mechanisms that we've 00:11:24.870 --> 00:11:31.160 uncovered and we can start with encryption. So when you're sending 00:11:31.160 --> 00:11:37.030 messages to a server by default, there's no encryption. This is things like IRC. 00:11:37.030 --> 00:11:43.300 Email used to be primarily unencrypted and you can think of that like a postcard. So 00:11:43.300 --> 00:11:47.560 you've got a letter or a postcard in this case that you're sending. It has where 00:11:47.560 --> 00:11:53.060 that message is coming from, where it's going to and the contents. In contrast, 00:11:53.060 --> 00:11:58.059 when you use transport encryption -- and so this is now a standard for most of the 00:11:58.059 --> 00:12:01.080 centralized things. What that means is you're taking that postcard and you're 00:12:01.080 --> 00:12:06.200 putting it in an envelope that the network can't open. And that's what TLS and other 00:12:06.200 --> 00:12:11.709 forms of transport encryption are going to give you, is the network link just sees 00:12:11.709 --> 00:12:15.790 the source and destination. It sees there's a message coming between Alice and 00:12:15.790 --> 00:12:19.779 Facebook or whatever cloud provider, but can't look into that and see that that's 00:12:19.779 --> 00:12:23.950 really a message for Bob or what's being said. It just sees individuals 00:12:23.950 --> 00:12:30.570 communicating with that cloud provider. And so, you know, SMTPS, there are secure 00:12:30.570 --> 00:12:35.570 versions of IRC and e-mail and most other protocols are using transport security at 00:12:35.570 --> 00:12:41.730 this point. The thing that we have now is called end-to-end encryption or E2E, and so 00:12:41.730 --> 00:12:48.880 now the difference here is the message that Alice is sending is addressed to Bob. 00:12:48.880 --> 00:12:53.640 And it's encrypted so that the provider Facebook can't open that either and can't 00:12:53.640 --> 00:13:00.370 look at the contents. OK? So the network just sees a message going between Alice 00:13:00.370 --> 00:13:04.270 and Facebook still, but Facebook can't open that and actually see the contents of 00:13:04.270 --> 00:13:11.690 the message. And so end-to-end encryption has gained pretty widespread adoption. We 00:13:11.690 --> 00:13:16.330 have this in Signal, for the most part in iMessage, we have tools like PGP and GPG 00:13:16.330 --> 00:13:21.040 that are implementing forms of this. For messaging there's a few that are worth 00:13:21.040 --> 00:13:26.350 sort of covering in the space: the Signal protocol, which was initially called 00:13:26.350 --> 00:13:34.420 axolotl, is adopted in WhatsApp, in Facebook private messaging and sort of 00:13:34.420 --> 00:13:43.170 is... I guess it has generalized into something called the noise framework and 00:13:43.170 --> 00:13:50.161 is gaining a lot of adoption. OMEMO looks a lot like that specifically for XMPP, and 00:13:50.161 --> 00:13:56.310 so it is a specific implementation. The other one is called Off-The-Record or OTR 00:13:56.310 --> 00:14:03.519 and Off-The-Record sort of developed a little bit ... or independently from this, 00:14:03.519 --> 00:14:10.480 thinks a lot about deniability. I'm not going to go too deep into the specific 00:14:10.480 --> 00:14:14.649 nits of what these protocols are doing, but I guess the intuition is the hard 00:14:14.649 --> 00:14:20.240 parts here is not encrypting a message, but rather the hard parts is how do you 00:14:20.240 --> 00:14:24.170 send that first message and establish a session, especially if the other person is 00:14:24.170 --> 00:14:28.160 offline. So I want to start a communication. I type in the first message 00:14:28.160 --> 00:14:32.410 I'm sending to someone. I need to somehow get a key and then send a message that 00:14:32.410 --> 00:14:37.531 only that person can read and also establish this sort of shared secret. And 00:14:37.531 --> 00:14:41.529 doing all of that in one message or with the other device not online ends up being 00:14:42.055 --> 00:14:48.374 tricky. Additionally, figuring out the mapping between a user and their devices, 00:14:48.374 --> 00:14:53.437 especially as that changes and making sure you've appropriately revoked devices, 00:14:53.437 --> 00:14:59.210 added new devices without keys falling over or getting too many warnings to the 00:14:59.210 --> 00:15:04.609 error ehm too many warnings to the user ends up being a lot of the trick in these 00:15:05.207 --> 00:15:15.463 systems. There's two problems that sort of come into play when we start using an end. 00:15:15.463 --> 00:15:20.049 One is we need to think about connection establishment. So, so this is the problem 00:15:20.049 --> 00:15:27.140 of saying who is Bob? So, so I find a contact and I know them in some way by an 00:15:27.140 --> 00:15:33.880 email address, by a phone number. Signal uses phone numbers. You know, a lot of 00:15:33.880 --> 00:15:38.470 systems maybe use an email address. There's things like Threema that use a 00:15:38.470 --> 00:15:42.280 unique identifier that they generate for you. But somehow I have to go from that 00:15:42.319 --> 00:15:47.727 identifier to some actual key or some knowledge of of a cryptographic secret 00:15:47.727 --> 00:15:51.480 that identifies the other person. And I have figure out who I trust to do that 00:15:51.480 --> 00:15:59.024 mapping of of gaining this thing that I'm now using for encryption. And then also 00:15:59.080 --> 00:16:04.486 there's this "Well, how do we match?" So a lot of systems do this by uploading your 00:16:04.486 --> 00:16:10.420 address book or trying to match with existing contacts to solve the user- 00:16:10.455 --> 00:16:16.150 interface problem of discovery, which is: If they can already know the identifiers 00:16:16.275 --> 00:16:20.179 and have this mapping, then when someone new comes in they can suggest and have 00:16:20.256 --> 00:16:24.760 "prefound" these keys and you just sort of trust the server to hold this address book 00:16:24.972 --> 00:16:28.500 and to do this mapping between what they're using as their identifier and and 00:16:28.709 --> 00:16:33.860 the keys themselves that you're getting out. Signal is nice here, it says it's not 00:16:34.114 --> 00:16:38.850 uploading your contacts, which is true. They're uploading hashes of your phone 00:16:38.850 --> 00:16:43.430 number rather than the actual phone numbers. But but it's a similar thing. 00:16:43.430 --> 00:16:48.910 They've got a directory of known phone numbers. And then as people search, you'll 00:16:49.041 --> 00:16:54.680 search for a hash of the phone number and get back, you know, the key that you hope 00:16:54.680 --> 00:17:01.470 signal has correctly given you. So there's sort of a couple of ways that you reduce 00:17:01.661 --> 00:17:09.850 your trust here. Signal has been going down a path using SGX to raise the cost of 00:17:09.951 --> 00:17:16.408 attacks, oblivious RAM and a bunch of sort of systems mechanisms to reduce the 00:17:16.408 --> 00:17:22.319 costs... or increase the cost of attack against their discovery mechanism. The 00:17:22.361 --> 00:17:27.089 other way that you do this is you allow for people to use pseudonyms or anonymous 00:17:27.089 --> 00:17:32.439 identifiers. So wire you can just register on an anonymous email address. And now the 00:17:32.439 --> 00:17:37.710 cost to you is potentially less if that gets compromised. And it's worth noting 00:17:37.790 --> 00:17:42.950 Moxie will be talking tomorrow at 4:00 p.m. about the evolution of the space 00:17:42.950 --> 00:17:50.114 around Signal, so there's probably a bunch more depth there that you can expect. So 00:17:50.180 --> 00:17:54.590 what if we don't want to trust the server to do matchmaking? One of the early things 00:17:54.590 --> 00:18:00.400 that has been around is the web of trust around GPG. And this is the notion that. 00:18:00.400 --> 00:18:09.015 I, if I have in real life or otherwise associated an identifier with a key, I can 00:18:09.015 --> 00:18:15.770 publicly provide a signed statement saying that I trust that mapping and then people 00:18:15.830 --> 00:18:21.910 who don't know someone but have a link socially maybe can find these proofs and 00:18:21.910 --> 00:18:27.460 use that to trust this mapping. So I know an identifier and I know that I trust 00:18:27.460 --> 00:18:32.429 someone who has said, well, this is the key associate with that identifier and I 00:18:32.429 --> 00:18:37.360 can use that network to eventually find an identifier that that I'm willing to trust 00:18:37.360 --> 00:18:44.226 or a key that I'm willing to encrypt to. There's some user interface tradeoff here. 00:18:44.226 --> 00:18:49.960 This is a manual process in general. And this year we've had a set of denial-of- 00:18:49.960 --> 00:18:56.070 service attacks on the web-of-trust infrastructure. And so the the specific 00:18:56.070 --> 00:19:03.830 attack is that anyone can upload these attestations or trust, and so if a bunch 00:19:03.830 --> 00:19:08.140 of random users or sybils start uploading trusts, when you go to try and download 00:19:08.140 --> 00:19:12.480 this, you end up overwhelmed by the amount of information. And so the system does not 00:19:12.480 --> 00:19:17.330 scale because it's very hard to filter to people you care about without telling the 00:19:17.330 --> 00:19:20.200 system who you care about and revealing your network, which you're trying to 00:19:20.200 --> 00:19:29.180 avoid. Keybase takes another approach. They made the observation that when I go 00:19:29.180 --> 00:19:34.630 to try and talk to someone, what I actually care about is the person that I 00:19:34.630 --> 00:19:40.520 believe owns a specific GitHub or Twitter or other social profile. And so I can 00:19:40.520 --> 00:19:44.870 provide an attestation where I say: "Well, this is a key that's associated with the 00:19:44.870 --> 00:19:50.540 account that controls this Twitter account or this Reddit account or this, you know, 00:19:50.540 --> 00:19:55.890 Facebook account." And so by having that trust of proofs, I can connect an 00:19:55.890 --> 00:20:00.350 individual and a cryptographic identity with the person behind who has the 00:20:00.350 --> 00:20:08.160 passwords to a set of other systems. Keybase also this year began to provide a 00:20:08.160 --> 00:20:13.357 monetary incentive for users and then struggled with the number of sign ups. And 00:20:13.357 --> 00:20:17.150 so there's a lot of work in figuring out: "OK, do these identities actually 00:20:17.150 --> 00:20:21.920 correspond to real people and how do you prevent a similar denial-of-service--style 00:20:21.977 --> 00:20:30.910 attack that the web of trust faced in identifying things here?" On our devices, 00:20:30.910 --> 00:20:37.655 we end up in general resorting to a concept called tofu or Trust-On-First-Use, 00:20:37.655 --> 00:20:43.010 and what that means is when I first see a key that identifies someone, I'll save 00:20:43.010 --> 00:20:47.760 that. And if I ever get another need to communicate with that person again, I've 00:20:47.760 --> 00:20:50.850 already got a key and I can keep using that same key and expect that key to stay 00:20:50.850 --> 00:20:56.419 the same. And so that that continuation and the ability to pin keys once you've 00:20:56.419 --> 00:21:00.790 seen them means that if when you first establish a connection with someone, it's 00:21:00.790 --> 00:21:04.750 the real person, then someone who compromises them later can't take over or 00:21:04.750 --> 00:21:14.360 change that. Finally, one of the sort of exciting things that came out - this is 00:21:14.360 --> 00:21:21.049 circa 2015 and is largely defunct now - was a system by Adam Langley called Pond 00:21:21.049 --> 00:21:27.790 that looked at hardening a modern version of email. And one of the things that Pond 00:21:27.790 --> 00:21:33.470 did was it had something called a password authenticated key exchange. And so this is 00:21:33.470 --> 00:21:40.220 an evolving cryptographic area where you're saying if two people can start with 00:21:40.220 --> 00:21:48.140 some weak shared secret - So I can perhaps publicly or in plain text ask the 00:21:48.140 --> 00:21:53.483 challenge, the other person: "Where were we at a specific day?" And so now we both 00:21:53.483 --> 00:21:57.450 know something that maybe has a few bits of entropy, at least. If we can write the 00:21:57.915 --> 00:22:04.700 same textual answer, we can take that, run a key derivation function to end up with a 00:22:04.700 --> 00:22:09.620 larger amount of shared entropy and use that as a bootstrapping method to do a key 00:22:09.620 --> 00:22:13.120 exchange and end up finding a strong cryptographic identity for the other 00:22:13.120 --> 00:22:22.049 person. So Pond has a system that they call Panda for linking to individuals 00:22:22.049 --> 00:22:25.960 based on a challenge response and this is also something that you'll find in off- 00:22:25.960 --> 00:22:32.309 the-record systems around Jabber. The other thing that we need to be careful 00:22:32.309 --> 00:22:37.750 about in end-to-end--encrypted systems is deniability. When I'm chatting one on one 00:22:37.750 --> 00:22:46.049 with someone, that conversation is eventually fairly deniable. Either a 00:22:46.049 --> 00:22:49.929 person can have their recollection of what happened and there's no proof that the 00:22:49.929 --> 00:22:55.220 other person said something unless you've recorded it or otherwise, you know, 00:22:55.220 --> 00:22:58.490 brought some other technology into play. But with an encrypted thing where I've 00:22:58.490 --> 00:23:02.790 authenticated the other person, I end up with a transcript - potentially - that, 00:23:02.790 --> 00:23:09.400 you know, I can turn over later and say, look, this person said this. And and, you 00:23:09.400 --> 00:23:13.419 know, we've seen recently that things like emails that come out are authenticated in 00:23:13.419 --> 00:23:20.610 this way. The DKIM system that authenticates email senders showed up in 00:23:20.610 --> 00:23:26.554 the WikiLeak's releases of Hillary Clinton's emails and was able to say: 00:23:26.554 --> 00:23:30.099 "Look the text in these hasn't been changed." And it was signed by the real 00:23:30.099 --> 00:23:36.351 server that we would expect. So the thing that we get from Off-The-Record and the 00:23:36.351 --> 00:23:42.076 Signal protocol is something called deniability or reputability. And this 00:23:42.076 --> 00:23:48.004 plays into a concept of a forward secrecy, which is: We're going to sort of throw 00:23:48.004 --> 00:23:54.590 away stuff afterwards in a way that our chat goes back to being more ephemeral. 00:23:54.590 --> 00:23:57.620 And so we can think about this in two ways. There's actually two properties that 00:23:57.620 --> 00:24:03.470 interlink in this: We have keys that we're using to form our shared session that 00:24:03.470 --> 00:24:10.620 we're expecting to use to have our secret message. And each time I send a message, 00:24:10.814 --> 00:24:15.980 I'm going to also provide some new key material and begin changing that secret 00:24:16.152 --> 00:24:21.540 key that we're using. So I provide a next key. And when Bob replies, he's going to 00:24:21.540 --> 00:24:26.645 now use my next key as part of that and give me his next key. And the other thing 00:24:26.645 --> 00:24:31.295 that I can then do is when I send a message, I can provide the secret bit of 00:24:31.295 --> 00:24:35.220 my previous key. So I can say: "My last private key that I used to send you that 00:24:35.220 --> 00:24:41.190 previous message was this." And now at the end of our conversation, we both know all 00:24:41.190 --> 00:24:46.030 of the private keys such that we both could have created that whole conversation 00:24:46.030 --> 00:24:53.610 on our own computer. At any given time, it's only the most recent message that is 00:24:53.610 --> 00:24:57.190 that only could have been sent by the other person and the rest of the 00:24:57.190 --> 00:25:03.570 transcript that you have is something you could have generated yourself. There is a 00:25:03.570 --> 00:25:07.980 talk on day three about Off-The-Record v4, the fourth version of that, that will go 00:25:07.980 --> 00:25:13.650 deeper into that, that's at 9:00 p.m. in the about:freedom assembly. So I encourage 00:25:13.650 --> 00:25:20.090 you to do that if you're interested in this. OK. The next one to talk about is 00:25:20.090 --> 00:25:28.150 expiry. This is sort of a follow on to this concept of forward secrecy. But 00:25:28.178 --> 00:25:31.720 there's sort of two attacks here to consider. One is something that we should 00:25:31.720 --> 00:25:37.299 maybe, I guess, give credit to Snapchat for popularizing, which is this concept of 00:25:37.299 --> 00:25:42.200 "the message goes away after some amount of time". And really, this is protecting 00:25:42.200 --> 00:25:46.289 against not fully trusting the other person from like sharing it later or 00:25:46.289 --> 00:25:51.419 sharing in a way you didn't attend ehm intent. And this is also like a snapshot 00:25:51.419 --> 00:25:57.190 adversary. So a bunch of apps will alert the other participant if you take a 00:25:57.190 --> 00:26:01.610 screenshot. This is why some apps will blank the screen when they go to the task 00:26:01.610 --> 00:26:07.919 switcher. So if you're swapping between apps, you'll see that some of your 00:26:07.919 --> 00:26:12.210 applications will just show a blank screen or will not show contents. And that's 00:26:12.210 --> 00:26:16.030 because the mobile operating systems APIs don't tell them when you're in that mode 00:26:16.030 --> 00:26:19.090 when you take a screenshot and so they want to just be able to notify you if the 00:26:19.090 --> 00:26:23.500 other person does. It's worth noting that this is all just raising the cost of these 00:26:23.500 --> 00:26:27.720 attacks and providing sort of a social incentive not to, right. I can still use 00:26:27.720 --> 00:26:31.740 another camera to take a picture of my phone and get evidence of something that 00:26:31.740 --> 00:26:39.200 has been said. But it's discouraging it and setting social norms. The other reason 00:26:39.200 --> 00:26:44.289 for expiry is: After the fact, a compromise of a device, so whether that's 00:26:44.289 --> 00:26:49.190 - you know, someone gets hold of the device and tries to do forensic analysis 00:26:49.190 --> 00:26:54.539 to pull off previous messages or the chat database or whether someone tries to 00:26:54.539 --> 00:26:59.770 install an application that then scans through your phone... So that's Fengcai is 00:26:59.770 --> 00:27:05.549 a application that's been installed as a surveillance app in China. And this also 00:27:05.549 --> 00:27:10.006 boils down to a user interface and user experience question, which is how longer 00:27:10.006 --> 00:27:13.480 you're going to save logs, how much history are you going to save and what 00:27:13.480 --> 00:27:19.493 norms are you going to have? And there's there's a tradeoff here. It's useful 00:27:19.493 --> 00:27:24.549 sometimes to scroll back. And especially for companies that believe that they have 00:27:24.549 --> 00:27:31.560 value added services around being able to do data analytics on your chat history. 00:27:31.560 --> 00:27:40.140 They're wary of getting rid of that. The next thing that we have is isolation and 00:27:40.140 --> 00:27:47.650 OS sand boxing. Right. So this is a lot of this is up one layer, which is what is the 00:27:47.650 --> 00:27:53.049 operating system doing to secure your application, your chat system from the 00:27:53.049 --> 00:27:58.650 other things, the malware or the compromises of the the broader device that 00:27:58.650 --> 00:28:06.750 it's running on. We have a bunch of projects around us at Congress that are 00:28:06.750 --> 00:28:11.440 innovating on this. There are chat systems that also attempt to do this sort of on 00:28:11.440 --> 00:28:16.270 their own. One sort of extreme example is called tinfoil chat, which makes use of 00:28:16.270 --> 00:28:21.240 three devices and a physical diode which is designed to have one device that is 00:28:21.240 --> 00:28:25.600 sending messages and another device that is receiving messages. And the thought is: 00:28:25.600 --> 00:28:30.559 if you receive a message that somehow compromises the device, the malware or the 00:28:30.559 --> 00:28:36.580 malicious file can never get any communication back out and so becomes much 00:28:36.580 --> 00:28:41.850 less valuable to have compromised. And they implement this with like a physical 00:28:41.850 --> 00:28:53.690 hardware diode. The other side of this is recovery and backups. Which is you've got 00:28:53.690 --> 00:29:01.169 a user experience tradeoff between a lot of people losing their devices and wanting 00:29:01.169 --> 00:29:04.960 to get back their contact list or their chat history and the fact that now you're 00:29:04.960 --> 00:29:08.120 keeping this extra copy and have this additional place for things to get 00:29:08.120 --> 00:29:15.290 compromised. Apple has done a lot of work here that we don't look out so much. They 00:29:15.290 --> 00:29:19.640 gave a blackout talk a few years ago where they discuss how they use custom hardware 00:29:19.640 --> 00:29:25.380 security modules in their data centers, much like the T2 chip. In the end, devices 00:29:25.380 --> 00:29:30.990 that will hold the backup keys that get used for their iclub backups and do 00:29:30.990 --> 00:29:36.870 similar amounts of rate limiting. And they consider a set of - a pretty wide set of 00:29:36.870 --> 00:29:40.530 adversaries - more than we might expect. So including things like what happens when 00:29:40.530 --> 00:29:46.520 the government comes and asks us to write new software to compromise this? And so 00:29:46.520 --> 00:29:51.840 they set up their HSMs such that they cannot provide software updates to them, 00:29:51.840 --> 00:29:56.900 which is, you know, a sort of a step of how do you do this cloud security side 00:29:56.900 --> 00:30:03.650 that we don't think about as much. So there's a set of slides that you can find 00:30:03.650 --> 00:30:09.110 from from this. And these slides will be online, too, as a pointer to to look at 00:30:09.110 --> 00:30:14.380 their solution, which considers a large number of adversaries that you might not 00:30:14.380 --> 00:30:28.220 have thought about. So traffic obfuscation is primarily a network side adversary. The 00:30:28.220 --> 00:30:31.799 technique that is getting used as sort of what people are using if they feel they 00:30:31.799 --> 00:30:37.350 need to do this, is something called domain fronting and domain fronting, had 00:30:37.350 --> 00:30:42.510 its heyday maybe in 2014 ish and has become somewhat less effective, but it's 00:30:42.510 --> 00:30:50.110 still effective enough for most of the chat things. The basic idea behind domain 00:30:50.110 --> 00:30:55.440 fronting is that there's a separation of layers behind that envelope and the 00:30:55.440 --> 00:31:02.240 message inside of it that we get in HTTP in the Web. So when I create a secure 00:31:02.240 --> 00:31:09.059 connection to a CDN to a content provider like Amazon or Google or Microsoft, I can 00:31:09.059 --> 00:31:14.030 make that connection and do perform the security layer and provide a fairly 00:31:14.030 --> 00:31:19.100 generic service that I'm connecting to. I just want to establish a secure connection 00:31:19.100 --> 00:31:23.580 to CloudFlare. And then once I've done that, the message that I can send inside 00:31:23.580 --> 00:31:27.399 can be a chat message to a specific customer of that CDN or that cloud 00:31:27.399 --> 00:31:35.440 provider. And so this is an effective way to prevent the network from knowing what 00:31:35.440 --> 00:31:41.659 specific service you're accessing. It got used for a bunch of circumvention things. 00:31:41.659 --> 00:31:45.620 It then got used for a bunch of malware things and this caused a bunch of the 00:31:45.620 --> 00:31:52.480 cloud providers to stop allowing you to do this. But it's still getting used. This is 00:31:52.480 --> 00:31:56.330 still what sort of happening when you turn on certain censorship circumvention in 00:31:56.330 --> 00:32:01.770 signal, it's what telegram is using for the most part. And it's the same basic 00:32:01.770 --> 00:32:08.290 technique is getting another revival with DNS over HTTPS and encrypted SNI 00:32:08.290 --> 00:32:15.300 extensions to TLS which allow for a standardized approach to establish a 00:32:15.300 --> 00:32:19.760 connection to a service without providing any specific identifiers to the network 00:32:19.760 --> 00:32:26.159 for which service you want to connect to. It's worth sort of mentioning that 00:32:26.159 --> 00:32:33.640 probably the most active chat service for this sort of obfuscation or circumvention 00:32:33.640 --> 00:32:39.380 is telegram, which has a bunch of users in countries that are not fans of having lots 00:32:39.380 --> 00:32:44.929 of users of telegram. And so they have both systems where they can bounce between 00:32:44.929 --> 00:32:49.343 IPs very quickly and change where their servers appear to be. And they've also 00:32:49.343 --> 00:32:55.299 used techniques like sending messages over DNS tunnels to mitigate some of these 00:32:55.299 --> 00:33:01.510 censorship things From the provider's perspectives this is really accessing 00:33:01.510 --> 00:33:05.700 their user population. They're not really thinking about your local network or 00:33:05.700 --> 00:33:09.220 caring about that as much as as much as they are like, oh, there's millions of 00:33:09.220 --> 00:33:16.570 users that should probably still have access to us. So we can maybe hide the 00:33:16.570 --> 00:33:21.809 characteristics of traffic in terms of what specific service we're connecting. 00:33:21.809 --> 00:33:25.840 There's some other things about traffic, though, that also are revealing to the 00:33:25.840 --> 00:33:28.990 network. And this is sort of this additional metadata that we need to think 00:33:28.990 --> 00:33:36.039 about. So one of these is padding or the size of messages can be revealing. So one 00:33:36.039 --> 00:33:39.350 sort of immediate thing is the size of a chat or a text message is going to be very 00:33:39.350 --> 00:33:45.700 different from the size of an image or voice or movies. And you see this on 00:33:45.700 --> 00:33:49.059 airplanes or in other bandwidth limited settings: they might allow text messages 00:33:49.059 --> 00:33:56.270 to go through, but images won't. There's been research that shows, for instance, on 00:33:56.270 --> 00:34:02.840 voice, even if I encrypt my voice, we've actually gotten really good at compressing 00:34:02.840 --> 00:34:07.580 audio of human speech. So much so that different phonemes, different sounds that 00:34:07.580 --> 00:34:13.799 we make take up different sizes. And so I can say something, compress it, encrypt it 00:34:13.799 --> 00:34:20.169 and then recover what was said based on the relative sizes of different sounds. So 00:34:20.169 --> 00:34:25.240 there was there was a paper in 2011 that Oakland S&P that demonstrated this 00:34:25.240 --> 00:34:33.159 potential for attacks. And so what this is telling us perhaps is that there's a 00:34:33.159 --> 00:34:39.639 tradeoff between how efficiently I want to send things and how much metadata or 00:34:39.639 --> 00:34:44.760 revealing information for distinguishing them I'm giving up. So I can use a less 00:34:44.760 --> 00:34:49.579 efficient compression that's constant bit rate or that otherwise is not revealing 00:34:49.579 --> 00:34:52.469 this information, but it has higher overhead and won't work as well in 00:34:52.469 --> 00:34:58.539 constrained network environments. The other place this shows up is just when 00:34:58.539 --> 00:35:04.839 people are active. So if I can look at when someone is tweeting or when messages 00:35:04.839 --> 00:35:10.785 are sent, I can probably figure out pretty quickly what timezone they're in. Right. 00:35:10.785 --> 00:35:17.009 And so this leads to a whole set of these metadata based attacks. And in particular, 00:35:17.009 --> 00:35:21.509 there's confirmation attacks and intersection attacks. And so intersection 00:35:21.509 --> 00:35:26.780 attacks is looking at the relative activity of multiple people and trying to 00:35:26.780 --> 00:35:32.519 figure out: OK, when Alice sent a message, who else was online or active at the same 00:35:32.519 --> 00:35:37.190 time? And over time, can I narrow down or filter to specific people that were likely 00:35:37.190 --> 00:35:45.269 who Alice was talking to? Pond also is a service to look at or a system to look at 00:35:45.269 --> 00:35:51.969 in this regard. Their approach was that a client would hopefully be always be online 00:35:51.969 --> 00:35:57.609 and would at a regular pattern check in with the server with the same amount of 00:35:57.609 --> 00:36:01.980 data, regardless of whether there was a real message to send or not. So that from 00:36:01.980 --> 00:36:07.089 the network's perspective, every user looked the same. The downside being that 00:36:07.089 --> 00:36:12.579 you've now got this message being sent by every client every minute or so and that 00:36:12.579 --> 00:36:19.300 creates a huge amount of overhead of, you know, just padded data that doesn't have 00:36:19.300 --> 00:36:27.559 any meaning. So finally, I'll take a look at server hardening and the things that 00:36:27.559 --> 00:36:33.259 we're doing to reduce trust in the server. There's a few examples of why we would 00:36:33.259 --> 00:36:37.759 want to do this. So one is that you've had messaging servers, plenty of times, that 00:36:37.759 --> 00:36:46.690 have not been as secure as they claim. One example being that there was a period 00:36:46.690 --> 00:36:52.739 where the Skype subsidiary in China was using a blacklist of keywords on the 00:36:52.739 --> 00:36:57.779 server to either prevent or intercept some subset of their users messages without 00:36:57.779 --> 00:37:03.650 telling anyone that they were doing that. And then also just sort of this uncertain 00:37:03.650 --> 00:37:07.999 future of, OK, I trust the data now, but what can we do so that I don't worry about 00:37:07.999 --> 00:37:14.890 what the corporate future of this service entails for my data. One of the sort of 00:37:14.890 --> 00:37:20.670 elephants in the room is: the software development is probably pretty 00:37:20.670 --> 00:37:25.339 centralized. So even if I don't trust the server, there's some pretty small number 00:37:25.339 --> 00:37:29.180 of developers who are writing the code. And how do I trust that the updates that 00:37:29.180 --> 00:37:33.229 they are making to this, either the server or to my client that they pushed my client 00:37:33.229 --> 00:37:39.339 isn't reducing my security. Open source is a great start to mitigating that, but it's 00:37:39.339 --> 00:37:45.581 certainly not solving all of this. So one thing, one way we can think about how we 00:37:45.581 --> 00:37:49.749 reduce trust in the server is by looking at what the server knows after end to end 00:37:49.775 --> 00:37:53.969 encryption. It knows things about the size. It knows where the message is coming 00:37:53.969 --> 00:37:58.385 from. It knows where the message is going to. Size: we've talked about some of these 00:37:58.385 --> 00:38:03.300 padding things that we can use to mitigate. So how do we reduce the amount 00:38:03.300 --> 00:38:06.720 of information about sources and destinations in this network graph that 00:38:06.720 --> 00:38:13.240 the server knows? So this is a concept called linkability, which is being able to 00:38:13.240 --> 00:38:21.690 link the source and destination of a message. We start to see some mitigations 00:38:21.690 --> 00:38:27.640 or approaches to reducing linkability entering mainstream systems. So Signal has 00:38:27.640 --> 00:38:32.260 a system called "Sealed Sender" that you can enable, where the source of the 00:38:32.260 --> 00:38:37.489 message goes within the encrypted envelope. So that Signal doesn't see that. 00:38:37.489 --> 00:38:42.089 The downside being that Signal is still seeing your IP address but the thought is 00:38:42.089 --> 00:38:46.559 that they will throw those out relatively quickly and so they will have less logs 00:38:46.559 --> 00:38:53.099 about this source to destination. Theoretically, though, there is a bunch of 00:38:53.099 --> 00:38:59.160 work in this. The first thing I'll point to is a set of systems that we classify as 00:38:59.160 --> 00:39:07.819 mixnets. A mixnet works by having a set of providers rather than a single entity 00:39:07.819 --> 00:39:12.800 that's running the servers. A bunch of users will send messages to the first 00:39:12.800 --> 00:39:16.670 provider, which will shuffle all of them and send them to the next provider, which 00:39:16.670 --> 00:39:20.640 will shuffle them again and send them to a final provider that will shuffle them and 00:39:20.640 --> 00:39:25.599 then be able to send them to destinations. And this de-links. Where none of the 00:39:25.599 --> 00:39:31.519 individual providers know both the source and destination of these messages. So this 00:39:31.519 --> 00:39:39.750 looks maybe a bit like Tors onion routing, but differs in in sort of a couple of 00:39:39.750 --> 00:39:44.799 technicalities. One is typically, you will wait for some number of messages rather 00:39:44.799 --> 00:39:49.719 than just going through with bandwidth and low latency. And so by doing that, you can 00:39:49.719 --> 00:39:53.920 get a theoretical guarantee that this batch had at least n messages that got 00:39:53.920 --> 00:39:58.400 shuffled and therefore you can prevent there being some time where only one user 00:39:58.400 --> 00:40:05.400 was using the system. And so you got a stronger theoretic guarantee. There's an 00:40:05.400 --> 00:40:09.779 active project making a messaging system using mixnets called Katzenpost. They gave 00:40:09.779 --> 00:40:14.150 a talk at Camp this summer and I'd encourage you to look at their website or 00:40:14.150 --> 00:40:22.679 go back to that talk to learn more about mixnets. The project that I was, I guess, 00:40:22.679 --> 00:40:26.319 tangentially helping with is in a space called private information retrieval, 00:40:26.319 --> 00:40:33.410 which is another technique for doing this delinking. Private information retrieval 00:40:33.410 --> 00:40:37.559 frames the question a little bit differently. And what it asks is: if I 00:40:37.559 --> 00:40:41.669 have a server that has a database of messages and I want a client to be able to 00:40:41.669 --> 00:40:45.539 retrieve one of those messages without the server knowing which message the client 00:40:45.539 --> 00:40:55.199 got or asked for. So this sounds maybe hard. I can give you a straw man to 00:40:55.199 --> 00:40:59.150 convince yourself that this is doable and the straw man is: I can ask the server for 00:40:59.150 --> 00:41:04.069 its entire database and then take the message that I want and the server hasn't 00:41:04.069 --> 00:41:08.349 learned anything about which message I cared about. But I spent a lot of network 00:41:08.349 --> 00:41:13.899 bandwidth probably doing that. So there's a couple of constructions for this. I'm 00:41:13.899 --> 00:41:20.029 going to focus on the information theoretic private information retrieval. 00:41:20.029 --> 00:41:24.920 And so we're going to use a similar setup to what we had in our threat model for a 00:41:24.920 --> 00:41:29.680 mixed net, which is we've got a set of providers now that have the same database. 00:41:29.680 --> 00:41:34.869 And I'm going to assume that they're not all talking to each other or colluding. So 00:41:34.869 --> 00:41:40.200 I just need at least one of them, to be honest. And one of the things that we'll 00:41:40.200 --> 00:41:44.749 use here is something called the exclusive or operation. To refresh your memory here 00:41:44.749 --> 00:41:50.711 exclusive or is a binary bitwise operation. And the nice property that we 00:41:50.711 --> 00:41:55.949 get is if I xor something with itself, it cancels out. So if I have some piece of 00:41:55.949 --> 00:42:02.970 data and I xor it against itself, it just goes away. So if I have my systems that 00:42:02.970 --> 00:42:11.430 have the database, I can ask each one to give me a superposition of some random 00:42:11.430 --> 00:42:17.249 subset of its database so I can ask the first server, give me items for 11, 14 and 00:42:17.249 --> 00:42:23.549 20 xor together. I'm assuming all of the items are the same size so that you can do 00:42:23.549 --> 00:42:31.069 these xors. And then if I structure that, it can appear to each server independently 00:42:31.069 --> 00:42:35.379 or as in the request that it sees that I just ask for some random subset. But I can 00:42:35.379 --> 00:42:39.019 do that so that when I xor the things I get back, everything just cancels out 00:42:39.019 --> 00:42:44.009 except the item that I care about. Unless you saw all of the requests that I made, 00:42:44.009 --> 00:42:49.140 you wouldn't be able to tell which item I cared about. So by doing this, I've 00:42:49.140 --> 00:42:53.949 reduced the network bandwidth. I'm only getting one item of size back from every 00:42:53.949 --> 00:43:00.050 server. Now, you might you might have a concern that I'm asking the server to do a 00:43:00.050 --> 00:43:03.720 whole lot of work here. It has to look through its entire database and compute 00:43:03.720 --> 00:43:09.519 this superposition thing. And that seems potentially like a lot of work, right. The 00:43:09.519 --> 00:43:14.660 thing that I think is exciting about this space is it turns out this sort of 00:43:14.660 --> 00:43:19.499 operation of going out to a large database and like searching for all of the things 00:43:19.499 --> 00:43:23.759 and then coming back with a small amount of data looks a lot like the hardware that 00:43:23.759 --> 00:43:29.510 we're building for A.I. and for a bunch of these sorts of search like things. And so 00:43:29.510 --> 00:43:34.479 this runs really quite well on a GPU where I can have all of those thousands of cores 00:43:34.479 --> 00:43:38.719 compute little small parts of the XOR and then pull back this relatively small 00:43:38.719 --> 00:43:43.160 amount of information. And so with GPUs, you can actually have databases of 00:43:43.160 --> 00:43:50.920 gigabytes, tens of gigabytes of data and compute these XORs across all of it in 00:43:50.920 --> 00:43:59.441 order of a millisecond or less. So a couple of things in this space. "Talek" is 00:43:59.441 --> 00:44:03.960 the system that I helped with that demonstrates this working. The converse 00:44:03.960 --> 00:44:08.900 problem is called private information storage. And that one is how do I write an 00:44:08.900 --> 00:44:14.150 item into a database without the database knowing which item I wrote, the 00:44:14.150 --> 00:44:20.209 mathematical construction there is not quite as simple to explain. But there's a 00:44:20.209 --> 00:44:26.039 pretty cool new work in the last month or two out of Dan Boneh and Henry Corrigan- 00:44:26.039 --> 00:44:34.680 Gibbs at Stanford called Express and Saba as first author that is showing how to 00:44:34.680 --> 00:44:44.380 fairly practically perform that operation. I'll finish just with a couple minutes on 00:44:44.380 --> 00:44:53.299 multiparty chat or group chat, so small groups. You've sort of got a choice here 00:44:53.299 --> 00:44:58.029 in terms of how assisted chat systems are implementing group chat. One is you can 00:44:58.029 --> 00:45:01.759 not tell the server about the group. And as someone who is part of the group, I 00:45:01.759 --> 00:45:05.729 just send the same message to everyone in the group. And maybe I can tag it for them 00:45:05.729 --> 00:45:10.009 so that they know it's part of the group or you do something more efficient where 00:45:10.009 --> 00:45:13.984 you tell the server about group membership and I send the message once to the server 00:45:13.984 --> 00:45:22.829 and it sends it to everyone in the group. Even if you don't tell the server about 00:45:22.829 --> 00:45:26.680 it, though, you've got a bunch of things to worry about leaked correlation, 00:45:26.680 --> 00:45:31.979 which is: if at a single time someone sends the same sized message to five other 00:45:31.979 --> 00:45:35.640 people and then later someone else sends the same sized message to five other 00:45:35.640 --> 00:45:39.360 people, and those basically overlap, someone in the network basically knows who 00:45:39.360 --> 00:45:42.839 the group membership is. So it's actually quite difficult to conceal group 00:45:42.839 --> 00:45:48.609 membership. The other thing that breaks down is our concept of deniability once 00:45:48.609 --> 00:45:52.929 again, which is now if multiple people have this log. Even if both of them 00:45:52.929 --> 00:45:56.799 individually could have written it, the fact that they have the same cryptographic 00:45:56.799 --> 00:46:04.329 keys from this other third party probably means that third party made that message. 00:46:04.329 --> 00:46:13.119 So there continues to be work here. Signal is working on providing again and SGX and 00:46:13.119 --> 00:46:16.510 centralized construction for grid management to be able to scale better, 00:46:16.510 --> 00:46:21.969 given I think the pretty realistic fact that the server in these cases is probably 00:46:21.969 --> 00:46:25.689 going to be able to figure out group membership in some case, you might as well 00:46:25.689 --> 00:46:32.019 make it scale. On the other side, one of the cool systems that's being prototyped 00:46:32.019 --> 00:46:39.969 is called "cwtch" out of open privacy. And this is an extension to ricochet that 00:46:39.969 --> 00:46:45.849 allows for offline messages and small group chats. It works for order of 5 to 20 00:46:45.849 --> 00:46:50.700 people, and it works by having a server that obliviously forwards on messages to 00:46:50.700 --> 00:46:55.599 everyone connected to it. So when I send a message to a group, the server sends the 00:46:55.599 --> 00:46:59.430 message to everyone it knows about, not just the people in the group, and 00:46:59.430 --> 00:47:03.609 therefore the server doesn't actually know the subgroups that exist. It just knows 00:47:03.609 --> 00:47:10.549 who's connected to it. And that's a neat way. It doesn't necessarily scale to large 00:47:10.549 --> 00:47:16.140 groups, but it allows for some concealing of group membership. They've got an 00:47:16.140 --> 00:47:22.299 Android prototype as well that's sort of a nice extension to make this usable. 00:47:22.299 --> 00:47:33.509 Wonderful. I guess the final thought here is: there's a lot of systems, I'm sure I 00:47:33.509 --> 00:47:40.339 haven't mentioned all of them. But this community is really closely tied to the 00:47:40.339 --> 00:47:46.059 innovations that are happening in the space of private chat. And this is the 00:47:46.059 --> 00:47:49.910 infrastructure that supports communities and is some of the most meaningful stuff 00:47:49.910 --> 00:47:55.959 you can possibly work on. And I encourage you to find new ones and look at a bunch 00:47:55.959 --> 00:48:00.029 of them and think about the tradeoffs and encourage friends to play with new 00:48:00.029 --> 00:48:03.650 systems, because that's how they gain adoption and how people figure out what 00:48:03.650 --> 00:48:09.710 mechanisms do and don't work. So with that, I will take questions. 00:48:09.710 --> 00:48:17.698 Applause 00:48:17.698 --> 00:48:21.379 Herald: Wasn't necessary to encourage you to come with an applause. There are 00:48:21.379 --> 00:48:25.130 microphones that are numbered in the room, so if you start lining up behind the 00:48:25.130 --> 00:48:29.709 microphones, then we can take your questions. We already have a question from 00:48:29.709 --> 00:48:36.500 the Internet. Question: Popularity and independency are 00:48:36.500 --> 00:48:42.630 a contradiction. How can I be sure that an increasingly popular messenger like Signal 00:48:42.630 --> 00:48:50.959 stays independent? Answer: I guess I would question whether 00:48:50.959 --> 00:48:57.720 independence is a goal in and of itself. It's true that the value is increasing. 00:48:57.720 --> 00:49:03.449 And so one of the things I think about is, is using systems that have open protocols 00:49:03.449 --> 00:49:07.289 or that are federated or otherwise not centralized. And again, this is reducing 00:49:07.289 --> 00:49:13.400 that need to have confidence in the future business model of single legal entity. 00:49:13.400 --> 00:49:20.539 But I don't know if independence is of the company is the thing that you're 00:49:20.539 --> 00:49:25.279 trying to trade off with popularity. Herald: Well, and we have questions at the 00:49:25.279 --> 00:49:27.630 microphones. We'll start a microphone, number one. 00:49:27.630 --> 00:49:33.839 Question: Thanks for the talk. First of all, we talked to you talked a lot about 00:49:33.839 --> 00:49:40.739 content and encryption. What about the initial problem? History shows that if I'm 00:49:40.739 --> 00:49:47.229 an individual already observed in a sensitive area, that might no need to 00:49:47.229 --> 00:49:52.750 encrypt or decrypt the message on sending. It's already identified. I'm sending at a 00:49:52.750 --> 00:49:58.880 specific location at a specific time. Is there any chance to hide that or do 00:49:58.880 --> 00:50:02.769 something against it? Answer: So make things hidden again after 00:50:02.769 --> 00:50:13.069 the fact? That seems very hard. I mean, so. So there's a couple thoughts there, 00:50:13.069 --> 00:50:20.769 maybe. There's sort of this real world intersection attack, which is if 00:50:20.769 --> 00:50:25.230 there's a real world observable action of who actually shows up at the protest, 00:50:25.230 --> 00:50:29.239 that's a pretty good way to figure out who is chatting about the protests beforehand, 00:50:29.239 --> 00:50:37.299 potentially. And so, I mean, I think what we've seen in real world organizing is 00:50:37.299 --> 00:50:42.170 things like either really decentralizing that, where it happens across a lot of 00:50:42.170 --> 00:50:46.119 platforms, and happens very spontaneously close to the event. So there's not enough 00:50:46.119 --> 00:50:55.740 time to respond in advance or using or hiding your presence or otherwise trying 00:50:55.740 --> 00:51:01.039 to stagger your actual actions so that they are harder to correlate to a specific 00:51:01.039 --> 00:51:06.849 group. But it's not something the chat systems are talking about. I don't think. 00:51:06.849 --> 00:51:10.890 Herald: We have time for more questions. So please line up in the microphones and 00:51:10.890 --> 00:51:15.510 if you're leaving, then leave quietly. We have a question from microphone number 4. 00:51:15.510 --> 00:51:18.690 Question: So if network actress 00:51:18.690 --> 00:51:23.509 translation is the original sin to the end to end principle, and due to that, we now 00:51:23.509 --> 00:51:31.309 have to run servers, someone has to pay for it. Do you know any solution to that 00:51:31.309 --> 00:51:38.130 economic problem? Answer: I mean, we had to pay for things 00:51:38.130 --> 00:51:42.609 even without network address translation, but we could move more of that cost to end 00:51:42.609 --> 00:51:49.829 users. And so we have another opportunity with IP v six to potentially keep more of 00:51:49.829 --> 00:51:53.539 the cost with end users or develop protocols that are more decentralized 00:51:53.539 --> 00:52:00.440 where that cost stays more fairly distributed. You know, our phones have a 00:52:00.440 --> 00:52:05.279 huge amount of computation power and figuring out how we make our protocols so 00:52:05.279 --> 00:52:13.339 that work happens there is, I think, an ongoing balance. I think some of the 00:52:13.339 --> 00:52:18.349 reasons why network address translation or centralization is so common is because 00:52:18.349 --> 00:52:22.849 distribute systems are pretty hard to build and pretty hard to gain confidence 00:52:22.849 --> 00:52:29.739 in. So more tools around how we can test and feel like we understand and that the 00:52:29.739 --> 00:52:35.130 system actually is, you know, going to work 99.9% of the time for distributed 00:52:35.130 --> 00:52:38.709 systems is going to make people less wary of working with them. 00:52:38.709 --> 00:52:42.779 So better tools on distribute systems is maybe the best answer. 00:52:42.779 --> 00:52:48.180 Herald: We also have another question from the internet, which we'll take now. 00:52:48.180 --> 00:52:53.299 Question: What do you think of technical novices, acceptance and dealing with OTR 00:52:53.299 --> 00:52:58.930 keys, for example, Matrix Riot? Most people I know just click "I verified this 00:52:58.930 --> 00:53:03.419 key" even if they didn't. Anwer: Absolutely. So this, I think 00:53:03.419 --> 00:53:07.550 goes back to a lot of these problems are sort of a user experience tradeoff, which 00:53:07.550 --> 00:53:14.160 is, you know, we saw initial versions of Signal where you would actually try and 00:53:14.160 --> 00:53:19.470 regularly verify some QR code between each and then that sort of has gotten pushed 00:53:19.470 --> 00:53:24.499 back to a harder to access part of the user interface because not many people 00:53:24.499 --> 00:53:29.120 wanted to deal with that. And an early matrix riot you would get a lot of 00:53:29.120 --> 00:53:33.059 warnings about: There's a new device. Do you want to verify this new device? Do you 00:53:33.059 --> 00:53:37.209 only want to send to the previous devices that you trusted. And now you're getting 00:53:37.209 --> 00:53:41.739 the ability to sort of more automatically just sort of accept these changes and 00:53:41.739 --> 00:53:45.429 you're weakening some amount of the encryption security, but you're getting a 00:53:45.429 --> 00:53:49.299 better, smoother user interface because most users are just going to sort of click 00:53:49.299 --> 00:53:52.669 "yes" because they want to send the message. Right. And so there's this 00:53:52.669 --> 00:53:56.129 tradeoff: when you have built the protocols such that you are standing in 00:53:56.129 --> 00:54:00.140 the way of the person doing what they want to do. That's not really where you want to 00:54:00.140 --> 00:54:06.369 put that friction. So figuring out other ways where you can have this on the side 00:54:06.369 --> 00:54:12.959 or supporting the communication rather than hindering it is probably the types of 00:54:12.959 --> 00:54:16.889 user interfaces or systems that we should be thinking about that can be successful. 00:54:16.889 --> 00:54:20.169 Herald: We have a couple of more questions. We'll start at microphone 00:54:20.169 --> 00:54:23.820 number 3. Question: Thank you for your talk. You 00:54:23.820 --> 00:54:28.970 talked about deniability by sending the private key with the last message. 00:54:28.970 --> 00:54:34.339 And how I you get the private key for the last message in the whole conversation 00:54:34.339 --> 00:54:45.119 Anwer: In the OTR, XMPP, Jabber systems there would be an explicit action to end 00:54:45.119 --> 00:54:50.410 the conversation that would then make it repudiateable that would that would send 00:54:50.410 --> 00:54:55.970 that final message to to close it. What you have in things like Signal is it's 00:54:55.970 --> 00:54:59.549 actually happening every message as part of the confirmation of the message. 00:54:59.549 --> 00:55:03.329 Question: OK. Thank you. Herald: We still probably have questions 00:55:03.329 --> 00:55:07.439 , time for more questions. So please line up if you have any. Don't hold back. 00:55:07.439 --> 00:55:09.549 We have a question from microphone number 7. 00:55:09.549 --> 00:55:14.269 Question: So, first of all, a brief comment. The riot thing still doesn't even 00:55:14.269 --> 00:55:19.880 do tofu. They they haven't figured this out. But I think there's a 00:55:19.880 --> 00:55:24.760 much more subtle conversation that needs to happen around deniability, because most 00:55:24.760 --> 00:55:31.489 of the time, if you have people with with a power imbalance, the non repudiatable 00:55:31.489 --> 00:55:36.660 conversation actually benefits the weaker person. So we actually don't want 00:55:36.660 --> 00:55:42.729 deniability in most of our chat applications or whatever, except that's 00:55:42.729 --> 00:55:47.390 still more subtle than that, because when you have people with equal power, maybe 00:55:47.390 --> 00:55:54.609 you do. It's kind of weird. Anwer: Absolutely. And I guess the other 00:55:54.609 --> 00:55:58.759 part of that is, is that something that should be shown to users and is that a 00:55:58.759 --> 00:56:03.259 concept? Is there a way that you express that notion in a way that users can 00:56:03.259 --> 00:56:07.910 understand it and make good choices? Or is it just something that your system makes a 00:56:07.910 --> 00:56:13.270 choice on for all of your users? Herald: We have one more question. 00:56:13.270 --> 00:56:17.229 Microphone number seven, please line up if you have any more. We still have a couple 00:56:17.229 --> 00:56:19.559 of more minutes. Microphone number seven, please. 00:56:19.559 --> 00:56:23.309 Question: Hi, Thanks for the talk. You talked about the private information 00:56:23.309 --> 00:56:30.979 retrieval and how that would stop the server from knowing who retrieved the 00:56:30.979 --> 00:56:36.469 message. But for me, the question is, how do I find out in the first place which 00:56:36.469 --> 00:56:44.140 message is for me? Because if he, for example, always use message slot 14, then 00:56:44.140 --> 00:56:53.589 obviously over a conversation, it would again be possible to deanonymize the users 00:56:53.589 --> 00:56:58.819 in like, OK, they always accessing this one in like all those queries. 00:56:58.819 --> 00:57:06.749 Answer: Absolutely. So I didn't explain that part. The trick is that between the 00:57:06.749 --> 00:57:13.069 two people, we will share some secret, which is our conversation secret. And what 00:57:13.069 --> 00:57:16.569 we will use that conversation secret for is to seed a pseudo random number 00:57:16.569 --> 00:57:20.900 generator. And so we will be able to generate the same stream of random 00:57:20.900 --> 00:57:27.519 numbers. And so each next message will go at the place determined by the next item 00:57:27.519 --> 00:57:32.550 in that random number generator. And so now the person writing can just write out 00:57:32.550 --> 00:57:36.119 random places as far as the server tells and when it wants to write the next 00:57:36.119 --> 00:57:40.869 message in this conversation, it'll make sure to write at that next place 00:57:40.869 --> 00:57:46.600 in its a random number generator for that conversation. There is a paper that will 00:57:46.600 --> 00:57:50.130 describe a bunch more of that system. But that's the basic sketch. 00:57:50.130 --> 00:57:53.819 A: Thank you. H: we have a question from the Internet. 00:57:53.819 --> 00:57:58.699 Question: It seems like identity is the weak point of the new breed of messaging 00:57:58.699 --> 00:58:02.979 apps. How do we solve this part of Zooko's triangle, the need for 00:58:02.979 --> 00:58:07.680 identifiers and to find people? Answer: Identity is hard, and I think 00:58:07.680 --> 00:58:18.279 identity has always been hard and will continue to be hard. Having a variety of 00:58:18.279 --> 00:58:23.420 ways to be identified, I think remains important and is why there isn't a single 00:58:23.420 --> 00:58:26.950 winner takes all system that we use for chat. But rather you have a lot of 00:58:26.950 --> 00:58:30.720 different chat protocols that you use for different circles and different social 00:58:30.720 --> 00:58:34.779 circles that you find yourself in. And part of that is our desire to not be 00:58:34.779 --> 00:58:38.920 confined to a single identity, but to be able to have different facets to our 00:58:38.920 --> 00:58:44.539 personalities. There are systems where you can identify yourself with a unique 00:58:44.539 --> 00:58:48.449 identifier to each person you talk to rather than having a single identity 00:58:48.449 --> 00:58:53.890 within the system. So that's something else that Pond would use. Was that the 00:58:53.890 --> 00:58:57.989 identifier that you gave out to each separate friend was different. And so 00:58:57.989 --> 00:59:03.710 you would appear as a totally separate user to each of them. It turns out that's 00:59:03.710 --> 00:59:10.239 at the same time very difficult, because if I post an identifier publicly, suddenly 00:59:10.239 --> 00:59:14.780 that identifier is now linked to me for everyone who uses that identifier. So you 00:59:14.780 --> 00:59:18.309 have to give these out privately in a one on one setting, which limits your 00:59:18.309 --> 00:59:22.909 discoverability. So that that concept of how we deal with identities I think is 00:59:22.909 --> 00:59:26.679 inherently messy and inherently something that there's not going to be something 00:59:26.679 --> 00:59:31.859 satisfying that solves. Herald: And that was the final question 00:59:31.859 --> 00:59:35.339 concluding this talk. Please give a big round of applause for Will Scott. 00:59:35.339 --> 00:59:36.089 Will: Thank you 00:59:36.090 --> 00:59:40.862 Postroll music 00:59:40.862 --> 01:00:04.000 subtitles created by c3subtitles.de in the year 2019. Join, and help us!