WEBVTT

00:00:00.000 --> 00:00:18.702
<i>36C3 preroll music</i>

00:00:18.702 --> 00:00:23.372
Herald: Please put your hands together and
give a warm round of applause to Will Scott.

00:00:23.373 --> 00:00:25.045
<i>Applause</i>

00:00:25.045 --> 00:00:25.635
Will Scott: Thank you.

00:00:25.635 --> 00:00:31.510
<i>Applause</i>

00:00:31.510 --> 00:00:42.960
Will: All right. Welcome. So. The basic
structure of this talk is sort of twofold.

00:00:42.960 --> 00:00:50.979
The first thing is to provide an overview
of the different mechanisms that exist in

00:00:50.979 --> 00:00:58.500
this space of secure communication and try
to tease apart a bunch of the individual

00:00:58.500 --> 00:01:03.380
choices and tradeoffs that have to be made
and the implications of them. Because a

00:01:03.380 --> 00:01:07.680
lot of times we talk about security or
privacy as very broad terms that cover a

00:01:07.680 --> 00:01:12.939
bunch of individual things. And breaking
that down gives us a better way to

00:01:12.939 --> 00:01:18.670
understand what it is we're giving up or
whether or why these decisions actually

00:01:18.670 --> 00:01:24.079
get made for the systems that we end up
using. And the way that it's going to sort

00:01:24.079 --> 00:01:29.509
of the arc that I'll cover is first trying
to provide a sort of taxonomy or

00:01:29.509 --> 00:01:34.800
classification of a bunch of the different
systems that we see around us. And from

00:01:34.800 --> 00:01:39.810
there identify the threats that we often
are trying to protect against and the

00:01:39.810 --> 00:01:44.189
mechanisms that we have to mitigate those
threats and then go into some of these

00:01:44.189 --> 00:01:48.599
mechanisms and look at what's happening
right now on different systems. And by the

00:01:48.599 --> 00:01:53.420
end, we'll sort of be closer to the
research frontier of what is still

00:01:53.420 --> 00:01:59.509
happening, where are places where we have
new ideas, but there's still quite a high

00:01:59.509 --> 00:02:06.159
tradeoff to usability or for other reasons
where these haven't gained mass adoption.

00:02:06.159 --> 00:02:11.330
So I'll introduce our actors: Alice and
Bob. The basic structure for pretty much

00:02:11.330 --> 00:02:17.651
all of this is one to one messaging. So
this is primarily systems that are

00:02:17.651 --> 00:02:21.220
enabling us to have a conversation that
looks a lot like what we would have in

00:02:21.220 --> 00:02:25.720
person. That's sort of the thing that
we're modelling is I want to have a

00:02:25.720 --> 00:02:30.130
somewhat synchronous real time
communication over a span of weeks,

00:02:30.130 --> 00:02:35.240
months, years, resume it, and in the same
way that in real life I know someone and I

00:02:35.240 --> 00:02:38.500
recognize them when I come and talk to
them again I expect the system to give me

00:02:38.500 --> 00:02:41.810
similar sorts of properties.

00:02:41.810 --> 00:02:44.860
So the way
we're going to then think about systems is

00:02:44.860 --> 00:02:51.920
initially, we have systems that look very
much the same as how we would have a real

00:02:51.920 --> 00:02:59.630
life communication, where I can - on a
local network - use AirDrop or use a bunch

00:02:59.630 --> 00:03:04.310
of things that just work directly between
my device and a friend's device

00:03:04.310 --> 00:03:06.650
to communicate.

00:03:06.650 --> 00:03:09.870
On a computer, this might
look like using Netcat or a command line

00:03:09.870 --> 00:03:14.790
tool to just push data directly to the
other person. And this actually results in

00:03:14.790 --> 00:03:18.060
a form of communication that looks very
similar. Right, it's ephemeral, it goes

00:03:18.060 --> 00:03:24.450
away afterwards unless the other person
saves it. But there is already a set of

00:03:24.450 --> 00:03:27.260
adversaries or threats that we can think
about how do we secure this sort of

00:03:27.260 --> 00:03:30.240
communication?

00:03:30.240 --> 00:03:34.590
One of those would be the
network. So, can someone else see this

00:03:34.590 --> 00:03:39.320
communication and how do we hide from
that? And we have mechanisms against that,

00:03:39.320 --> 00:03:43.910
namely encryption. Right, I can
disguise my communication and encrypt it

00:03:43.910 --> 00:03:49.600
so that someone who is not my intended
recipient cannot see what's happening.

00:03:49.600 --> 00:03:53.630
And then the other one would be the
other...these end devices themselves.

00:03:53.630 --> 00:03:57.570
Right, so there's a couple of things that
we need to think about when we think about

00:03:57.570 --> 00:04:00.330
what is it that we're trying to protect
against on an end device. One is there

00:04:00.330 --> 00:04:05.682
might be other bad software that either,
later gets installed and tries

00:04:05.682 --> 00:04:09.980
to steal or learn about what was said.

00:04:09.980 --> 00:04:12.240
Either, either at the same time or
afterwards.

00:04:12.240 --> 00:04:15.460
And so we have mechanisms
there. One of them would be message

00:04:15.460 --> 00:04:19.690
expiry. So we can make the messages go
away, make sure we delete them from disk

00:04:19.690 --> 00:04:25.460
at some point. And the other would be
making sure that we've sort of isolated

00:04:25.460 --> 00:04:28.810
our chats so that it doesn't overlap and
other applications can't see what's

00:04:28.810 --> 00:04:32.440
happening there.

00:04:32.440 --> 00:04:36.470
So, we have these direct
communication patterns but that's a small

00:04:36.470 --> 00:04:42.801
minority of most of what we think of when
we chat. Instead, most of the systems that

00:04:42.801 --> 00:04:48.110
we're using online use a centralized
server. There's some logically centralized

00:04:48.110 --> 00:04:52.940
thing in the cloud and I send my messages
there and it then forwards them to my

00:04:52.940 --> 00:04:58.880
intended recipient. And so whether it's
Facebook or WhatsApp or Signal or sorry,

00:04:58.880 --> 00:05:05.110
Slack or IRC or Signal or Wire or Threema
or whatever, you know, cloud chat app

00:05:05.110 --> 00:05:12.740
we're using today, this same model
applies. So we can identify additional

00:05:12.740 --> 00:05:19.810
threats here and then we can think about
why we do this. So one threat is the

00:05:19.810 --> 00:05:24.150
network. And I'll tear that apart a little
bit. You've got the local network that we

00:05:24.150 --> 00:05:28.850
had before. So someone who's on the
network near the person who's sending

00:05:28.850 --> 00:05:33.280
messages or receiving messages, so someone
else in the coffee shop, your local

00:05:33.280 --> 00:05:38.910
organization, your school, your work,
you've got the Internet as a whole that

00:05:38.910 --> 00:05:44.770
messages are passing over. So the ISPs or
the countries that you're in may want to

00:05:44.770 --> 00:05:48.840
look at or prevent you from sending
messages. You've also got an adversary in

00:05:48.840 --> 00:05:54.040
the network, sort of local or near the
server that can see most of the messages

00:05:54.040 --> 00:05:57.680
going in and out of the server because
these services have to exist somewhere be

00:05:57.680 --> 00:06:04.200
that in a data center that they physically
have computers in or in AWS or Google or

00:06:04.200 --> 00:06:08.530
one of these other clouds. And now you've
got a set of actors that you need to think

00:06:08.530 --> 00:06:11.530
about that are near the server that can
see most of the traffic going in and out

00:06:11.530 --> 00:06:13.526
of that server.

00:06:14.600 --> 00:06:17.610
We also have to think
about the server itself as a potential

00:06:17.610 --> 00:06:21.870
adversary. There's a few different threats
that we need to think about. The server

00:06:21.870 --> 00:06:27.060
could get threatened... could get hacked
or otherwise compromised. So parts of

00:06:27.060 --> 00:06:32.450
the communication or bugs in the software
can potentially be a problem.

00:06:32.450 --> 00:06:34.060
You've got a

00:06:34.060 --> 00:06:39.370
legal entity typically that is running
this server. And so the jurisdiction

00:06:39.370 --> 00:06:44.180
that it's in can send requests to get data
from users or to compel it to provide

00:06:44.180 --> 00:06:49.050
information. So there's this whole threat
of what is the server required to turn

00:06:49.050 --> 00:06:55.210
over. And then you've got sort of how is
the server actually or this company making

00:06:55.210 --> 00:06:58.690
money and sustaining itself. Is it going
to get acquired by someone that you don't

00:06:58.690 --> 00:07:02.750
trust, even if you trust it now? So
there's this future view of how do we

00:07:02.750 --> 00:07:08.809
ensure that the messages I have now don't
get misused in the future?

00:07:08.809 --> 00:07:10.339
And we have a

00:07:10.339 --> 00:07:14.370
set of techniques that mitigate these
problems as well. So one of them would

00:07:14.370 --> 00:07:18.980
be we can use traffic obfuscation or
circumvention techniques to make our

00:07:18.980 --> 00:07:25.919
traffic look less obvious to the network.
And that prevents a large amount of these.

00:07:25.919 --> 00:07:29.210
And then, I'm calling this server hardening
but it's really a sort of a broad set of

00:07:29.210 --> 00:07:34.130
techniques around how do we trust the
server less? And how do we make those

00:07:34.130 --> 00:07:38.930
potential compromises of the server,
either code based or it having to reveal

00:07:38.930 --> 00:07:42.940
information less damaging?

00:07:44.080 --> 00:07:46.680
It's worth
saying that there are a bunch of reasons

00:07:46.680 --> 00:07:50.760
why we have primarily used centralized
messaging.

00:07:50.760 --> 00:07:53.290
You've got availability. It's

00:07:53.290 --> 00:07:58.260
very easy to go to a single place and it
also makes a bunch of problems like

00:07:58.260 --> 00:08:04.040
handling multiple devices and mobile push
in particular, because both Google and

00:08:04.040 --> 00:08:09.870
Apple expect or allocate sort of a single
authorized provider who can send

00:08:09.870 --> 00:08:16.270
notifications to the app user's mobile
devices. And so that sort of requires you

00:08:16.270 --> 00:08:20.040
to have a centralized place that knows
when to send those messages if you want to

00:08:20.040 --> 00:08:24.440
provide real time alerts to your
application users.

00:08:24.440 --> 00:08:25.830
The cons is that it is

00:08:25.830 --> 00:08:32.130
both cost, there's some entity now
that is responsible for all of this cost

00:08:32.130 --> 00:08:35.769
and has to have a business model and also
that there is a single entity that people

00:08:35.769 --> 00:08:41.170
can come to and that now faces the legal
and regulatory issues.

00:08:41.170 --> 00:08:42.020
So this is not the

00:08:42.020 --> 00:08:46.610
only type of system we have, right? The
next most common is probably federated.

00:08:46.610 --> 00:08:52.810
E-mail is a great example of this. An
email is nice that now as a user I can

00:08:52.810 --> 00:08:58.240
choose an email provider that I trust out
of many, or if I don't trust any of the

00:08:58.240 --> 00:09:02.830
ones that I see, I can even spin up my own
with a small group so we can decentralize

00:09:02.830 --> 00:09:09.550
cost. We can make this more approachable.
And so while I can gain more confidence in

00:09:09.550 --> 00:09:16.040
my individual provider, I don't have as
much trust in, you know, is the recipient,

00:09:16.040 --> 00:09:21.520
is Bob in this case, I don't know how
secure his connection is to his provider.

00:09:21.520 --> 00:09:26.220
Because we've separated and decentralized
that.

00:09:26.220 --> 00:09:27.890
There's also a bunch of problems,

00:09:27.890 --> 00:09:35.160
both in figuring out identity and
discovery securely and mobile push. But we

00:09:35.160 --> 00:09:38.530
have a number of successful examples of
this. So beyond email, the Fediverse and

00:09:38.530 --> 00:09:43.810
Mastodon, Riot chat and even SMS are
examples of federated systems where

00:09:43.810 --> 00:09:50.570
there's a bunch of providers and it's not
a single central place.

00:09:50.570 --> 00:09:52.990
As you continue

00:09:52.990 --> 00:09:57.870
this sort of metaphor of splitting apart
and decentralizing and reducing the trust

00:09:57.870 --> 00:10:01.510
in a single party, you end up with a set
of decentralized messaging systems as

00:10:01.510 --> 00:10:07.420
well. And so it's worth mentioning that as
we sort of get onto this fringe. There's

00:10:07.420 --> 00:10:11.430
sort of two types: One is using Gossip
protocols. So things like Secure

00:10:11.430 --> 00:10:15.740
Scuttlebutt. And in those you connect to
either the people around you or people

00:10:15.740 --> 00:10:20.210
that you know. And when you get messages,
you gossip, you send them on to all of

00:10:20.210 --> 00:10:26.550
the people around you. And so messages
spread through the network. That is still

00:10:26.550 --> 00:10:33.050
an area where we are learning the tradeoff
of how much metadata gets leaked and

00:10:33.050 --> 00:10:41.450
things, but is nice in its level of
decentralization. The others basically

00:10:41.450 --> 00:10:47.610
tried to make all of the users have some
relatively low trusted participation in

00:10:47.610 --> 00:10:52.390
the serving infrastructure. And so you can
think of this as evolving out of things

00:10:52.390 --> 00:10:57.610
like distributed hash tables that that are
used in BitTorrent. You see something very

00:10:57.610 --> 00:11:05.760
similar in in things like ricochet or
tox.chat, which will use either tor like

00:11:05.760 --> 00:11:10.740
relays for sending messages or have an
explicit DHT for routing where all of the

00:11:10.740 --> 00:11:15.480
members provide some amount of lookup to
help with discovery

00:11:15.480 --> 00:11:18.424
and finding other participants.

00:11:19.810 --> 00:11:24.870
OK, so let's now turn to
some of these mechanisms that we've

00:11:24.870 --> 00:11:31.160
uncovered and we can start with
encryption. So when you're sending

00:11:31.160 --> 00:11:37.030
messages to a server by default, there's
no encryption. This is things like IRC.

00:11:37.030 --> 00:11:43.300
Email used to be primarily unencrypted and
you can think of that like a postcard. So

00:11:43.300 --> 00:11:47.560
you've got a letter or a postcard in this
case that you're sending. It has where

00:11:47.560 --> 00:11:53.060
that message is coming from, where it's
going to and the contents. In contrast,

00:11:53.060 --> 00:11:58.059
when you use transport encryption -- and
so this is now a standard for most of the

00:11:58.059 --> 00:12:01.080
centralized things. What that means is
you're taking that postcard and you're

00:12:01.080 --> 00:12:06.200
putting it in an envelope that the network
can't open. And that's what TLS and other

00:12:06.200 --> 00:12:11.709
forms of transport encryption are going to
give you, is the network link just sees

00:12:11.709 --> 00:12:15.790
the source and destination. It sees there's
a message coming between Alice and

00:12:15.790 --> 00:12:19.779
Facebook or whatever cloud provider, but
can't look into that and see that that's

00:12:19.779 --> 00:12:23.950
really a message for Bob or what's being
said. It just sees individuals

00:12:23.950 --> 00:12:30.570
communicating with that cloud provider.
And so, you know, SMTPS, there are secure

00:12:30.570 --> 00:12:35.570
versions of IRC and e-mail and most other
protocols are using transport security at

00:12:35.570 --> 00:12:41.730
this point. The thing that we have now is
called end-to-end encryption or E2E, and so

00:12:41.730 --> 00:12:48.880
now the difference here is the message
that Alice is sending is addressed to Bob.

00:12:48.880 --> 00:12:53.640
And it's encrypted so that the provider
Facebook can't open that either and can't

00:12:53.640 --> 00:13:00.370
look at the contents. OK? So the network
just sees a message going between Alice

00:13:00.370 --> 00:13:04.270
and Facebook still, but Facebook can't
open that and actually see the contents of

00:13:04.270 --> 00:13:11.690
the message. And so end-to-end encryption
has gained pretty widespread adoption. We

00:13:11.690 --> 00:13:16.330
have this in Signal, for the most part in
iMessage, we have tools like PGP and GPG

00:13:16.330 --> 00:13:21.040
that are implementing forms of this. For
messaging there's a few that are worth

00:13:21.040 --> 00:13:26.350
sort of covering in the space: the Signal
protocol, which was initially called

00:13:26.350 --> 00:13:34.420
axolotl, is adopted in WhatsApp, in
Facebook private messaging and sort of

00:13:34.420 --> 00:13:43.170
is... I guess it has generalized into
something called the noise framework and

00:13:43.170 --> 00:13:50.161
is gaining a lot of adoption. OMEMO looks
a lot like that specifically for XMPP, and

00:13:50.161 --> 00:13:56.310
so it is a specific implementation. The
other one is called Off-The-Record or OTR

00:13:56.310 --> 00:14:03.519
and Off-The-Record sort of developed a
little bit ... or independently from this,

00:14:03.519 --> 00:14:10.480
thinks a lot about deniability. I'm not
going to go too deep into the specific

00:14:10.480 --> 00:14:14.649
nits of what these protocols are doing,
but I guess the intuition is the hard

00:14:14.649 --> 00:14:20.240
parts here is not encrypting a message,
but rather the hard parts is how do you

00:14:20.240 --> 00:14:24.170
send that first message and establish a
session, especially if the other person is

00:14:24.170 --> 00:14:28.160
offline. So I want to start a
communication. I type in the first message

00:14:28.160 --> 00:14:32.410
I'm sending to someone. I need to somehow
get a key and then send a message that

00:14:32.410 --> 00:14:37.531
only that person can read and also
establish this sort of shared secret. And

00:14:37.531 --> 00:14:41.529
doing all of that in one message or with
the other device not online ends up being

00:14:42.055 --> 00:14:48.374
tricky. Additionally, figuring out the
mapping between a user and their devices,

00:14:48.374 --> 00:14:53.437
especially as that changes and making sure
you've appropriately revoked devices,

00:14:53.437 --> 00:14:59.210
added new devices without keys falling
over or getting too many warnings to the

00:14:59.210 --> 00:15:04.609
error <i>ehm</i> too many warnings to the user
ends up being a lot of the trick in these

00:15:05.207 --> 00:15:15.463
systems. There's two problems that sort of
come into play when we start using an end.

00:15:15.463 --> 00:15:20.049
One is we need to think about connection
establishment. So, so this is the problem

00:15:20.049 --> 00:15:27.140
of saying who is Bob? So, so I find a
contact and I know them in some way by an

00:15:27.140 --> 00:15:33.880
email address, by a phone number. Signal
uses phone numbers. You know, a lot of

00:15:33.880 --> 00:15:38.470
systems maybe use an email address.
There's things like Threema that use a

00:15:38.470 --> 00:15:42.280
unique identifier that they generate for
you. But somehow I have to go from that

00:15:42.319 --> 00:15:47.727
identifier to some actual key or some
knowledge of of a cryptographic secret

00:15:47.727 --> 00:15:51.480
that identifies the other person. And I
have figure out who I trust to do that

00:15:51.480 --> 00:15:59.024
mapping of of gaining this thing that I'm
now using for encryption. And then also

00:15:59.080 --> 00:16:04.486
there's this "Well, how do we match?" So a
lot of systems do this by uploading your

00:16:04.486 --> 00:16:10.420
address book or trying to match with
existing contacts to solve the user-

00:16:10.455 --> 00:16:16.150
interface problem of discovery, which is:
If they can already know the identifiers

00:16:16.275 --> 00:16:20.179
and have this mapping, then when someone
new comes in they can suggest and have

00:16:20.256 --> 00:16:24.760
"prefound" these keys and you just sort of
trust the server to hold this address book

00:16:24.972 --> 00:16:28.500
and to do this mapping between what
they're using as their identifier and and

00:16:28.709 --> 00:16:33.860
the keys themselves that you're getting
out. Signal is nice here, it says it's not

00:16:34.114 --> 00:16:38.850
uploading your contacts, which is true.
They're uploading hashes of your phone

00:16:38.850 --> 00:16:43.430
number rather than the actual phone
numbers. But but it's a similar thing.

00:16:43.430 --> 00:16:48.910
They've got a directory of known phone
numbers. And then as people search, you'll

00:16:49.041 --> 00:16:54.680
search for a hash of the phone number and
get back, you know, the key that you hope

00:16:54.680 --> 00:17:01.470
signal has correctly given you. So there's
sort of a couple of ways that you reduce

00:17:01.661 --> 00:17:09.850
your trust here. Signal has been going
down a path using SGX to raise the cost of

00:17:09.951 --> 00:17:16.408
attacks, oblivious RAM and a bunch of sort
of systems mechanisms to reduce the

00:17:16.408 --> 00:17:22.319
costs... or increase the cost of attack
against their discovery mechanism. The

00:17:22.361 --> 00:17:27.089
other way that you do this is you allow
for people to use pseudonyms or anonymous

00:17:27.089 --> 00:17:32.439
identifiers. So wire you can just register
on an anonymous email address. And now the

00:17:32.439 --> 00:17:37.710
cost to you is potentially less if that
gets compromised. And it's worth noting

00:17:37.790 --> 00:17:42.950
Moxie will be talking tomorrow at 4:00
p.m. about the evolution of the space

00:17:42.950 --> 00:17:50.114
around Signal, so there's probably a bunch
more depth there that you can expect. So

00:17:50.180 --> 00:17:54.590
what if we don't want to trust the server
to do matchmaking? One of the early things

00:17:54.590 --> 00:18:00.400
that has been around is the web of trust
around GPG. And this is the notion that.

00:18:00.400 --> 00:18:09.015
I, if I have in real life or otherwise
associated an identifier with a key, I can

00:18:09.015 --> 00:18:15.770
publicly provide a signed statement saying
that I trust that mapping and then people

00:18:15.830 --> 00:18:21.910
who don't know someone but have a link
socially maybe can find these proofs and

00:18:21.910 --> 00:18:27.460
use that to trust this mapping. So I know
an identifier and I know that I trust

00:18:27.460 --> 00:18:32.429
someone who has said, well, this is the
key associate with that identifier and I

00:18:32.429 --> 00:18:37.360
can use that network to eventually find an
identifier that that I'm willing to trust

00:18:37.360 --> 00:18:44.226
or a key that I'm willing to encrypt to.
There's some user interface tradeoff here.

00:18:44.226 --> 00:18:49.960
This is a manual process in general. And
this year we've had a set of denial-of-

00:18:49.960 --> 00:18:56.070
service attacks on the web-of-trust
infrastructure. And so the the specific

00:18:56.070 --> 00:19:03.830
attack is that anyone can upload these
attestations or trust, and so if a bunch

00:19:03.830 --> 00:19:08.140
of random users or sybils start uploading
trusts, when you go to try and download

00:19:08.140 --> 00:19:12.480
this, you end up overwhelmed by the amount
of information. And so the system does not

00:19:12.480 --> 00:19:17.330
scale because it's very hard to filter to
people you care about without telling the

00:19:17.330 --> 00:19:20.200
system who you care about and revealing
your network, which you're trying to

00:19:20.200 --> 00:19:29.180
avoid. Keybase takes another approach.
They made the observation that when I go

00:19:29.180 --> 00:19:34.630
to try and talk to someone, what I
actually care about is the person that I

00:19:34.630 --> 00:19:40.520
believe owns a specific GitHub or Twitter
or other social profile. And so I can

00:19:40.520 --> 00:19:44.870
provide an attestation where I say: "Well,
this is a key that's associated with the

00:19:44.870 --> 00:19:50.540
account that controls this Twitter account
or this Reddit account or this, you know,

00:19:50.540 --> 00:19:55.890
Facebook account." And so by having that
trust of proofs, I can connect an

00:19:55.890 --> 00:20:00.350
individual and a cryptographic identity
with the person behind who has the

00:20:00.350 --> 00:20:08.160
passwords to a set of other systems.
Keybase also this year began to provide a

00:20:08.160 --> 00:20:13.357
monetary incentive for users and then
struggled with the number of sign ups. And

00:20:13.357 --> 00:20:17.150
so there's a lot of work in figuring out:
"OK, do these identities actually

00:20:17.150 --> 00:20:21.920
correspond to real people and how do you
prevent a similar denial-of-service--style

00:20:21.977 --> 00:20:30.910
attack that the web of trust faced in
identifying things here?" On our devices,

00:20:30.910 --> 00:20:37.655
we end up in general resorting to a
concept called tofu or Trust-On-First-Use,

00:20:37.655 --> 00:20:43.010
and what that means is when I first see a
key that identifies someone, I'll save

00:20:43.010 --> 00:20:47.760
that. And if I ever get another need to
communicate with that person again, I've

00:20:47.760 --> 00:20:50.850
already got a key and I can keep using
that same key and expect that key to stay

00:20:50.850 --> 00:20:56.419
the same. And so that that continuation
and the ability to pin keys once you've

00:20:56.419 --> 00:21:00.790
seen them means that if when you first
establish a connection with someone, it's

00:21:00.790 --> 00:21:04.750
the real person, then someone who
compromises them later can't take over or

00:21:04.750 --> 00:21:14.360
change that. Finally, one of the sort of
exciting things that came out - this is

00:21:14.360 --> 00:21:21.049
circa 2015 and is largely defunct now -
was a system by Adam Langley called Pond

00:21:21.049 --> 00:21:27.790
that looked at hardening a modern version
of email. And one of the things that Pond

00:21:27.790 --> 00:21:33.470
did was it had something called a password
authenticated key exchange. And so this is

00:21:33.470 --> 00:21:40.220
an evolving cryptographic area where
you're saying if two people can start with

00:21:40.220 --> 00:21:48.140
some weak shared secret - So I can perhaps
publicly or in plain text ask the

00:21:48.140 --> 00:21:53.483
challenge, the other person: "Where were
we at a specific day?" And so now we both

00:21:53.483 --> 00:21:57.450
know something that maybe has a few bits
of entropy, at least. If we can write the

00:21:57.915 --> 00:22:04.700
same textual answer, we can take that, run
a key derivation function to end up with a

00:22:04.700 --> 00:22:09.620
larger amount of shared entropy and use
that as a bootstrapping method to do a key

00:22:09.620 --> 00:22:13.120
exchange and end up finding a strong
cryptographic identity for the other

00:22:13.120 --> 00:22:22.049
person. So Pond has a system that they
call Panda for linking to individuals

00:22:22.049 --> 00:22:25.960
based on a challenge response and this is
also something that you'll find in off-

00:22:25.960 --> 00:22:32.309
the-record systems around Jabber. The
other thing that we need to be careful

00:22:32.309 --> 00:22:37.750
about in end-to-end--encrypted systems is
deniability. When I'm chatting one on one

00:22:37.750 --> 00:22:46.049
with someone, that conversation is
eventually fairly deniable. Either a

00:22:46.049 --> 00:22:49.929
person can have their recollection of what
happened and there's no proof that the

00:22:49.929 --> 00:22:55.220
other person said something unless you've
recorded it or otherwise, you know,

00:22:55.220 --> 00:22:58.490
brought some other technology into play.
But with an encrypted thing where I've

00:22:58.490 --> 00:23:02.790
authenticated the other person, I end up
with a transcript - potentially - that,

00:23:02.790 --> 00:23:09.400
you know, I can turn over later and say,
look, this person said this. And and, you

00:23:09.400 --> 00:23:13.419
know, we've seen recently that things like
emails that come out are authenticated in

00:23:13.419 --> 00:23:20.610
this way. The DKIM system that
authenticates email senders showed up in

00:23:20.610 --> 00:23:26.554
the WikiLeak's releases of Hillary
Clinton's emails and was able to say:

00:23:26.554 --> 00:23:30.099
"Look the text in these hasn't been
changed." And it was signed by the real

00:23:30.099 --> 00:23:36.351
server that we would expect. So the thing
that we get from Off-The-Record and the

00:23:36.351 --> 00:23:42.076
Signal protocol is something called
deniability or reputability. And this

00:23:42.076 --> 00:23:48.004
plays into a concept of a forward secrecy,
which is: We're going to sort of throw

00:23:48.004 --> 00:23:54.590
away stuff afterwards in a way that our
chat goes back to being more ephemeral.

00:23:54.590 --> 00:23:57.620
And so we can think about this in two
ways. There's actually two properties that

00:23:57.620 --> 00:24:03.470
interlink in this: We have keys that we're
using to form our shared session that

00:24:03.470 --> 00:24:10.620
we're expecting to use to have our secret
message. And each time I send a message,

00:24:10.814 --> 00:24:15.980
I'm going to also provide some new key
material and begin changing that secret

00:24:16.152 --> 00:24:21.540
key that we're using. So I provide a next
key. And when Bob replies, he's going to

00:24:21.540 --> 00:24:26.645
now use my next key as part of that and
give me his next key. And the other thing

00:24:26.645 --> 00:24:31.295
that I can then do is when I send a
message, I can provide the secret bit of

00:24:31.295 --> 00:24:35.220
my previous key. So I can say: "My last
private key that I used to send you that

00:24:35.220 --> 00:24:41.190
previous message was this." And now at the
end of our conversation, we both know all

00:24:41.190 --> 00:24:46.030
of the private keys such that we both
could have created that whole conversation

00:24:46.030 --> 00:24:53.610
on our own computer. At any given time,
it's only the most recent message that is

00:24:53.610 --> 00:24:57.190
that only could have been sent by the
other person and the rest of the

00:24:57.190 --> 00:25:03.570
transcript that you have is something you
could have generated yourself. There is a

00:25:03.570 --> 00:25:07.980
talk on day three about Off-The-Record v4,
the fourth version of that, that will go

00:25:07.980 --> 00:25:13.650
deeper into that, that's at 9:00 p.m. in
the about:freedom assembly. So I encourage

00:25:13.650 --> 00:25:20.090
you to do that if you're interested in
this. OK. The next one to talk about is

00:25:20.090 --> 00:25:28.150
expiry. This is sort of a follow on to
this concept of forward secrecy. But

00:25:28.178 --> 00:25:31.720
there's sort of two attacks here to
consider. One is something that we should

00:25:31.720 --> 00:25:37.299
maybe, I guess, give credit to Snapchat
for popularizing, which is this concept of

00:25:37.299 --> 00:25:42.200
"the message goes away after some amount
of time". And really, this is protecting

00:25:42.200 --> 00:25:46.289
against not fully trusting the other
person from like sharing it later or

00:25:46.289 --> 00:25:51.419
sharing in a way you didn't attend <i>ehm</i>
intent. And this is also like a snapshot

00:25:51.419 --> 00:25:57.190
adversary. So a bunch of apps will alert
the other participant if you take a

00:25:57.190 --> 00:26:01.610
screenshot. This is why some apps will
blank the screen when they go to the task

00:26:01.610 --> 00:26:07.919
switcher. So if you're swapping between
apps, you'll see that some of your

00:26:07.919 --> 00:26:12.210
applications will just show a blank screen
or will not show contents. And that's

00:26:12.210 --> 00:26:16.030
because the mobile operating systems APIs
don't tell them when you're in that mode

00:26:16.030 --> 00:26:19.090
when you take a screenshot and so they
want to just be able to notify you if the

00:26:19.090 --> 00:26:23.500
other person does. It's worth noting that
this is all just raising the cost of these

00:26:23.500 --> 00:26:27.720
attacks and providing sort of a social
incentive not to, right. I can still use

00:26:27.720 --> 00:26:31.740
another camera to take a picture of my
phone and get evidence of something that

00:26:31.740 --> 00:26:39.200
has been said. But it's discouraging it
and setting social norms. The other reason

00:26:39.200 --> 00:26:44.289
for expiry is: After the fact, a
compromise of a device, so whether that's

00:26:44.289 --> 00:26:49.190
- you know, someone gets hold of the
device and tries to do forensic analysis

00:26:49.190 --> 00:26:54.539
to pull off previous messages or the chat
database or whether someone tries to

00:26:54.539 --> 00:26:59.770
install an application that then scans
through your phone... So that's Fengcai is

00:26:59.770 --> 00:27:05.549
a application that's been installed as a
surveillance app in China. And this also

00:27:05.549 --> 00:27:10.006
boils down to a user interface and user
experience question, which is how longer

00:27:10.006 --> 00:27:13.480
you're going to save logs, how much
history are you going to save and what

00:27:13.480 --> 00:27:19.493
norms are you going to have? And there's
there's a tradeoff here. It's useful

00:27:19.493 --> 00:27:24.549
sometimes to scroll back. And especially
for companies that believe that they have

00:27:24.549 --> 00:27:31.560
value added services around being able to
do data analytics on your chat history.

00:27:31.560 --> 00:27:40.140
They're wary of getting rid of that. The
next thing that we have is isolation and

00:27:40.140 --> 00:27:47.650
OS sand boxing. Right. So this is a lot of
this is up one layer, which is what is the

00:27:47.650 --> 00:27:53.049
operating system doing to secure your
application, your chat system from the

00:27:53.049 --> 00:27:58.650
other things, the malware or the
compromises of the the broader device that

00:27:58.650 --> 00:28:06.750
it's running on. We have a bunch of
projects around us at Congress that are

00:28:06.750 --> 00:28:11.440
innovating on this. There are chat systems
that also attempt to do this sort of on

00:28:11.440 --> 00:28:16.270
their own. One sort of extreme example is
called tinfoil chat, which makes use of

00:28:16.270 --> 00:28:21.240
three devices and a physical diode which
is designed to have one device that is

00:28:21.240 --> 00:28:25.600
sending messages and another device that
is receiving messages. And the thought is:

00:28:25.600 --> 00:28:30.559
if you receive a message that somehow
compromises the device, the malware or the

00:28:30.559 --> 00:28:36.580
malicious file can never get any
communication back out and so becomes much

00:28:36.580 --> 00:28:41.850
less valuable to have compromised. And
they implement this with like a physical

00:28:41.850 --> 00:28:53.690
hardware diode. The other side of this is
recovery and backups. Which is you've got

00:28:53.690 --> 00:29:01.169
a user experience tradeoff between a lot
of people losing their devices and wanting

00:29:01.169 --> 00:29:04.960
to get back their contact list or their
chat history and the fact that now you're

00:29:04.960 --> 00:29:08.120
keeping this extra copy and have this
additional place for things to get

00:29:08.120 --> 00:29:15.290
compromised. Apple has done a lot of work
here that we don't look out so much. They

00:29:15.290 --> 00:29:19.640
gave a blackout talk a few years ago where
they discuss how they use custom hardware

00:29:19.640 --> 00:29:25.380
security modules in their data centers,
much like the T2 chip. In the end, devices

00:29:25.380 --> 00:29:30.990
that will hold the backup keys that get
used for their iclub backups and do

00:29:30.990 --> 00:29:36.870
similar amounts of rate limiting. And they
consider a set of - a pretty wide set of

00:29:36.870 --> 00:29:40.530
adversaries - more than we might expect.
So including things like what happens when

00:29:40.530 --> 00:29:46.520
the government comes and asks us to write
new software to compromise this? And so

00:29:46.520 --> 00:29:51.840
they set up their HSMs such that they
cannot provide software updates to them,

00:29:51.840 --> 00:29:56.900
which is, you know, a sort of a step of
how do you do this cloud security side

00:29:56.900 --> 00:30:03.650
that we don't think about as much. So
there's a set of slides that you can find

00:30:03.650 --> 00:30:09.110
from from this. And these slides will be
online, too, as a pointer to to look at

00:30:09.110 --> 00:30:14.380
their solution, which considers a large
number of adversaries that you might not

00:30:14.380 --> 00:30:28.220
have thought about. So traffic obfuscation
is primarily a network side adversary. The

00:30:28.220 --> 00:30:31.799
technique that is getting used as sort of
what people are using if they feel they

00:30:31.799 --> 00:30:37.350
need to do this, is something called
domain fronting and domain fronting, had

00:30:37.350 --> 00:30:42.510
its heyday maybe in 2014 ish and has
become somewhat less effective, but it's

00:30:42.510 --> 00:30:50.110
still effective enough for most of the
chat things. The basic idea behind domain

00:30:50.110 --> 00:30:55.440
fronting is that there's a separation of
layers behind that envelope and the

00:30:55.440 --> 00:31:02.240
message inside of it that we get in HTTP
in the Web. So when I create a secure

00:31:02.240 --> 00:31:09.059
connection to a CDN to a content provider
like Amazon or Google or Microsoft, I can

00:31:09.059 --> 00:31:14.030
make that connection and do perform the
security layer and provide a fairly

00:31:14.030 --> 00:31:19.100
generic service that I'm connecting to. I
just want to establish a secure connection

00:31:19.100 --> 00:31:23.580
to CloudFlare. And then once I've done
that, the message that I can send inside

00:31:23.580 --> 00:31:27.399
can be a chat message to a specific
customer of that CDN or that cloud

00:31:27.399 --> 00:31:35.440
provider. And so this is an effective way
to prevent the network from knowing what

00:31:35.440 --> 00:31:41.659
specific service you're accessing. It got
used for a bunch of circumvention things.

00:31:41.659 --> 00:31:45.620
It then got used for a bunch of malware
things and this caused a bunch of the

00:31:45.620 --> 00:31:52.480
cloud providers to stop allowing you to do
this. But it's still getting used. This is

00:31:52.480 --> 00:31:56.330
still what sort of happening when you turn
on certain censorship circumvention in

00:31:56.330 --> 00:32:01.770
signal, it's what telegram is using for
the most part. And it's the same basic

00:32:01.770 --> 00:32:08.290
technique is getting another revival with
DNS over HTTPS and encrypted SNI

00:32:08.290 --> 00:32:15.300
extensions to TLS which allow for a
standardized approach to establish a

00:32:15.300 --> 00:32:19.760
connection to a service without providing
any specific identifiers to the network

00:32:19.760 --> 00:32:26.159
for which service you want to connect to.
It's worth sort of mentioning that

00:32:26.159 --> 00:32:33.640
probably the most active chat service for
this sort of obfuscation or circumvention

00:32:33.640 --> 00:32:39.380
is telegram, which has a bunch of users in
countries that are not fans of having lots

00:32:39.380 --> 00:32:44.929
of users of telegram. And so they have
both systems where they can bounce between

00:32:44.929 --> 00:32:49.343
IPs very quickly and change where their
servers appear to be. And they've also

00:32:49.343 --> 00:32:55.299
used techniques like sending messages over
DNS tunnels to mitigate some of these

00:32:55.299 --> 00:33:01.510
censorship things From the provider's
perspectives this is really accessing

00:33:01.510 --> 00:33:05.700
their user population. They're not really
thinking about your local network or

00:33:05.700 --> 00:33:09.220
caring about that as much as as much as
they are like, oh, there's millions of

00:33:09.220 --> 00:33:16.570
users that should probably still have
access to us. So we can maybe hide the

00:33:16.570 --> 00:33:21.809
characteristics of traffic in terms of
what specific service we're connecting.

00:33:21.809 --> 00:33:25.840
There's some other things about traffic,
though, that also are revealing to the

00:33:25.840 --> 00:33:28.990
network. And this is sort of this
additional metadata that we need to think

00:33:28.990 --> 00:33:36.039
about. So one of these is padding or the
size of messages can be revealing. So one

00:33:36.039 --> 00:33:39.350
sort of immediate thing is the size of a
chat or a text message is going to be very

00:33:39.350 --> 00:33:45.700
different from the size of an image or
voice or movies. And you see this on

00:33:45.700 --> 00:33:49.059
airplanes or in other bandwidth limited
settings: they might allow text messages

00:33:49.059 --> 00:33:56.270
to go through, but images won't. There's
been research that shows, for instance, on

00:33:56.270 --> 00:34:02.840
voice, even if I encrypt my voice, we've
actually gotten really good at compressing

00:34:02.840 --> 00:34:07.580
audio of human speech. So much so that
different phonemes, different sounds that

00:34:07.580 --> 00:34:13.799
we make take up different sizes. And so I
can say something, compress it, encrypt it

00:34:13.799 --> 00:34:20.169
and then recover what was said based on
the relative sizes of different sounds. So

00:34:20.169 --> 00:34:25.240
there was there was a paper in 2011 that
Oakland S&amp;P that demonstrated this

00:34:25.240 --> 00:34:33.159
potential for attacks. And so what this is
telling us perhaps is that there's a

00:34:33.159 --> 00:34:39.639
tradeoff between how efficiently I want to
send things and how much metadata or

00:34:39.639 --> 00:34:44.760
revealing information for distinguishing
them I'm giving up. So I can use a less

00:34:44.760 --> 00:34:49.579
efficient compression that's constant bit
rate or that otherwise is not revealing

00:34:49.579 --> 00:34:52.469
this information, but it has higher
overhead and won't work as well in

00:34:52.469 --> 00:34:58.539
constrained network environments. The
other place this shows up is just when

00:34:58.539 --> 00:35:04.839
people are active. So if I can look at
when someone is tweeting or when messages

00:35:04.839 --> 00:35:10.785
are sent, I can probably figure out pretty
quickly what timezone they're in. Right.

00:35:10.785 --> 00:35:17.009
And so this leads to a whole set of these
metadata based attacks. And in particular,

00:35:17.009 --> 00:35:21.509
there's confirmation attacks and
intersection attacks. And so intersection

00:35:21.509 --> 00:35:26.780
attacks is looking at the relative
activity of multiple people and trying to

00:35:26.780 --> 00:35:32.519
figure out: OK, when Alice sent a message,
who else was online or active at the same

00:35:32.519 --> 00:35:37.190
time? And over time, can I narrow down or
filter to specific people that were likely

00:35:37.190 --> 00:35:45.269
who Alice was talking to? Pond also is a
service to look at or a system to look at

00:35:45.269 --> 00:35:51.969
in this regard. Their approach was that a
client would hopefully be always be online

00:35:51.969 --> 00:35:57.609
and would at a regular pattern check in
with the server with the same amount of

00:35:57.609 --> 00:36:01.980
data, regardless of whether there was a
real message to send or not. So that from

00:36:01.980 --> 00:36:07.089
the network's perspective, every user
looked the same. The downside being that

00:36:07.089 --> 00:36:12.579
you've now got this message being sent by
every client every minute or so and that

00:36:12.579 --> 00:36:19.300
creates a huge amount of overhead of, you
know, just padded data that doesn't have

00:36:19.300 --> 00:36:27.559
any meaning. So finally, I'll take a look
at server hardening and the things that

00:36:27.559 --> 00:36:33.259
we're doing to reduce trust in the server.
There's a few examples of why we would

00:36:33.259 --> 00:36:37.759
want to do this. So one is that you've had
messaging servers, plenty of times, that

00:36:37.759 --> 00:36:46.690
have not been as secure as they claim. One
example being that there was a period

00:36:46.690 --> 00:36:52.739
where the Skype subsidiary in China was
using a blacklist of keywords on the

00:36:52.739 --> 00:36:57.779
server to either prevent or intercept some
subset of their users messages without

00:36:57.779 --> 00:37:03.650
telling anyone that they were doing that.
And then also just sort of this uncertain

00:37:03.650 --> 00:37:07.999
future of, OK, I trust the data now, but
what can we do so that I don't worry about

00:37:07.999 --> 00:37:14.890
what the corporate future of this service
entails for my data. One of the sort of

00:37:14.890 --> 00:37:20.670
elephants in the room is: the software
development is probably pretty

00:37:20.670 --> 00:37:25.339
centralized. So even if I don't trust the
server, there's some pretty small number

00:37:25.339 --> 00:37:29.180
of developers who are writing the code.
And how do I trust that the updates that

00:37:29.180 --> 00:37:33.229
they are making to this, either the server
or to my client that they pushed my client

00:37:33.229 --> 00:37:39.339
isn't reducing my security. Open source is
a great start to mitigating that, but it's

00:37:39.339 --> 00:37:45.581
certainly not solving all of this. So one
thing, one way we can think about how we

00:37:45.581 --> 00:37:49.749
reduce trust in the server is by looking
at what the server knows after end to end

00:37:49.775 --> 00:37:53.969
encryption. It knows things about the
size. It knows where the message is coming

00:37:53.969 --> 00:37:58.385
from. It knows where the message is going
to. Size: we've talked about some of these

00:37:58.385 --> 00:38:03.300
padding things that we can use to
mitigate. So how do we reduce the amount

00:38:03.300 --> 00:38:06.720
of information about sources and
destinations in this network graph that

00:38:06.720 --> 00:38:13.240
the server knows? So this is a concept
called linkability, which is being able to

00:38:13.240 --> 00:38:21.690
link the source and destination of a
message. We start to see some mitigations

00:38:21.690 --> 00:38:27.640
or approaches to reducing linkability
entering mainstream systems. So Signal has

00:38:27.640 --> 00:38:32.260
a system called "Sealed Sender" that you
can enable, where the source of the

00:38:32.260 --> 00:38:37.489
message goes within the encrypted
envelope. So that Signal doesn't see that.

00:38:37.489 --> 00:38:42.089
The downside being that Signal is still
seeing your IP address but the thought is

00:38:42.089 --> 00:38:46.559
that they will throw those out relatively
quickly and so they will have less logs

00:38:46.559 --> 00:38:53.099
about this source to destination.
Theoretically, though, there is a bunch of

00:38:53.099 --> 00:38:59.160
work in this. The first thing I'll point
to is a set of systems that we classify as

00:38:59.160 --> 00:39:07.819
mixnets. A mixnet works by having a set of
providers rather than a single entity

00:39:07.819 --> 00:39:12.800
that's running the servers. A bunch of
users will send messages to the first

00:39:12.800 --> 00:39:16.670
provider, which will shuffle all of them
and send them to the next provider, which

00:39:16.670 --> 00:39:20.640
will shuffle them again and send them to a
final provider that will shuffle them and

00:39:20.640 --> 00:39:25.599
then be able to send them to destinations.
And this de-links. Where none of the

00:39:25.599 --> 00:39:31.519
individual providers know both the source
and destination of these messages. So this

00:39:31.519 --> 00:39:39.750
looks maybe a bit like Tors onion routing,
but differs in in sort of a couple of

00:39:39.750 --> 00:39:44.799
technicalities. One is typically, you will
wait for some number of messages rather

00:39:44.799 --> 00:39:49.719
than just going through with bandwidth and
low latency. And so by doing that, you can

00:39:49.719 --> 00:39:53.920
get a theoretical guarantee that this
batch had at least n messages that got

00:39:53.920 --> 00:39:58.400
shuffled and therefore you can prevent
there being some time where only one user

00:39:58.400 --> 00:40:05.400
was using the system. And so you got a
stronger theoretic guarantee. There's an

00:40:05.400 --> 00:40:09.779
active project making a messaging system
using mixnets called Katzenpost. They gave

00:40:09.779 --> 00:40:14.150
a talk at Camp this summer and I'd
encourage you to look at their website or

00:40:14.150 --> 00:40:22.679
go back to that talk to learn more about
mixnets. The project that I was, I guess,

00:40:22.679 --> 00:40:26.319
tangentially helping with is in a space
called private information retrieval,

00:40:26.319 --> 00:40:33.410
which is another technique for doing this
delinking. Private information retrieval

00:40:33.410 --> 00:40:37.559
frames the question a little bit
differently. And what it asks is: if I

00:40:37.559 --> 00:40:41.669
have a server that has a database of
messages and I want a client to be able to

00:40:41.669 --> 00:40:45.539
retrieve one of those messages without the
server knowing which message the client

00:40:45.539 --> 00:40:55.199
got or asked for. So this sounds maybe
hard. I can give you a straw man to

00:40:55.199 --> 00:40:59.150
convince yourself that this is doable and
the straw man is: I can ask the server for

00:40:59.150 --> 00:41:04.069
its entire database and then take the
message that I want and the server hasn't

00:41:04.069 --> 00:41:08.349
learned anything about which message I
cared about. But I spent a lot of network

00:41:08.349 --> 00:41:13.899
bandwidth probably doing that. So there's
a couple of constructions for this. I'm

00:41:13.899 --> 00:41:20.029
going to focus on the information
theoretic private information retrieval.

00:41:20.029 --> 00:41:24.920
And so we're going to use a similar setup
to what we had in our threat model for a

00:41:24.920 --> 00:41:29.680
mixed net, which is we've got a set of
providers now that have the same database.

00:41:29.680 --> 00:41:34.869
And I'm going to assume that they're not
all talking to each other or colluding. So

00:41:34.869 --> 00:41:40.200
I just need at least one of them, to be
honest. And one of the things that we'll

00:41:40.200 --> 00:41:44.749
use here is something called the exclusive
or operation. To refresh your memory here

00:41:44.749 --> 00:41:50.711
exclusive or is a binary bitwise
operation. And the nice property that we

00:41:50.711 --> 00:41:55.949
get is if I xor something with itself, it
cancels out. So if I have some piece of

00:41:55.949 --> 00:42:02.970
data and I xor it against itself, it just
goes away. So if I have my systems that

00:42:02.970 --> 00:42:11.430
have the database, I can ask each one to
give me a superposition of some random

00:42:11.430 --> 00:42:17.249
subset of its database so I can ask the
first server, give me items for 11, 14 and

00:42:17.249 --> 00:42:23.549
20 xor together. I'm assuming all of the
items are the same size so that you can do

00:42:23.549 --> 00:42:31.069
these xors. And then if I structure that,
it can appear to each server independently

00:42:31.069 --> 00:42:35.379
or as in the request that it sees that I
just ask for some random subset. But I can

00:42:35.379 --> 00:42:39.019
do that so that when I xor the things I
get back, everything just cancels out

00:42:39.019 --> 00:42:44.009
except the item that I care about. Unless
you saw all of the requests that I made,

00:42:44.009 --> 00:42:49.140
you wouldn't be able to tell which item I
cared about. So by doing this, I've

00:42:49.140 --> 00:42:53.949
reduced the network bandwidth. I'm only
getting one item of size back from every

00:42:53.949 --> 00:43:00.050
server. Now, you might you might have a
concern that I'm asking the server to do a

00:43:00.050 --> 00:43:03.720
whole lot of work here. It has to look
through its entire database and compute

00:43:03.720 --> 00:43:09.519
this superposition thing. And that seems
potentially like a lot of work, right. The

00:43:09.519 --> 00:43:14.660
thing that I think is exciting about this
space is it turns out this sort of

00:43:14.660 --> 00:43:19.499
operation of going out to a large database
and like searching for all of the things

00:43:19.499 --> 00:43:23.759
and then coming back with a small amount
of data looks a lot like the hardware that

00:43:23.759 --> 00:43:29.510
we're building for A.I. and for a bunch of
these sorts of search like things. And so

00:43:29.510 --> 00:43:34.479
this runs really quite well on a GPU where
I can have all of those thousands of cores

00:43:34.479 --> 00:43:38.719
compute little small parts of the XOR and
then pull back this relatively small

00:43:38.719 --> 00:43:43.160
amount of information. And so with GPUs,
you can actually have databases of

00:43:43.160 --> 00:43:50.920
gigabytes, tens of gigabytes of data and
compute these XORs across all of it in

00:43:50.920 --> 00:43:59.441
order of a millisecond or less. So a
couple of things in this space. "Talek" is

00:43:59.441 --> 00:44:03.960
the system that I helped with that
demonstrates this working. The converse

00:44:03.960 --> 00:44:08.900
problem is called private information
storage. And that one is how do I write an

00:44:08.900 --> 00:44:14.150
item into a database without the database
knowing which item I wrote, the

00:44:14.150 --> 00:44:20.209
mathematical construction there is not
quite as simple to explain. But there's a

00:44:20.209 --> 00:44:26.039
pretty cool new work in the last month or
two out of Dan Boneh and Henry Corrigan-

00:44:26.039 --> 00:44:34.680
Gibbs at Stanford called Express and Saba
as first author that is showing how to

00:44:34.680 --> 00:44:44.380
fairly practically perform that operation.
I'll finish just with a couple minutes on

00:44:44.380 --> 00:44:53.299
multiparty chat or group chat, so small
groups. You've sort of got a choice here

00:44:53.299 --> 00:44:58.029
in terms of how assisted chat systems are
implementing group chat. One is you can

00:44:58.029 --> 00:45:01.759
not tell the server about the group. And
as someone who is part of the group, I

00:45:01.759 --> 00:45:05.729
just send the same message to everyone in
the group. And maybe I can tag it for them

00:45:05.729 --> 00:45:10.009
so that they know it's part of the group
or you do something more efficient where

00:45:10.009 --> 00:45:13.984
you tell the server about group membership
and I send the message once to the server

00:45:13.984 --> 00:45:22.829
and it sends it to everyone in the group.
Even if you don't tell the server about

00:45:22.829 --> 00:45:26.680
it, though, you've got a bunch of things
to worry about leaked correlation,

00:45:26.680 --> 00:45:31.979
which is: if at a single time someone
sends the same sized message to five other

00:45:31.979 --> 00:45:35.640
people and then later someone else sends
the same sized message to five other

00:45:35.640 --> 00:45:39.360
people, and those basically overlap,
someone in the network basically knows who

00:45:39.360 --> 00:45:42.839
the group membership is. So it's actually
quite difficult to conceal group

00:45:42.839 --> 00:45:48.609
membership. The other thing that breaks
down is our concept of deniability once

00:45:48.609 --> 00:45:52.929
again, which is now if multiple people
have this log. Even if both of them

00:45:52.929 --> 00:45:56.799
individually could have written it, the
fact that they have the same cryptographic

00:45:56.799 --> 00:46:04.329
keys from this other third party probably
means that third party made that message.

00:46:04.329 --> 00:46:13.119
So there continues to be work here. Signal
is working on providing again and SGX and

00:46:13.119 --> 00:46:16.510
centralized construction for grid
management to be able to scale better,

00:46:16.510 --> 00:46:21.969
given I think the pretty realistic fact
that the server in these cases is probably

00:46:21.969 --> 00:46:25.689
going to be able to figure out group
membership in some case, you might as well

00:46:25.689 --> 00:46:32.019
make it scale. On the other side, one of
the cool systems that's being prototyped

00:46:32.019 --> 00:46:39.969
is called "cwtch" out of open privacy. 
And this is an extension to ricochet that

00:46:39.969 --> 00:46:45.849
allows for offline messages and small
group chats. It works for order of 5 to 20

00:46:45.849 --> 00:46:50.700
people, and it works by having a server
that obliviously forwards on messages to

00:46:50.700 --> 00:46:55.599
everyone connected to it. So when I send a
message to a group, the server sends the

00:46:55.599 --> 00:46:59.430
message to everyone it knows about, not
just the people in the group, and

00:46:59.430 --> 00:47:03.609
therefore the server doesn't actually know
the subgroups that exist. It just knows

00:47:03.609 --> 00:47:10.549
who's connected to it. And that's a neat
way. It doesn't necessarily scale to large

00:47:10.549 --> 00:47:16.140
groups, but it allows for some concealing
of group membership. They've got an

00:47:16.140 --> 00:47:22.299
Android prototype as well that's sort of a
nice extension to make this usable.

00:47:22.299 --> 00:47:33.509
Wonderful. I guess the final thought here
is: there's a lot of systems, I'm sure I

00:47:33.509 --> 00:47:40.339
haven't mentioned all of them. But this
community is really closely tied to the

00:47:40.339 --> 00:47:46.059
innovations that are happening in the
space of private chat. And this is the

00:47:46.059 --> 00:47:49.910
infrastructure that supports communities
and is some of the most meaningful stuff

00:47:49.910 --> 00:47:55.959
you can possibly work on. And I encourage
you to find new ones and look at a bunch

00:47:55.959 --> 00:48:00.029
of them and think about the tradeoffs and
encourage friends to play with new

00:48:00.029 --> 00:48:03.650
systems, because that's how they gain
adoption and how people figure out what

00:48:03.650 --> 00:48:09.710
mechanisms do and don't work. So with
that, I will take questions.

00:48:09.710 --> 00:48:17.698
<i>Applause</i>

00:48:17.698 --> 00:48:21.379
Herald: Wasn't necessary to encourage you
to come with an applause. There are

00:48:21.379 --> 00:48:25.130
microphones that are numbered in the room,
so if you start lining up behind the

00:48:25.130 --> 00:48:29.709
microphones, then we can take your
questions. We already have a question from

00:48:29.709 --> 00:48:36.500
the Internet.
Question: Popularity and independency are

00:48:36.500 --> 00:48:42.630
a contradiction. How can I be sure that an
increasingly popular messenger like Signal

00:48:42.630 --> 00:48:50.959
stays independent?
Answer: I guess I would question whether

00:48:50.959 --> 00:48:57.720
independence is a goal in and of itself.
It's true that the value is increasing.

00:48:57.720 --> 00:49:03.449
And so one of the things I think about is,
is using systems that have open protocols

00:49:03.449 --> 00:49:07.289
or that are federated or otherwise not
centralized. And again, this is reducing

00:49:07.289 --> 00:49:13.400
that need to have confidence in the future
business model of single legal entity.

00:49:13.400 --> 00:49:20.539
But I don't know if independence is of
the company is the thing that you're

00:49:20.539 --> 00:49:25.279
trying to trade off with popularity.
Herald: Well, and we have questions at the

00:49:25.279 --> 00:49:27.630
microphones. We'll start a microphone,
number one.

00:49:27.630 --> 00:49:33.839
Question: Thanks for the talk. First of
all, we talked to you talked a lot about

00:49:33.839 --> 00:49:40.739
content and encryption. What about the
initial problem? History shows that if I'm

00:49:40.739 --> 00:49:47.229
an individual already observed in a
sensitive area, that might no need to

00:49:47.229 --> 00:49:52.750
encrypt or decrypt the message on sending.
It's already identified. I'm sending at a

00:49:52.750 --> 00:49:58.880
specific location at a specific time. Is
there any chance to hide that or do

00:49:58.880 --> 00:50:02.769
something against it?
Answer: So make things hidden again after

00:50:02.769 --> 00:50:13.069
the fact? That seems very hard. I mean,
so. So there's a couple thoughts there,

00:50:13.069 --> 00:50:20.769
maybe. There's sort of this real world
intersection attack, which is if

00:50:20.769 --> 00:50:25.230
there's a real world observable action of
who actually shows up at the protest,

00:50:25.230 --> 00:50:29.239
that's a pretty good way to figure out who
is chatting about the protests beforehand,

00:50:29.239 --> 00:50:37.299
potentially. And so, I mean, I think what
we've seen in real world organizing is

00:50:37.299 --> 00:50:42.170
things like either really decentralizing
that, where it happens across a lot of

00:50:42.170 --> 00:50:46.119
platforms, and happens very spontaneously
close to the event. So there's not enough

00:50:46.119 --> 00:50:55.740
time to respond in advance or using or
hiding your presence or otherwise trying

00:50:55.740 --> 00:51:01.039
to stagger your actual actions so that
they are harder to correlate to a specific

00:51:01.039 --> 00:51:06.849
group. But it's not something the chat
systems are talking about. I don't think.

00:51:06.849 --> 00:51:10.890
Herald: We have time for more questions.
So please line up in the microphones and

00:51:10.890 --> 00:51:15.510
if you're leaving, then leave quietly. We
have a question from microphone number 4.

00:51:15.510 --> 00:51:18.690
Question: So if network actress

00:51:18.690 --> 00:51:23.509
translation is the original sin to the end
to end principle, and due to that, we now

00:51:23.509 --> 00:51:31.309
have to run servers, someone has to pay
for it. Do you know any solution to that

00:51:31.309 --> 00:51:38.130
economic problem?
Answer: I mean, we had to pay for things

00:51:38.130 --> 00:51:42.609
even without network address translation,
but we could move more of that cost to end

00:51:42.609 --> 00:51:49.829
users. And so we have another opportunity
with IP v six to potentially keep more of

00:51:49.829 --> 00:51:53.539
the cost with end users or develop
protocols that are more decentralized

00:51:53.539 --> 00:52:00.440
where that cost stays more fairly
distributed. You know, our phones have a

00:52:00.440 --> 00:52:05.279
huge amount of computation power and
figuring out how we make our protocols so

00:52:05.279 --> 00:52:13.339
that work happens there is, I think, an
ongoing balance. I think some of the

00:52:13.339 --> 00:52:18.349
reasons why network address translation or
centralization is so common is because

00:52:18.349 --> 00:52:22.849
distribute systems are pretty hard to
build and pretty hard to gain confidence

00:52:22.849 --> 00:52:29.739
in. So more tools around how we can test
and feel like we understand and that the

00:52:29.739 --> 00:52:35.130
system actually is, you know, going to
work 99.9% of the time for distributed

00:52:35.130 --> 00:52:38.709
systems is going to make people less wary
of working with them.

00:52:38.709 --> 00:52:42.779
So better tools on distribute systems is
maybe the best answer.

00:52:42.779 --> 00:52:48.180
Herald: We also have another question from
the internet, which we'll take now.

00:52:48.180 --> 00:52:53.299
Question: What do you think of technical
novices, acceptance and dealing with OTR

00:52:53.299 --> 00:52:58.930
keys, for example, Matrix Riot? Most
people I know just click "I verified this

00:52:58.930 --> 00:53:03.419
key" even if they didn't.
Anwer: Absolutely. So this, I think

00:53:03.419 --> 00:53:07.550
goes back to a lot of these problems are
sort of a user experience tradeoff, which

00:53:07.550 --> 00:53:14.160
is, you know, we saw initial versions of
Signal where you would actually try and

00:53:14.160 --> 00:53:19.470
regularly verify some QR code between each
and then that sort of has gotten pushed

00:53:19.470 --> 00:53:24.499
back to a harder to access part of the
user interface because not many people

00:53:24.499 --> 00:53:29.120
wanted to deal with that. And an early
matrix riot you would get a lot of

00:53:29.120 --> 00:53:33.059
warnings about: There's a new device. Do
you want to verify this new device? Do you

00:53:33.059 --> 00:53:37.209
only want to send to the previous devices
that you trusted. And now you're getting

00:53:37.209 --> 00:53:41.739
the ability to sort of more automatically
just sort of accept these changes and

00:53:41.739 --> 00:53:45.429
you're weakening some amount of the
encryption security, but you're getting a

00:53:45.429 --> 00:53:49.299
better, smoother user interface because
most users are just going to sort of click

00:53:49.299 --> 00:53:52.669
"yes" because they want to send the
message. Right. And so there's this

00:53:52.669 --> 00:53:56.129
tradeoff: when you have built the
protocols such that you are standing in

00:53:56.129 --> 00:54:00.140
the way of the person doing what they want
to do. That's not really where you want to

00:54:00.140 --> 00:54:06.369
put that friction. So figuring out other
ways where you can have this on the side

00:54:06.369 --> 00:54:12.959
or supporting the communication rather
than hindering it is probably the types of

00:54:12.959 --> 00:54:16.889
user interfaces or systems that we should
be thinking about that can be successful.

00:54:16.889 --> 00:54:20.169
Herald: We have a couple of more
questions. We'll start at microphone

00:54:20.169 --> 00:54:23.820
number 3.
Question: Thank you for your talk. You

00:54:23.820 --> 00:54:28.970
talked about deniability by sending the
private key with the last message.

00:54:28.970 --> 00:54:34.339
And how I you get the private key for the
last message in the whole conversation

00:54:34.339 --> 00:54:45.119
Anwer: In the OTR, XMPP, Jabber systems
there would be an explicit action to end

00:54:45.119 --> 00:54:50.410
the conversation that would then make it
repudiateable that would that would send

00:54:50.410 --> 00:54:55.970
that final message to to close it. What
you have in things like Signal is it's

00:54:55.970 --> 00:54:59.549
actually happening every message as part
of the confirmation of the message.

00:54:59.549 --> 00:55:03.329
Question: OK. Thank you.
Herald: We still probably have questions

00:55:03.329 --> 00:55:07.439
, time for more questions. So please
line up if you have any. Don't hold back.

00:55:07.439 --> 00:55:09.549
We have a question from 
microphone number 7.

00:55:09.549 --> 00:55:14.269
Question: So, first of all, a brief
comment. The riot thing still doesn't even

00:55:14.269 --> 00:55:19.880
do tofu. They they haven't figured this
out. But I think there's a

00:55:19.880 --> 00:55:24.760
much more subtle conversation that needs
to happen around deniability, because most

00:55:24.760 --> 00:55:31.489
of the time, if you have people with with
a power imbalance, the non repudiatable

00:55:31.489 --> 00:55:36.660
conversation actually benefits the weaker
person. So we actually don't want

00:55:36.660 --> 00:55:42.729
deniability in most of our chat
applications or whatever, except that's

00:55:42.729 --> 00:55:47.390
still more subtle than that, because when
you have people with equal power, maybe

00:55:47.390 --> 00:55:54.609
you do. It's kind of weird.
Anwer: Absolutely. And I guess the other

00:55:54.609 --> 00:55:58.759
part of that is, is that something that
should be shown to users and is that a

00:55:58.759 --> 00:56:03.259
concept? Is there a way that you express
that notion in a way that users can

00:56:03.259 --> 00:56:07.910
understand it and make good choices? Or is
it just something that your system makes a

00:56:07.910 --> 00:56:13.270
choice on for all of your users?
Herald: We have one more question.

00:56:13.270 --> 00:56:17.229
Microphone number seven, please line up if
you have any more. We still have a couple

00:56:17.229 --> 00:56:19.559
of more minutes. Microphone number seven,
please.

00:56:19.559 --> 00:56:23.309
Question: Hi, Thanks for the talk. You
talked about the private information

00:56:23.309 --> 00:56:30.979
retrieval and how that would stop the
server from knowing who retrieved the

00:56:30.979 --> 00:56:36.469
message. But for me, the question is, how
do I find out in the first place which

00:56:36.469 --> 00:56:44.140
message is for me? Because if he, for
example, always use message slot 14, then

00:56:44.140 --> 00:56:53.589
obviously over a conversation, it would
again be possible to deanonymize the users

00:56:53.589 --> 00:56:58.819
in like, OK, they always accessing this
one in like all those queries.

00:56:58.819 --> 00:57:06.749
Answer: Absolutely. So I didn't explain
that part. The trick is that between the

00:57:06.749 --> 00:57:13.069
two people, we will share some secret,
which is our conversation secret. And what

00:57:13.069 --> 00:57:16.569
we will use that conversation secret for
is to seed a pseudo random number

00:57:16.569 --> 00:57:20.900
generator. And so we will be able to
generate the same stream of random

00:57:20.900 --> 00:57:27.519
numbers. And so each next message will go
at the place determined by the next item

00:57:27.519 --> 00:57:32.550
in that random number generator. And so
now the person writing can just write out

00:57:32.550 --> 00:57:36.119
random places as far as the server tells
and when it wants to write the next

00:57:36.119 --> 00:57:40.869
message in this conversation, it'll
make sure to write at that next place

00:57:40.869 --> 00:57:46.600
in its a random number generator for that 
conversation. There is a paper that will

00:57:46.600 --> 00:57:50.130
describe a bunch more of that system. 
But that's the basic sketch.

00:57:50.130 --> 00:57:53.819
A: Thank you. 
H: we have a question from the Internet.

00:57:53.819 --> 00:57:58.699
Question: It seems like identity is the
weak point of the new breed of messaging

00:57:58.699 --> 00:58:02.979
apps. How do we solve this part 
of Zooko's triangle, the need for

00:58:02.979 --> 00:58:07.680
identifiers and to find people?
Answer: Identity is hard, and I think

00:58:07.680 --> 00:58:18.279
identity has always been hard and will
continue to be hard. Having a variety of

00:58:18.279 --> 00:58:23.420
ways to be identified, I think remains
important and is why there isn't a single

00:58:23.420 --> 00:58:26.950
winner takes all system that we use for
chat. But rather you have a lot of

00:58:26.950 --> 00:58:30.720
different chat protocols that you use for
different circles and different social

00:58:30.720 --> 00:58:34.779
circles that you find yourself in. And
part of that is our desire to not be

00:58:34.779 --> 00:58:38.920
confined to a single identity, but to be
able to have different facets to our

00:58:38.920 --> 00:58:44.539
personalities. There are systems where you
can identify yourself with a unique

00:58:44.539 --> 00:58:48.449
identifier to each person you talk to
rather than having a single identity

00:58:48.449 --> 00:58:53.890
within the system. So that's something
else that Pond would use. Was that the

00:58:53.890 --> 00:58:57.989
identifier that you gave out to each
separate friend was different. And so

00:58:57.989 --> 00:59:03.710
you would appear as a totally separate
user to each of them. It turns out that's

00:59:03.710 --> 00:59:10.239
at the same time very difficult, because
if I post an identifier publicly, suddenly

00:59:10.239 --> 00:59:14.780
that identifier is now linked to me for
everyone who uses that identifier. So you

00:59:14.780 --> 00:59:18.309
have to give these out privately in a one
on one setting, which limits your

00:59:18.309 --> 00:59:22.909
discoverability. So that that concept of
how we deal with identities I think is

00:59:22.909 --> 00:59:26.679
inherently messy and inherently something
that there's not going to be something

00:59:26.679 --> 00:59:31.859
satisfying that solves.
Herald: And that was the final question

00:59:31.859 --> 00:59:35.339
concluding this talk. Please give a big
round of applause for Will Scott.

00:59:35.339 --> 00:59:36.089
Will: Thank you

00:59:36.090 --> 00:59:40.862
<i>Postroll music</i>

00:59:40.862 --> 01:00:04.000
subtitles created by c3subtitles.de
in the year 2019. Join, and help us!