<i>36C3 preroll music</i>

Herald: Please put your hands together and
give a warm round of applause to Will Scott.

<i>Applause</i>

Will Scott: Thank you.

<i>Applause</i>

Will: All right. Welcome. So. The basic
structure of this talk is sort of twofold.

The first thing is to provide an overview
of the different mechanisms that exist in

this space of secure communication and try
to tease apart a bunch of the individual

choices and tradeoffs that have to be made
and the implications of them. Because a

lot of times we talk about security or
privacy as very broad terms that cover a

bunch of individual things. And breaking
that down gives us a better way to

understand what it is we're giving up or
whether or why these decisions actually

get made for the systems that we end up
using. And the way that it's going to sort

of the arc that I'll cover is first trying
to provide a sort of taxonomy or

classification of a bunch of the different
systems that we see around us. And from

there identify the threats that we often
are trying to protect against and the

mechanisms that we have to mitigate those
threats and then go into some of these

mechanisms and look at what's happening
right now on different systems. And by the

end, we'll sort of be closer to the
research frontier of what is still

happening, where are places where we have
new ideas, but there's still quite a high

tradeoff to usability or for other reasons
where these haven't gained mass adoption.

So I'll introduce our actors: Alice and
Bob. The basic structure for pretty much

all of this is one to one messaging. So
this is primarily systems that are

enabling us to have a conversation that
looks a lot like what we would have in

person. That's sort of the thing that
we're modelling is I want to have a

somewhat synchronous real time
communication over a span of weeks,

months, years, resume it, and in the same
way that in real life I know someone and I

recognize them when I come and talk to
them again I expect the system to give me

similar sorts of properties.

So the way
we're going to then think about systems is

initially, we have systems that look very
much the same as how we would have a real

life communication, where I can - on a
local network - use AirDrop or use a bunch

of things that just work directly between
my device and a friend's device

to communicate.

On a computer, this might
look like using Netcat or a command line

tool to just push data directly to the
other person. And this actually results in

a form of communication that looks very
similar. Right, it's ephemeral, it goes

away afterwards unless the other person
saves it. But there is already a set of

adversaries or threats that we can think
about how do we secure this sort of

communication?

One of those would be the
network. So, can someone else see this

communication and how do we hide from
that? And we have mechanisms against that,

namely encryption. Right, I can
disguise my communication and encrypt it

so that someone who is not my intended
recipient cannot see what's happening.

And then the other one would be the
other...these end devices themselves.

Right, so there's a couple of things that
we need to think about when we think about

what is it that we're trying to protect
against on an end device. One is there

might be other bad software that either,
later gets installed and tries

to steal or learn about what was said.

Either, either at the same time or
afterwards.

And so we have mechanisms
there. One of them would be message

expiry. So we can make the messages go
away, make sure we delete them from disk

at some point. And the other would be
making sure that we've sort of isolated

our chats so that it doesn't overlap and
other applications can't see what's

happening there.

So, we have these direct
communication patterns but that's a small

minority of most of what we think of when
we chat. Instead, most of the systems that

we're using online use a centralized
server. There's some logically centralized

thing in the cloud and I send my messages
there and it then forwards them to my

intended recipient. And so whether it's
Facebook or WhatsApp or Signal or sorry,

Slack or IRC or Signal or Wire or Threema
or whatever, you know, cloud chat app

we're using today, this same model
applies. So we can identify additional

threats here and then we can think about
why we do this. So one threat is the

network. And I'll tear that apart a little
bit. You've got the local network that we

had before. So someone who's on the
network near the person who's sending

messages or receiving messages, so someone
else in the coffee shop, your local

organization, your school, your work,
you've got the Internet as a whole that

messages are passing over. So the ISPs or
the countries that you're in may want to

look at or prevent you from sending
messages. You've also got an adversary in

the network, sort of local or near the
server that can see most of the messages

going in and out of the server because
these services have to exist somewhere be

that in a data center that they physically
have computers in or in AWS or Google or

one of these other clouds. And now you've
got a set of actors that you need to think

about that are near the server that can
see most of the traffic going in and out

of that server.

We also have to think
about the server itself as a potential

adversary. There's a few different threats
that we need to think about. The server

could get threatened... could get hacked
or otherwise compromised. So parts of

the communication or bugs in the software
can potentially be a problem.

You've got a

legal entity typically that is running
this server. And so the jurisdiction

that it's in can send requests to get data
from users or to compel it to provide

information. So there's this whole threat
of what is the server required to turn

over. And then you've got sort of how is
the server actually or this company making

money and sustaining itself. Is it going
to get acquired by someone that you don't

trust, even if you trust it now? So
there's this future view of how do we

ensure that the messages I have now don't
get misused in the future?

And we have a

set of techniques that mitigate these
problems as well. So one of them would

be we can use traffic obfuscation or
circumvention techniques to make our

traffic look less obvious to the network.
And that prevents a large amount of these.

And then, I'm calling this server hardening
but it's really a sort of a broad set of

techniques around how do we trust the
server less? And how do we make those

potential compromises of the server,
either code based or it having to reveal

information less damaging?

It's worth
saying that there are a bunch of reasons

why we have primarily used centralized
messaging.

You've got availability. It's

very easy to go to a single place and it
also makes a bunch of problems like

handling multiple devices and mobile push
in particular, because both Google and

Apple expect or allocate sort of a single
authorized provider who can send

notifications to the app user's mobile
devices. And so that sort of requires you

to have a centralized place that knows
when to send those messages if you want to

provide real time alerts to your
application users.

The cons is that it is

both cost, there's some entity now
that is responsible for all of this cost

and has to have a business model and also
that there is a single entity that people

can come to and that now faces the legal
and regulatory issues.

So this is not the

only type of system we have, right? The
next most common is probably federated.

E-mail is a great example of this. An
email is nice that now as a user I can

choose an email provider that I trust out
of many, or if I don't trust any of the

ones that I see, I can even spin up my own
with a small group so we can decentralize

cost. We can make this more approachable.
And so while I can gain more confidence in

my individual provider, I don't have as
much trust in, you know, is the recipient,

is Bob in this case, I don't know how
secure his connection is to his provider.

Because we've separated and decentralized
that.

There's also a bunch of problems,

both in figuring out identity and
discovery securely and mobile push. But we

have a number of successful examples of
this. So beyond email, the Fediverse and

Mastodon, Riot chat and even SMS are
examples of federated systems where

there's a bunch of providers and it's not
a single central place.

As you continue

this sort of metaphor of splitting apart
and decentralizing and reducing the trust

in a single party, you end up with a set
of decentralized messaging systems as

well. And so it's worth mentioning that as
we sort of get onto this fringe. There's

sort of two types: One is using Gossip
protocols. So things like Secure

Scuttlebutt. And in those you connect to
either the people around you or people

that you know. And when you get messages,
you gossip, you send them on to all of

the people around you. And so messages
spread through the network. That is still

an area where we are learning the tradeoff
of how much metadata gets leaked and

things, but is nice in its level of
decentralization. The others basically

tried to make all of the users have some
relatively low trusted participation in

the serving infrastructure. And so you can
think of this as evolving out of things

like distributed hash tables that that are
used in BitTorrent. You see something very

similar in in things like ricochet or
tox.chat, which will use either tor like

relays for sending messages or have an
explicit DHT for routing where all of the

members provide some amount of lookup to
help with discovery

and finding other participants.

OK, so let's now turn to
some of these mechanisms that we've

uncovered and we can start with
encryption. So when you're sending

messages to a server by default, there's
no encryption. This is things like IRC.

Email used to be primarily unencrypted and
you can think of that like a postcard. So

you've got a letter or a postcard in this
case that you're sending. It has where

that message is coming from, where it's
going to and the contents. In contrast,

when you use transport encryption -- and
so this is now a standard for most of the

centralized things. What that means is
you're taking that postcard and you're

putting it in an envelope that the network
can't open. And that's what TLS and other

forms of transport encryption are going to
give you, is the network link just sees

the source and destination. It sees there's
a message coming between Alice and

Facebook or whatever cloud provider, but
can't look into that and see that that's

really a message for Bob or what's being
said. It just sees individuals

communicating with that cloud provider.
And so, you know, SMTPS, there are secure

versions of IRC and e-mail and most other
protocols are using transport security at

this point. The thing that we have now is
called end-to-end encryption or E2E, and so

now the difference here is the message
that Alice is sending is addressed to Bob.

And it's encrypted so that the provider
Facebook can't open that either and can't

look at the contents. OK? So the network
just sees a message going between Alice

and Facebook still, but Facebook can't
open that and actually see the contents of

the message. And so end-to-end encryption
has gained pretty widespread adoption. We

have this in Signal, for the most part in
iMessage, we have tools like PGP and GPG

that are implementing forms of this. For
messaging there's a few that are worth

sort of covering in the space: the Signal
protocol, which was initially called

axolotl, is adopted in WhatsApp, in
Facebook private messaging and sort of

is... I guess it has generalized into
something called the noise framework and

is gaining a lot of adoption. OMEMO looks
a lot like that specifically for XMPP, and

so it is a specific implementation. The
other one is called Off-The-Record or OTR

and Off-The-Record sort of developed a
little bit ... or independently from this,

thinks a lot about deniability. I'm not
going to go too deep into the specific

nits of what these protocols are doing,
but I guess the intuition is the hard

parts here is not encrypting a message,
but rather the hard parts is how do you

send that first message and establish a
session, especially if the other person is

offline. So I want to start a
communication. I type in the first message

I'm sending to someone. I need to somehow
get a key and then send a message that

only that person can read and also
establish this sort of shared secret. And

doing all of that in one message or with
the other device not online ends up being

tricky. Additionally, figuring out the
mapping between a user and their devices,

especially as that changes and making sure
you've appropriately revoked devices,

added new devices without keys falling
over or getting too many warnings to the

error <i>ehm</i> too many warnings to the user
ends up being a lot of the trick in these

systems. There's two problems that sort of
come into play when we start using an end.

One is we need to think about connection
establishment. So, so this is the problem

of saying who is Bob? So, so I find a
contact and I know them in some way by an

email address, by a phone number. Signal
uses phone numbers. You know, a lot of

systems maybe use an email address.
There's things like Threema that use a

unique identifier that they generate for
you. But somehow I have to go from that

identifier to some actual key or some
knowledge of of a cryptographic secret

that identifies the other person. And I
have figure out who I trust to do that

mapping of of gaining this thing that I'm
now using for encryption. And then also

there's this "Well, how do we match?" So a
lot of systems do this by uploading your

address book or trying to match with
existing contacts to solve the user-

interface problem of discovery, which is:
If they can already know the identifiers

and have this mapping, then when someone
new comes in they can suggest and have

"prefound" these keys and you just sort of
trust the server to hold this address book

and to do this mapping between what
they're using as their identifier and and

the keys themselves that you're getting
out. Signal is nice here, it says it's not

uploading your contacts, which is true.
They're uploading hashes of your phone

number rather than the actual phone
numbers. But but it's a similar thing.

They've got a directory of known phone
numbers. And then as people search, you'll

search for a hash of the phone number and
get back, you know, the key that you hope

signal has correctly given you. So there's
sort of a couple of ways that you reduce

your trust here. Signal has been going
down a path using SGX to raise the cost of

attacks, oblivious RAM and a bunch of sort
of systems mechanisms to reduce the

costs... or increase the cost of attack
against their discovery mechanism. The

other way that you do this is you allow
for people to use pseudonyms or anonymous

identifiers. So wire you can just register
on an anonymous email address. And now the

cost to you is potentially less if that
gets compromised. And it's worth noting

Moxie will be talking tomorrow at 4:00
p.m. about the evolution of the space

around Signal, so there's probably a bunch
more depth there that you can expect. So

what if we don't want to trust the server
to do matchmaking? One of the early things

that has been around is the web of trust
around GPG. And this is the notion that.

I, if I have in real life or otherwise
associated an identifier with a key, I can

publicly provide a signed statement saying
that I trust that mapping and then people

who don't know someone but have a link
socially maybe can find these proofs and

use that to trust this mapping. So I know
an identifier and I know that I trust

someone who has said, well, this is the
key associate with that identifier and I

can use that network to eventually find an
identifier that that I'm willing to trust

or a key that I'm willing to encrypt to.
There's some user interface tradeoff here.

This is a manual process in general. And
this year we've had a set of denial-of-

service attacks on the web-of-trust
infrastructure. And so the the specific

attack is that anyone can upload these
attestations or trust, and so if a bunch

of random users or sybils start uploading
trusts, when you go to try and download

this, you end up overwhelmed by the amount
of information. And so the system does not

scale because it's very hard to filter to
people you care about without telling the

system who you care about and revealing
your network, which you're trying to

avoid. Keybase takes another approach.
They made the observation that when I go

to try and talk to someone, what I
actually care about is the person that I

believe owns a specific GitHub or Twitter
or other social profile. And so I can

provide an attestation where I say: "Well,
this is a key that's associated with the

account that controls this Twitter account
or this Reddit account or this, you know,

Facebook account." And so by having that
trust of proofs, I can connect an

individual and a cryptographic identity
with the person behind who has the

passwords to a set of other systems.
Keybase also this year began to provide a

monetary incentive for users and then
struggled with the number of sign ups. And

so there's a lot of work in figuring out:
"OK, do these identities actually

correspond to real people and how do you
prevent a similar denial-of-service--style

attack that the web of trust faced in
identifying things here?" On our devices,

we end up in general resorting to a
concept called tofu or Trust-On-First-Use,

and what that means is when I first see a
key that identifies someone, I'll save

that. And if I ever get another need to
communicate with that person again, I've

already got a key and I can keep using
that same key and expect that key to stay

the same. And so that that continuation
and the ability to pin keys once you've

seen them means that if when you first
establish a connection with someone, it's

the real person, then someone who
compromises them later can't take over or

change that. Finally, one of the sort of
exciting things that came out - this is

circa 2015 and is largely defunct now -
was a system by Adam Langley called Pond

that looked at hardening a modern version
of email. And one of the things that Pond

did was it had something called a password
authenticated key exchange. And so this is

an evolving cryptographic area where
you're saying if two people can start with

some weak shared secret - So I can perhaps
publicly or in plain text ask the

challenge, the other person: "Where were
we at a specific day?" And so now we both

know something that maybe has a few bits
of entropy, at least. If we can write the

same textual answer, we can take that, run
a key derivation function to end up with a

larger amount of shared entropy and use
that as a bootstrapping method to do a key

exchange and end up finding a strong
cryptographic identity for the other

person. So Pond has a system that they
call Panda for linking to individuals

based on a challenge response and this is
also something that you'll find in off-

the-record systems around Jabber. The
other thing that we need to be careful

about in end-to-end--encrypted systems is
deniability. When I'm chatting one on one

with someone, that conversation is
eventually fairly deniable. Either a

person can have their recollection of what
happened and there's no proof that the

other person said something unless you've
recorded it or otherwise, you know,

brought some other technology into play.
But with an encrypted thing where I've

authenticated the other person, I end up
with a transcript - potentially - that,

you know, I can turn over later and say,
look, this person said this. And and, you

know, we've seen recently that things like
emails that come out are authenticated in

this way. The DKIM system that
authenticates email senders showed up in

the WikiLeak's releases of Hillary
Clinton's emails and was able to say:

"Look the text in these hasn't been
changed." And it was signed by the real

server that we would expect. So the thing
that we get from Off-The-Record and the

Signal protocol is something called
deniability or reputability. And this

plays into a concept of a forward secrecy,
which is: We're going to sort of throw

away stuff afterwards in a way that our
chat goes back to being more ephemeral.

And so we can think about this in two
ways. There's actually two properties that

interlink in this: We have keys that we're
using to form our shared session that

we're expecting to use to have our secret
message. And each time I send a message,

I'm going to also provide some new key
material and begin changing that secret

key that we're using. So I provide a next
key. And when Bob replies, he's going to

now use my next key as part of that and
give me his next key. And the other thing

that I can then do is when I send a
message, I can provide the secret bit of

my previous key. So I can say: "My last
private key that I used to send you that

previous message was this." And now at the
end of our conversation, we both know all

of the private keys such that we both
could have created that whole conversation

on our own computer. At any given time,
it's only the most recent message that is

that only could have been sent by the
other person and the rest of the

transcript that you have is something you
could have generated yourself. There is a

talk on day three about Off-The-Record v4,
the fourth version of that, that will go

deeper into that, that's at 9:00 p.m. in
the about:freedom assembly. So I encourage

you to do that if you're interested in
this. OK. The next one to talk about is

expiry. This is sort of a follow on to
this concept of forward secrecy. But

there's sort of two attacks here to
consider. One is something that we should

maybe, I guess, give credit to Snapchat
for popularizing, which is this concept of

"the message goes away after some amount
of time". And really, this is protecting

against not fully trusting the other
person from like sharing it later or

sharing in a way you didn't attend <i>ehm</i>
intent. And this is also like a snapshot

adversary. So a bunch of apps will alert
the other participant if you take a

screenshot. This is why some apps will
blank the screen when they go to the task

switcher. So if you're swapping between
apps, you'll see that some of your

applications will just show a blank screen
or will not show contents. And that's

because the mobile operating systems APIs
don't tell them when you're in that mode

when you take a screenshot and so they
want to just be able to notify you if the

other person does. It's worth noting that
this is all just raising the cost of these

attacks and providing sort of a social
incentive not to, right. I can still use

another camera to take a picture of my
phone and get evidence of something that

has been said. But it's discouraging it
and setting social norms. The other reason

for expiry is: After the fact, a
compromise of a device, so whether that's

- you know, someone gets hold of the
device and tries to do forensic analysis

to pull off previous messages or the chat
database or whether someone tries to

install an application that then scans
through your phone... So that's Fengcai is

a application that's been installed as a
surveillance app in China. And this also

boils down to a user interface and user
experience question, which is how longer

you're going to save logs, how much
history are you going to save and what

norms are you going to have? And there's
there's a tradeoff here. It's useful

sometimes to scroll back. And especially
for companies that believe that they have

value added services around being able to
do data analytics on your chat history.

They're wary of getting rid of that. The
next thing that we have is isolation and

OS sand boxing. Right. So this is a lot of
this is up one layer, which is what is the

operating system doing to secure your
application, your chat system from the

other things, the malware or the
compromises of the the broader device that

it's running on. We have a bunch of
projects around us at Congress that are

innovating on this. There are chat systems
that also attempt to do this sort of on

their own. One sort of extreme example is
called tinfoil chat, which makes use of

three devices and a physical diode which
is designed to have one device that is

sending messages and another device that
is receiving messages. And the thought is:

if you receive a message that somehow
compromises the device, the malware or the

malicious file can never get any
communication back out and so becomes much

less valuable to have compromised. And
they implement this with like a physical

hardware diode. The other side of this is
recovery and backups. Which is you've got

a user experience tradeoff between a lot
of people losing their devices and wanting

to get back their contact list or their
chat history and the fact that now you're

keeping this extra copy and have this
additional place for things to get

compromised. Apple has done a lot of work
here that we don't look out so much. They

gave a blackout talk a few years ago where
they discuss how they use custom hardware

security modules in their data centers,
much like the T2 chip. In the end, devices

that will hold the backup keys that get
used for their iclub backups and do

similar amounts of rate limiting. And they
consider a set of - a pretty wide set of

adversaries - more than we might expect.
So including things like what happens when

the government comes and asks us to write
new software to compromise this? And so

they set up their HSMs such that they
cannot provide software updates to them,

which is, you know, a sort of a step of
how do you do this cloud security side

that we don't think about as much. So
there's a set of slides that you can find

from from this. And these slides will be
online, too, as a pointer to to look at

their solution, which considers a large
number of adversaries that you might not

have thought about. So traffic obfuscation
is primarily a network side adversary. The

technique that is getting used as sort of
what people are using if they feel they

need to do this, is something called
domain fronting and domain fronting, had

its heyday maybe in 2014 ish and has
become somewhat less effective, but it's

still effective enough for most of the
chat things. The basic idea behind domain

fronting is that there's a separation of
layers behind that envelope and the

message inside of it that we get in HTTP
in the Web. So when I create a secure

connection to a CDN to a content provider
like Amazon or Google or Microsoft, I can

make that connection and do perform the
security layer and provide a fairly

generic service that I'm connecting to. I
just want to establish a secure connection

to CloudFlare. And then once I've done
that, the message that I can send inside

can be a chat message to a specific
customer of that CDN or that cloud

provider. And so this is an effective way
to prevent the network from knowing what

specific service you're accessing. It got
used for a bunch of circumvention things.

It then got used for a bunch of malware
things and this caused a bunch of the

cloud providers to stop allowing you to do
this. But it's still getting used. This is

still what sort of happening when you turn
on certain censorship circumvention in

signal, it's what telegram is using for
the most part. And it's the same basic

technique is getting another revival with
DNS over HTTPS and encrypted SNI

extensions to TLS which allow for a
standardized approach to establish a

connection to a service without providing
any specific identifiers to the network

for which service you want to connect to.
It's worth sort of mentioning that

probably the most active chat service for
this sort of obfuscation or circumvention

is telegram, which has a bunch of users in
countries that are not fans of having lots

of users of telegram. And so they have
both systems where they can bounce between

IPs very quickly and change where their
servers appear to be. And they've also

used techniques like sending messages over
DNS tunnels to mitigate some of these

censorship things From the provider's
perspectives this is really accessing

their user population. They're not really
thinking about your local network or

caring about that as much as as much as
they are like, oh, there's millions of

users that should probably still have
access to us. So we can maybe hide the

characteristics of traffic in terms of
what specific service we're connecting.

There's some other things about traffic,
though, that also are revealing to the

network. And this is sort of this
additional metadata that we need to think

about. So one of these is padding or the
size of messages can be revealing. So one

sort of immediate thing is the size of a
chat or a text message is going to be very

different from the size of an image or
voice or movies. And you see this on

airplanes or in other bandwidth limited
settings: they might allow text messages

to go through, but images won't. There's
been research that shows, for instance, on

voice, even if I encrypt my voice, we've
actually gotten really good at compressing

audio of human speech. So much so that
different phonemes, different sounds that

we make take up different sizes. And so I
can say something, compress it, encrypt it

and then recover what was said based on
the relative sizes of different sounds. So

there was there was a paper in 2011 that
Oakland S&amp;P that demonstrated this

potential for attacks. And so what this is
telling us perhaps is that there's a

tradeoff between how efficiently I want to
send things and how much metadata or

revealing information for distinguishing
them I'm giving up. So I can use a less

efficient compression that's constant bit
rate or that otherwise is not revealing

this information, but it has higher
overhead and won't work as well in

constrained network environments. The
other place this shows up is just when

people are active. So if I can look at
when someone is tweeting or when messages

are sent, I can probably figure out pretty
quickly what timezone they're in. Right.

And so this leads to a whole set of these
metadata based attacks. And in particular,

there's confirmation attacks and
intersection attacks. And so intersection

attacks is looking at the relative
activity of multiple people and trying to

figure out: OK, when Alice sent a message,
who else was online or active at the same

time? And over time, can I narrow down or
filter to specific people that were likely

who Alice was talking to? Pond also is a
service to look at or a system to look at

in this regard. Their approach was that a
client would hopefully be always be online

and would at a regular pattern check in
with the server with the same amount of

data, regardless of whether there was a
real message to send or not. So that from

the network's perspective, every user
looked the same. The downside being that

you've now got this message being sent by
every client every minute or so and that

creates a huge amount of overhead of, you
know, just padded data that doesn't have

any meaning. So finally, I'll take a look
at server hardening and the things that

we're doing to reduce trust in the server.
There's a few examples of why we would

want to do this. So one is that you've had
messaging servers, plenty of times, that

have not been as secure as they claim. One
example being that there was a period

where the Skype subsidiary in China was
using a blacklist of keywords on the

server to either prevent or intercept some
subset of their users messages without

telling anyone that they were doing that.
And then also just sort of this uncertain

future of, OK, I trust the data now, but
what can we do so that I don't worry about

what the corporate future of this service
entails for my data. One of the sort of

elephants in the room is: the software
development is probably pretty

centralized. So even if I don't trust the
server, there's some pretty small number

of developers who are writing the code.
And how do I trust that the updates that

they are making to this, either the server
or to my client that they pushed my client

isn't reducing my security. Open source is
a great start to mitigating that, but it's

certainly not solving all of this. So one
thing, one way we can think about how we

reduce trust in the server is by looking
at what the server knows after end to end

encryption. It knows things about the
size. It knows where the message is coming

from. It knows where the message is going
to. Size: we've talked about some of these

padding things that we can use to
mitigate. So how do we reduce the amount

of information about sources and
destinations in this network graph that

the server knows? So this is a concept
called linkability, which is being able to

link the source and destination of a
message. We start to see some mitigations

or approaches to reducing linkability
entering mainstream systems. So Signal has

a system called "Sealed Sender" that you
can enable, where the source of the

message goes within the encrypted
envelope. So that Signal doesn't see that.

The downside being that Signal is still
seeing your IP address but the thought is

that they will throw those out relatively
quickly and so they will have less logs

about this source to destination.
Theoretically, though, there is a bunch of

work in this. The first thing I'll point
to is a set of systems that we classify as

mixnets. A mixnet works by having a set of
providers rather than a single entity

that's running the servers. A bunch of
users will send messages to the first

provider, which will shuffle all of them
and send them to the next provider, which

will shuffle them again and send them to a
final provider that will shuffle them and

then be able to send them to destinations.
And this de-links. Where none of the

individual providers know both the source
and destination of these messages. So this

looks maybe a bit like Tors onion routing,
but differs in in sort of a couple of

technicalities. One is typically, you will
wait for some number of messages rather

than just going through with bandwidth and
low latency. And so by doing that, you can

get a theoretical guarantee that this
batch had at least n messages that got

shuffled and therefore you can prevent
there being some time where only one user

was using the system. And so you got a
stronger theoretic guarantee. There's an

active project making a messaging system
using mixnets called Katzenpost. They gave

a talk at Camp this summer and I'd
encourage you to look at their website or

go back to that talk to learn more about
mixnets. The project that I was, I guess,

tangentially helping with is in a space
called private information retrieval,

which is another technique for doing this
delinking. Private information retrieval

frames the question a little bit
differently. And what it asks is: if I

have a server that has a database of
messages and I want a client to be able to

retrieve one of those messages without the
server knowing which message the client

got or asked for. So this sounds maybe
hard. I can give you a straw man to

convince yourself that this is doable and
the straw man is: I can ask the server for

its entire database and then take the
message that I want and the server hasn't

learned anything about which message I
cared about. But I spent a lot of network

bandwidth probably doing that. So there's
a couple of constructions for this. I'm

going to focus on the information
theoretic private information retrieval.

And so we're going to use a similar setup
to what we had in our threat model for a

mixed net, which is we've got a set of
providers now that have the same database.

And I'm going to assume that they're not
all talking to each other or colluding. So

I just need at least one of them, to be
honest. And one of the things that we'll

use here is something called the exclusive
or operation. To refresh your memory here

exclusive or is a binary bitwise
operation. And the nice property that we

get is if I xor something with itself, it
cancels out. So if I have some piece of

data and I xor it against itself, it just
goes away. So if I have my systems that

have the database, I can ask each one to
give me a superposition of some random

subset of its database so I can ask the
first server, give me items for 11, 14 and

20 xor together. I'm assuming all of the
items are the same size so that you can do

these xors. And then if I structure that,
it can appear to each server independently

or as in the request that it sees that I
just ask for some random subset. But I can

do that so that when I xor the things I
get back, everything just cancels out

except the item that I care about. Unless
you saw all of the requests that I made,

you wouldn't be able to tell which item I
cared about. So by doing this, I've

reduced the network bandwidth. I'm only
getting one item of size back from every

server. Now, you might you might have a
concern that I'm asking the server to do a

whole lot of work here. It has to look
through its entire database and compute

this superposition thing. And that seems
potentially like a lot of work, right. The

thing that I think is exciting about this
space is it turns out this sort of

operation of going out to a large database
and like searching for all of the things

and then coming back with a small amount
of data looks a lot like the hardware that

we're building for A.I. and for a bunch of
these sorts of search like things. And so

this runs really quite well on a GPU where
I can have all of those thousands of cores

compute little small parts of the XOR and
then pull back this relatively small

amount of information. And so with GPUs,
you can actually have databases of

gigabytes, tens of gigabytes of data and
compute these XORs across all of it in

order of a millisecond or less. So a
couple of things in this space. "Talek" is

the system that I helped with that
demonstrates this working. The converse

problem is called private information
storage. And that one is how do I write an

item into a database without the database
knowing which item I wrote, the

mathematical construction there is not
quite as simple to explain. But there's a

pretty cool new work in the last month or
two out of Dan Boneh and Henry Corrigan-

Gibbs at Stanford called Express and Saba
as first author that is showing how to

fairly practically perform that operation.
I'll finish just with a couple minutes on

multiparty chat or group chat, so small
groups. You've sort of got a choice here

in terms of how assisted chat systems are
implementing group chat. One is you can

not tell the server about the group. And
as someone who is part of the group, I

just send the same message to everyone in
the group. And maybe I can tag it for them

so that they know it's part of the group
or you do something more efficient where

you tell the server about group membership
and I send the message once to the server

and it sends it to everyone in the group.
Even if you don't tell the server about

it, though, you've got a bunch of things
to worry about leaked correlation,

which is: if at a single time someone
sends the same sized message to five other

people and then later someone else sends
the same sized message to five other

people, and those basically overlap,
someone in the network basically knows who

the group membership is. So it's actually
quite difficult to conceal group

membership. The other thing that breaks
down is our concept of deniability once

again, which is now if multiple people
have this log. Even if both of them

individually could have written it, the
fact that they have the same cryptographic

keys from this other third party probably
means that third party made that message.

So there continues to be work here. Signal
is working on providing again and SGX and

centralized construction for grid
management to be able to scale better,

given I think the pretty realistic fact
that the server in these cases is probably

going to be able to figure out group
membership in some case, you might as well

make it scale. On the other side, one of
the cool systems that's being prototyped

is called "cwtch" out of open privacy. 
And this is an extension to ricochet that

allows for offline messages and small
group chats. It works for order of 5 to 20

people, and it works by having a server
that obliviously forwards on messages to

everyone connected to it. So when I send a
message to a group, the server sends the

message to everyone it knows about, not
just the people in the group, and

therefore the server doesn't actually know
the subgroups that exist. It just knows

who's connected to it. And that's a neat
way. It doesn't necessarily scale to large

groups, but it allows for some concealing
of group membership. They've got an

Android prototype as well that's sort of a
nice extension to make this usable.

Wonderful. I guess the final thought here
is: there's a lot of systems, I'm sure I

haven't mentioned all of them. But this
community is really closely tied to the

innovations that are happening in the
space of private chat. And this is the

infrastructure that supports communities
and is some of the most meaningful stuff

you can possibly work on. And I encourage
you to find new ones and look at a bunch

of them and think about the tradeoffs and
encourage friends to play with new

systems, because that's how they gain
adoption and how people figure out what

mechanisms do and don't work. So with
that, I will take questions.

<i>Applause</i>

Herald: Wasn't necessary to encourage you
to come with an applause. There are

microphones that are numbered in the room,
so if you start lining up behind the

microphones, then we can take your
questions. We already have a question from

the Internet.
Question: Popularity and independency are

a contradiction. How can I be sure that an
increasingly popular messenger like Signal

stays independent?
Answer: I guess I would question whether

independence is a goal in and of itself.
It's true that the value is increasing.

And so one of the things I think about is,
is using systems that have open protocols

or that are federated or otherwise not
centralized. And again, this is reducing

that need to have confidence in the future
business model of single legal entity.

But I don't know if independence is of
the company is the thing that you're

trying to trade off with popularity.
Herald: Well, and we have questions at the

microphones. We'll start a microphone,
number one.

Question: Thanks for the talk. First of
all, we talked to you talked a lot about

content and encryption. What about the
initial problem? History shows that if I'm

an individual already observed in a
sensitive area, that might no need to

encrypt or decrypt the message on sending.
It's already identified. I'm sending at a

specific location at a specific time. Is
there any chance to hide that or do

something against it?
Answer: So make things hidden again after

the fact? That seems very hard. I mean,
so. So there's a couple thoughts there,

maybe. There's sort of this real world
intersection attack, which is if

there's a real world observable action of
who actually shows up at the protest,

that's a pretty good way to figure out who
is chatting about the protests beforehand,

potentially. And so, I mean, I think what
we've seen in real world organizing is

things like either really decentralizing
that, where it happens across a lot of

platforms, and happens very spontaneously
close to the event. So there's not enough

time to respond in advance or using or
hiding your presence or otherwise trying

to stagger your actual actions so that
they are harder to correlate to a specific

group. But it's not something the chat
systems are talking about. I don't think.

Herald: We have time for more questions.
So please line up in the microphones and

if you're leaving, then leave quietly. We
have a question from microphone number 4.

Question: So if network actress

translation is the original sin to the end
to end principle, and due to that, we now

have to run servers, someone has to pay
for it. Do you know any solution to that

economic problem?
Answer: I mean, we had to pay for things

even without network address translation,
but we could move more of that cost to end

users. And so we have another opportunity
with IP v six to potentially keep more of

the cost with end users or develop
protocols that are more decentralized

where that cost stays more fairly
distributed. You know, our phones have a

huge amount of computation power and
figuring out how we make our protocols so

that work happens there is, I think, an
ongoing balance. I think some of the

reasons why network address translation or
centralization is so common is because

distribute systems are pretty hard to
build and pretty hard to gain confidence

in. So more tools around how we can test
and feel like we understand and that the

system actually is, you know, going to
work 99.9% of the time for distributed

systems is going to make people less wary
of working with them.

So better tools on distribute systems is
maybe the best answer.

Herald: We also have another question from
the internet, which we'll take now.

Question: What do you think of technical
novices, acceptance and dealing with OTR

keys, for example, Matrix Riot? Most
people I know just click "I verified this

key" even if they didn't.
Anwer: Absolutely. So this, I think

goes back to a lot of these problems are
sort of a user experience tradeoff, which

is, you know, we saw initial versions of
Signal where you would actually try and

regularly verify some QR code between each
and then that sort of has gotten pushed

back to a harder to access part of the
user interface because not many people

wanted to deal with that. And an early
matrix riot you would get a lot of

warnings about: There's a new device. Do
you want to verify this new device? Do you

only want to send to the previous devices
that you trusted. And now you're getting

the ability to sort of more automatically
just sort of accept these changes and

you're weakening some amount of the
encryption security, but you're getting a

better, smoother user interface because
most users are just going to sort of click

"yes" because they want to send the
message. Right. And so there's this

tradeoff: when you have built the
protocols such that you are standing in

the way of the person doing what they want
to do. That's not really where you want to

put that friction. So figuring out other
ways where you can have this on the side

or supporting the communication rather
than hindering it is probably the types of

user interfaces or systems that we should
be thinking about that can be successful.

Herald: We have a couple of more
questions. We'll start at microphone

number 3.
Question: Thank you for your talk. You

talked about deniability by sending the
private key with the last message.

And how I you get the private key for the
last message in the whole conversation

Anwer: In the OTR, XMPP, Jabber systems
there would be an explicit action to end

the conversation that would then make it
repudiateable that would that would send

that final message to to close it. What
you have in things like Signal is it's

actually happening every message as part
of the confirmation of the message.

Question: OK. Thank you.
Herald: We still probably have questions

, time for more questions. So please
line up if you have any. Don't hold back.

We have a question from 
microphone number 7.

Question: So, first of all, a brief
comment. The riot thing still doesn't even

do tofu. They they haven't figured this
out. But I think there's a

much more subtle conversation that needs
to happen around deniability, because most

of the time, if you have people with with
a power imbalance, the non repudiatable

conversation actually benefits the weaker
person. So we actually don't want

deniability in most of our chat
applications or whatever, except that's

still more subtle than that, because when
you have people with equal power, maybe

you do. It's kind of weird.
Anwer: Absolutely. And I guess the other

part of that is, is that something that
should be shown to users and is that a

concept? Is there a way that you express
that notion in a way that users can

understand it and make good choices? Or is
it just something that your system makes a

choice on for all of your users?
Herald: We have one more question.

Microphone number seven, please line up if
you have any more. We still have a couple

of more minutes. Microphone number seven,
please.

Question: Hi, Thanks for the talk. You
talked about the private information

retrieval and how that would stop the
server from knowing who retrieved the

message. But for me, the question is, how
do I find out in the first place which

message is for me? Because if he, for
example, always use message slot 14, then

obviously over a conversation, it would
again be possible to deanonymize the users

in like, OK, they always accessing this
one in like all those queries.

Answer: Absolutely. So I didn't explain
that part. The trick is that between the

two people, we will share some secret,
which is our conversation secret. And what

we will use that conversation secret for
is to seed a pseudo random number

generator. And so we will be able to
generate the same stream of random

numbers. And so each next message will go
at the place determined by the next item

in that random number generator. And so
now the person writing can just write out

random places as far as the server tells
and when it wants to write the next

message in this conversation, it'll
make sure to write at that next place

in its a random number generator for that 
conversation. There is a paper that will

describe a bunch more of that system. 
But that's the basic sketch.

A: Thank you. 
H: we have a question from the Internet.

Question: It seems like identity is the
weak point of the new breed of messaging

apps. How do we solve this part 
of Zooko's triangle, the need for

identifiers and to find people?
Answer: Identity is hard, and I think

identity has always been hard and will
continue to be hard. Having a variety of

ways to be identified, I think remains
important and is why there isn't a single

winner takes all system that we use for
chat. But rather you have a lot of

different chat protocols that you use for
different circles and different social

circles that you find yourself in. And
part of that is our desire to not be

confined to a single identity, but to be
able to have different facets to our

personalities. There are systems where you
can identify yourself with a unique

identifier to each person you talk to
rather than having a single identity

within the system. So that's something
else that Pond would use. Was that the

identifier that you gave out to each
separate friend was different. And so

you would appear as a totally separate
user to each of them. It turns out that's

at the same time very difficult, because
if I post an identifier publicly, suddenly

that identifier is now linked to me for
everyone who uses that identifier. So you

have to give these out privately in a one
on one setting, which limits your

discoverability. So that that concept of
how we deal with identities I think is

inherently messy and inherently something
that there's not going to be something

satisfying that solves.
Herald: And that was the final question

concluding this talk. Please give a big
round of applause for Will Scott.

Will: Thank you

<i>Postroll music</i>

subtitles created by c3subtitles.de
in the year 2019. Join, and help us!