-
36C3 preroll music
-
Herald: Please put your hands together and
give a warm round of applause to Will Scott.
-
Applause
-
Will Scott: Thank you.
-
Applause
-
Will: All right. Welcome. So. The basic
structure of this talk is sort of twofold.
-
The first thing is to provide an overview
of the different mechanisms that exist in
-
this space of secure communication and try
to tease apart a bunch of the individual
-
choices and tradeoffs that have to be made
and the implications of them. Because a
-
lot of times we talk about security or
privacy as very broad terms that cover a
-
bunch of individual things. And breaking
that down gives us a better way to
-
understand what it is we're giving up or
whether or why these decisions actually
-
get made for the systems that we end up
using. And the way that it's going to sort
-
of the arc that I'll cover is first trying
to provide a sort of taxonomy or
-
classification of a bunch of the different
systems that we see around us. And from
-
there identify the threats that we often
are trying to protect against and the
-
mechanisms that we have to mitigate those
threats and then go into some of these
-
mechanisms and look at what's happening
right now on different systems. And by the
-
end, we'll sort of be closer to the
research frontier of what is still
-
happening, where are places where we have
new ideas, but there's still quite a high
-
tradeoff to usability or for other reasons
where these haven't gained mass adoption.
-
So I'll introduce our actors: Alice and
Bob. The basic structure for pretty much
-
all of this is one to one messaging. So
this is primarily systems that are
-
enabling us to have a conversation that
looks a lot like what we would have in
-
person. That's sort of the thing that
we're modelling is I want to have a
-
somewhat synchronous real time
communication over a span of weeks,
-
months, years, resume it, and in the same
way that in real life I know someone and I
-
recognize them when I come and talk to
them again I expect the system to give me
-
similar sorts of properties.
-
So the way
we're going to then think about systems is
-
initially, we have systems that look very
much the same as how we would have a real
-
life communication, where I can - on a
local network - use AirDrop or use a bunch
-
of things that just work directly between
my device and a friend's device
-
to communicate.
-
On a computer, this might
look like using Netcat or a command line
-
tool to just push data directly to the
other person. And this actually results in
-
a form of communication that looks very
similar. Right, it's ephemeral, it goes
-
away afterwards unless the other person
saves it. But there is already a set of
-
adversaries or threats that we can think
about how do we secure this sort of
-
communication?
-
One of those would be the
network. So, can someone else see this
-
communication and how do we hide from
that? And we have mechanisms against that,
-
namely encryption. Right, I can
disguise my communication and encrypt it
-
so that someone who is not my intended
recipient cannot see what's happening.
-
And then the other one would be the
other...these end devices themselves.
-
Right, so there's a couple of things that
we need to think about when we think about
-
what is it that we're trying to protect
against on an end device. One is there
-
might be other bad software that either,
later gets installed and tries
-
to steal or learn about what was said.
-
Either, either at the same time or
afterwards.
-
And so we have mechanisms
there. One of them would be message
-
expiry. So we can make the messages go
away, make sure we delete them from disk
-
at some point. And the other would be
making sure that we've sort of isolated
-
our chats so that it doesn't overlap and
other applications can't see what's
-
happening there.
-
So, we have these direct
communication patterns but that's a small
-
minority of most of what we think of when
we chat. Instead, most of the systems that
-
we're using online use a centralized
server. There's some logically centralized
-
thing in the cloud and I send my messages
there and it then forwards them to my
-
intended recipient. And so whether it's
Facebook or WhatsApp or Signal or sorry,
-
Slack or IRC or Signal or Wire or Threema
or whatever, you know, cloud chat app
-
we're using today, this same model
applies. So we can identify additional
-
threats here and then we can think about
why we do this. So one threat is the
-
network. And I'll tear that apart a little
bit. You've got the local network that we
-
had before. So someone who's on the
network near the person who's sending
-
messages or receiving messages, so someone
else in the coffee shop, your local
-
organization, your school, your work,
you've got the Internet as a whole that
-
messages are passing over. So the ISPs or
the countries that you're in may want to
-
look at or prevent you from sending
messages. You've also got an adversary in
-
the network, sort of local or near the
server that can see most of the messages
-
going in and out of the server because
these services have to exist somewhere be
-
that in a data center that they physically
have computers in or in AWS or Google or
-
one of these other clouds. And now you've
got a set of actors that you need to think
-
about that are near the server that can
see most of the traffic going in and out
-
of that server.
-
We also have to think
about the server itself as a potential
-
adversary. There's a few different threats
that we need to think about. The server
-
could get threatened... could get hacked
or otherwise compromised. So parts of
-
the communication or bugs in the software
can potentially be a problem.
-
You've got a
-
legal entity typically that is running
this server. And so the jurisdiction
-
that it's in can send requests to get data
from users or to compel it to provide
-
information. So there's this whole threat
of what is the server required to turn
-
over. And then you've got sort of how is
the server actually or this company making
-
money and sustaining itself. Is it going
to get acquired by someone that you don't
-
trust, even if you trust it now? So
there's this future view of how do we
-
ensure that the messages I have now don't
get misused in the future?
-
And we have a
-
set of techniques that mitigate these
problems as well. So one of them would
-
be we can use traffic obfuscation or
circumvention techniques to make our
-
traffic look less obvious to the network.
And that prevents a large amount of these.
-
And then, I'm calling this server hardening
but it's really a sort of a broad set of
-
techniques around how do we trust the
server less? And how do we make those
-
potential compromises of the server,
either code based or it having to reveal
-
information less damaging?
-
It's worth
saying that there are a bunch of reasons
-
why we have primarily used centralized
messaging.
-
You've got availability. It's
-
very easy to go to a single place and it
also makes a bunch of problems like
-
handling multiple devices and mobile push
in particular, because both Google and
-
Apple expect or allocate sort of a single
authorized provider who can send
-
notifications to the app user's mobile
devices. And so that sort of requires you
-
to have a centralized place that knows
when to send those messages if you want to
-
provide real time alerts to your
application users.
-
The cons is that it is
-
both cost, there's some entity now
that is responsible for all of this cost
-
and has to have a business model and also
that there is a single entity that people
-
can come to and that now faces the legal
and regulatory issues.
-
So this is not the
-
only type of system we have, right? The
next most common is probably federated.
-
E-mail is a great example of this. An
email is nice that now as a user I can
-
choose an email provider that I trust out
of many, or if I don't trust any of the
-
ones that I see, I can even spin up my own
with a small group so we can decentralize
-
cost. We can make this more approachable.
And so while I can gain more confidence in
-
my individual provider, I don't have as
much trust in, you know, is the recipient,
-
is Bob in this case, I don't know how
secure his connection is to his provider.
-
Because we've separated and decentralized
that.
-
There's also a bunch of problems,
-
both in figuring out identity and
discovery securely and mobile push. But we
-
have a number of successful examples of
this. So beyond email, the Fediverse and
-
Mastodon, Riot chat and even SMS are
examples of federated systems where
-
there's a bunch of providers and it's not
a single central place.
-
As you continue
-
this sort of metaphor of splitting apart
and decentralizing and reducing the trust
-
in a single party, you end up with a set
of decentralized messaging systems as
-
well. And so it's worth mentioning that as
we sort of get onto this fringe. There's
-
sort of two types: One is using Gossip
protocols. So things like Secure
-
Scuttlebutt. And in those you connect to
either the people around you or people
-
that you know. And when you get messages,
you gossip, you send them on to all of
-
the people around you. And so messages
spread through the network. That is still
-
an area where we are learning the tradeoff
of how much metadata gets leaked and
-
things, but is nice in its level of
decentralization. The others basically
-
tried to make all of the users have some
relatively low trusted participation in
-
the serving infrastructure. And so you can
think of this as evolving out of things
-
like distributed hash tables that that are
used in BitTorrent. You see something very
-
similar in in things like ricochet or
tox.chat, which will use either tor like
-
relays for sending messages or have an
explicit DHT for routing where all of the
-
members provide some amount of lookup to
help with discovery
-
and finding other participants.
-
OK, so let's now turn to
some of these mechanisms that we've
-
uncovered and we can start with
encryption. So when you're sending
-
messages to a server by default, there's
no encryption. This is things like IRC.
-
Email used to be primarily unencrypted and
you can think of that like a postcard. So
-
you've got a letter or a postcard in this
case that you're sending. It has where
-
that message is coming from, where it's
going to and the contents. In contrast,
-
when you use transport encryption -- and
so this is now a standard for most of the
-
centralized things. What that means is
you're taking that postcard and you're
-
putting it in an envelope that the network
can't open. And that's what TLS and other
-
forms of transport encryption are going to
give you, is the network link just sees
-
the source and destination. It sees there's
a message coming between Alice and
-
Facebook or whatever cloud provider, but
can't look into that and see that that's
-
really a message for Bob or what's being
said. It just sees individuals
-
communicating with that cloud provider.
And so, you know, SMTPS, there are secure
-
versions of IRC and e-mail and most other
protocols are using transport security at
-
this point. The thing that we have now is
called end-to-end encryption or E2E, and so
-
now the difference here is the message
that Alice is sending is addressed to Bob.
-
And it's encrypted so that the provider
Facebook can't open that either and can't
-
look at the contents. OK? So the network
just sees a message going between Alice
-
and Facebook still, but Facebook can't
open that and actually see the contents of
-
the message. And so end-to-end encryption
has gained pretty widespread adoption. We
-
have this in Signal, for the most part in
iMessage, we have tools like PGP and GPG
-
that are implementing forms of this. For
messaging there's a few that are worth
-
sort of covering in the space: the Signal
protocol, which was initially called
-
axolotl, is adopted in WhatsApp, in
Facebook private messaging and sort of
-
is... I guess it has generalized into
something called the noise framework and
-
is gaining a lot of adoption. OMEMO looks
a lot like that specifically for XMPP, and
-
so it is a specific implementation. The
other one is called Off-The-Record or OTR
-
and Off-The-Record sort of developed a
little bit ... or independently from this,
-
thinks a lot about deniability. I'm not
going to go too deep into the specific
-
nits of what these protocols are doing,
but I guess the intuition is the hard
-
parts here is not encrypting a message,
but rather the hard parts is how do you
-
send that first message and establish a
session, especially if the other person is
-
offline. So I want to start a
communication. I type in the first message
-
I'm sending to someone. I need to somehow
get a key and then send a message that
-
only that person can read and also
establish this sort of shared secret. And
-
doing all of that in one message or with
the other device not online ends up being
-
tricky. Additionally, figuring out the
mapping between a user and their devices,
-
especially as that changes and making sure
you've appropriately revoked devices,
-
added new devices without keys falling
over or getting too many warnings to the
-
error ehm too many warnings to the user
ends up being a lot of the trick in these
-
systems. There's two problems that sort of
come into play when we start using an end.
-
One is we need to think about connection
establishment. So, so this is the problem
-
of saying who is Bob? So, so I find a
contact and I know them in some way by an
-
email address, by a phone number. Signal
uses phone numbers. You know, a lot of
-
systems maybe use an email address.
There's things like Threema that use a
-
unique identifier that they generate for
you. But somehow I have to go from that
-
identifier to some actual key or some
knowledge of of a cryptographic secret
-
that identifies the other person. And I
have figure out who I trust to do that
-
mapping of of gaining this thing that I'm
now using for encryption. And then also
-
there's this "Well, how do we match?" So a
lot of systems do this by uploading your
-
address book or trying to match with
existing contacts to solve the user-
-
interface problem of discovery, which is:
If they can already know the identifiers
-
and have this mapping, then when someone
new comes in they can suggest and have
-
"prefound" these keys and you just sort of
trust the server to hold this address book
-
and to do this mapping between what
they're using as their identifier and and
-
the keys themselves that you're getting
out. Signal is nice here, it says it's not
-
uploading your contacts, which is true.
They're uploading hashes of your phone
-
number rather than the actual phone
numbers. But but it's a similar thing.
-
They've got a directory of known phone
numbers. And then as people search, you'll
-
search for a hash of the phone number and
get back, you know, the key that you hope
-
signal has correctly given you. So there's
sort of a couple of ways that you reduce
-
your trust here. Signal has been going
down a path using SGX to raise the cost of
-
attacks, oblivious RAM and a bunch of sort
of systems mechanisms to reduce the
-
costs... or increase the cost of attack
against their discovery mechanism. The
-
other way that you do this is you allow
for people to use pseudonyms or anonymous
-
identifiers. So wire you can just register
on an anonymous email address. And now the
-
cost to you is potentially less if that
gets compromised. And it's worth noting
-
Moxie will be talking tomorrow at 4:00
p.m. about the evolution of the space
-
around Signal, so there's probably a bunch
more depth there that you can expect. So
-
what if we don't want to trust the server
to do matchmaking? One of the early things
-
that has been around is the web of trust
around GPG. And this is the notion that.
-
I, if I have in real life or otherwise
associated an identifier with a key, I can
-
publicly provide a signed statement saying
that I trust that mapping and then people
-
who don't know someone but have a link
socially maybe can find these proofs and
-
use that to trust this mapping. So I know
an identifier and I know that I trust
-
someone who has said, well, this is the
key associate with that identifier and I
-
can use that network to eventually find an
identifier that that I'm willing to trust
-
or a key that I'm willing to encrypt to.
There's some user interface tradeoff here.
-
This is a manual process in general. And
this year we've had a set of denial-of-
-
service attacks on the web-of-trust
infrastructure. And so the the specific
-
attack is that anyone can upload these
attestations or trust, and so if a bunch
-
of random users or sybils start uploading
trusts, when you go to try and download
-
this, you end up overwhelmed by the amount
of information. And so the system does not
-
scale because it's very hard to filter to
people you care about without telling the
-
system who you care about and revealing
your network, which you're trying to
-
avoid. Keybase takes another approach.
They made the observation that when I go
-
to try and talk to someone, what I
actually care about is the person that I
-
believe owns a specific GitHub or Twitter
or other social profile. And so I can
-
provide an attestation where I say: "Well,
this is a key that's associated with the
-
account that controls this Twitter account
or this Reddit account or this, you know,
-
Facebook account." And so by having that
trust of proofs, I can connect an
-
individual and a cryptographic identity
with the person behind who has the
-
passwords to a set of other systems.
Keybase also this year began to provide a
-
monetary incentive for users and then
struggled with the number of sign ups. And
-
so there's a lot of work in figuring out:
"OK, do these identities actually
-
correspond to real people and how do you
prevent a similar denial-of-service--style
-
attack that the web of trust faced in
identifying things here?" On our devices,
-
we end up in general resorting to a
concept called tofu or Trust-On-First-Use,
-
and what that means is when I first see a
key that identifies someone, I'll save
-
that. And if I ever get another need to
communicate with that person again, I've
-
already got a key and I can keep using
that same key and expect that key to stay
-
the same. And so that that continuation
and the ability to pin keys once you've
-
seen them means that if when you first
establish a connection with someone, it's
-
the real person, then someone who
compromises them later can't take over or
-
change that. Finally, one of the sort of
exciting things that came out - this is
-
circa 2015 and is largely defunct now -
was a system by Adam Langley called Pond
-
that looked at hardening a modern version
of email. And one of the things that Pond
-
did was it had something called a password
authenticated key exchange. And so this is
-
an evolving cryptographic area where
you're saying if two people can start with
-
some weak shared secret - So I can perhaps
publicly or in plain text ask the
-
challenge, the other person: "Where were
we at a specific day?" And so now we both
-
know something that maybe has a few bits
of entropy, at least. If we can write the
-
same textual answer, we can take that, run
a key derivation function to end up with a
-
larger amount of shared entropy and use
that as a bootstrapping method to do a key
-
exchange and end up finding a strong
cryptographic identity for the other
-
person. So Pond has a system that they
call Panda for linking to individuals
-
based on a challenge response and this is
also something that you'll find in off-
-
the-record systems around Jabber. The
other thing that we need to be careful
-
about in end-to-end--encrypted systems is
deniability. When I'm chatting one on one
-
with someone, that conversation is
eventually fairly deniable. Either a
-
person can have their recollection of what
happened and there's no proof that the
-
other person said something unless you've
recorded it or otherwise, you know,
-
brought some other technology into play.
But with an encrypted thing where I've
-
authenticated the other person, I end up
with a transcript - potentially - that,
-
you know, I can turn over later and say,
look, this person said this. And and, you
-
know, we've seen recently that things like
emails that come out are authenticated in
-
this way. The DKIM system that
authenticates email senders showed up in
-
the WikiLeak's releases of Hillary
Clinton's emails and was able to say:
-
"Look the text in these hasn't been
changed." And it was signed by the real
-
server that we would expect. So the thing
that we get from Off-The-Record and the
-
Signal protocol is something called
deniability or reputability. And this
-
plays into a concept of a forward secrecy,
which is: We're going to sort of throw
-
away stuff afterwards in a way that our
chat goes back to being more ephemeral.
-
And so we can think about this in two
ways. There's actually two properties that
-
interlink in this: We have keys that we're
using to form our shared session that
-
we're expecting to use to have our secret
message. And each time I send a message,
-
I'm going to also provide some new key
material and begin changing that secret
-
key that we're using. So I provide a next
key. And when Bob replies, he's going to
-
now use my next key as part of that and
give me his next key. And the other thing
-
that I can then do is when I send a
message, I can provide the secret bit of
-
my previous key. So I can say: "My last
private key that I used to send you that
-
previous message was this." And now at the
end of our conversation, we both know all
-
of the private keys such that we both
could have created that whole conversation
-
on our own computer. At any given time,
it's only the most recent message that is
-
that only could have been sent by the
other person and the rest of the
-
transcript that you have is something you
could have generated yourself. There is a
-
talk on day three about Off-The-Record v4,
the fourth version of that, that will go
-
deeper into that, that's at 9:00 p.m. in
the about:freedom assembly. So I encourage
-
you to do that if you're interested in
this. OK. The next one to talk about is
-
expiry. This is sort of a follow on to
this concept of forward secrecy. But
-
there's sort of two attacks here to
consider. One is something that we should
-
maybe, I guess, give credit to Snapchat
for popularizing, which is this concept of
-
"the message goes away after some amount
of time". And really, this is protecting
-
against not fully trusting the other
person from like sharing it later or
-
sharing in a way you didn't attend ehm
intent. And this is also like a snapshot
-
adversary. So a bunch of apps will alert
the other participant if you take a
-
screenshot. This is why some apps will
blank the screen when they go to the task
-
switcher. So if you're swapping between
apps, you'll see that some of your
-
applications will just show a blank screen
or will not show contents. And that's
-
because the mobile operating systems APIs
don't tell them when you're in that mode
-
when you take a screenshot and so they
want to just be able to notify you if the
-
other person does. It's worth noting that
this is all just raising the cost of these
-
attacks and providing sort of a social
incentive not to, right. I can still use
-
another camera to take a picture of my
phone and get evidence of something that
-
has been said. But it's discouraging it
and setting social norms. The other reason
-
for expiry is: After the fact, a
compromise of a device, so whether that's
-
- you know, someone gets hold of the
device and tries to do forensic analysis
-
to pull off previous messages or the chat
database or whether someone tries to
-
install an application that then scans
through your phone... So that's Fengcai is
-
a application that's been installed as a
surveillance app in China. And this also
-
boils down to a user interface and user
experience question, which is how longer
-
you're going to save logs, how much
history are you going to save and what
-
norms are you going to have? And there's
there's a tradeoff here. It's useful
-
sometimes to scroll back. And especially
for companies that believe that they have
-
value added services around being able to
do data analytics on your chat history.
-
They're wary of getting rid of that. The
next thing that we have is isolation and
-
OS sand boxing. Right. So this is a lot of
this is up one layer, which is what is the
-
operating system doing to secure your
application, your chat system from the
-
other things, the malware or the
compromises of the the broader device that
-
it's running on. We have a bunch of
projects around us at Congress that are
-
innovating on this. There are chat systems
that also attempt to do this sort of on
-
their own. One sort of extreme example is
called tinfoil chat, which makes use of
-
three devices and a physical diode which
is designed to have one device that is
-
sending messages and another device that
is receiving messages. And the thought is:
-
if you receive a message that somehow
compromises the device, the malware or the
-
malicious file can never get any
communication back out and so becomes much
-
less valuable to have compromised. And
they implement this with like a physical
-
hardware diode. The other side of this is
recovery and backups. Which is you've got
-
a user experience tradeoff between a lot
of people losing their devices and wanting
-
to get back their contact list or their
chat history and the fact that now you're
-
keeping this extra copy and have this
additional place for things to get
-
compromised. Apple has done a lot of work
here that we don't look out so much. They
-
gave a blackout talk a few years ago where
they discuss how they use custom hardware
-
security modules in their data centers,
much like the T2 chip. In the end, devices
-
that will hold the backup keys that get
used for their iclub backups and do
-
similar amounts of rate limiting. And they
consider a set of - a pretty wide set of
-
adversaries - more than we might expect.
So including things like what happens when
-
the government comes and asks us to write
new software to compromise this? And so
-
they set up their HSMs such that they
cannot provide software updates to them,
-
which is, you know, a sort of a step of
how do you do this cloud security side
-
that we don't think about as much. So
there's a set of slides that you can find
-
from from this. And these slides will be
online, too, as a pointer to to look at
-
their solution, which considers a large
number of adversaries that you might not
-
have thought about. So traffic obfuscation
is primarily a network side adversary. The
-
technique that is getting used as sort of
what people are using if they feel they
-
need to do this, is something called
domain fronting and domain fronting, had
-
its heyday maybe in 2014 ish and has
become somewhat less effective, but it's
-
still effective enough for most of the
chat things. The basic idea behind domain
-
fronting is that there's a separation of
layers behind that envelope and the
-
message inside of it that we get in HTTP
in the Web. So when I create a secure
-
connection to a CDN to a content provider
like Amazon or Google or Microsoft, I can
-
make that connection and do perform the
security layer and provide a fairly
-
generic service that I'm connecting to. I
just want to establish a secure connection
-
to CloudFlare. And then once I've done
that, the message that I can send inside
-
can be a chat message to a specific
customer of that CDN or that cloud
-
provider. And so this is an effective way
to prevent the network from knowing what
-
specific service you're accessing. It got
used for a bunch of circumvention things.
-
It then got used for a bunch of malware
things and this caused a bunch of the
-
cloud providers to stop allowing you to do
this. But it's still getting used. This is
-
still what sort of happening when you turn
on certain censorship circumvention in
-
signal, it's what telegram is using for
the most part. And it's the same basic
-
technique is getting another revival with
DNS over HTTPS and encrypted SNI
-
extensions to TLS which allow for a
standardized approach to establish a
-
connection to a service without providing
any specific identifiers to the network
-
for which service you want to connect to.
It's worth sort of mentioning that
-
probably the most active chat service for
this sort of obfuscation or circumvention
-
is telegram, which has a bunch of users in
countries that are not fans of having lots
-
of users of telegram. And so they have
both systems where they can bounce between
-
IPs very quickly and change where their
servers appear to be. And they've also
-
used techniques like sending messages over
DNS tunnels to mitigate some of these
-
censorship things From the provider's
perspectives this is really accessing
-
their user population. They're not really
thinking about your local network or
-
caring about that as much as as much as
they are like, oh, there's millions of
-
users that should probably still have
access to us. So we can maybe hide the
-
characteristics of traffic in terms of
what specific service we're connecting.
-
There's some other things about traffic,
though, that also are revealing to the
-
network. And this is sort of this
additional metadata that we need to think
-
about. So one of these is padding or the
size of messages can be revealing. So one
-
sort of immediate thing is the size of a
chat or a text message is going to be very
-
different from the size of an image or
voice or movies. And you see this on
-
airplanes or in other bandwidth limited
settings: they might allow text messages
-
to go through, but images won't. There's
been research that shows, for instance, on
-
voice, even if I encrypt my voice, we've
actually gotten really good at compressing
-
audio of human speech. So much so that
different phonemes, different sounds that
-
we make take up different sizes. And so I
can say something, compress it, encrypt it
-
and then recover what was said based on
the relative sizes of different sounds. So
-
there was there was a paper in 2011 that
Oakland S&P that demonstrated this
-
potential for attacks. And so what this is
telling us perhaps is that there's a
-
tradeoff between how efficiently I want to
send things and how much metadata or
-
revealing information for distinguishing
them I'm giving up. So I can use a less
-
efficient compression that's constant bit
rate or that otherwise is not revealing
-
this information, but it has higher
overhead and won't work as well in
-
constrained network environments. The
other place this shows up is just when
-
people are active. So if I can look at
when someone is tweeting or when messages
-
are sent, I can probably figure out pretty
quickly what timezone they're in. Right.
-
And so this leads to a whole set of these
metadata based attacks. And in particular,
-
there's confirmation attacks and
intersection attacks. And so intersection
-
attacks is looking at the relative
activity of multiple people and trying to
-
figure out: OK, when Alice sent a message,
who else was online or active at the same
-
time? And over time, can I narrow down or
filter to specific people that were likely
-
who Alice was talking to? Pond also is a
service to look at or a system to look at
-
in this regard. Their approach was that a
client would hopefully be always be online
-
and would at a regular pattern check in
with the server with the same amount of
-
data, regardless of whether there was a
real message to send or not. So that from
-
the network's perspective, every user
looked the same. The downside being that
-
you've now got this message being sent by
every client every minute or so and that
-
creates a huge amount of overhead of, you
know, just padded data that doesn't have
-
any meaning. So finally, I'll take a look
at server hardening and the things that
-
we're doing to reduce trust in the server.
There's a few examples of why we would
-
want to do this. So one is that you've had
messaging servers, plenty of times, that
-
have not been as secure as they claim. One
example being that there was a period
-
where the Skype subsidiary in China was
using a blacklist of keywords on the
-
server to either prevent or intercept some
subset of their users messages without
-
telling anyone that they were doing that.
And then also just sort of this uncertain
-
future of, OK, I trust the data now, but
what can we do so that I don't worry about
-
what the corporate future of this service
entails for my data. One of the sort of
-
elephants in the room is: the software
development is probably pretty
-
centralized. So even if I don't trust the
server, there's some pretty small number
-
of developers who are writing the code.
And how do I trust that the updates that
-
they are making to this, either the server
or to my client that they pushed my client
-
isn't reducing my security. Open source is
a great start to mitigating that, but it's
-
certainly not solving all of this. So one
thing, one way we can think about how we
-
reduce trust in the server is by looking
at what the server knows after end to end
-
encryption. It knows things about the
size. It knows where the message is coming
-
from. It knows where the message is going
to. Size: we've talked about some of these
-
padding things that we can use to
mitigate. So how do we reduce the amount
-
of information about sources and
destinations in this network graph that
-
the server knows? So this is a concept
called linkability, which is being able to
-
link the source and destination of a
message. We start to see some mitigations
-
or approaches to reducing linkability
entering mainstream systems. So Signal has
-
a system called "Sealed Sender" that you
can enable, where the source of the
-
message goes within the encrypted
envelope. So that Signal doesn't see that.
-
The downside being that Signal is still
seeing your IP address but the thought is
-
that they will throw those out relatively
quickly and so they will have less logs
-
about this source to destination.
Theoretically, though, there is a bunch of
-
work in this. The first thing I'll point
to is a set of systems that we classify as
-
mixnets. A mixnet works by having a set of
providers rather than a single entity
-
that's running the servers. A bunch of
users will send messages to the first
-
provider, which will shuffle all of them
and send them to the next provider, which
-
will shuffle them again and send them to a
final provider that will shuffle them and
-
then be able to send them to destinations.
And this de-links. Where none of the
-
individual providers know both the source
and destination of these messages. So this
-
looks maybe a bit like Tors onion routing,
but differs in in sort of a couple of
-
technicalities. One is typically, you will
wait for some number of messages rather
-
than just going through with bandwidth and
low latency. And so by doing that, you can
-
get a theoretical guarantee that this
batch had at least n messages that got
-
shuffled and therefore you can prevent
there being some time where only one user
-
was using the system. And so you got a
stronger theoretic guarantee. There's an
-
active project making a messaging system
using mixnets called Katzenpost. They gave
-
a talk at Camp this summer and I'd
encourage you to look at their website or
-
go back to that talk to learn more about
mixnets. The project that I was, I guess,
-
tangentially helping with is in a space
called private information retrieval,
-
which is another technique for doing this
delinking. Private information retrieval
-
frames the question a little bit
differently. And what it asks is: if I
-
have a server that has a database of
messages and I want a client to be able to
-
retrieve one of those messages without the
server knowing which message the client
-
got or asked for. So this sounds maybe
hard. I can give you a straw man to
-
convince yourself that this is doable and
the straw man is: I can ask the server for
-
its entire database and then take the
message that I want and the server hasn't
-
learned anything about which message I
cared about. But I spent a lot of network
-
bandwidth probably doing that. So there's
a couple of constructions for this. I'm
-
going to focus on the information
theoretic private information retrieval.
-
And so we're going to use a similar setup
to what we had in our threat model for a
-
mixed net, which is we've got a set of
providers now that have the same database.
-
And I'm going to assume that they're not
all talking to each other or colluding. So
-
I just need at least one of them, to be
honest. And one of the things that we'll
-
use here is something called the exclusive
or operation. To refresh your memory here
-
exclusive or is a binary bitwise
operation. And the nice property that we
-
get is if I xor something with itself, it
cancels out. So if I have some piece of
-
data and I xor it against itself, it just
goes away. So if I have my systems that
-
have the database, I can ask each one to
give me a superposition of some random
-
subset of its database so I can ask the
first server, give me items for 11, 14 and
-
20 xor together. I'm assuming all of the
items are the same size so that you can do
-
these xors. And then if I structure that,
it can appear to each server independently
-
or as in the request that it sees that I
just ask for some random subset. But I can
-
do that so that when I xor the things I
get back, everything just cancels out
-
except the item that I care about. Unless
you saw all of the requests that I made,
-
you wouldn't be able to tell which item I
cared about. So by doing this, I've
-
reduced the network bandwidth. I'm only
getting one item of size back from every
-
server. Now, you might you might have a
concern that I'm asking the server to do a
-
whole lot of work here. It has to look
through its entire database and compute
-
this superposition thing. And that seems
potentially like a lot of work, right. The
-
thing that I think is exciting about this
space is it turns out this sort of
-
operation of going out to a large database
and like searching for all of the things
-
and then coming back with a small amount
of data looks a lot like the hardware that
-
we're building for A.I. and for a bunch of
these sorts of search like things. And so
-
this runs really quite well on a GPU where
I can have all of those thousands of cores
-
compute little small parts of the XOR and
then pull back this relatively small
-
amount of information. And so with GPUs,
you can actually have databases of
-
gigabytes, tens of gigabytes of data and
compute these XORs across all of it in
-
order of a millisecond or less. So a
couple of things in this space. "Talek" is
-
the system that I helped with that
demonstrates this working. The converse
-
problem is called private information
storage. And that one is how do I write an
-
item into a database without the database
knowing which item I wrote, the
-
mathematical construction there is not
quite as simple to explain. But there's a
-
pretty cool new work in the last month or
two out of Dan Boneh and Henry Corrigan-
-
Gibbs at Stanford called Express and Saba
as first author that is showing how to
-
fairly practically perform that operation.
I'll finish just with a couple minutes on
-
multiparty chat or group chat, so small
groups. You've sort of got a choice here
-
in terms of how assisted chat systems are
implementing group chat. One is you can
-
not tell the server about the group. And
as someone who is part of the group, I
-
just send the same message to everyone in
the group. And maybe I can tag it for them
-
so that they know it's part of the group
or you do something more efficient where
-
you tell the server about group membership
and I send the message once to the server
-
and it sends it to everyone in the group.
Even if you don't tell the server about
-
it, though, you've got a bunch of things
to worry about leaked correlation,
-
which is: if at a single time someone
sends the same sized message to five other
-
people and then later someone else sends
the same sized message to five other
-
people, and those basically overlap,
someone in the network basically knows who
-
the group membership is. So it's actually
quite difficult to conceal group
-
membership. The other thing that breaks
down is our concept of deniability once
-
again, which is now if multiple people
have this log. Even if both of them
-
individually could have written it, the
fact that they have the same cryptographic
-
keys from this other third party probably
means that third party made that message.
-
So there continues to be work here. Signal
is working on providing again and SGX and
-
centralized construction for grid
management to be able to scale better,
-
given I think the pretty realistic fact
that the server in these cases is probably
-
going to be able to figure out group
membership in some case, you might as well
-
make it scale. On the other side, one of
the cool systems that's being prototyped
-
is called "cwtch" out of open privacy.
And this is an extension to ricochet that
-
allows for offline messages and small
group chats. It works for order of 5 to 20
-
people, and it works by having a server
that obliviously forwards on messages to
-
everyone connected to it. So when I send a
message to a group, the server sends the
-
message to everyone it knows about, not
just the people in the group, and
-
therefore the server doesn't actually know
the subgroups that exist. It just knows
-
who's connected to it. And that's a neat
way. It doesn't necessarily scale to large
-
groups, but it allows for some concealing
of group membership. They've got an
-
Android prototype as well that's sort of a
nice extension to make this usable.
-
Wonderful. I guess the final thought here
is: there's a lot of systems, I'm sure I
-
haven't mentioned all of them. But this
community is really closely tied to the
-
innovations that are happening in the
space of private chat. And this is the
-
infrastructure that supports communities
and is some of the most meaningful stuff
-
you can possibly work on. And I encourage
you to find new ones and look at a bunch
-
of them and think about the tradeoffs and
encourage friends to play with new
-
systems, because that's how they gain
adoption and how people figure out what
-
mechanisms do and don't work. So with
that, I will take questions.
-
Applause
-
Herald: Wasn't necessary to encourage you
to come with an applause. There are
-
microphones that are numbered in the room,
so if you start lining up behind the
-
microphones, then we can take your
questions. We already have a question from
-
the Internet.
Question: Popularity and independency are
-
a contradiction. How can I be sure that an
increasingly popular messenger like Signal
-
stays independent?
Answer: I guess I would question whether
-
independence is a goal in and of itself.
It's true that the value is increasing.
-
And so one of the things I think about is,
is using systems that have open protocols
-
or that are federated or otherwise not
centralized. And again, this is reducing
-
that need to have confidence in the future
business model of single legal entity.
-
But I don't know if independence is of
the company is the thing that you're
-
trying to trade off with popularity.
Herald: Well, and we have questions at the
-
microphones. We'll start a microphone,
number one.
-
Question: Thanks for the talk. First of
all, we talked to you talked a lot about
-
content and encryption. What about the
initial problem? History shows that if I'm
-
an individual already observed in a
sensitive area, that might no need to
-
encrypt or decrypt the message on sending.
It's already identified. I'm sending at a
-
specific location at a specific time. Is
there any chance to hide that or do
-
something against it?
Answer: So make things hidden again after
-
the fact? That seems very hard. I mean,
so. So there's a couple thoughts there,
-
maybe. There's sort of this real world
intersection attack, which is if
-
there's a real world observable action of
who actually shows up at the protest,
-
that's a pretty good way to figure out who
is chatting about the protests beforehand,
-
potentially. And so, I mean, I think what
we've seen in real world organizing is
-
things like either really decentralizing
that, where it happens across a lot of
-
platforms, and happens very spontaneously
close to the event. So there's not enough
-
time to respond in advance or using or
hiding your presence or otherwise trying
-
to stagger your actual actions so that
they are harder to correlate to a specific
-
group. But it's not something the chat
systems are talking about. I don't think.
-
Herald: We have time for more questions.
So please line up in the microphones and
-
if you're leaving, then leave quietly. We
have a question from microphone number 4.
-
Question: So if network actress
-
translation is the original sin to the end
to end principle, and due to that, we now
-
have to run servers, someone has to pay
for it. Do you know any solution to that
-
economic problem?
Answer: I mean, we had to pay for things
-
even without network address translation,
but we could move more of that cost to end
-
users. And so we have another opportunity
with IP v six to potentially keep more of
-
the cost with end users or develop
protocols that are more decentralized
-
where that cost stays more fairly
distributed. You know, our phones have a
-
huge amount of computation power and
figuring out how we make our protocols so
-
that work happens there is, I think, an
ongoing balance. I think some of the
-
reasons why network address translation or
centralization is so common is because
-
distribute systems are pretty hard to
build and pretty hard to gain confidence
-
in. So more tools around how we can test
and feel like we understand and that the
-
system actually is, you know, going to
work 99.9% of the time for distributed
-
systems is going to make people less wary
of working with them.
-
So better tools on distribute systems is
maybe the best answer.
-
Herald: We also have another question from
the internet, which we'll take now.
-
Question: What do you think of technical
novices, acceptance and dealing with OTR
-
keys, for example, Matrix Riot? Most
people I know just click "I verified this
-
key" even if they didn't.
Anwer: Absolutely. So this, I think
-
goes back to a lot of these problems are
sort of a user experience tradeoff, which
-
is, you know, we saw initial versions of
Signal where you would actually try and
-
regularly verify some QR code between each
and then that sort of has gotten pushed
-
back to a harder to access part of the
user interface because not many people
-
wanted to deal with that. And an early
matrix riot you would get a lot of
-
warnings about: There's a new device. Do
you want to verify this new device? Do you
-
only want to send to the previous devices
that you trusted. And now you're getting
-
the ability to sort of more automatically
just sort of accept these changes and
-
you're weakening some amount of the
encryption security, but you're getting a
-
better, smoother user interface because
most users are just going to sort of click
-
"yes" because they want to send the
message. Right. And so there's this
-
tradeoff: when you have built the
protocols such that you are standing in
-
the way of the person doing what they want
to do. That's not really where you want to
-
put that friction. So figuring out other
ways where you can have this on the side
-
or supporting the communication rather
than hindering it is probably the types of
-
user interfaces or systems that we should
be thinking about that can be successful.
-
Herald: We have a couple of more
questions. We'll start at microphone
-
number 3.
Question: Thank you for your talk. You
-
talked about deniability by sending the
private key with the last message.
-
And how I you get the private key for the
last message in the whole conversation
-
Anwer: In the OTR, XMPP, Jabber systems
there would be an explicit action to end
-
the conversation that would then make it
repudiateable that would that would send
-
that final message to to close it. What
you have in things like Signal is it's
-
actually happening every message as part
of the confirmation of the message.
-
Question: OK. Thank you.
Herald: We still probably have questions
-
, time for more questions. So please
line up if you have any. Don't hold back.
-
We have a question from
microphone number 7.
-
Question: So, first of all, a brief
comment. The riot thing still doesn't even
-
do tofu. They they haven't figured this
out. But I think there's a
-
much more subtle conversation that needs
to happen around deniability, because most
-
of the time, if you have people with with
a power imbalance, the non repudiatable
-
conversation actually benefits the weaker
person. So we actually don't want
-
deniability in most of our chat
applications or whatever, except that's
-
still more subtle than that, because when
you have people with equal power, maybe
-
you do. It's kind of weird.
Anwer: Absolutely. And I guess the other
-
part of that is, is that something that
should be shown to users and is that a
-
concept? Is there a way that you express
that notion in a way that users can
-
understand it and make good choices? Or is
it just something that your system makes a
-
choice on for all of your users?
Herald: We have one more question.
-
Microphone number seven, please line up if
you have any more. We still have a couple
-
of more minutes. Microphone number seven,
please.
-
Question: Hi, Thanks for the talk. You
talked about the private information
-
retrieval and how that would stop the
server from knowing who retrieved the
-
message. But for me, the question is, how
do I find out in the first place which
-
message is for me? Because if he, for
example, always use message slot 14, then
-
obviously over a conversation, it would
again be possible to deanonymize the users
-
in like, OK, they always accessing this
one in like all those queries.
-
Answer: Absolutely. So I didn't explain
that part. The trick is that between the
-
two people, we will share some secret,
which is our conversation secret. And what
-
we will use that conversation secret for
is to seed a pseudo random number
-
generator. And so we will be able to
generate the same stream of random
-
numbers. And so each next message will go
at the place determined by the next item
-
in that random number generator. And so
now the person writing can just write out
-
random places as far as the server tells
and when it wants to write the next
-
message in this conversation, it'll
make sure to write at that next place
-
in its a random number generator for that
conversation. There is a paper that will
-
describe a bunch more of that system.
But that's the basic sketch.
-
A: Thank you.
H: we have a question from the Internet.
-
Question: It seems like identity is the
weak point of the new breed of messaging
-
apps. How do we solve this part
of Zooko's triangle, the need for
-
identifiers and to find people?
Answer: Identity is hard, and I think
-
identity has always been hard and will
continue to be hard. Having a variety of
-
ways to be identified, I think remains
important and is why there isn't a single
-
winner takes all system that we use for
chat. But rather you have a lot of
-
different chat protocols that you use for
different circles and different social
-
circles that you find yourself in. And
part of that is our desire to not be
-
confined to a single identity, but to be
able to have different facets to our
-
personalities. There are systems where you
can identify yourself with a unique
-
identifier to each person you talk to
rather than having a single identity
-
within the system. So that's something
else that Pond would use. Was that the
-
identifier that you gave out to each
separate friend was different. And so
-
you would appear as a totally separate
user to each of them. It turns out that's
-
at the same time very difficult, because
if I post an identifier publicly, suddenly
-
that identifier is now linked to me for
everyone who uses that identifier. So you
-
have to give these out privately in a one
on one setting, which limits your
-
discoverability. So that that concept of
how we deal with identities I think is
-
inherently messy and inherently something
that there's not going to be something
-
satisfying that solves.
Herald: And that was the final question
-
concluding this talk. Please give a big
round of applause for Will Scott.
-
Will: Thank you
-
Postroll music
-
subtitles created by c3subtitles.de
in the year 2019. Join, and help us!