*preroll music*
Herald: I did some research,
and it was not, not easy
that Diffie-Hellman key exchange
is so much above my pay grade
therefore, I'm going to keep it simple.
Please welcome
we have Alex Halderman from
the University of Michigan,
and Nadia Heninger from
the University of Pennsylvania.
*applause*
AH: Thank you.
Thank you all so much.
It's wonderful to be back again in 32C3.
I'm Alex Halderman from
the University of Michigan,
here again this year with,
with many of my students from my group
here in the audience also speaking.
We study security in the real world.
So tonight, we have
a very special story to tell you
that I'm very proud to be telling
along with my colleague Nadia Heninger.
We're going to be talking
about discrete log, Diffie-Hellman
and some of the, um,
the research that we've done
over the past year,
try to understand how the NSA
may be breaking so much of the crypto
that we know they're breaking.
Why do we...? So this work is
joint work with a number of co-authors,
with 12 other co-authors,
3 of them are in this room right now,
and I'd ask to stand up
but they said they didn't want to
so please, a quick round of applause
for my co-authors as well.
*applause*
So, thank you.
So in this very room,
a year ago at 31C3,
Jacob Appelbaum and Laura Poitras
gave a talk, "Reconstructing Narratives",
in which they announced some new results
from the Snowden archives.
They were able to tell us how the NSA,
that the NSA was breaking cryptography
used in widespread online communication.
And, they later published
an article in der Spiegel,
in which the article included documents
that showed indeed the scope of NSA
breaking widely used encryption
was significant.
That NSA is breaking,
maybe not all crypto,
but they're able to break
widely used crypto
from many of the different kinds
of services and protocols we care about.
What the leaks didn't answer
is how NSA is breaking this cryptography
and to a technologist,
well, if we don't know
how they're breaking it,
what can we do to stop it?
So, Nadia and I and our co-authors set out
earlier this year
to try to, through our research,
start answering those questions.
How is NSA likely to be breaking
likely used cryptography,
and what can we and other researchers do
to stop government from being able
to attack the crypto
that all of us depend on?
So, so...
*applause*
Let's tell the story.
Wait until you see how it ends!
So if I were setting out as the attacker
to break widely used crypto,
well, there's a few different ways
that I could do it.
One branch of the decision tree here
is to attacking the implementations
right, either finding vulnerabilities
or introducing backdoors,
we've all been witnessing over the past
week or so with Juniper and their systems
being compromised.
On the other hand,
there's another prong you could take.
You could try to attack the crypto
algorithms themselves,
the underlying crypto.
And the difference is,
if you're attacking implementations,
you have to make a big investment
in every hardware device
and piece of software
that you're trying to compromise.
If you're attacking the underlying crypto,
you have just one, a one-stop shop
to gain access to,
potentially a very wide swath
of all the crypto in use.
So a small number of algorithms
predominate for both
symmetric cryptography,
things like AES and RC4,
but those particular algorithms anyway,
most cryptographers seem to think
that breaking them,
at least in the general case,
is pretty hard right now.
Instead though, we also have
a very small number of
public key crypto algorithms
that are also in use very widely
for most or all of the protocols
and services we care about.
Things like RSA and Diffie-Hellman.
And here be dragons, as they say,
this is the cryptography that we
and many other cryptographers
think is most likely to be targeted.
So, I'll hand it off to Nadia
to talk about breaking public key.
*applause*
NH: So, in order to understand
a little bit about
how cryptanalysis works,
I'm going to go all the way back
to the very beginning of
public key cryptography
from the 70s,
and I'll start by explaining
a little bit about RSA.
This is Rivest, Shamir, and Adleman
up on the screen here,
from the 70s, you can tell.
And this is sort of the simple example,
and then we'll talk a little bit more
about the actual
Diffie-Hellman-based cryptanalysis
that we're actually talking about.
So, this the first public-key
crypto algorithm
that was ever published,
and it is still the most widely used
cryptography, public key cryptography
algorithm out there.
That shows you, I guess something
about the naturalness of ideas,
or maybe the lack of progress
that we've had in the past 40 years.
So, here's sort of the textbook version
of RSA encryption,
what we really care about is that...
So, Alice and Bob, they want
to bootstrap communication over
an untrusted channel,
so there's some eavesdropper
in between them
who's intercepting their messages.
And, in order to get around this,
they need to somehow figure out
how to share a key in order to
actually encrypt their communications.
And the way that RSA accomplishes this,
is, Bob here on the right has a public key
which in RSA is e modulus N
which is the product of
two large prime factors,
and he sends this over to Alice,
and Alice uses Bob's public key
to encrypt a message, like a session key,
and send it back to Bob,
and then Bob can decrypt the message,
get the session key,
and they can communicate using
some other symmetric cipher.
So, this is the big picture
of RSA encryption.
The reason that we think
that RSA is secure is because
the best way that we know how to break
an RSA public key
is to factor the modulus,
and as far as we know,
factoring is not very easy.
So, in particular, factoring is
what we hope is something like
a one-way function,
multiplication is easy,
factoring the number into
two pieces again is hard,
in some sense.
And the best algorithm that we have
in the general case, of, say
an RSA modulus that's well-generated,
is called the number field sieve.
So here is the...
this is as bad as technical
as I'm going to get,
and I'm waving my hands vigorously here,
but here's something along the lines of
what the number field sieve algorithm
actually looks like,
so it's a multi-stage algorithm,
it's rather complex,
some stages parallelise very well,
embarrassingly well,
other stages parallelise somewhat
less well,
and so we've got these multiple stages
that we go through,
and at the end of the algorithm,
we discover, we hope, a prime factor
of the number that we're trying to factor.
So, how long does it take to factor?
Here's one answer:
if you ask a number theorist, this is
the answer that they all give you,
they all go through the optimisation,
and they will tell you the answer is
a sub-exponential function in the size
of the number that we're trying to factor.
That I think is not the answer
that you guys are looking for.
So, here's a slightly more
concrete answer.
So, how long does it take to factor
different sizes of keys?
So, for 512-bit RSA,
the first 512-bit RSA modulus
was factored in 1999
by a group of academics,
that took them 6 months
and a few supercomputers,
now you can rent supercomputers
by the hour.
So what does that do?
Well, some students of mine
actually were able to factor
a 512-bit RSA key
for 4 hours and 75 dollars on Amazon EC2.
If you would like to do this too...
*applause*
So, you can also do this too,
code is online, right here, download it,
send us bug reports, etc.
So, as we go up in key sizes,
768-bit RSA modulus,
estimate, current estimate is
less than 1000 core-years,
and for sort of reasonable-size
academic clusters,
that should take less than
a calendar year to finish, now.
That was,
the first 768-bit RSA factorisation
was done in public in 2009.
So, that gives you some idea
of sort of the progress.
For 1024-bit RSA, nobody has factored
a key of that size in public,
the estimate is probably,
it's about a million core-years,
which is certainly within range
for a large government,
so it is against better recommendations
to use a 1024-bit RSA key size,
at this point.
Current recommendation is to use
a 2048-bit RSA modulus,
with current algorithms,
nobody should ever be able to factor
something at this size,
without some kind of major improvement.
So, there's the big picture for you.
Now move on to Diffie-Hellman.
So, the paper that introduced
Diffie-Hellman
was called "New Directions
in Cryptography",
it's one of the seminal papers
of the 20th century,
how many of you have read this paper?
You should all go read it right now,
I mean not right now, maybe after I talk.
The first sentence of this paper,
written in 1976,
is "We stand today on the brink
of a revolution in cryptography".
This is one of the best opening sentences
of an academic paper I've ever read,
and they were 100% right
about everything they put in the paper.
They laid out the entire foundations
of cryptographic research
for a couple decades,
and to boot they came up with
the first public key exchange,
that is still one of the commonly used
public key methods we
have on the Internet.
So, all that in one paper.
So, the way that
textbook Diffie-Hellman works,
is, you've got a couple of parameters,
you've got a prime p,
and some element g less than p,
you can think,
for concreteness, g is 2.
It's a good number.
And p is some very large prime,
say 1024, 2048-bit prime.
And so Alice and Bob,
same as our previous scenario,
they want to bootstrap communication
in the presence of
an untrusted eavesdropper.
So what they're going to do,
Alice will generate some secret value a,
and she will compute g^a mod p,
and send it over to Bob,
and Bob will compute some secret value b,
and compute g^b mod p,
and send it over to Alice,
the eavesdropper sees the values
g^a and g^b,
can't derive anything useful from those,
but Alice and Bob can individually
take their secrets
and derive the values g^ab mod p,
both the same values.
And that becomes a shared secret,
which they can then use as a session key,
and, you know, switch to AES
and start symmetrically
encrypting their data.
So, that's Diffie-Hellman key exchange.
Used all over the Internet,
one of the commonly used things possible.
So, one of the reasons that people
have been advocating
Diffie-Hellman key exchange recently
over, say, RSA,
is because it can be, it can provide
the property of perfect forward secrecy.
So assuming that Alice and Bob
generate fresh random
secret values a and b
for every single connection,
then if, say, some large government agency
is collecting all of their communications
and later tries to hack into Alice or Bob,
or break one of their keys,
in order to decrypt their communication,
they can't hack into Alice or
Bob's computer later,
and then discover the key
that Alice and Bob used
to generate all the conversations
that they had,
because Alice and Bob have
already forgotten
the keys that they used.
So, as long as Alice and Bob
are generating fresh random
values every time,
those values should reveal nothing
about past or future communications.
So, that's perfect forward secrecy.
And, a lot of people have,
who really know what
they're talking about,
including, unfortunately, me,
on this stage a couple years ago,
have said, "you guys should always use
Diffie-Hellman over RSA key exchange
because of this property of
perfect forward secrecy".
So here's a selection of quotes
that I found on the Internet,
just from googling
"perfect forward secrecy"
and "Diffie-Hellman key exchange",
and you find people saying that
this provides better security,
the NSA can decrypt nothing,
1024-bit Diffie-Hellman is arguably
better than 1024-bit RSA,
and then 1024-bit Diffie-Hellman
is better than any key size for RSA.
This is a selection of friends
and people I respect,
and some also, also some
random people on Stack Overflow,
which is where...
*laughter*
where we know everybody's actually
getting their recommendations from.
So, there's been wide-scale advocacy
of perfect forward secrecy as
like, the reason that you should
use Diffie-Hellman.
It will protect you against the NSA.
I'm sorry. We were wrong.
And, why were we wrong?
I'm going to tell little bit more
about what the cryptanalysis looks like
for Diffie-Hellman.
So, the underlying problem
that we hope is hard
for the security of Diffie-Hellman
is called discrete log,
it is exactly sort of the problem of
given one of the key exchange values g^a mod p
compute, say, Alice's secret a.
This would allow the attacker
to compute the shared secret
in the same way that Alice did.
And, sort of similar to
factoring and multiplication,
discrete log, we think it's much harder
than modular exponentiation,
the computation that actually
gives you the value g^a mod p.
And the best algorithm that we have
is called the number field sieve.
So, there's a lot of parallels going on here.
So what does the number field sieve
for discrete log look like?
Hopefully this diagram is somewhat
familiar to you by now,
it's a multi-stage algorithm,
it has many of the same
stages as factoring,
you can sort of line them up in parallel,
the last bit looks a little bit different,
but we can sort of ignore that
for the moment.
Okay. So, we have some pictures
of what the number field sieve looks like.
How long does it take?
Answer number 1:
The same answer as factoring.
In case you didn't remember,
here it is again.
This is kind of why everybody
has been saying
"Okay, the security of, you know,
1024-bit Diffie-Hellman key exchange
is about the same as the security of
a 1024-bit RSA key."
It's because we have the same
complicated formula that tells us
how hard it is to break.
The sort of more subtle answer for...
or, not more subtle,
but the more practical answer
for, how secure is it,
is, we can say, well, how long do we think
it will take to actually compute
a discrete log for
commonly used key sizes,
and the answer is,
slightly longer than factoring an
RSA key of equivalent size,
but, not so much longer than an RSA key.
And, the minimum recommended key size
today is 2048 bits.
Okay, so, 2048-bit Diffie-Hellman,
we're good. Great! We can all go home.
Okay. However, okay,
what if you want to break many connections
that use the same public parameter p?
Do you have to go through
this whole computation,
every single, for every single connection
that you want to break?
That was kind of the justification
for perfect forward secrecy,
that every single connection
should be as hard as factoring an RSA key
of the equivalent size.
Except that's not quite the case.
Because if you look at where
the actual target that
we're trying to compute
appears in this plot,
it's only at the very last stage.
So all of this only depends
on the prime p.
So we can actually divide up
the algorithm in two pieces:
A few stages that depend only
on the prime p that we're using,
and then the final computation
that takes the output of this
first precomputation stage,
and that's the only stage
that actually matters,
that actually depends on the target
of our discrete log computation.
So, we're in trouble.
In particular, that means that
if many connections are using
this same prime p,
you can do the precomputation once,
spend a huge amount of effort,
and then the actual effort required
to break individual connections
using those primes
is much, much smaller.
So here's our current estimates,
these are actually somewhat new
from our paper,
of how long the individual log stage
takes in practice,
if you push the primer as far as you can
to make this as fast as possible.
And the answer is basically,
if you're worried about a government,
you better start worrying
for reasonable key sizes
that people are using.
See, so I'll let Alex continue
with the next part of our talk.
*applause*
So this fact that Nadia just told us
about Diffie-Hellman
and the number field sieve,
this was something that the
mathematical crypto people knew about,
but most of us who did system security,
people like me,
didn't ever get the memo.
So, it's, I'm here in part to apologise
to everyone I've taught
about Diffie-Hellman and cryptanalysis
who didn't get to hear about this,
as we were talking about
perfect forward secrecy.
Right, this fact about the cryptanalysis
and how well it can apply in exactly
the scenario that we're worried about,
this kind of situation
involving mass surveillance,
was news to many of those.
But now that we have that fact,
how can we exploit it,
to try to break Diffie-Hellman,
in scenarios that we all care about?
And we're going to talk about
two scenarios in the talk today.
The first one applies to HTTPS,
to encrypted web connections,
and it applies not only
to government agencies,
but also just to normal
everyday attackers,
with maybe the same resources
that you or I have.
Right, this is a down-to-earth
kind of attack on HTTPS,
and we call it Logjam.
Logjam allows us to break
the HTTPS connections
to many, many popular websites
in modern browsers,
by fooling those browsers into using
1990s-era backdoored crypto.
So where does this backdoored
crypto come from?
This is from the first crypto wars,
back in the 90s,
when my country, the US,
had restrictions on what kind and strength
of cryptography could be exported,
and used by people abroad.
So US companies were prohibited
from exporting products that contained
cryptography that had greater
than a certain strength.
For RSA, that was that the key size
had to be less than or equal to 512 bits,
and for Diffie-Hellman it was that
basically the prime had to be
512 bits or less.
So back in the 90s,
these were constants,
these were strengths of crypto
that were chosen presumably because
they were easy for NSA to break.
So, if you were an American company
making products, like let's say
Netscape Navigator, the web browser
that initiated the first SSL protocol,
you needed some way to be able
to communicate with,
from servers in the US,
to clients, including your own browser,
that you would ship to people abroad,
say, here in Germany.
And so they came up with a way
to use, to have HTTPS automatically select
the appropriate key strength
depending on whether the browser
was able to support
the full-strength cryptography,
or the weakened version
for deployment abroad.
And the way that they did that
was something called
export-grade cipher suites.
So when your browser,
whenever it starts a TLS connection
for an HTTPS URL,
one of the first thing that it does
is, the browser will send to the server
a list of the kinds of cryptography
that it can speak,
these are called cipher suites,
and then the server selects one of those,
that is compatible with
whatever cryptography
the server has available,
and then that negotiated cipher suite
is what's used for the connection.
Now the way that they supported
the 90s-era backdoor crypto
was by having browsers shipped abroad
from the United States that could only
speak a certain subset
of crypto algorithms,
that were limited in strength
to 512 bits or less,
those were the export-grade cipher suites
with the names you see here.
Now, even though no
browser has been shipped
that is limited to only these suites,
since probably 2000-sometime,
when the US relaxed
its export regulations,
there wasn't just any one day
when all of those old browsers
from before that era went away.
So, servers, even now, many servers
will still accept and speak
these weakened cipher suites,
if that's all the browser has available.
Like if you're running an e-commerce site,
right, I'm sure you still want to be able
to speak to any customers
who have 1998-era
Netspace Navigator involved,
otherwise you'd lose some sales, right?
So there was no reason to turn them off,
because no modern browser any more,
now that those restrictions are lifted,
would choose these weakened suites.
Well, that's what we thought, anyway.
So, in, over this past year,
there were two attacks that exploited
these weakened cipher suites,
that found ways to convince
modern browsers
to speak them instead of
full-strength cryptography.
The first was the FREAK attack,
which was revealed in early 2015
by a separate group of authors,
and the FREAK attack was
an implementation flaw
in many modern browsers.
In order to exploit it,
all you need to do is to be able
to relatively inexpensively
factor a 512-bit RSA key.
And, as Nadia has told you,
this is now a matter of maybe 4 hours,
maybe 75 dollars, something like that,
and if you did that, you'd able to break
all the connections coming into
a weak server for a long period of time,
to a server that still spoke
these cipher suites.
So this affected most modern browsers,
and just shy of 10% of Alexa
top million sites that speak HTTPS.
Now that left the Diffie-Hellman
export-grade cipher suites,
which were not affected by FREAK.
But we came up with a paper
in May of this year,
that showed a separate attack,
the Logjam attack,
which is a protocol flaw in TLS,
and affects all modern browsers,
and, similarly, allows you
to downgrade connections
to export-grade Diffie-Hellman,
and then intercept or modify the contents,
if the server speaks and still supports
these export-grade Diffie-Hellman
cipher suites.
So now let me give you
the hopefully brief technical overview
of how the Logjam attack works.
If you've been curious about this,
this is the chance to see it.
So, Logjam is a problem that happens
during the TLS connection handshake.
And the first thing that happens
in the handshake,
at the top of this diagram,
so this is just your browser connecting
to some website, some server
there on the right,
maybe Alice connecting to
her favourite website here.
So the first stage is this client hello,
where, you know, a normal client
is going to say,
I speak various kinds of cryptography,
including full-strength Diffie-Hellman.
The server at that point is going to
respond by picking some cipher suite,
let's say Diffie-Hellman,
and then sending over
its certificate chain,
as well as its Diffie-Hellman
public parameters,
p and g, the server gets to choose them,
as well as g^a.
And then it's going to use
its long-term RSA key that is the thing
that is signed in its certificate,
in order to make a signature on
those Diffie-Hellman parameters.
Okay, then it's going to do...
complete the negotiation, and so on.
In the Logjam attack,
we have a man-in-the-middle attacker,
who's able to modify some
of these messages
as they're going by.
So the first thing the attacker does,
he modifies the client hello message,
to replace all of the different
kinds of cryptography
the client says it supports,
and just put in export-grade
Diffie-Hellman.
Ah, the 90s are here again.
Alright, so then, you know, the server
will get that, and if the server supports
export-grade Diffie-Hellman,
as about 10% or so of servers
still support export grade Diffie-Hellman,
it's going to respond and say,
okay, that's what you asked for,
I'll take it,
and it's going to send over its side
of the Diffie-Hellman key exchange,
but using a 512-bit prime,
because that's what is required under
these export-grade suites.
Now, at that point, the browser would
normally reject this message,
because it didn't ask for export-grade,
it really asked for full-strength,
so instead, the man in the middle has to
modify the server's hello message,
and say that this is full-strength
Diffie-Hellman,
well, if it's full-strength,
why is it only a 512-bit prime
that's being used?
Well, there's actually no limitation,
and no distinction between the messages
that the server would send
in that space with p and g,
that says normal-grade Diffie-Hellman
has to be more than 512 bits.
In fact we found a handful of real sites
that even for normal-strength
Diffie-Hellman
just happened to use 512-bit
or even weaker cryptography.
So, as long as we modify
this earlier message,
the server's hello message,
and make it say, "normal-strength
Diffie-Hellman",
there's no way for the client to tell
from just the structure of the protocol,
that anything is amiss.
So, at this point, there is one last place
that we could catch the problem,
that the client or the server could see
that something's wrong,
which is that each side sends
the other a finished message,
here at the end,
that says, basically, has, uses
the session secrets
to add an authentication code
to a dialogue of all of the
protocol messages
that match the handshake dialogue,
all the messages going back
in each direction so far
have to be the same from each side of you.
However, in our case, in Logjam,
the attacker is able to spoof
these messages,
to make them look correct to each side.
He's able to modify that dialogue why?
Well, because we're using this
512-bit Diffie-Hellman
that we know from using
the number field sieve,
we are able to efficiently break.
So, if the attacker is able to quickly
perform the discrete log
on the Diffie-Hellman key exchange
that's going by at 512-bit strength,
then he can fix up the client
and server hello messages,
and neither side will notice
that anything went wrong.
So that's Logjam in a nutshell.
I'm sorry, it's a little bit complicated.
So, how widely shared are
these Diffie-Hellman public parameters?
Well, we used Internet-wide
scanning to find out.
So, my group, we also made
something called zmap,
that I talked about here
a couple of years ago,
which lets us quickly probe
everything on the Internet,
and so we did this and tried to make
connections to each HTTPS server
in the public IPv4 address space,
and found out what key exchange methods
were supported and with what primes.
And what we find is that 97% of hosts
that support export-grade Diffie-Hellman
use one of only 3 512-bit primes.
They're that widely-shared.
Why is this? Because those parameters
are very often either hard-coded
or encoded in standards
that different people implement.
The most common of these,
used by 80% of hosts that support
export-grade Diffie-Hellman,
is a public parameter that's
hardcoded into Apache 2.2.
So, it's actually there,
embedded in the source,
you have to recompile Apache to change it.
13% of hosts supported something,
a second prime that has also 512 bits,
that's hardcoded in mod_ssl,
and the next most popular, 4%,
was in the Sun JDK.
Only 10 primes accounted for 99%
of all the hosts we found in
the public address space
that supported export-grade
Diffie-Hellman.
So, if we would like to compromise these,
well, Nadia just told you about
how long it takes to use
the number field sieve
to break 512-bit discrete log,
well, we actually went and did
the precomputation
for all 3 of these most widely used
Diffie-Hellman primes,
and our colleagues who make a tool
called CADO-NFS
where able to implement the code
for that piece of the discrete log version
of the number field sieve
and they ran the algorithm on these primes
on a cluster they just happened
to have lying around,
it took about a week of time
on the cluster
for each of these primes.
After which, using an optimised version
of the last portion of
the number field sieve,
it takes about 70 seconds for us to break
any individual connection
that uses any one of these
3 most popular primes.
So, Logjam and our precomputations
now allow us to break any connection
to about 8% of the top million
HTTPS sites from Alexa
and when we came up with this attack,
it worked in all modern browsers.
So, mitigation!
*applause*
This is bad, everyone, this is the crypto
all of us are using.
So we do have some mitigations.
This is the actual positive part,
is that the browser makers have now
started to increase the minimum strength
of Diffie-Hellman they will accept.
So IE, Chrome, and Firefox will reject
primes less than 1024 bits
and Safari less than 768.
And the new draft of TLS 1.3 is including
an anti-downgrade flag
that will make it even harder
for such attacks to take place
in the future.
Now back to Nadia.
NH: So we promised in our abstract...
*applause*
...that there would be a hands-on
portion of this talk.
So, you have a couple options,
number 1 is, if you're really into this,
you can do and download
CADO-NFS yourselves,
cado-nfs.gforge.inria.fr
and, you know, run
discrete log algorithms yourselves
for any prime you wish
and then you can compute
arbitrary discrete logs.
However, since we have already done
some of the computations,
we figured that we would make
them available for you guys
if you wanted to play with them.
So...
*applause*
We have done so through the Twitter API,
so we have a bot running on Twitter
and if you would like to compute
discrete logs for any of these
widely-used parameters,
this bot will do so for you.
So here is the group generator
and the primes in hexadecimal,
for the 3 groups that we
did the precomputation for.
And if you wanted to test out,
you would do something like this,
so this using Sage,
which is a Python-based open source
mathematics package,
that does a lot of algebra
and number theory,
if you like playing with the stuff,
sage is super cool,
so, I said, say, my prime m
is this last value in hex there,
the mod_ssl prime,
then I take 2 and raise it to
the 0x1337 power mod m,
and then I print it out in hexadecimal,
and I get this value, then I can
copy-paste it into a tweet @DLogBot
then some comp stuff happens
on our back end,
this is running on one of
the machines in my lab,
so please don't break it,
and after a minute or two,
you should get back an answer.
*applause*
So, there's a queue,
only one thing can run at a time,
median time is 70 seconds,
it can vary between
30 seconds and 3 minutes,
so, you know, if it doesn't respond to you
within like, you know, an hour
or something,
then send us a ping and we'll see
if it's still running.
Okay. So, have fun.
Please don't actually use this for malice.
*applause*
We already have some satisfied customers.
*laughter*
AH: Alright, so we promised there were
two exploits that have to do with
weakened Diffie-Hellman,
and the first, Logjam, right, anyone can
use backdoors from the 90s
to pwn modern browsers,
well, the second one is
a little bit more widespread.
So, we're going to talk about
how Diffie-Hellman weaknesses
can be used for mass surveillance.
We believe that governments can probably
already right now, exploit
1024-bit discrete log
to break Diffie-Hellman for
wide-scale passive decryption
of Internet communications.
So, is breaking 1024-bit Diffie-Hellman
within the reach of governments,
let's look back at these numbers quickly.
So we can see that for 512-bit RSA
and Diffie-Hellman,
they're both really in reach of
basically any effort right now,
any one of you can probably,
most of the resources to do this.
For 768-bit RSA or Diffie-Hellman,
well, we think this is now in the reach
of a concerted academic effort.
For 1024, it's a little bit
more complicated,
because the number field sieve algorithm
is complicated enough that even
making estimates of the runtime
at this size and larger
is very, very complicated
and having a high-confidence estimate
is difficult.
But we've tried to do the math
conservatively,
and we believe that
a conservative estimate,
at least for 1024-bit Diffie-Hellman
is to break, to do those precomputations
for a single prime p,
would take about 45 million core-years.
Now 45 million core-years
sounds like a hell of a lot.
But, when you start to think about it,
if you're going to do
an effort that large,
there are some optimisations
you could start doing,
and, for instance, maybe instead of
running this on general-purpose PCs,
like these estimates show,
if you're going to do
an effort on this scale,
maybe you're going to tape out some chips,
maybe you're going to use custom hardware.
And if we do the math and look at
what kind of gains
we can get from custom hardware
in other applications that
are similar to this,
we estimate that we can get
maybe a speedup of 80 times
just by doing it in custom hardware.
Okay, and then we ask what's
that's going to cost,
well, we estimate that for...
to build a machine that could break
one 1024-bit p, precompute for
one 1024-bit p every year,
would cost somewhere in the neighbourhood
of low hundreds of millions of dollars,
in a one-time investment.
As a result of this, you can churn out
precomputations once a year
that will let you break efficiently
every connection that uses that p.
Now, individual logs then are going to be
close to real-time, and in fact you can
re-use much of the same hardware
to do the computations for
individual logs very quickly.
So, um, oh shit.
This is what the estimates look like.
Now is NSA actually doing this?
NH: This is where we get into
the conspiracy theories.
*applause*
So, there have been rumours flying around
for many years, I mean
for decades, really,
but sort of credible rumours
for many years,
of some large cryptanalytic breakthrough
that the NSA made.
So here's an article from James Bamford,
one of the, you know, world experts
in open ???
of what the NSA's activities are
and he wrote an article in 2012
saying very clearly that he had talked
to multiple government officials
who said that the NSA made
some enormous breakthrough
several years ago.
Everybody's a target,
everybody with communication is a target,
and this computing breakthrough
is going to give them the ability
to crack current public encryption.
And it was so secret that no oversight,
anybody had sort of access
to the details of it.
But whatever it was,
it was major and massive.
Of course, you know, after we saw this,
we said, oh my god, you know,
what could it possibly be,
are they breaking RSA?
Bamford actually goes on in this article
to speculate that it's
something about AES,
which at least to my mind
seems less likely
than some kind of major
public key breakthrough.
So clearly we have sort of these rumours
of large breakthroughs by the NSA's
tens of thousands of mathematicians.
Simultaneously, we can say, you know,
we know the NSA is clearly
interested in cryptanalysis,
is cryptanalysis on the scale
of hundreds of millions of dollars
within their reach?
The answer, thanks to Snowden, is yes.
We have some of their budgets
and they spend billions of dollars a year
on computer network operations,
they spend hundred of millions of dollars
on cryptanalytic IT systems,
cybercryptanalysis,
exploitation solutions,
in fact, a couple years ago there was even
an increase of hundreds of millions of
dollars in their budget for cryptanalysis.
Interesting.
So, a hundred million dollars of
special-purpose hardware
is certainly within range
of a government the size of ours.
Additionally, we can ask,
what would the impact of doing one of
these single precomputations
for a discrete log
for a single prime would be,
and the answer is actually
surprisingly large.
So if you did this precomputation
for a single 1024-bit prime,
that would allow passive decryption
of connections to 66% of VPN servers
and 26% of SSH servers.
This is from Internet-wide scanning,
we connected to all of these
and we said "we would like to speak
Diffie-Hellman with you,
what parameters do you prefer?"
and these are the servers that preferred
a single 1024-bit prime over
every other parameter in key size.
A second 1024-bit prime would allow
passive decryption for 18%
of the top million HTTPS domains.
These are domains that prefer
to speak Diffie-Hellman
with this fixed prime.
And, the final piece of evidence
for something like this being within range
and at least being worth worrying about
is actually some of the slides
that were release last year,
by der Spiegel,
and in particular they have
a large amount of detail
about passive decryptions of VPN traffic.
So here's an example,
it is clear from the slides that
whatever the NSA is doing,
they have the ability to passively decrypt
VPN connections on a large scale.
And they're very happy about it.
I think this is my favourite
Snowden slide ever.
*laughter*
I feel this way when I decrypt things too.
*laughter*
So, if we take a look at what,
and these slides are specifically
talking about IPsec VPNs,
if we take a look at what the
key exchange looks like for IPsec VPNs,
what happens is, we have two hosts
who want to make a VPN
connection with each other,
the key exchange actually uses a
fixed set of parameters
from a small list of possibilities,
and so Alice and Bob will negotiate
which parameters they're going
to use from this list,
and then they will do a
Diffie-Hellman key exchange,
from that they will have
a shared secret, g^ab,
and then they, in the most
commonly used mode,
they also have some pre-shared key,
like a password that has been shared
over some other channel.
And that Diffie-Hellman secret
that was negotiated together
with the pre-shared key
or mixed together to generate
the session key.
So, if somebody wanted to
break a connection of this type,
one option would be to, say,
steal the pre-shared key
through some other mechanism
and then break Diffie-Hellman.
That would be a possibility.
So, if we look what the
NSA's requirements are
for their mass-scale decryption efforts,
they require finding out what
the pre-shared key is,
getting both sides of the connection,
getting both the asymmetric key exchange
and the symmetrically encrypted data,
and then having some metadata.
These are the requirements for them
to get decryption.
And we can also take a closer look
at what their decryption flow
actually looks like,
this is somewhat complicated,
but in this diagram,
so they're getting the IK exchange,
and the symmetric data,
they're sending it into
one system that they have,
they're sending the IKE messages through
out to some high-performance
computing resources,
and then they get sent back with
some data from stored
databases of information
that returns the actual decrypted data.
So that's what the decryption
flow looks like.
We don't have any details
of the cryptanalysis,
but we have details from
the sysadmin's perspective
of how the systems
that do the cryptanalysis
are hooked together.
And they're doing something
that requires high-performance computing,
that takes in key exchanges
and hands out decrypted data.
So, we can line up sort of the NSA's
on-demand IKE decryption
with what a discrete log decryption
would actually look like,
and they're very close,
so they would both require
the pre-shared key,
both sides of the handshake,
both the handshake and the symmetric data,
and they would send off the data
to high-performance computing.
So in the same set of slides,
they also discuss targeted implants
against particular implementations,
if you were going to design a
backdoor to make your life easy,
you would have fewer
requirements than this.
Potentially.
There are many kinds of backdoors
that you could design,
but if you were being clever about it,
you might try to make it
a little bit easier on yourself
to decrypt the mess.
So I will let Alex finish with this.
*applause*
So to wrap up,
what we've seen today
through the cryptanalysis
of Diffie-Hellman
is probably a mass surveillance dream.
The algorithms that we've talked about
would let a government with
sufficient resources
to invest in these precomputation attacks
break connections on an almost
unheard-of scale,
across almost every widely-used
crypto protocol on the Internet.
Here are some numbers again,
for HTTPS, the top million sites,
we're looking at a device like
the ones we hypothesised
breaking connections to maybe
56% of them passively.
For IKE, for Internet key
exchange v1 and v2,
we're looking at in the 60%s of servers
are potentially compromisable
using this same hardware.
For SSH, for IMAP with secure encrypted
connections, for SMTP with STARTTLS,
the encrypted mail transports,
all of these protocols are
potentially jeopardised
by the same kind of attack,
because everyone fundamentally,
so many people fundamentally
rely on the same underlying cryptography,
often with the very same public parameters
that are so widely shared.
So what can we do about this?
So first, let's go back to the
Logjam attack again,
using 90s-era backdoored crypto
that lets any of us break connections
to modern browsers.
Luckily, browsers have already started
to mitigate this, as I said,
by increasing the minimum strength
of Diffie-Hellman they support,
although there's still a way to go there,
since they're all still accepting
1024-bit key exchange.
Our biggest recommendation
under here though,
I think the lesson is:
don't backdoor crypto!
Right, because the backdoored crypto
of 20 years ago is now coming back
to bite everyone.
*applause*
And then, we have the second attack,
the 1024-bit case that enables
so much mass surveillance.
Well, to get around this,
we're going to have to do some upgrades.
Probably the easiest thing to do,
and the thing that almost
every cryptographer that we talked to
recommends now,
is to move to elliptic-curve crypto.
Yes, there's been talk
about whether the specific NIST curves
may have been backdoored by NSA,
but by and large, we think that
elliptic curve is the most sound choice
we have for now.
Now if elliptic curve isn't an option,
and there's technical reasons
why it might not be,
at the very least use
a Diffie-Hellman prime
that's 2048 bits or longer.
If even that isn't an option,
you're using legacy systems
for some reason,
well, or Java yes, thanks,
if there's anyone there who works for Sun,
please, please tell them
to fix the crypto in Java!
*applause*
But if that's not an option,
if that's not an option,
the fallback is you can generate,
at least generate your own 1024-bit prime.
Mind you, there various tricks
that you have to make sure you do
when generating a prime,
it must be a safe prime,
but there are implementations
of doing this,
so it's not exactly free to generate
your own 1024-bit prime,
but it's inexpensive,
and if you have no other option,
at least so that this large
government adversary
has to spend a lot of precomputation,
a year perhaps, targeting
you individually,
and they can't just get this for free.
Alright, so, that is our talk for tonight,
we're saving a lot of time for questions,
thank you all very very much.
*applause*
Herald: Nadia. Nadia and Alex,
thank you very much.
We installed some microphones
here in the room,
so please queue up, but first,
signal angel, do we have
some questions from the net?
Signal Angel: Yes, we have a lot of questions.
First question is,
do you think it's possible that the NSA
uses quantum Shor factorisation
for 1024 or bigger keys already?
NH: I would believe it is much more likely
that they're using classical cryptanalysis
for 1024-bit keys than than they have
a quantum computer that
nobody has heard about.
Herald: And another one?
Signal Angel: Another one... Is it thinkable
that the NSA solved the P=NP problem
but keeps quiet?
*laughter*
AH: Probably not, but if they have,
yes, I think they'd want to
keep quiet about it.
NH: I hope they would tell us!
AH: I hope they would tell us too,
but I doubt it.
Herald: Okay, and over to
number 1, please.
Q: Two questions.
First, is it reasonable to think that,
is it possible they are attacking
individual RSA keys,
that they can fetch individual RSA keys
in about a week with custom hardware,
and two, NSA Suite B came out 2005
and they don't use Diffie-Hellman,
so NSA themselves, they told us in 2005,
"we won't use Diffie-Hellman",
so is it reasonable that,
when they changed the requirement
for top secret, we should follow?
AH: Well, to the first part
of your question,
about whether they're factoring RSA,
I think the answer for 1024,
is very likely, yes they are,
for high-value targets.
So if you're a major website at least
and you're using a 1024-bit RSA key,
well, it's long past time to change
to a higher strength.
NH: If the NSA has not factored
a 1024-bit key,
I'm going to be very disappointed,
I'm going to ask where
my tax dollars are going.
*laughter, applause*
And also I think actually,
the point of sort of watching
what the defensive side of the NSA
is advocating in terms of recommendations
is actually a wise thing to do,
because as far as we know,
at least the public recommendations
defensively should... I mean,
making recommendations for people
who are building systems that are
going to be handling classified data,
so they should be solid recommendations
as far as we know.
AH: What the NSA has told me
about those recommendations, by the way,
is that as long as you
follow them exactly,
you're going to be okay,
but if you deviate in any
small way whatsoever,
then they make no guarantees whatsoever.
So, think about what that might mean
in terms of your implementation
the next time you read through
those particular recommendations
that they make.
Herald: Okay. Then we hop over to
microphone 3, please.
Q: So, for the moment, is
elliptic-curve-based
Diffie-Hellman secure?
NH: I hope so.
AH: It doesn't suffer from
the same shape of attack
that we've described here.
As far as we know, there's not a way
to do this same kind of precomputation
for elliptic-curve Diffie-Hellman.
NH: So what we didn't mention in the talk
is, so, one of the reasons that
elliptic curve keys are so much shorter
than, say, finite-field
Diffie-Hellman or RSA
is because we have this
superpowerful index calculus
number field sieve-type algorithms
for factoring and for discrete log
over finite fields,
and those don't seem,
we don't actually have equivalents
of those algorithms for
properly generated elliptic curves.
So, that's why those key sizes are shorter
and that's why we think
they seem to be more secure.
Herald: Then we take another one
from microphone 3, please.
Q: Yes, you said that when doing
the precomputations
for commonly-used primes,
you can reduce the effort you have to put
in a single connection
to about 70 seconds.
How is that usable?
If my TLS handshake is delayed 70 seconds,
I already ran away.
AH: Ah! So we refer you to the paper
for the full answer to that,
but it turns out there's a bunch of tricks
that you can do to keep
a session handshake open
for at least 70 seconds.
So, this may not be what you want to do
to the connection, say, in a web browser
that's loading index.html,
but whichever one is loading, say,
the, I don't know, the 1-pixel
tracking image in the background,
that nobody sees,
which is also getting the same
session cookie,
that one you can hold open
for 70 seconds
without the user noticing.
So what we've been able to do
is show a variety of ways
that we can trick
browsers and other implementations
into holding the connection
open long enough.
Also, 70 seconds is just
what we were able to do
with a few weeks of hacking
around and optimisation,
we think that with
not that much more effort
we could get that number
down quite a bit more.
But 70 seconds we think
already is not so bad,
and there's plenty of ways
that we can exploit it.
NH: Proof of concept.
Herald: Okay. Do we have
something from the net?
Signal Angel: How long do you estimate the security
of RSA-DHE to sustain,
and do you have any idea if and when
there's any quantum encryption algorithms
that will soon be available to be used
by a broad public?
AH: Oh, quantum encryption algorithms.
NH: You should watch Dan
and Tanja's talk from yesterday.
AH: Yeah, last night was the time
to hear about that.
NH: The dangers of quantum cryptography.
I mean, the short answer is
that people who know
what they're talking about
have said we should start worrying now
because we may see quantum computers
within the next 15 years, maybe.
But it's really hard to speculate about
advances in physics that
may be pretty far off.
Herald: Do we have another one?
Signal angel: Sure. What's your
opinion on the NIST curves,
especially with the current rumours
about the curve parameters
having a backdoor?
NH: There are no known ways
that the curves could have been backdoored
with the given parameters.
AH: But if you don't trust them,
you know Dan Bernstein
has a curve you can use too.
NH: So...
*applause*
NH: Do you trust Dan,
or do you trust the NSA?
*laughter*
Herald: Over to 2, please.
Q: Some of the little bit
that you recommend,
you say Diffie-Hellman is worse
than RSA now,
so, does it mean I should switch back
to RSA, preferring it instead
of Diffie-Hellman?
AH: With equivalent key sizes,
equivalent sizes of your primes,
or your RSA modulus,
yes, we are saying that.
That in the 1024-bit case,
there's strong reasons that you should
distrust the very common repeated primes
for Diffie-Hellman.
But that's not the whole story.
Right, so for longer sizes of modulus,
larger strengths of crypto,
RSA is probably still okay.
But I think either way,
switching to elliptic curve
for key exchange
is probably the thing to do right now.
NH: I think the precise statement
that we can make
is, if you're comparing 1024-bit
Diffie-Hellman
to a 1024-bit RSA key,
that if you're using Diffie-Hellman
with the most commonly used parameters,
say, the Oakley group 2
that everybody on the Internet is using,
and you think it is likely that
a large government agency
has already done the
precomputation for that prime,
then breaking an individual
connection using that prime
with Diffie-Hellman key exchange
would be much, much, much less effort
than factoring a freshly generated
1024-bit RSA key that is unique to you.
Even if that 1024-bit RSA factorisation
is within range of the NSA,
it may not be worth their while
to actually factor your key.
Whereas breaking a
Diffie-Hellman key exchange,
they've already done the hard work
to break everybody on the Internet,
so, you're just one more fish.
That's the precise statement
that we can make about the security.
The real answer: use elliptic curves,
or, to use 2048-bit
Diffie-Hellman: probably fine.
Herald: And, over to number 1, please.
Q: How realistic is it to use, or to create
a new prime for every exchange
or at least every few exchanges?
NH: So, unfortunately, the properties
that you need for discrete log to be hard,
you need to have a safe prime
and you would hopefully like it
not to be backdoored,
generating safe primes is
still kind of effortful
on modern hardware,
I mean if you try to do it on your laptop
it will probably take like, I don't know,
a minute or something.
So, it's actually a lot of effort
to generate a new safe prime all the time.
Just use a larger safe prime
and you'll be better.
Herald: So we're running out of time,
but let's... with number 2.
Q: You said that elliptic
curve cryptography
is not susceptible to
this precomputation attack,
is that luck, or is it
engineered to be that way?
*AH laughs*
NH: ...luck?
AH: In part!
NH: I mean, a combination of both, but
so as far as we know, I mean, you can't do
precomputation with elliptic curves,
so, you know, sort of generically,
the best thing that you can say
is you can do a lot of precomputation
but you still have to do a lot of effort
for each individual value,
so you could do, you know, generically
if you want to break an elliptic curve
you could do like,
a square-root-of-n attack
against the key size,
you could do, say, n^2/3 precomputation
and then you would have n^1/3 online work
if that makes sense to you.
But you get less effort as far as we know.
Less benefit.
Herald: Sorry. We're going to finalise
then, with number 4.
Q: What do you think about blacklisting
these common primes,
just in the modern browsers?
Will this get rid of this issue?
AH: Just blacklisting the common primes,
well, if you blacklist the common primes,
if you blacklisted the common primes
when we first came up with this,
you'd immediately break
about 10% of websites
because there's not a good
fallback mechanism
if you don't like the prime you got
during key negotiation.
What the browsers are more likely to do
is to phase out this kind of
finite-field Diffie-Hellman entirely,
over the next larger number of years.
So first they're going to
start rejecting things
that use unusually weak primes,
that's what they're doing already today,
but I think in the long term
they're going to encourage the use
of elliptic curves as an alternative,
if you want forward secrecy,
elliptic curves will be the way to get it.
Herald: Nadia, Alex, once again,
thank you so much.
AH: Thank you.
*applause*
*postroll music*
subtitles created by c3subtitles.de
in the year 2016. Join, and help us!