34c3 preroll music
Herald Angel: Today we have our speaker
Filippo Valsorda. He is a crypto-engineer
and he is specialized in Go. He is some
kind of Go wizard so to say and used to
work at CloudFlare. He actually shows now
today in the talk how he made an attack to
exploit a bug in the implementation of the
elliptic curve P256 in Go, that's why he's
a Go wizard. The reason is actually due to
a misplaced bit on the assembler level,
just due to the curves implementation. The
result is that in certain cases you can
retrieve the private key in the elliptic
curve Diffie-Hellman encryption scheme.
Please welcome Filippo Valsorda to his
talk, give him a round of applause. Thank
you very much.
Filippo Valsorda: Thank you.
applause
Thanks. I love the term crypto-golfer. Tony
came up with it and I just want business
cards with that on it. Anyway, okay, I'm
Filippo. As you heard I worked on Internet
infrastructure, I work on Go, I work on
cryptography. But this is a collaboration
with Sean Devlin, you might know him as
one of the authors of the Cryptopals or
the matasano crypto challenges. A few
months ago earlier this year a bug.. a
CloudFlare engineer was scanning CT logs.
CT logs are big logs of TLS certificates
and he was checking the ECDSA signatures
on these certificates and one of the
signatures was not verifying. Which is
weird, because CT logs check the
certificates before adding them to the
log. It turned out that the bug was in the
code that CloudFlare was using to verify
the certificates. And more specifically it
was in the Go standard library. It was in the
implementation of the NIST P256 curve,
which is a popular, very hard to implement,
elliptic curve that is used for example
for ECDSA TLS certificates. This curve has
an assembly implementation in the Go
standard library, of course, to squeeze
every bit of performance out of it, and
it's specifically optimized for x86-64
architecture. So that's where the bug was.
There was a carry propagation bug. It was
reported upstream and everyone agreed that
this was not obviously exploitable.
But Adam Langley also said that it would
be a cool paper though. And well I mean,
how do you pass on that? So Sean and I met
in Paris and spend a weekend and then some
eating Thai food and staring at this
assembly code to try to understand what is
it doing. And one month later we have a
CVE and two Go security releases. We found
a way to go from this single carry bit bug
to a full key recovery against protocols
that use elliptic curve Diffie-Hellman
within ephemeral static way. If this means
nothing to you, it's okay, I will try to
go over it. One of these protocols for
example is JWT. Jozy, or however you want
to call it - it has 15 different acronyms.
And it's often implemented in Go, so this
was exploitable against real-world
services. So, let's start by looking at
the code with the bug. Let's take it from
the beginning. This is the short assembly
function that does subtraction. When you
do elliptic curve math, you usually work
on a field, you work on math modulo some
prime p. So if it was subtraction you do a
minus b modulo p and this is what this
assembly snippet does. It sets a to a
minus b. Of course these are numbers way
too big to fit into a single register. So
how do you do math, when you can't fit
into a single register? You do multi-
precision math. And the thing is, you all
know how to do multi-precision math, you
learned that in elementary school. When
you would write numbers in columns and you
do math with register size of 10, because
for every digit you would subtract two
digits and carry the minus one if it went
negative and then subtract with the carry
and then carry the minus one. That's
exactly what this is doing, but it's doing
it instead with digits with 64-bit registers,
because this is a 64-bit architecture
. So we look at the first
lines: the first lines is nothing else
than subtract, subtract, subtract with
carry. And then the carry is finally
stored in that mul0 accumulator. But then
what do we do if it goes negative?
We said that this is modulo p so we can't
just let it wrap around modulo 2^256,
because that's how wide, you know, 4
registers are. But since we're doing
arithmetic modulo a number, we can just
add that number and the result is the
same, right? Adding p modulo p is a no-op
in the result. So that's what this code
does: It does a equal a minus b, it takes
a copy of the result and it adds p. And
then in constant time, it uses that final
carry to check, if it went negative or not
to decide in constant time, which one to
use. The one with b , so a minus b plus p
or a minus B - and that's subtraction.
Straightforward enough. Now the problem
with this code is that if you look closely
you can see something that might be weird
if you're not familiar with assembly. It
still trips me over. To use a condition
like these constant time conditionals
there. Which have to be constant time,
because you don't want to leak timings
based on the size of number. You have to
first operate on mul0, so that you set the
flags, the zero flag. So normally what you
do is either add 0 and 1 to your mul0
to check if to set the flags. But that's
not that's not an add, that's an add with
carry. It means that it's adding zero to
mul0 and the carry bit from this addition
here, which has nothing to do with it,
it's not supposed to be there. Like this
is adding p, but it doesn't matter, if it
overflows, because if it does, it wasn't
going to be picked here anyway. So it's
adding another bit into the operation that
wasn't supposed to be there. So it's
flipping the result, but then also the
conditions here are flipped, so
essentially it evens itself out. Except:
when that carry bit happens not to be set,
because the number a minus b is small
enough, that if you add P, you don't wrap
around. That happens once every 2^32
times, which is why it's so rare that
nobody had noticed so far.
So the code went in and nobody noticed for
a long time, until CloudFlare first
started scanning CT logs and had this
weird one signature that wasn't verifying
and they were lucky enough to actually
have it in the logs because you know if it
happens transiently you might just think..
I don't know, the connection broke, right?
So this is what we call a carry
propagation bug. Carry propagation is the
activity of making sure that you carry the
1. And here it's a bit weird, we didn't
forget to carry it, but we carried a bit
that we weren't supposed to carry into a
result. Most of the times that goes well,
sometimes that breaks. When that breaks:
wrong result! Wrong result, wrong point
computation and wrong point computation,
so what? Like how does forgetting to carry
the one lead to a full key recovery? This
is not one of those bugs, where like
buffer overflow you know what you're
trying to do even if you might have to
jump through so many hoops, because there
might be all these protections, you know
what your capabilities are, you can
overwrite some memory. Here instead it's
not clear what your capability is. So
today I'm going to try to explain that.
How we turn this very rare failed
computation into a complete key recovery
attack. But first I apologize that I have
to give a elliptic curve cryptography
crash course here at CCC. So we've seen
the field is nothing else than operations
modulo p, then there are the points: the
points are x and y, the coordinates. They
fit an equation, which we don't care
about, they just fit an equation. They are
integers, so we can work with them, and we
can use them to build a group. A group is
one of the core structures in modern
cryptography. To build a group we need a
zero point, a generator point, and an
addition over the group, over the
points. So we define an addition,
again, we don't care about how addition
works. It's just: you take two points, you
add them together, you get a result.
It has all the properties that you
remember from elementary school addition:
it's commutative, it's associative. And
then you have multiplication: You don't
actually define how to multiply a point,
but if I tell you that you have an
addition operation and I want five times
this point, what do you do? You take the
point and you add the point and you add
the point and you add the point,... so
this is called scalar multiplication:
doing multiplication by adding repeatedly
a certain point.
So now we have a group, which is given by
multiplying the point, a generator point, a
certain number of times and we have a
multiplication operation. So how do we
build cryptography out of it? This is all
awfully abstract so far? So we build
elliptic curve Diffie-Hellman. If you're
familiar with normal Diffie-Hellman, you
will see something snap at some point. The
private key is a giant number, this is
important. The private key in ECDH is just
a random giant 256 bit number. And then we
have a public key. A public key is that
giant number multiplied, the scalar
multiplication we just talked about, by a
generator point. If you know normal
Diffie-Hellman, this is like doing g to
the a. If you don't, it's okay, this is
just multiplying private key by a point.
And then, when I have my private key and
my public key, you send me your public
key. We need to agree on a shared secret,
how we do that, is that I take your public
key, which is a point. I take this and I
multiply it by my private key, here, so
again: it's my giant number private key
multiplied by your point. That comes
together: The two results are the same,
because if we do my private key times your
private key times g it's the same as your
private key times my private key times g,
because that's commutative. And we land on
a shared secret. And that's all we need to
know about elliptic curve cartography to
exploit this. However, there's one thing
that I glossed over: It's easy to multiply
by five, you add five times.
But if I'm asking you to multiply by a 256
bit number, you can't add 2 to the 256
times a point, right? So what do we do
there? Remember that here, what we're
trying to do is the multiplication: The
private key times the public key, the
point. We do something called double and
add: We take our private key and we string
it out like bits. We start from the most
significant bit. This is little endian,
because I have gotten this slide wrong the
first time. But, you know, you just claim
that you meant it to be the opposite
endianness. Anyway, that's the most
significant bit, the one that is two to
the 256, no, two to the 255. If you flip
it, you're adding or removing two to the
255. So you start with 0, that's zed, 0,
nothing, and you check the first bit, the
most important bit: Is that set, yes or
no? Yes, so you add the point Q, which is
the point we're trying to multiply by this
giant e. So we add the point and then we
move down one by doubling. Can you imagine
how we double something? Remember we only
have addition. We add it to itself, of
course. So we use addition to double the
point. And you might see, where we're
going with this. We double every time we
slide down one bit.
By the time we arrive at the end, how many
times did we double that first point that
we added, because the first bit was one?
255 times! That bit was worth 2 to the
255, so at the end that will have the
value it was supposed to have. And we keep
going, we check if this bit is 1: Is it 1?
No. So we do nothing, we double again to
move down one. We check if this bit is
one. This bit is 1: so we add the point.
So we did add the point, double, double,
add the point, double, moving down one and
we keep going like this. We keep going
like this, until we reach the least
significant bit. At the least significant
bit, if it's one, we add the point, if
it's not, we don't add the point. And at
the end we have the correct result and the
result comes from this sequence of
operations which at most are twice 255
operations, which is something that we can
do concretely. Now why did I explain this
very specific algorithm to you. Because to
understand this attack, you have to
recognize that each key, so each string of
bits here, converts into a very specific
sequence of operations. Okay, if you
change one bit, there will be an one more
addition one less addition and each key
has a very specific sequence. In this case
it's add, double, double, add, double,
add, double, double, and it keeps going.
So back to our bug. If you spaced out,
because we took a lot of crypto, I saw
you! But the two things you should take
away are: There's a giant number, it's the
private key, we want to multiply the giant
number by a point and we do that by doing
additions and doubles in an order that is
specified by the bits of the giant number.
That's what you need to have clear, the
only thing. So let's go back to see how we
use that to turn our very small carry bug
into a complete key recovery attack. First
thing we do is we bubble it up. That
function that breaks is called P256
subinternal. That's the function I showed
you earlier. P256 subinternal is used by P256
point add, which is what we spoke about
adding two points, the only important
operation. And adding two points, we've
seen, we use when we're multiplying, when
we're doing that scalar multiplication,
which is d times q, d times the point. And
how is scalar mult used? Here we finally
surfaced to a level that if you work with
cryptographic protocols you might start
recognizing pieces of. Scalar
multiplication is the operation that the
peer does in elliptic curve Diffie-
Hellman. There's a scalar, which is the
secret, which is the private key. There's
a point, which is the public key of the
other person, which might be the attacker.
So the scalar multiplication here,
speaking in InfoSec terms, has an attacker
supplied point and a secret scalar and the
result, the shared secret, is the session
key. For example: TLS, when you open a
connection with TLS and you're using
elliptic curve Diffie-Hellman, then you
will do this dance to agree on a session
key. If the session key is correct, the
connection will open and you will be able
to, I don't know, get send HTTP request.
If the bug is hit and the result is wrong,
so the result bubbles up into a wrong
shared secret, the session key is wrong.
And the session key is wrong you notice,
how do you notice? The connection breaks.
So this is what in cryptography we call an
oracle.
You have an oracle that you can call and
send a point, because that's your public
key, you're the attacker, and you send
that point and it gets multiplied by the
private key. And it gives you one bit of
information, did the bug trigger or did it
not? Most of the times it won't, because
remember, this is an extremely rare bug.
So you have an oracle that tells you: Bug
happen, bug didn't happen, based on the
point you sent. And let's assume that the
key stays the same, we'll talk about that.
Can you imagine how we use that to start
learning things about the key? Well, let's
say, that we can magically conjure a point
that in that sequence of operation, that's
why I told you the sequence of operation
was important, the bug happens very
specifically at that addition and we find
another point where the bug happens very
specifically at that double. If we know
already these bits of the key, and we
aren't sure about this one, what can we do
with these two points? We send them both,
one of them will break the TLS connection,
the other one will succeed. That means
that the one that broke triggered the bug,
the one that didn't break, didn't trigger
the bug. And we know which one corresponds
to which key. Because we magically made
special points that only break very
precisely at that point of the
computation. Okay, so we learned a bit of
the key. In cryptography as soon as you
learn one bit of the key, there's probably
a way all the way down. So we build what's
called an adaptive attack. Let's say we
have these bits, but we want to learn
these, too. We compute two points, one
that breaks, when the addition happens
exactly at that point in the double and
add a procedure, and one that triggers
only when the add doesn't happen at the
specific point in the double and add
sequence. We send them both, only one of
them breaks the TLS connection, well, then
we know a bit! We go back to our magic
point generator and we generate two new
points.
This time we don't look for things that
break here, we look for things that break
here. We make two points, we send them
both. One of them breaks the connection,
the other doesn't break the connection. We
learned one more bit of the key. We go
back, we make two points, we send them
both: One breaks the connection, one
doesn't. We keep going like that, once for
each bit. Every time we go back and we
adapt to what we learned so far. That's
why it's called an adaptive attack, we
can't pre-compute all these points. We
have to come up with them while we learn
the key. And the beautiful thing about
adaptive attacks is that they look exactly
like Hollywood, it's beautiful! Because
you see them flipping and going through
values, getting it right and moving to the
next one, which you all told it was fake,
it was not! Everything else is fake. Okay,
so, this attack we came up with that, we
told we had something extremely novel, we
went to the literature and everyone that
had picked an academic career in the
audience knows exactly what happened: We
found a paper that was doing exactly this.
However, you know, it was a little
different. It was still P256 and it was
still ECDH and, hmm.. okay it's really
similar. But it's an attack that depends a
lot on the implementation details, how you
pull it off. You can't suddenly just
repurpose the code, but the idea so far:
an adaptive attack where you send two
points and you check which one breaks is
the same as a BBB paper from a few years
ago when it worked against open SSL
instead. Instead of against this bug in
the Go standard library. So instead from
now on we're going to talk about how
exactly we implemented this against Go,
because this changes a lot based on the
implementation details of the library
you're working with. So this was the
general idea of how the attack works. All
that: You find points that break at the
right time, you send them both, and that
way you learn a bit of the key,
except I lied to you.
I lied to you, because I lied to you on a
lot of things: The first one is that Go
doesn't do double and add one bit at a
time, it does it five bits at a time! It's
called Booth multiplication, it took us
forever to figure it out. It's an 1980s
paper. Instead of adding one point or zero
points and then doubling, it adds between
minus 16 and plus 16 points and then
doubles five times moving down. It just
does it in blocks of five. So it splits up
the key and then looks at each block of
bits, picks a value from a pre-computed
table, which is just all the values from
one times the point to 16 times the point
and in the loop it doubles five times,
because it moved five bits down. And then
it chooses, which point between zero and
16 to use from the table and it adds that
to the rolling result, instead of adding 1
or 0. There's also a bit of constant time
arithmetic there, because the other thing
I lied to you on is that there's no such
thing as at 0 point. It's an imaginary
point that we add to make the math work,
but when you try to tell the code to
actually add the zero point it's like
asking you to divide by 0. It just won't
do it! But you know how you add 0, right?
You do nothing! So there's some constant
time code here that looks at it and if
it's 0, it does nothing. If it's not 0, it
adds the value.
So the first thing we had to do to adapt
to this format is that we worked in limbs.
When you hear me say limb, it just means
that we will look at each five bit block
on its own as it is zero to sixteen and a
sign value. That's much easier, because
it's actually kind of weird how the five
bits are extracted and I don't want to
bore you with it. So let's just consider
that we look at each group of five bits
converted into a value from zero to
sixteen and a one and a sign or plus minus
sixteen. So when you hear me talk about
limbs you just know that it means the five
bit value from the key. This is still the
giant d key that's the multiplier. So how
does the attack change? Not that much!
Instead of attacking one bit at a time,
you know two points - one that breaks for
zero, one that breaks for one - we attack
one limb at a time, one that breaks for
one, one that breaks for two, one that
breaks for three, sixteen, minus one,
minus two, minus 16. So, to recover five
bits of the key, we'll need on average
half the space: seventeen points, which is
a little less efficient than the bit-by-
bit one, because that would be five points
for five bits. So, however, how the attack
triggers is the same: We're looking for a
bug that happens in the five doubles at
the very right time or that happens at the
addition at the very right time. Now
that's still the high level of how we're
going to do it, but in practice I didn't
tell you how we're going to magically
generate these magic points that break
just at the right time. And I didn't tell
you because it's fuzzing. There is there's
no specific way to generate them
algebraically, so instead we just hooked
the assembly code with something that
would just set a boolean, you know, true,
false, if the conditions for the bug are
met. And then we run through a lot of
points. And if for each point we run it
through the limbs we already know,
remember this is an adaptive attack, so
we want to learn the next limb.
We learned a few limbs, we want to learn
the next one. We run through the ones we
know and then we try all the possible
values for ones that we don't know. If one
of them and only one of them breaks,
that's a useful point. Because if we send
that point and it breaks, we know exactly
what the value of the next limb is. Okay,
so this is going even more low-level. If
you're interested in optimizations, we had
to run through a lot of candidate points
and for each point we needed to know the d
value. Because we can find a magic point,
but if we don't know the private key, we
don't know if the entire protocol works
and our oracle doesn't work anymore. So to
work with that instead of multiplying a
new random private key every time we just
add that one to the private key and added
the point to the public key, this is just
an optimization. We called into the
various assembly of the standard library.
Don't do this, but it's beautiful. You can
go call all the unexported functions in
the standard library. I will never approve
it on code review, but I love it. And then
there's some post-processing to make sure
that the bug is actually there, because
sometimes there are false positives. So we
just run it through the broken code, the
fixed code, is the result the same?
Well, false positive, is it
different, good. Okay, so we have a way to
move through the attack. The only thing we
don't have yet is: How we figure out the
first limb. The first one, the most
important, the most significant one, when
we still don't know anything about the
key. The problem with this one is: It's
like this, so let's pick three, it's an
example okay, let's pretend that the limb
is three. So we do our usual thing, we do
our fuzzing and we find something that
breaks at the fifth doubling and we send
it and it breaks. It means that the first
limb is three, right? Wrong. Sadly, it
might mean that the limb is also 6 or 12.
Because how six is selected for example is
that three is taken, it's doubled and
saved in the precomputation table as six,
then it's taken out of the table, doubled
five times, but what happens after you
double six four times? What's the value?
The exact same as doubling three five
times, and the sequence is the exact same!
So we don't know, which one it is, because
we only know that that's the sequence but
that doesn't tell us anything. And the
same happens for 12, 12 is nothing else
than 3 double double. So if we double it
five times, at the third double it breaks
and we only know if it breaks or not. So
we can't move, so instead what we do is
that we find three points: one that breaks
after doubling three five times, one that
breaks doubling it six times, and one that
breaks after doubling it seven times. We
send them all and we look at which ones
break. Only this one? It's a three! The
first and the second but not the third
point? It's a six! All three? It's a 12!
This took me forever to wrap my head
around, this is like pure shaman insight.
Okay now I can feel that you're getting
bit tired, this is intensive, it's a lot
of math, so let's go for a change of pace
and let's talk about kangaroos instead.
I'm gonna tell you something that I
learned from a cryptographic paper, I
swear. And it's about how kangaroos jump.
Apparently kangaroos jump depending on how
the terrain on which they are jumping from
is. Depending on that terrain, if you put
two kangaroos on the exact same spot, they
will jump the same distance and
approximately the same direction. I don't
know, if this is true, but Pollard said so
in a paper and I am NOT arguing with
Pollard. So now, why is this useful? Well,
this makes for an extremely cool way to
catch kangaroos! I mean, did you expect
some math or crypto? I know, we're
catching kangaroos here! So you take a
kangaroo that you have on hand, because
you all have a kangaroo on hand. And you
put a GPS tracker on it and you let it
loose. This kangaroo jumps and it roams it
enjoys a brief stint of freedom and it
runs and at some point you go pick it up
and because you know where it is and you
place a trap there exactly where you find
it. What happens to the kangaroo that is
just passing by? If it steps at any point
on one of the points, where the other
kangaroo jumped from there on it will
follow the same path. Because each jump
depends on the ground. So this way if the
wild kangaroo lands on any of the prints
of the previous kangaroo you're catching
it because eventually it will end up in
the trap. Okay, this had nothing to do
with the talk, I just wanted to share
this!
Now, okay, so here is how this converts to
crypto. We can make elliptic curve points
jump like kangaroos, we just have to make
the jump deterministic based on the input,
based on the starting point. For example
we can take a hash, any hash, you design
the hash. Apparently that's popular in
cryptocurrencies to design your own hash
..
laughter
And you hash a point to another point and
when you want to do a jump you take the
point and you add it to its own hash, so
Q_N+1 depends only on QN, and this is just
like the kangaroo jump. How do you use
this for what we're doing? We want to
recover d, right? We want to recover the
multiplier, the discrete log it's often
called, of the public key. How we work
with this is that we take a tame kangaroo,
a point that we know the d off, and we
make it jump a lot. It keeps jumping and
we remember what the value is at the end.
We take that value at the end and we save
it. No need to keep all the points in
between, so we don't need some giant
memory construction and then we take the
point that we don't know the d of and we
make it jump a lot. What happens is that
it has much higher probability to
intersect one of the various prints of the
previous point. When it does, it will
eventually end up in our trap, it will end
up having the same value as the one we
calculated earlier. When that happens, we
know how far the wild point traveled,
because we were the ones making it jump.
So we can just walk backwards to the
starting point. It turns out that this is
extremely efficient to find the d value
when you know the interval of the d value.
The intuition there is that if the
kangaroo starts in a small area: You know
when it's been too much time that it
probably travelled past your trap. So you
have to rewind time, which in crypto you
can, and try again, because it went past
your trap. So the smaller the interval,
the more precise you can be, the more
efficient attack. This is called the
Pollard-kangaroo attack. It's described in
original paper from the 1980s, it was
described on Diffie-Hellman first, but it
works on any group, so it works on
elliptic curves. And the elliptic curve
version is then improved by a few papers
later and there is a chapter in the
elliptic and hyperelliptic handbook
that describes it.
So we have it all, we have how to start,
we have how to run through the attack and
we have a how to end. So now let's get
concrete, let's pick a target, as I said
this attack works against JWT. JWT made
... a lot of questionable choices. One of
them is to include as one of the public
key algorithms elliptic curve Diffie-
Hellman but not the version of Diffie-
Hellman you and I are familiar with. The
one where we both generate a new private
key every time, which makes this attack
impossible. Remember that this is an
adaptive attack. We need to query the
orcale for the same private key over and
over and over again. Instead there's a
variant called ephemeral static Diffie-
Hellman, where one of the two sides is
always the same. This is sometimes done as
an optimization, don't do that. OpenSSL
was doing that and it stopped doing that
after a bunch of attacks came from that.
So in TLS that usually doesn't happen and
the Go TLS stack thankfully never did
that, so the attack doesn't work against
TLS. But JWT is encoded it straight into
the standard, because if your public key
is fixed, so is your private key, always
the same. So if we have a service that
accepts things encrypted with a ECDHES
algorithm, we can use this attack. For
example against the popular implementation
Go-Jose, it's not Go-Jose's fault and Go
1.8.1.1, the latest vulnerable version,
and we can just check if the service can
decrypt what we're sending it. It can,
because it throws HTTP error when it
can't, because of different timings. This
changes in any case, but what you need is
the Oracle that tells you: did it work did
it not work? Did the bug trigger, so are
you right about your prediction of the
limb or are you not?
Then of course we need to do a lot of
work. If you don't have the resources of a
big corporation of where to spin up
things, you can just use EC2 spot
instances. How we architected that, is
that there will be a small piece of code
that do nothing intensive on your laptop
or anything. It would accept HTTP requests
from the workers. The beautiful thing
about this infrastructure is that you can
horizontally scale the workers, spin up as
many as you want on node.js platforms,
because the only thing that they need to
be able to do: they don't need to have
ports open, you don't have to worry about
NAT, you can even run it on your botnet,
because the only thing they have to do is
connect back and ask for work and then
after 30 cycles or when they find the
point, connect back and say I found
something, I didn't find anything, give me
some more work and send the result. This
was also useful, because if you get the
worker code wrong or if you want to change
the deployment, you can just redeploy the
workers without losing state on the
dispatcher. Because the dispatcher just
keeps running and it will just wait for
workers to come and ask. Specifically we
built this just with the small script that
you can start an EC2 machine with, because
we didn't even need to make a custom
image. So a few figures, a few numbers.
Each Key has 52 limbs, it will take a bit
less than that, because we have kangaroos.
But let's say approximately 52. Each limb
is 16 points on average. It would be 17,
but two of the points are faster to find,
so let's say 16. Each point takes
approximately 2 to the 26 candidate
points, before we find one that triggers
the bug at the right point. Since we need
16 candidate points and each takes 2 to
the 26 candidate points, that takes
approximately 85 CPU hours. That's like
one CPU running for an hour, 1 core. Turns
out that you can get 85 CPU hours from
spot instances for about a dollar and a
quarter, which makes the total cost of the
attack something like 60 bucks.
Which was a relief, because I had done the
math wrong first and the it came out as at
1000 and I run the demo tonight and I
didn't check the bill, yeah.. Anyway, it's
cheap. Now, I am not brave enough to run
the attack live, because, yes, it's a nice
infrastructure, but no, I don't trust it
that much. But also, because it takes a
while. If you don't want to spin up
infinite number of EC2 machines, you have
to accept that it would take about, I
think, our attack run in 12 hours. So
we're gonna look at a sped-up version of a
one-hour video in the next 45 minutes, you
have time, right?
laughter
No, it's a couple of minutes. So this is
the UI. It shouldn't be too confusing and
if anyone works at Hollywood and wants to
license it, we can talk. What you're
seeing is that the red values are the ones
that we found a point for from the workers
and we submitted. And when we submitted
that it resulted not to be the right limb
and the green ones are the ones that
instead broke, so they're the right limb.
Remember that here the target is a JWT
receiving application. And then you can
see the key slowly flipping from the
bottom and it's exactly like Hollywood, I
love that. So you can see the limbs
filling up as we find them and that
approximately there's 30 bits, so 2 to the
30 rounds, so 30 candidates work for each
round for each limb that we find. It
obviously depends on luck and yes the the
thing will probably keep running for a
little while, but this is already at limb
nine, it has to get to 52 and you don't
have that patience. So this was the attack
the code will be open-source soon. Leave
the limbs you lost, they belong to us now.
And: any questions?
applause
Herald: Valsorda, thank you very much for
a lovely talk and the kangaroos. So, we
have a question from the signal angel, go
ahead, please.
Signal Angel: Actually the internet wants
to know: Did you compare this bug to
implementation in other libraries.
Filippo: So I guess that means, if I
looked for similar bugs in other
implementations. We did not, because each
implementation is a bit different. Hanno
works on a lot of fuzzing of bigint
implementations and at one point he asked
me, like on Twitter just today, if I tried
fuzzing the Go implementation for example.
And sadly this is constant time code that
is specific to P256, so the answer is,
there's a lot of them and the bug can be
small and anywhere. It's not like you will
be looking for another bug in P256
subtraction, it can be anywhere in the
underlying math and we can turn that into
the same attack. So, no, we didn't look
for this specific one, but I think that
four CVEs in 2017 on open SSL have
descriptions that are very similar, but
they're about finite field Diffie-Hellman,
like normal DH. If you look for something
that says about it's barely doable,
because all the computation can be done
offline. That's this kind of attacks on
open SSL. Next?
Herald: Are there any other questions from
the Signal Angel? So please line up at the
microphone. Microphone one please.
Mic1: So why can't you determine the
points algebraically?
Filippo: Laziness? No, so it's entirely
assembly and there's a lot of points where
the value is then thrown out or it might
get corrected by .. it's essentially we
didn't see a clear path to this, and it's
$65 on ec2, so it doesn't really change
the feasibility to just fuzz them, so we
just went for the fastest path to the
entrance.
Herald: Are there any other questions?
Filippo: No one is asking about kangaroos,
people, I mean..
Herald: Yes, ask about kangaroos, lovely.
have you been to Australia?
Filippo: I haven't.
Herald: Okay, you have to.
Filippo: Yep, definitely.
Herald: I think, there aren't any other
questions. So Filippo Valsorda, big round
of applause for you. Thank you!
applause
34c3 outro
subtitles created by c3subtitles.de
in the year 2019. Join, and help us!