<i>34c3 preroll music</i>
Herald Angel: Today we have our speaker

Filippo Valsorda. He is a crypto-engineer
and he is specialized in Go. He is some

kind of Go wizard so to say and used to
work at CloudFlare. He actually shows now

today in the talk how he made an attack to
exploit a bug in the implementation of the

elliptic curve P256 in Go, that's why he's
a Go wizard. The reason is actually due to

a misplaced bit on the assembler level,
just due to the curves implementation. The

result is that in certain cases you can
retrieve the private key in the elliptic

curve Diffie-Hellman encryption scheme.
Please welcome Filippo Valsorda to his

talk, give him a round of applause. Thank
you very much.

Filippo Valsorda: Thank you.
<i>applause</i>

Thanks. I love the term crypto-golfer. Tony
came up with it and I just want business

cards with that on it. Anyway, okay, I'm
Filippo. As you heard I worked on Internet

infrastructure, I work on Go, I work on
cryptography. But this is a collaboration

with Sean Devlin, you might know him as
one of the authors of the Cryptopals or

the matasano crypto challenges. A few
months ago earlier this year a bug.. a

CloudFlare engineer was scanning CT logs.
CT logs are big logs of TLS certificates

and he was checking the ECDSA signatures
on these certificates and one of the

signatures was not verifying. Which is
weird, because CT logs check the

certificates before adding them to the
log. It turned out that the bug was in the

code that CloudFlare was using to verify
the certificates. And more specifically it

was in the Go standard library. It was in the
implementation of the NIST P256 curve,

which is a popular, very hard to implement,
elliptic curve that is used for example

for ECDSA TLS certificates. This curve has
an assembly implementation in the Go

standard library, of course, to squeeze
every bit of performance out of it, and

it's specifically optimized for x86-64
architecture. So that's where the bug was.

There was a carry propagation bug. It was
reported upstream and everyone agreed that

this was not obviously exploitable.
But Adam Langley also said that it would

be a cool paper though. And well I mean,
how do you pass on that? So Sean and I met

in Paris and spend a weekend and then some
eating Thai food and staring at this

assembly code to try to understand what is
it doing. And one month later we have a

CVE and two Go security releases. We found
a way to go from this single carry bit bug

to a full key recovery against protocols
that use elliptic curve Diffie-Hellman

within ephemeral static way. If this means
nothing to you, it's okay, I will try to

go over it. One of these protocols for
example is JWT. Jozy, or however you want

to call it - it has 15 different acronyms.
And it's often implemented in Go, so this

was exploitable against real-world
services. So, let's start by looking at

the code with the bug. Let's take it from
the beginning. This is the short assembly

function that does subtraction. When you
do elliptic curve math, you usually work

on a field, you work on math modulo some
prime p. So if it was subtraction you do a

minus b modulo p and this is what this
assembly snippet does. It sets a to a

minus b. Of course these are numbers way
too big to fit into a single register. So

how do you do math, when you can't fit
into a single register? You do multi-

precision math. And the thing is, you all
know how to do multi-precision math, you

learned that in elementary school. When
you would write numbers in columns and you

do math with register size of 10, because
for every digit you would subtract two

digits and carry the minus one if it went
negative and then subtract with the carry

and then carry the minus one. That's
exactly what this is doing, but it's doing

it instead with digits with 64-bit registers,
because this is a 64-bit architecture

. So we look at the first
lines: the first lines is nothing else

than subtract, subtract, subtract with
carry. And then the carry is finally

stored in that mul0 accumulator. But then
what do we do if it goes negative?

We said that this is modulo p so we can't
just let it wrap around modulo 2^256,

because that's how wide, you know, 4
registers are. But since we're doing

arithmetic modulo a number, we can just
add that number and the result is the

same, right? Adding p modulo p is a no-op
in the result. So that's what this code

does: It does a equal a minus b, it takes
a copy of the result and it adds p. And

then in constant time, it uses that final
carry to check, if it went negative or not

to decide in constant time, which one to
use. The one with b , so a minus b plus p

or a minus B - and that's subtraction.
Straightforward enough. Now the problem

with this code is that if you look closely
you can see something that might be weird

if you're not familiar with assembly. It
still trips me over. To use a condition

like these constant time conditionals
there. Which have to be constant time,

because you don't want to leak timings
based on the size of number. You have to

first operate on mul0, so that you set the
flags, the zero flag. So normally what you

do is either add 0 and 1 to your mul0
to check if to set the flags. But that's

not that's not an add, that's an add with
carry. It means that it's adding zero to

mul0 and the carry bit from this addition
here, which has nothing to do with it,

it's not supposed to be there. Like this
is adding p, but it doesn't matter, if it

overflows, because if it does, it wasn't
going to be picked here anyway. So it's

adding another bit into the operation that
wasn't supposed to be there. So it's

flipping the result, but then also the
conditions here are flipped, so

essentially it evens itself out. Except:
when that carry bit happens not to be set,

because the number a minus b is small
enough, that if you add P, you don't wrap

around. That happens once every 2^32
times, which is why it's so rare that

nobody had noticed so far.
So the code went in and nobody noticed for

a long time, until CloudFlare first
started scanning CT logs and had this

weird one signature that wasn't verifying
and they were lucky enough to actually

have it in the logs because you know if it
happens transiently you might just think..

I don't know, the connection broke, right?
So this is what we call a carry

propagation bug. Carry propagation is the
activity of making sure that you carry the

1. And here it's a bit weird, we didn't
forget to carry it, but we carried a bit

that we weren't supposed to carry into a
result. Most of the times that goes well,

sometimes that breaks. When that breaks:
wrong result! Wrong result, wrong point

computation and wrong point computation,
so what? Like how does forgetting to carry

the one lead to a full key recovery? This
is not one of those bugs, where like

buffer overflow you know what you're
trying to do even if you might have to

jump through so many hoops, because there
might be all these protections, you know

what your capabilities are, you can
overwrite some memory. Here instead it's

not clear what your capability is. So
today I'm going to try to explain that.

How we turn this very rare failed
computation into a complete key recovery

attack. But first I apologize that I have
to give a elliptic curve cryptography

crash course here at CCC. So we've seen
the field is nothing else than operations

modulo p, then there are the points: the
points are x and y, the coordinates. They

fit an equation, which we don't care
about, they just fit an equation. They are

integers, so we can work with them, and we
can use them to build a group. A group is

one of the core structures in modern
cryptography. To build a group we need a

zero point, a generator point, and an
addition over the group, over the

points. So we define an addition,

again, we don't care about how addition
works. It's just: you take two points, you

add them together, you get a result.
It has all the properties that you

remember from elementary school addition:
it's commutative, it's associative. And

then you have multiplication: You don't
actually define how to multiply a point,

but if I tell you that you have an
addition operation and I want five times

this point, what do you do? You take the
point and you add the point and you add

the point and you add the point,... so
this is called scalar multiplication:

doing multiplication by adding repeatedly
a certain point.

So now we have a group, which is given by
multiplying the point, a generator point, a

certain number of times and we have a
multiplication operation. So how do we

build cryptography out of it? This is all
awfully abstract so far? So we build

elliptic curve Diffie-Hellman. If you're
familiar with normal Diffie-Hellman, you

will see something snap at some point. The
private key is a giant number, this is

important. The private key in ECDH is just
a random giant 256 bit number. And then we

have a public key. A public key is that
giant number multiplied, the scalar

multiplication we just talked about, by a
generator point. If you know normal

Diffie-Hellman, this is like doing g to
the a. If you don't, it's okay, this is

just multiplying private key by a point.
And then, when I have my private key and

my public key, you send me your public
key. We need to agree on a shared secret,

how we do that, is that I take your public
key, which is a point. I take this and I

multiply it by my private key, here, so
again: it's my giant number private key

multiplied by your point. That comes
together: The two results are the same,

because if we do my private key times your
private key times g it's the same as your

private key times my private key times g,
because that's commutative. And we land on

a shared secret. And that's all we need to
know about elliptic curve cartography to

exploit this. However, there's one thing
that I glossed over: It's easy to multiply

by five, you add five times.
But if I'm asking you to multiply by a 256

bit number, you can't add 2 to the 256
times a point, right? So what do we do

there? Remember that here, what we're
trying to do is the multiplication: The

private key times the public key, the
point. We do something called double and

add: We take our private key and we string
it out like bits. We start from the most

significant bit. This is little endian,
because I have gotten this slide wrong the

first time. But, you know, you just claim
that you meant it to be the opposite

endianness. Anyway, that's the most
significant bit, the one that is two to

the 256, no, two to the 255. If you flip
it, you're adding or removing two to the

255. So you start with 0, that's zed, 0,
nothing, and you check the first bit, the

most important bit: Is that set, yes or
no? Yes, so you add the point Q, which is

the point we're trying to multiply by this
giant e. So we add the point and then we

move down one by doubling. Can you imagine
how we double something? Remember we only

have addition. We add it to itself, of
course. So we use addition to double the

point. And you might see, where we're
going with this. We double every time we

slide down one bit.
By the time we arrive at the end, how many

times did we double that first point that
we added, because the first bit was one?

255 times! That bit was worth 2 to the
255, so at the end that will have the

value it was supposed to have. And we keep
going, we check if this bit is 1: Is it 1?

No. So we do nothing, we double again to
move down one. We check if this bit is

one. This bit is 1: so we add the point.
So we did add the point, double, double,

add the point, double, moving down one and
we keep going like this. We keep going

like this, until we reach the least
significant bit. At the least significant

bit, if it's one, we add the point, if
it's not, we don't add the point. And at

the end we have the correct result and the
result comes from this sequence of

operations which at most are twice 255
operations, which is something that we can

do concretely. Now why did I explain this
very specific algorithm to you. Because to

understand this attack, you have to
recognize that each key, so each string of

bits here, converts into a very specific
sequence of operations. Okay, if you

change one bit, there will be an one more
addition one less addition and each key

has a very specific sequence. In this case
it's add, double, double, add, double,

add, double, double, and it keeps going.
So back to our bug. If you spaced out,

because we took a lot of crypto, I saw
you! But the two things you should take

away are: There's a giant number, it's the
private key, we want to multiply the giant

number by a point and we do that by doing
additions and doubles in an order that is

specified by the bits of the giant number.
That's what you need to have clear, the

only thing. So let's go back to see how we
use that to turn our very small carry bug

into a complete key recovery attack. First
thing we do is we bubble it up. That

function that breaks is called P256
subinternal. That's the function I showed

you earlier. P256 subinternal is used by P256
point add, which is what we spoke about

adding two points, the only important
operation. And adding two points, we've

seen, we use when we're multiplying, when
we're doing that scalar multiplication,

which is d times q, d times the point. And
how is scalar mult used? Here we finally

surfaced to a level that if you work with
cryptographic protocols you might start

recognizing pieces of. Scalar
multiplication is the operation that the

peer does in elliptic curve Diffie-
Hellman. There's a scalar, which is the

secret, which is the private key. There's
a point, which is the public key of the

other person, which might be the attacker.
So the scalar multiplication here,

speaking in InfoSec terms, has an attacker
supplied point and a secret scalar and the

result, the shared secret, is the session
key. For example: TLS, when you open a

connection with TLS and you're using
elliptic curve Diffie-Hellman, then you

will do this dance to agree on a session
key. If the session key is correct, the

connection will open and you will be able
to, I don't know, get send HTTP request.

If the bug is hit and the result is wrong,
so the result bubbles up into a wrong

shared secret, the session key is wrong.
And the session key is wrong you notice,

how do you notice? The connection breaks.
So this is what in cryptography we call an

oracle.
You have an oracle that you can call and

send a point, because that's your public
key, you're the attacker, and you send

that point and it gets multiplied by the
private key. And it gives you one bit of

information, did the bug trigger or did it
not? Most of the times it won't, because

remember, this is an extremely rare bug.
So you have an oracle that tells you: Bug

happen, bug didn't happen, based on the
point you sent. And let's assume that the

key stays the same, we'll talk about that.
Can you imagine how we use that to start

learning things about the key? Well, let's
say, that we can magically conjure a point

that in that sequence of operation, that's
why I told you the sequence of operation

was important, the bug happens very
specifically at that addition and we find

another point where the bug happens very
specifically at that double. If we know

already these bits of the key, and we
aren't sure about this one, what can we do

with these two points? We send them both,
one of them will break the TLS connection,

the other one will succeed. That means
that the one that broke triggered the bug,

the one that didn't break, didn't trigger
the bug. And we know which one corresponds

to which key. Because we magically made
special points that only break very

precisely at that point of the
computation. Okay, so we learned a bit of

the key. In cryptography as soon as you
learn one bit of the key, there's probably

a way all the way down. So we build what's
called an adaptive attack. Let's say we

have these bits, but we want to learn
these, too. We compute two points, one

that breaks, when the addition happens
exactly at that point in the double and

add a procedure, and one that triggers
only when the add doesn't happen at the

specific point in the double and add
sequence. We send them both, only one of

them breaks the TLS connection, well, then
we know a bit! We go back to our magic

point generator and we generate two new
points.

This time we don't look for things that
break here, we look for things that break

here. We make two points, we send them
both. One of them breaks the connection,

the other doesn't break the connection. We
learned one more bit of the key. We go

back, we make two points, we send them
both: One breaks the connection, one

doesn't. We keep going like that, once for
each bit. Every time we go back and we

adapt to what we learned so far. That's
why it's called an adaptive attack, we

can't pre-compute all these points. We
have to come up with them while we learn

the key. And the beautiful thing about
adaptive attacks is that they look exactly

like Hollywood, it's beautiful! Because
you see them flipping and going through

values, getting it right and moving to the
next one, which you all told it was fake,

it was not! Everything else is fake. Okay,
so, this attack we came up with that, we

told we had something extremely novel, we
went to the literature and everyone that

had picked an academic career in the
audience knows exactly what happened: We

found a paper that was doing exactly this.
However, you know, it was a little

different. It was still P256 and it was
still ECDH and, hmm.. okay it's really

similar. But it's an attack that depends a
lot on the implementation details, how you

pull it off. You can't suddenly just
repurpose the code, but the idea so far:

an adaptive attack where you send two
points and you check which one breaks is

the same as a BBB paper from a few years
ago when it worked against open SSL

instead. Instead of against this bug in
the Go standard library. So instead from

now on we're going to talk about how
exactly we implemented this against Go,

because this changes a lot based on the
implementation details of the library

you're working with. So this was the
general idea of how the attack works. All

that: You find points that break at the
right time, you send them both, and that

way you learn a bit of the key,
except I lied to you.

I lied to you, because I lied to you on a
lot of things: The first one is that Go

doesn't do double and add one bit at a
time, it does it five bits at a time! It's

called Booth multiplication, it took us
forever to figure it out. It's an 1980s

paper. Instead of adding one point or zero
points and then doubling, it adds between

minus 16 and plus 16 points and then
doubles five times moving down. It just

does it in blocks of five. So it splits up
the key and then looks at each block of

bits, picks a value from a pre-computed
table, which is just all the values from

one times the point to 16 times the point
and in the loop it doubles five times,

because it moved five bits down. And then
it chooses, which point between zero and

16 to use from the table and it adds that
to the rolling result, instead of adding 1

or 0. There's also a bit of constant time
arithmetic there, because the other thing

I lied to you on is that there's no such
thing as at 0 point. It's an imaginary

point that we add to make the math work,
but when you try to tell the code to

actually add the zero point it's like
asking you to divide by 0. It just won't

do it! But you know how you add 0, right?
You do nothing! So there's some constant

time code here that looks at it and if
it's 0, it does nothing. If it's not 0, it

adds the value.
So the first thing we had to do to adapt

to this format is that we worked in limbs.
When you hear me say limb, it just means

that we will look at each five bit block
on its own as it is zero to sixteen and a

sign value. That's much easier, because
it's actually kind of weird how the five

bits are extracted and I don't want to
bore you with it. So let's just consider

that we look at each group of five bits
converted into a value from zero to

sixteen and a one and a sign or plus minus
sixteen. So when you hear me talk about

limbs you just know that it means the five
bit value from the key. This is still the

giant d key that's the multiplier. So how
does the attack change? Not that much!

Instead of attacking one bit at a time,
you know two points - one that breaks for

zero, one that breaks for one - we attack
one limb at a time, one that breaks for

one, one that breaks for two, one that
breaks for three, sixteen, minus one,

minus two, minus 16. So, to recover five
bits of the key, we'll need on average

half the space: seventeen points, which is
a little less efficient than the bit-by-

bit one, because that would be five points
for five bits. So, however, how the attack

triggers is the same: We're looking for a
bug that happens in the five doubles at

the very right time or that happens at the
addition at the very right time. Now

that's still the high level of how we're
going to do it, but in practice I didn't

tell you how we're going to magically
generate these magic points that break

just at the right time. And I didn't tell
you because it's fuzzing. There is there's

no specific way to generate them
algebraically, so instead we just hooked

the assembly code with something that
would just set a boolean, you know, true,

false, if the conditions for the bug are
met. And then we run through a lot of

points. And if for each point we run it
through the limbs we already know,

remember this is an adaptive attack, so
we want to learn the next limb.

We learned a few limbs, we want to learn
the next one. We run through the ones we

know and then we try all the possible
values for ones that we don't know. If one

of them and only one of them breaks,
that's a useful point. Because if we send

that point and it breaks, we know exactly
what the value of the next limb is. Okay,

so this is going even more low-level. If
you're interested in optimizations, we had

to run through a lot of candidate points
and for each point we needed to know the d

value. Because we can find a magic point,
but if we don't know the private key, we

don't know if the entire protocol works
and our oracle doesn't work anymore. So to

work with that instead of multiplying a
new random private key every time we just

add that one to the private key and added
the point to the public key, this is just

an optimization. We called into the
various assembly of the standard library.

Don't do this, but it's beautiful. You can
go call all the unexported functions in

the standard library. I will never approve
it on code review, but I love it. And then

there's some post-processing to make sure
that the bug is actually there, because

sometimes there are false positives. So we
just run it through the broken code, the

fixed code, is the result the same?
Well, false positive, is it

different, good. Okay, so we have a way to
move through the attack. The only thing we

don't have yet is: How we figure out the
first limb. The first one, the most

important, the most significant one, when
we still don't know anything about the

key. The problem with this one is: It's
like this, so let's pick three, it's an

example okay, let's pretend that the limb
is three. So we do our usual thing, we do

our fuzzing and we find something that
breaks at the fifth doubling and we send

it and it breaks. It means that the first
limb is three, right? Wrong. Sadly, it

might mean that the limb is also 6 or 12.
Because how six is selected for example is

that three is taken, it's doubled and
saved in the precomputation table as six,

then it's taken out of the table, doubled
five times, but what happens after you

double six four times? What's the value?
The exact same as doubling three five

times, and the sequence is the exact same!
So we don't know, which one it is, because

we only know that that's the sequence but
that doesn't tell us anything. And the

same happens for 12, 12 is nothing else
than 3 double double. So if we double it

five times, at the third double it breaks
and we only know if it breaks or not. So

we can't move, so instead what we do is
that we find three points: one that breaks

after doubling three five times, one that
breaks doubling it six times, and one that

breaks after doubling it seven times. We
send them all and we look at which ones

break. Only this one? It's a three! The
first and the second but not the third

point? It's a six! All three? It's a 12!
This took me forever to wrap my head

around, this is like pure shaman insight.
Okay now I can feel that you're getting

bit tired, this is intensive, it's a lot
of math, so let's go for a change of pace

and let's talk about kangaroos instead.
I'm gonna tell you something that I

learned from a cryptographic paper, I
swear. And it's about how kangaroos jump.

Apparently kangaroos jump depending on how
the terrain on which they are jumping from

is. Depending on that terrain, if you put
two kangaroos on the exact same spot, they

will jump the same distance and
approximately the same direction. I don't

know, if this is true, but Pollard said so
in a paper and I am NOT arguing with

Pollard. So now, why is this useful? Well,
this makes for an extremely cool way to

catch kangaroos! I mean, did you expect
some math or crypto? I know, we're

catching kangaroos here! So you take a
kangaroo that you have on hand, because

you all have a kangaroo on hand. And you
put a GPS tracker on it and you let it

loose. This kangaroo jumps and it roams it
enjoys a brief stint of freedom and it

runs and at some point you go pick it up
and because you know where it is and you

place a trap there exactly where you find
it. What happens to the kangaroo that is

just passing by? If it steps at any point
on one of the points, where the other

kangaroo jumped from there on it will
follow the same path. Because each jump

depends on the ground. So this way if the
wild kangaroo lands on any of the prints

of the previous kangaroo you're catching
it because eventually it will end up in

the trap. Okay, this had nothing to do
with the talk, I just wanted to share

this!
Now, okay, so here is how this converts to

crypto. We can make elliptic curve points
jump like kangaroos, we just have to make

the jump deterministic based on the input,
based on the starting point. For example

we can take a hash, any hash, you design
the hash. Apparently that's popular in

cryptocurrencies to design your own hash
..

<i>laughter</i>
And you hash a point to another point and

when you want to do a jump you take the
point and you add it to its own hash, so

Q_N+1 depends only on QN, and this is just
like the kangaroo jump. How do you use

this for what we're doing? We want to
recover d, right? We want to recover the

multiplier, the discrete log it's often
called, of the public key. How we work

with this is that we take a tame kangaroo,
a point that we know the d off, and we

make it jump a lot. It keeps jumping and
we remember what the value is at the end.

We take that value at the end and we save
it. No need to keep all the points in

between, so we don't need some giant
memory construction and then we take the

point that we don't know the d of and we
make it jump a lot. What happens is that

it has much higher probability to
intersect one of the various prints of the

previous point. When it does, it will
eventually end up in our trap, it will end

up having the same value as the one we
calculated earlier. When that happens, we

know how far the wild point traveled,
because we were the ones making it jump.

So we can just walk backwards to the
starting point. It turns out that this is

extremely efficient to find the d value
when you know the interval of the d value.

The intuition there is that if the
kangaroo starts in a small area: You know

when it's been too much time that it
probably travelled past your trap. So you

have to rewind time, which in crypto you
can, and try again, because it went past

your trap. So the smaller the interval,
the more precise you can be, the more

efficient attack. This is called the
Pollard-kangaroo attack. It's described in

original paper from the 1980s, it was
described on Diffie-Hellman first, but it

works on any group, so it works on
elliptic curves. And the elliptic curve

version is then improved by a few papers
later and there is a chapter in the

elliptic and hyperelliptic handbook
that describes it.

So we have it all, we have how to start,
we have how to run through the attack and

we have a how to end. So now let's get
concrete, let's pick a target, as I said

this attack works against JWT. JWT made
... a lot of questionable choices. One of

them is to include as one of the public
key algorithms elliptic curve Diffie-

Hellman but not the version of Diffie-
Hellman you and I are familiar with. The

one where we both generate a new private
key every time, which makes this attack

impossible. Remember that this is an
adaptive attack. We need to query the

orcale for the same private key over and
over and over again. Instead there's a

variant called ephemeral static Diffie-
Hellman, where one of the two sides is

always the same. This is sometimes done as
an optimization, don't do that. OpenSSL

was doing that and it stopped doing that
after a bunch of attacks came from that.

So in TLS that usually doesn't happen and
the Go TLS stack thankfully never did

that, so the attack doesn't work against
TLS. But JWT is encoded it straight into

the standard, because if your public key
is fixed, so is your private key, always

the same. So if we have a service that
accepts things encrypted with a ECDHES

algorithm, we can use this attack. For
example against the popular implementation

Go-Jose, it's not Go-Jose's fault and Go
1.8.1.1, the latest vulnerable version,

and we can just check if the service can
decrypt what we're sending it. It can,

because it throws HTTP error when it
can't, because of different timings. This

changes in any case, but what you need is
the Oracle that tells you: did it work did

it not work? Did the bug trigger, so are
you right about your prediction of the

limb or are you not?
Then of course we need to do a lot of

work. If you don't have the resources of a
big corporation of where to spin up

things, you can just use EC2 spot
instances. How we architected that, is

that there will be a small piece of code
that do nothing intensive on your laptop

or anything. It would accept HTTP requests
from the workers. The beautiful thing

about this infrastructure is that you can
horizontally scale the workers, spin up as

many as you want on node.js platforms,
because the only thing that they need to

be able to do: they don't need to have
ports open, you don't have to worry about

NAT, you can even run it on your botnet,
because the only thing they have to do is

connect back and ask for work and then
after 30 cycles or when they find the

point, connect back and say I found
something, I didn't find anything, give me

some more work and send the result. This
was also useful, because if you get the

worker code wrong or if you want to change
the deployment, you can just redeploy the

workers without losing state on the
dispatcher. Because the dispatcher just

keeps running and it will just wait for
workers to come and ask. Specifically we

built this just with the small script that
you can start an EC2 machine with, because

we didn't even need to make a custom
image. So a few figures, a few numbers.

Each Key has 52 limbs, it will take a bit
less than that, because we have kangaroos.

But let's say approximately 52. Each limb
is 16 points on average. It would be 17,

but two of the points are faster to find,
so let's say 16. Each point takes

approximately 2 to the 26 candidate
points, before we find one that triggers

the bug at the right point. Since we need
16 candidate points and each takes 2 to

the 26 candidate points, that takes
approximately 85 CPU hours. That's like

one CPU running for an hour, 1 core. Turns
out that you can get 85 CPU hours from

spot instances for about a dollar and a
quarter, which makes the total cost of the

attack something like 60 bucks.
Which was a relief, because I had done the

math wrong first and the it came out as at
1000 and I run the demo tonight and I

didn't check the bill, yeah.. Anyway, it's
cheap. Now, I am not brave enough to run

the attack live, because, yes, it's a nice
infrastructure, but no, I don't trust it

that much. But also, because it takes a
while. If you don't want to spin up

infinite number of EC2 machines, you have
to accept that it would take about, I

think, our attack run in 12 hours. So
we're gonna look at a sped-up version of a

one-hour video in the next 45 minutes, you
have time, right?

<i>laughter</i>
No, it's a couple of minutes. So this is

the UI. It shouldn't be too confusing and
if anyone works at Hollywood and wants to

license it, we can talk. What you're
seeing is that the red values are the ones

that we found a point for from the workers
and we submitted. And when we submitted

that it resulted not to be the right limb
and the green ones are the ones that

instead broke, so they're the right limb.
Remember that here the target is a JWT

receiving application. And then you can
see the key slowly flipping from the

bottom and it's exactly like Hollywood, I
love that. So you can see the limbs

filling up as we find them and that
approximately there's 30 bits, so 2 to the

30 rounds, so 30 candidates work for each
round for each limb that we find. It

obviously depends on luck and yes the the
thing will probably keep running for a

little while, but this is already at limb
nine, it has to get to 52 and you don't

have that patience. So this was the attack
the code will be open-source soon. Leave

the limbs you lost, they belong to us now.
And: any questions?

<i>applause</i>
Herald: Valsorda, thank you very much for

a lovely talk and the kangaroos. So, we
have a question from the signal angel, go

ahead, please.
Signal Angel: Actually the internet wants

to know: Did you compare this bug to
implementation in other libraries.

Filippo: So I guess that means, if I
looked for similar bugs in other

implementations. We did not, because each
implementation is a bit different. Hanno

works on a lot of fuzzing of bigint
implementations and at one point he asked

me, like on Twitter just today, if I tried
fuzzing the Go implementation for example.

And sadly this is constant time code that
is specific to P256, so the answer is,

there's a lot of them and the bug can be
small and anywhere. It's not like you will

be looking for another bug in P256
subtraction, it can be anywhere in the

underlying math and we can turn that into
the same attack. So, no, we didn't look

for this specific one, but I think that
four CVEs in 2017 on open SSL have

descriptions that are very similar, but
they're about finite field Diffie-Hellman,

like normal DH. If you look for something
that says about it's barely doable,

because all the computation can be done
offline. That's this kind of attacks on

open SSL. Next?
Herald: Are there any other questions from

the Signal Angel? So please line up at the
microphone. Microphone one please.

Mic1: So why can't you determine the
points algebraically?

Filippo: Laziness? No, so it's entirely
assembly and there's a lot of points where

the value is then thrown out or it might
get corrected by .. it's essentially we

didn't see a clear path to this, and it's
$65 on ec2, so it doesn't really change

the feasibility to just fuzz them, so we
just went for the fastest path to the

entrance.
Herald: Are there any other questions?

Filippo: No one is asking about kangaroos,
people, I mean..

Herald: Yes, ask about kangaroos, lovely.
have you been to Australia?

Filippo: I haven't.
Herald: Okay, you have to.

Filippo: Yep, definitely.
Herald: I think, there aren't any other

questions. So Filippo Valsorda, big round
of applause for you. Thank you!

<i>applause</i>

<i>34c3 outro</i>

subtitles created by c3subtitles.de
in the year 2019. Join, and help us!