34C3 - Squeezing a key through a carry bit

Edit subtitles

0:00 - 0:16

34c3 preroll music
Herald Angel: Today we have our speaker
0:16 - 0:22

Filippo Valsorda. He is a crypto-engineer
and he is specialized in Go. He is some
0:22 - 0:29

kind of Go wizard so to say and used to
work at CloudFlare. He actually shows now
0:29 - 0:35

today in the talk how he made an attack to
exploit a bug in the implementation of the
0:35 - 0:42

elliptic curve P256 in Go, that's why he's
a Go wizard. The reason is actually due to
0:42 - 0:48

a misplaced bit on the assembler level,
just due to the curves implementation. The
0:48 - 0:54

result is that in certain cases you can
retrieve the private key in the elliptic
0:54 - 0:58

curve Diffie-Hellman encryption scheme.
Please welcome Filippo Valsorda to his
0:58 - 1:03

talk, give him a round of applause. Thank
you very much.
1:03 - 1:06

Filippo Valsorda: Thank you.
applause
1:06 - 1:11

Thanks. I love the term crypto-golfer. Tony
came up with it and I just want business
1:11 - 1:17

cards with that on it. Anyway, okay, I'm
Filippo. As you heard I worked on Internet
1:17 - 1:22

infrastructure, I work on Go, I work on
cryptography. But this is a collaboration
1:22 - 1:28

with Sean Devlin, you might know him as
one of the authors of the Cryptopals or
1:28 - 1:35

the matasano crypto challenges. A few
months ago earlier this year a bug.. a
1:35 - 1:46

CloudFlare engineer was scanning CT logs.
CT logs are big logs of TLS certificates
1:46 - 1:52

and he was checking the ECDSA signatures
on these certificates and one of the
1:52 - 1:59

signatures was not verifying. Which is
weird, because CT logs check the
1:59 - 2:04

certificates before adding them to the
log. It turned out that the bug was in the
2:04 - 2:09

code that CloudFlare was using to verify
the certificates. And more specifically it
2:09 - 2:18

was in the Go standard library. It was in the
implementation of the NIST P256 curve,
2:18 - 2:24

which is a popular, very hard to implement,
elliptic curve that is used for example
2:24 - 2:32

for ECDSA TLS certificates. This curve has
an assembly implementation in the Go
2:32 - 2:41

standard library, of course, to squeeze
every bit of performance out of it, and
2:41 - 2:47

it's specifically optimized for x86-64
architecture. So that's where the bug was.
2:47 - 2:53

There was a carry propagation bug. It was
reported upstream and everyone agreed that
2:53 - 3:00

this was not obviously exploitable.
But Adam Langley also said that it would
3:00 - 3:08

be a cool paper though. And well I mean,
how do you pass on that? So Sean and I met
3:08 - 3:14

in Paris and spend a weekend and then some
eating Thai food and staring at this
3:14 - 3:20

assembly code to try to understand what is
it doing. And one month later we have a
3:20 - 3:31

CVE and two Go security releases. We found
a way to go from this single carry bit bug
3:31 - 3:36

to a full key recovery against protocols
that use elliptic curve Diffie-Hellman
3:36 - 3:43

within ephemeral static way. If this means
nothing to you, it's okay, I will try to
3:43 - 3:51

go over it. One of these protocols for
example is JWT. Jozy, or however you want
3:51 - 3:58

to call it - it has 15 different acronyms.
And it's often implemented in Go, so this
3:58 - 4:06

was exploitable against real-world
services. So, let's start by looking at
4:06 - 4:12

the code with the bug. Let's take it from
the beginning. This is the short assembly
4:12 - 4:20

function that does subtraction. When you
do elliptic curve math, you usually work
4:20 - 4:28

on a field, you work on math modulo some
prime p. So if it was subtraction you do a
4:28 - 4:36

minus b modulo p and this is what this
assembly snippet does. It sets a to a
4:36 - 4:44

minus b. Of course these are numbers way
too big to fit into a single register. So
4:44 - 4:50

how do you do math, when you can't fit
into a single register? You do multi-
4:50 - 4:56

precision math. And the thing is, you all
know how to do multi-precision math, you
4:56 - 5:01

learned that in elementary school. When
you would write numbers in columns and you
5:01 - 5:07

do math with register size of 10, because
for every digit you would subtract two
5:07 - 5:13

digits and carry the minus one if it went
negative and then subtract with the carry
5:13 - 5:18

and then carry the minus one. That's
exactly what this is doing, but it's doing
5:18 - 5:23

it instead with digits with 64-bit registers,
because this is a 64-bit architecture
5:23 - 5:28

. So we look at the first
lines: the first lines is nothing else
5:28 - 5:33

than subtract, subtract, subtract with
carry. And then the carry is finally
5:33 - 5:41

stored in that mul0 accumulator. But then
what do we do if it goes negative?
5:41 - 5:48

We said that this is modulo p so we can't
just let it wrap around modulo 2^256,
5:48 - 5:53

because that's how wide, you know, 4
registers are. But since we're doing
5:53 - 5:58

arithmetic modulo a number, we can just
add that number and the result is the
5:58 - 6:04

same, right? Adding p modulo p is a no-op
in the result. So that's what this code
6:04 - 6:11

does: It does a equal a minus b, it takes
a copy of the result and it adds p. And
6:11 - 6:18

then in constant time, it uses that final
carry to check, if it went negative or not
6:18 - 6:23

to decide in constant time, which one to
use. The one with b , so a minus b plus p
6:23 - 6:30

or a minus B - and that's subtraction.
Straightforward enough. Now the problem
6:30 - 6:36

with this code is that if you look closely
you can see something that might be weird
6:36 - 6:41

if you're not familiar with assembly. It
still trips me over. To use a condition
6:41 - 6:45

like these constant time conditionals
there. Which have to be constant time,
6:45 - 6:50

because you don't want to leak timings
based on the size of number. You have to
6:50 - 6:57

first operate on mul0, so that you set the
flags, the zero flag. So normally what you
6:57 - 7:07

do is either add 0 and 1 to your mul0
to check if to set the flags. But that's
7:07 - 7:13

not that's not an add, that's an add with
carry. It means that it's adding zero to
7:13 - 7:21

mul0 and the carry bit from this addition
here, which has nothing to do with it,
7:21 - 7:26

it's not supposed to be there. Like this
is adding p, but it doesn't matter, if it
7:26 - 7:32

overflows, because if it does, it wasn't
going to be picked here anyway. So it's
7:32 - 7:36

adding another bit into the operation that
wasn't supposed to be there. So it's
7:36 - 7:41

flipping the result, but then also the
conditions here are flipped, so
7:41 - 7:51

essentially it evens itself out. Except:
when that carry bit happens not to be set,
7:51 - 7:55

because the number a minus b is small
enough, that if you add P, you don't wrap
7:55 - 8:01

around. That happens once every 2^32
times, which is why it's so rare that
8:01 - 8:07

nobody had noticed so far.
So the code went in and nobody noticed for
8:07 - 8:10

a long time, until CloudFlare first
started scanning CT logs and had this
8:10 - 8:16

weird one signature that wasn't verifying
and they were lucky enough to actually
8:16 - 8:21

have it in the logs because you know if it
happens transiently you might just think..
8:21 - 8:28

I don't know, the connection broke, right?
So this is what we call a carry
8:28 - 8:33

propagation bug. Carry propagation is the
activity of making sure that you carry the
8:33 - 8:43

1. And here it's a bit weird, we didn't
forget to carry it, but we carried a bit
8:43 - 8:49

that we weren't supposed to carry into a
result. Most of the times that goes well,
8:49 - 8:55

sometimes that breaks. When that breaks:
wrong result! Wrong result, wrong point
8:55 - 9:03

computation and wrong point computation,
so what? Like how does forgetting to carry
9:03 - 9:09

the one lead to a full key recovery? This
is not one of those bugs, where like
9:09 - 9:13

buffer overflow you know what you're
trying to do even if you might have to
9:13 - 9:19

jump through so many hoops, because there
might be all these protections, you know
9:19 - 9:23

what your capabilities are, you can
overwrite some memory. Here instead it's
9:23 - 9:30

not clear what your capability is. So
today I'm going to try to explain that.
9:30 - 9:36

How we turn this very rare failed
computation into a complete key recovery
9:36 - 9:44

attack. But first I apologize that I have
to give a elliptic curve cryptography
9:44 - 9:55

crash course here at CCC. So we've seen
the field is nothing else than operations
9:55 - 10:03

modulo p, then there are the points: the
points are x and y, the coordinates. They
10:03 - 10:08

fit an equation, which we don't care
about, they just fit an equation. They are
10:08 - 10:13

integers, so we can work with them, and we
can use them to build a group. A group is
10:13 - 10:21

one of the core structures in modern
cryptography. To build a group we need a
10:21 - 10:27

zero point, a generator point, and an
addition over the group, over the
10:27 - 10:30

points. So we define an addition,
10:30 - 10:35

again, we don't care about how addition
works. It's just: you take two points, you
10:35 - 10:38

add them together, you get a result.
It has all the properties that you
10:38 - 10:45

remember from elementary school addition:
it's commutative, it's associative. And
10:45 - 10:50

then you have multiplication: You don't
actually define how to multiply a point,
10:50 - 10:55

but if I tell you that you have an
addition operation and I want five times
10:55 - 11:01

this point, what do you do? You take the
point and you add the point and you add
11:01 - 11:07

the point and you add the point,... so
this is called scalar multiplication:
11:07 - 11:12

doing multiplication by adding repeatedly
a certain point.
11:12 - 11:19

So now we have a group, which is given by
multiplying the point, a generator point, a
11:19 - 11:25

certain number of times and we have a
multiplication operation. So how do we
11:25 - 11:31

build cryptography out of it? This is all
awfully abstract so far? So we build
11:31 - 11:35

elliptic curve Diffie-Hellman. If you're
familiar with normal Diffie-Hellman, you
11:35 - 11:41

will see something snap at some point. The
private key is a giant number, this is
11:41 - 11:50

important. The private key in ECDH is just
a random giant 256 bit number. And then we
11:50 - 11:56

have a public key. A public key is that
giant number multiplied, the scalar
11:56 - 12:03

multiplication we just talked about, by a
generator point. If you know normal
12:03 - 12:09

Diffie-Hellman, this is like doing g to
the a. If you don't, it's okay, this is
12:09 - 12:17

just multiplying private key by a point.
And then, when I have my private key and
12:17 - 12:23

my public key, you send me your public
key. We need to agree on a shared secret,
12:23 - 12:30

how we do that, is that I take your public
key, which is a point. I take this and I
12:30 - 12:36

multiply it by my private key, here, so
again: it's my giant number private key
12:36 - 12:42

multiplied by your point. That comes
together: The two results are the same,
12:42 - 12:49

because if we do my private key times your
private key times g it's the same as your
12:49 - 12:55

private key times my private key times g,
because that's commutative. And we land on
12:55 - 13:03

a shared secret. And that's all we need to
know about elliptic curve cartography to
13:03 - 13:11

exploit this. However, there's one thing
that I glossed over: It's easy to multiply
13:11 - 13:18

by five, you add five times.
But if I'm asking you to multiply by a 256
13:18 - 13:27

bit number, you can't add 2 to the 256
times a point, right? So what do we do
13:27 - 13:31

there? Remember that here, what we're
trying to do is the multiplication: The
13:31 - 13:38

private key times the public key, the
point. We do something called double and
13:38 - 13:44

add: We take our private key and we string
it out like bits. We start from the most
13:44 - 13:50

significant bit. This is little endian,
because I have gotten this slide wrong the
13:50 - 13:55

first time. But, you know, you just claim
that you meant it to be the opposite
13:55 - 14:01

endianness. Anyway, that's the most
significant bit, the one that is two to
14:01 - 14:08

the 256, no, two to the 255. If you flip
it, you're adding or removing two to the
14:08 - 14:14

255. So you start with 0, that's zed, 0,
nothing, and you check the first bit, the
14:14 - 14:21

most important bit: Is that set, yes or
no? Yes, so you add the point Q, which is
14:21 - 14:26

the point we're trying to multiply by this
giant e. So we add the point and then we
14:26 - 14:33

move down one by doubling. Can you imagine
how we double something? Remember we only
14:33 - 14:40

have addition. We add it to itself, of
course. So we use addition to double the
14:40 - 14:46

point. And you might see, where we're
going with this. We double every time we
14:46 - 14:50

slide down one bit.
By the time we arrive at the end, how many
14:50 - 14:57

times did we double that first point that
we added, because the first bit was one?
14:57 - 15:04

255 times! That bit was worth 2 to the
255, so at the end that will have the
15:04 - 15:11

value it was supposed to have. And we keep
going, we check if this bit is 1: Is it 1?
15:11 - 15:16

No. So we do nothing, we double again to
move down one. We check if this bit is
15:16 - 15:24

one. This bit is 1: so we add the point.
So we did add the point, double, double,
15:24 - 15:31

add the point, double, moving down one and
we keep going like this. We keep going
15:31 - 15:38

like this, until we reach the least
significant bit. At the least significant
15:38 - 15:43

bit, if it's one, we add the point, if
it's not, we don't add the point. And at
15:43 - 15:49

the end we have the correct result and the
result comes from this sequence of
15:49 - 15:56

operations which at most are twice 255
operations, which is something that we can
15:56 - 16:07

do concretely. Now why did I explain this
very specific algorithm to you. Because to
16:07 - 16:12

understand this attack, you have to
recognize that each key, so each string of
16:12 - 16:20

bits here, converts into a very specific
sequence of operations. Okay, if you
16:20 - 16:27

change one bit, there will be an one more
addition one less addition and each key
16:27 - 16:33

has a very specific sequence. In this case
it's add, double, double, add, double,
16:33 - 16:44

add, double, double, and it keeps going.
So back to our bug. If you spaced out,
16:44 - 16:53

because we took a lot of crypto, I saw
you! But the two things you should take
16:53 - 16:59

away are: There's a giant number, it's the
private key, we want to multiply the giant
16:59 - 17:07

number by a point and we do that by doing
additions and doubles in an order that is
17:07 - 17:13

specified by the bits of the giant number.
That's what you need to have clear, the
17:13 - 17:20

only thing. So let's go back to see how we
use that to turn our very small carry bug
17:20 - 17:27

into a complete key recovery attack. First
thing we do is we bubble it up. That
17:27 - 17:31

function that breaks is called P256
subinternal. That's the function I showed
17:31 - 17:40

you earlier. P256 subinternal is used by P256
point add, which is what we spoke about
17:40 - 17:46

adding two points, the only important
operation. And adding two points, we've
17:46 - 17:52

seen, we use when we're multiplying, when
we're doing that scalar multiplication,
17:52 - 17:59

which is d times q, d times the point. And
how is scalar mult used? Here we finally
17:59 - 18:04

surfaced to a level that if you work with
cryptographic protocols you might start
18:04 - 18:10

recognizing pieces of. Scalar
multiplication is the operation that the
18:10 - 18:15

peer does in elliptic curve Diffie-
Hellman. There's a scalar, which is the
18:15 - 18:20

secret, which is the private key. There's
a point, which is the public key of the
18:20 - 18:26

other person, which might be the attacker.
So the scalar multiplication here,
18:26 - 18:34

speaking in InfoSec terms, has an attacker
supplied point and a secret scalar and the
18:34 - 18:41

result, the shared secret, is the session
key. For example: TLS, when you open a
18:41 - 18:47

connection with TLS and you're using
elliptic curve Diffie-Hellman, then you
18:47 - 18:53

will do this dance to agree on a session
key. If the session key is correct, the
18:53 - 18:59

connection will open and you will be able
to, I don't know, get send HTTP request.
18:59 - 19:06

If the bug is hit and the result is wrong,
so the result bubbles up into a wrong
19:06 - 19:14

shared secret, the session key is wrong.
And the session key is wrong you notice,
19:14 - 19:22

how do you notice? The connection breaks.
So this is what in cryptography we call an
19:22 - 19:25

oracle.
You have an oracle that you can call and
19:25 - 19:33

send a point, because that's your public
key, you're the attacker, and you send
19:33 - 19:40

that point and it gets multiplied by the
private key. And it gives you one bit of
19:40 - 19:46

information, did the bug trigger or did it
not? Most of the times it won't, because
19:46 - 19:54

remember, this is an extremely rare bug.
So you have an oracle that tells you: Bug
19:54 - 19:59

happen, bug didn't happen, based on the
point you sent. And let's assume that the
19:59 - 20:05

key stays the same, we'll talk about that.
Can you imagine how we use that to start
20:05 - 20:13

learning things about the key? Well, let's
say, that we can magically conjure a point
20:13 - 20:17

that in that sequence of operation, that's
why I told you the sequence of operation
20:17 - 20:25

was important, the bug happens very
specifically at that addition and we find
20:25 - 20:33

another point where the bug happens very
specifically at that double. If we know
20:33 - 20:39

already these bits of the key, and we
aren't sure about this one, what can we do
20:39 - 20:49

with these two points? We send them both,
one of them will break the TLS connection,
20:49 - 20:56

the other one will succeed. That means
that the one that broke triggered the bug,
20:56 - 21:02

the one that didn't break, didn't trigger
the bug. And we know which one corresponds
21:02 - 21:08

to which key. Because we magically made
special points that only break very
21:08 - 21:14

precisely at that point of the
computation. Okay, so we learned a bit of
21:14 - 21:19

the key. In cryptography as soon as you
learn one bit of the key, there's probably
21:19 - 21:28

a way all the way down. So we build what's
called an adaptive attack. Let's say we
21:28 - 21:34

have these bits, but we want to learn
these, too. We compute two points, one
21:34 - 21:40

that breaks, when the addition happens
exactly at that point in the double and
21:40 - 21:47

add a procedure, and one that triggers
only when the add doesn't happen at the
21:47 - 21:53

specific point in the double and add
sequence. We send them both, only one of
21:53 - 22:01

them breaks the TLS connection, well, then
we know a bit! We go back to our magic
22:01 - 22:06

point generator and we generate two new
points.
22:06 - 22:11

This time we don't look for things that
break here, we look for things that break
22:11 - 22:17

here. We make two points, we send them
both. One of them breaks the connection,
22:17 - 22:21

the other doesn't break the connection. We
learned one more bit of the key. We go
22:21 - 22:27

back, we make two points, we send them
both: One breaks the connection, one
22:27 - 22:31

doesn't. We keep going like that, once for
each bit. Every time we go back and we
22:31 - 22:36

adapt to what we learned so far. That's
why it's called an adaptive attack, we
22:36 - 22:41

can't pre-compute all these points. We
have to come up with them while we learn
22:41 - 22:48

the key. And the beautiful thing about
adaptive attacks is that they look exactly
22:48 - 22:54

like Hollywood, it's beautiful! Because
you see them flipping and going through
22:54 - 22:59

values, getting it right and moving to the
next one, which you all told it was fake,
22:59 - 23:16

it was not! Everything else is fake. Okay,
so, this attack we came up with that, we
23:16 - 23:22

told we had something extremely novel, we
went to the literature and everyone that
23:22 - 23:28

had picked an academic career in the
audience knows exactly what happened: We
23:28 - 23:34

found a paper that was doing exactly this.
However, you know, it was a little
23:34 - 23:44

different. It was still P256 and it was
still ECDH and, hmm.. okay it's really
23:44 - 23:51

similar. But it's an attack that depends a
lot on the implementation details, how you
23:51 - 23:56

pull it off. You can't suddenly just
repurpose the code, but the idea so far:
23:56 - 24:00

an adaptive attack where you send two
points and you check which one breaks is
24:00 - 24:06

the same as a BBB paper from a few years
ago when it worked against open SSL
24:06 - 24:14

instead. Instead of against this bug in
the Go standard library. So instead from
24:14 - 24:21

now on we're going to talk about how
exactly we implemented this against Go,
24:21 - 24:26

because this changes a lot based on the
implementation details of the library
24:26 - 24:31

you're working with. So this was the
general idea of how the attack works. All
24:31 - 24:36

that: You find points that break at the
right time, you send them both, and that
24:36 - 24:40

way you learn a bit of the key,
except I lied to you.
24:40 - 24:46

I lied to you, because I lied to you on a
lot of things: The first one is that Go
24:46 - 24:52

doesn't do double and add one bit at a
time, it does it five bits at a time! It's
24:52 - 24:58

called Booth multiplication, it took us
forever to figure it out. It's an 1980s
24:58 - 25:14

paper. Instead of adding one point or zero
points and then doubling, it adds between
25:14 - 25:20

minus 16 and plus 16 points and then
doubles five times moving down. It just
25:20 - 25:27

does it in blocks of five. So it splits up
the key and then looks at each block of
25:27 - 25:33

bits, picks a value from a pre-computed
table, which is just all the values from
25:33 - 25:41

one times the point to 16 times the point
and in the loop it doubles five times,
25:41 - 25:48

because it moved five bits down. And then
it chooses, which point between zero and
25:48 - 25:56

16 to use from the table and it adds that
to the rolling result, instead of adding 1
25:56 - 26:02

or 0. There's also a bit of constant time
arithmetic there, because the other thing
26:02 - 26:08

I lied to you on is that there's no such
thing as at 0 point. It's an imaginary
26:08 - 26:13

point that we add to make the math work,
but when you try to tell the code to
26:13 - 26:17

actually add the zero point it's like
asking you to divide by 0. It just won't
26:17 - 26:26

do it! But you know how you add 0, right?
You do nothing! So there's some constant
26:26 - 26:31

time code here that looks at it and if
it's 0, it does nothing. If it's not 0, it
26:31 - 26:40

adds the value.
So the first thing we had to do to adapt
26:40 - 26:47

to this format is that we worked in limbs.
When you hear me say limb, it just means
26:47 - 26:54

that we will look at each five bit block
on its own as it is zero to sixteen and a
26:54 - 27:01

sign value. That's much easier, because
it's actually kind of weird how the five
27:01 - 27:06

bits are extracted and I don't want to
bore you with it. So let's just consider
27:06 - 27:12

that we look at each group of five bits
converted into a value from zero to
27:12 - 27:19

sixteen and a one and a sign or plus minus
sixteen. So when you hear me talk about
27:19 - 27:25

limbs you just know that it means the five
bit value from the key. This is still the
27:25 - 27:35

giant d key that's the multiplier. So how
does the attack change? Not that much!
27:35 - 27:39

Instead of attacking one bit at a time,
you know two points - one that breaks for
27:39 - 27:45

zero, one that breaks for one - we attack
one limb at a time, one that breaks for
27:45 - 27:49

one, one that breaks for two, one that
breaks for three, sixteen, minus one,
27:49 - 27:58

minus two, minus 16. So, to recover five
bits of the key, we'll need on average
27:58 - 28:04

half the space: seventeen points, which is
a little less efficient than the bit-by-
28:04 - 28:16

bit one, because that would be five points
for five bits. So, however, how the attack
28:16 - 28:23

triggers is the same: We're looking for a
bug that happens in the five doubles at
28:23 - 28:34

the very right time or that happens at the
addition at the very right time. Now
28:34 - 28:39

that's still the high level of how we're
going to do it, but in practice I didn't
28:39 - 28:44

tell you how we're going to magically
generate these magic points that break
28:44 - 28:51

just at the right time. And I didn't tell
you because it's fuzzing. There is there's
28:51 - 28:59

no specific way to generate them
algebraically, so instead we just hooked
28:59 - 29:04

the assembly code with something that
would just set a boolean, you know, true,
29:04 - 29:09

false, if the conditions for the bug are
met. And then we run through a lot of
29:09 - 29:15

points. And if for each point we run it
through the limbs we already know,
29:15 - 29:20

remember this is an adaptive attack, so
we want to learn the next limb.
29:20 - 29:24

We learned a few limbs, we want to learn
the next one. We run through the ones we
29:24 - 29:31

know and then we try all the possible
values for ones that we don't know. If one
29:31 - 29:37

of them and only one of them breaks,
that's a useful point. Because if we send
29:37 - 29:46

that point and it breaks, we know exactly
what the value of the next limb is. Okay,
29:46 - 29:54

so this is going even more low-level. If
you're interested in optimizations, we had
29:54 - 29:58

to run through a lot of candidate points
and for each point we needed to know the d
29:58 - 30:03

value. Because we can find a magic point,
but if we don't know the private key, we
30:03 - 30:10

don't know if the entire protocol works
and our oracle doesn't work anymore. So to
30:10 - 30:15

work with that instead of multiplying a
new random private key every time we just
30:15 - 30:24

add that one to the private key and added
the point to the public key, this is just
30:24 - 30:33

an optimization. We called into the
various assembly of the standard library.
30:33 - 30:38

Don't do this, but it's beautiful. You can
go call all the unexported functions in
30:38 - 30:45

the standard library. I will never approve
it on code review, but I love it. And then
30:45 - 30:51

there's some post-processing to make sure
that the bug is actually there, because
30:51 - 30:56

sometimes there are false positives. So we
just run it through the broken code, the
30:56 - 31:00

fixed code, is the result the same?
Well, false positive, is it
31:00 - 31:11

different, good. Okay, so we have a way to
move through the attack. The only thing we
31:11 - 31:15

don't have yet is: How we figure out the
first limb. The first one, the most
31:15 - 31:19

important, the most significant one, when
we still don't know anything about the
31:19 - 31:27

key. The problem with this one is: It's
like this, so let's pick three, it's an
31:27 - 31:33

example okay, let's pretend that the limb
is three. So we do our usual thing, we do
31:33 - 31:40

our fuzzing and we find something that
breaks at the fifth doubling and we send
31:40 - 31:47

it and it breaks. It means that the first
limb is three, right? Wrong. Sadly, it
31:47 - 31:55

might mean that the limb is also 6 or 12.
Because how six is selected for example is
31:55 - 32:02

that three is taken, it's doubled and
saved in the precomputation table as six,
32:02 - 32:06

then it's taken out of the table, doubled
five times, but what happens after you
32:06 - 32:13

double six four times? What's the value?
The exact same as doubling three five
32:13 - 32:19

times, and the sequence is the exact same!
So we don't know, which one it is, because
32:19 - 32:24

we only know that that's the sequence but
that doesn't tell us anything. And the
32:24 - 32:30

same happens for 12, 12 is nothing else
than 3 double double. So if we double it
32:30 - 32:37

five times, at the third double it breaks
and we only know if it breaks or not. So
32:37 - 32:43

we can't move, so instead what we do is
that we find three points: one that breaks
32:43 - 32:48

after doubling three five times, one that
breaks doubling it six times, and one that
32:48 - 32:54

breaks after doubling it seven times. We
send them all and we look at which ones
32:54 - 33:01

break. Only this one? It's a three! The
first and the second but not the third
33:01 - 33:08

point? It's a six! All three? It's a 12!
This took me forever to wrap my head
33:08 - 33:17

around, this is like pure shaman insight.
Okay now I can feel that you're getting
33:17 - 33:26

bit tired, this is intensive, it's a lot
of math, so let's go for a change of pace
33:26 - 33:33

and let's talk about kangaroos instead.
I'm gonna tell you something that I
33:33 - 33:41

learned from a cryptographic paper, I
swear. And it's about how kangaroos jump.
33:41 - 33:49

Apparently kangaroos jump depending on how
the terrain on which they are jumping from
33:49 - 33:54

is. Depending on that terrain, if you put
two kangaroos on the exact same spot, they
33:54 - 34:00

will jump the same distance and
approximately the same direction. I don't
34:00 - 34:05

know, if this is true, but Pollard said so
in a paper and I am NOT arguing with
34:05 - 34:14

Pollard. So now, why is this useful? Well,
this makes for an extremely cool way to
34:14 - 34:22

catch kangaroos! I mean, did you expect
some math or crypto? I know, we're
34:22 - 34:28

catching kangaroos here! So you take a
kangaroo that you have on hand, because
34:28 - 34:34

you all have a kangaroo on hand. And you
put a GPS tracker on it and you let it
34:34 - 34:43

loose. This kangaroo jumps and it roams it
enjoys a brief stint of freedom and it
34:43 - 34:48

runs and at some point you go pick it up
and because you know where it is and you
34:48 - 34:55

place a trap there exactly where you find
it. What happens to the kangaroo that is
34:55 - 35:05

just passing by? If it steps at any point
on one of the points, where the other
35:05 - 35:11

kangaroo jumped from there on it will
follow the same path. Because each jump
35:11 - 35:21

depends on the ground. So this way if the
wild kangaroo lands on any of the prints
35:21 - 35:26

of the previous kangaroo you're catching
it because eventually it will end up in
35:26 - 35:31

the trap. Okay, this had nothing to do
with the talk, I just wanted to share
35:31 - 35:38

this!
Now, okay, so here is how this converts to
35:38 - 35:45

crypto. We can make elliptic curve points
jump like kangaroos, we just have to make
35:45 - 35:54

the jump deterministic based on the input,
based on the starting point. For example
35:54 - 36:00

we can take a hash, any hash, you design
the hash. Apparently that's popular in
36:00 - 36:02

cryptocurrencies to design your own hash
..
36:02 - 36:13

laughter
And you hash a point to another point and
36:13 - 36:20

when you want to do a jump you take the
point and you add it to its own hash, so
36:20 - 36:28

Q_N+1 depends only on QN, and this is just
like the kangaroo jump. How do you use
36:28 - 36:36

this for what we're doing? We want to
recover d, right? We want to recover the
36:36 - 36:44

multiplier, the discrete log it's often
called, of the public key. How we work
36:44 - 36:51

with this is that we take a tame kangaroo,
a point that we know the d off, and we
36:51 - 36:59

make it jump a lot. It keeps jumping and
we remember what the value is at the end.
36:59 - 37:03

We take that value at the end and we save
it. No need to keep all the points in
37:03 - 37:10

between, so we don't need some giant
memory construction and then we take the
37:10 - 37:17

point that we don't know the d of and we
make it jump a lot. What happens is that
37:17 - 37:23

it has much higher probability to
intersect one of the various prints of the
37:23 - 37:29

previous point. When it does, it will
eventually end up in our trap, it will end
37:29 - 37:37

up having the same value as the one we
calculated earlier. When that happens, we
37:37 - 37:44

know how far the wild point traveled,
because we were the ones making it jump.
37:44 - 37:49

So we can just walk backwards to the
starting point. It turns out that this is
37:49 - 37:56

extremely efficient to find the d value
when you know the interval of the d value.
37:56 - 38:02

The intuition there is that if the
kangaroo starts in a small area: You know
38:02 - 38:07

when it's been too much time that it
probably travelled past your trap. So you
38:07 - 38:15

have to rewind time, which in crypto you
can, and try again, because it went past
38:15 - 38:18

your trap. So the smaller the interval,
the more precise you can be, the more
38:18 - 38:26

efficient attack. This is called the
Pollard-kangaroo attack. It's described in
38:26 - 38:31

original paper from the 1980s, it was
described on Diffie-Hellman first, but it
38:31 - 38:35

works on any group, so it works on
elliptic curves. And the elliptic curve
38:35 - 38:44

version is then improved by a few papers
later and there is a chapter in the
38:44 - 38:50

elliptic and hyperelliptic handbook
that describes it.
38:50 - 38:56

So we have it all, we have how to start,
we have how to run through the attack and
38:56 - 39:04

we have a how to end. So now let's get
concrete, let's pick a target, as I said
39:04 - 39:16

this attack works against JWT. JWT made
... a lot of questionable choices. One of
39:16 - 39:21

them is to include as one of the public
key algorithms elliptic curve Diffie-
39:21 - 39:25

Hellman but not the version of Diffie-
Hellman you and I are familiar with. The
39:25 - 39:31

one where we both generate a new private
key every time, which makes this attack
39:31 - 39:35

impossible. Remember that this is an
adaptive attack. We need to query the
39:35 - 39:41

orcale for the same private key over and
over and over again. Instead there's a
39:41 - 39:48

variant called ephemeral static Diffie-
Hellman, where one of the two sides is
39:48 - 39:55

always the same. This is sometimes done as
an optimization, don't do that. OpenSSL
39:55 - 40:02

was doing that and it stopped doing that
after a bunch of attacks came from that.
40:02 - 40:07

So in TLS that usually doesn't happen and
the Go TLS stack thankfully never did
40:07 - 40:14

that, so the attack doesn't work against
TLS. But JWT is encoded it straight into
40:14 - 40:21

the standard, because if your public key
is fixed, so is your private key, always
40:21 - 40:31

the same. So if we have a service that
accepts things encrypted with a ECDHES
40:31 - 40:37

algorithm, we can use this attack. For
example against the popular implementation
40:37 - 40:46

Go-Jose, it's not Go-Jose's fault and Go
1.8.1.1, the latest vulnerable version,
40:46 - 40:53

and we can just check if the service can
decrypt what we're sending it. It can,
40:53 - 40:58

because it throws HTTP error when it
can't, because of different timings. This
40:58 - 41:03

changes in any case, but what you need is
the Oracle that tells you: did it work did
41:03 - 41:08

it not work? Did the bug trigger, so are
you right about your prediction of the
41:08 - 41:13

limb or are you not?
Then of course we need to do a lot of
41:13 - 41:22

work. If you don't have the resources of a
big corporation of where to spin up
41:22 - 41:28

things, you can just use EC2 spot
instances. How we architected that, is
41:28 - 41:34

that there will be a small piece of code
that do nothing intensive on your laptop
41:34 - 41:44

or anything. It would accept HTTP requests
from the workers. The beautiful thing
41:44 - 41:48

about this infrastructure is that you can
horizontally scale the workers, spin up as
41:48 - 41:59

many as you want on node.js platforms,
because the only thing that they need to
41:59 - 42:04

be able to do: they don't need to have
ports open, you don't have to worry about
42:04 - 42:08

NAT, you can even run it on your botnet,
because the only thing they have to do is
42:08 - 42:14

connect back and ask for work and then
after 30 cycles or when they find the
42:14 - 42:18

point, connect back and say I found
something, I didn't find anything, give me
42:18 - 42:26

some more work and send the result. This
was also useful, because if you get the
42:26 - 42:31

worker code wrong or if you want to change
the deployment, you can just redeploy the
42:31 - 42:37

workers without losing state on the
dispatcher. Because the dispatcher just
42:37 - 42:45

keeps running and it will just wait for
workers to come and ask. Specifically we
42:45 - 42:52

built this just with the small script that
you can start an EC2 machine with, because
42:52 - 42:59

we didn't even need to make a custom
image. So a few figures, a few numbers.
42:59 - 43:05

Each Key has 52 limbs, it will take a bit
less than that, because we have kangaroos.
43:05 - 43:14

But let's say approximately 52. Each limb
is 16 points on average. It would be 17,
43:14 - 43:19

but two of the points are faster to find,
so let's say 16. Each point takes
43:19 - 43:31

approximately 2 to the 26 candidate
points, before we find one that triggers
43:31 - 43:41

the bug at the right point. Since we need
16 candidate points and each takes 2 to
43:41 - 43:50

the 26 candidate points, that takes
approximately 85 CPU hours. That's like
43:50 - 43:58

one CPU running for an hour, 1 core. Turns
out that you can get 85 CPU hours from
43:58 - 44:04

spot instances for about a dollar and a
quarter, which makes the total cost of the
44:04 - 44:11

attack something like 60 bucks.
Which was a relief, because I had done the
44:11 - 44:17

math wrong first and the it came out as at
1000 and I run the demo tonight and I
44:17 - 44:26

didn't check the bill, yeah.. Anyway, it's
cheap. Now, I am not brave enough to run
44:26 - 44:32

the attack live, because, yes, it's a nice
infrastructure, but no, I don't trust it
44:32 - 44:38

that much. But also, because it takes a
while. If you don't want to spin up
44:38 - 44:47

infinite number of EC2 machines, you have
to accept that it would take about, I
44:47 - 44:54

think, our attack run in 12 hours. So
we're gonna look at a sped-up version of a
44:54 - 44:58

one-hour video in the next 45 minutes, you
have time, right?
44:58 - 45:03

laughter
No, it's a couple of minutes. So this is
45:03 - 45:10

the UI. It shouldn't be too confusing and
if anyone works at Hollywood and wants to
45:10 - 45:17

license it, we can talk. What you're
seeing is that the red values are the ones
45:17 - 45:23

that we found a point for from the workers
and we submitted. And when we submitted
45:23 - 45:29

that it resulted not to be the right limb
and the green ones are the ones that
45:29 - 45:36

instead broke, so they're the right limb.
Remember that here the target is a JWT
45:36 - 45:43

receiving application. And then you can
see the key slowly flipping from the
45:43 - 45:52

bottom and it's exactly like Hollywood, I
love that. So you can see the limbs
45:52 - 45:57

filling up as we find them and that
approximately there's 30 bits, so 2 to the
45:57 - 46:07

30 rounds, so 30 candidates work for each
round for each limb that we find. It
46:07 - 46:17

obviously depends on luck and yes the the
thing will probably keep running for a
46:17 - 46:24

little while, but this is already at limb
nine, it has to get to 52 and you don't
46:24 - 46:35

have that patience. So this was the attack
the code will be open-source soon. Leave
46:35 - 46:40

the limbs you lost, they belong to us now.
And: any questions?
46:40 - 46:56

applause
Herald: Valsorda, thank you very much for
46:56 - 47:02

a lovely talk and the kangaroos. So, we
have a question from the signal angel, go
47:02 - 47:05

ahead, please.
Signal Angel: Actually the internet wants
47:05 - 47:10

to know: Did you compare this bug to
implementation in other libraries.
47:10 - 47:15

Filippo: So I guess that means, if I
looked for similar bugs in other
47:15 - 47:21

implementations. We did not, because each
implementation is a bit different. Hanno
47:21 - 47:28

works on a lot of fuzzing of bigint
implementations and at one point he asked
47:28 - 47:33

me, like on Twitter just today, if I tried
fuzzing the Go implementation for example.
47:33 - 47:41

And sadly this is constant time code that
is specific to P256, so the answer is,
47:41 - 47:46

there's a lot of them and the bug can be
small and anywhere. It's not like you will
47:46 - 47:53

be looking for another bug in P256
subtraction, it can be anywhere in the
47:53 - 47:57

underlying math and we can turn that into
the same attack. So, no, we didn't look
47:57 - 48:05

for this specific one, but I think that
four CVEs in 2017 on open SSL have
48:05 - 48:12

descriptions that are very similar, but
they're about finite field Diffie-Hellman,
48:12 - 48:22

like normal DH. If you look for something
that says about it's barely doable,
48:22 - 48:28

because all the computation can be done
offline. That's this kind of attacks on
48:28 - 48:33

open SSL. Next?
Herald: Are there any other questions from
48:33 - 48:39

the Signal Angel? So please line up at the
microphone. Microphone one please.
48:39 - 48:45

Mic1: So why can't you determine the
points algebraically?
48:45 - 48:52

Filippo: Laziness? No, so it's entirely
assembly and there's a lot of points where
48:52 - 49:01

the value is then thrown out or it might
get corrected by .. it's essentially we
49:01 - 49:08

didn't see a clear path to this, and it's
$65 on ec2, so it doesn't really change
49:08 - 49:15

the feasibility to just fuzz them, so we
just went for the fastest path to the
49:15 - 49:19

entrance.
Herald: Are there any other questions?
49:19 - 49:22

Filippo: No one is asking about kangaroos,
people, I mean..
49:22 - 49:26

Herald: Yes, ask about kangaroos, lovely.
have you been to Australia?
49:26 - 49:29

Filippo: I haven't.
Herald: Okay, you have to.
49:29 - 49:31

Filippo: Yep, definitely.
Herald: I think, there aren't any other
49:31 - 49:37

questions. So Filippo Valsorda, big round
of applause for you. Thank you!
49:37 - 49:40

applause
49:40 - 49:46

34c3 outro
49:46 - 50:02

subtitles created by c3subtitles.de
in the year 2019. Join, and help us!

Title:: 34C3 - Squeezing a key through a carry bit
Description:: more » « less
Video Language:: English
Duration:: 50:02

	C3Subtitles edited English subtitles for 34C3 - Squeezing a key through a carry bit
	C3Subtitles edited English subtitles for 34C3 - Squeezing a key through a carry bit
	Bar Sch edited English subtitles for 34C3 - Squeezing a key through a carry bit
	Bar Sch edited English subtitles for 34C3 - Squeezing a key through a carry bit
	Bar Sch edited English subtitles for 34C3 - Squeezing a key through a carry bit
	harryfuchs edited English subtitles for 34C3 - Squeezing a key through a carry bit
	apo edited English subtitles for 34C3 - Squeezing a key through a carry bit
	Maximilian Marx edited English subtitles for 34C3 - Squeezing a key through a carry bit

Show all

English subtitles

Revisions

Revision 9 Edited

C3Subtitles

34C3 - Squeezing a key through a carry bit

Revisions

Our website uses cookies

Operating cookies (Required)