< Return to Video

34C3 - Squeezing a key through a carry bit

  • 0:00 - 0:16
    34c3 preroll music
    Herald Angel: Today we have our speaker
  • 0:16 - 0:22
    Filippo Valsorda. He is a crypto-engineer
    and he is specialized in Go. He is some
  • 0:22 - 0:29
    kind of Go wizard so to say and used to
    work at CloudFlare. He actually shows now
  • 0:29 - 0:35
    today in the talk how he made an attack to
    exploit a bug in the implementation of the
  • 0:35 - 0:42
    elliptic curve P256 in Go, that's why he's
    a Go wizard. The reason is actually due to
  • 0:42 - 0:48
    a misplaced bit on the assembler level,
    just due to the curves implementation. The
  • 0:48 - 0:54
    result is that in certain cases you can
    retrieve the private key in the elliptic
  • 0:54 - 0:58
    curve Diffie-Hellman encryption scheme.
    Please welcome Filippo Valsorda to his
  • 0:58 - 1:03
    talk, give him a round of applause. Thank
    you very much.
  • 1:03 - 1:06
    Filippo Valsorda: Thank you.
    applause
  • 1:06 - 1:11
    Thanks. I love the term crypto-golfer. Tony
    came up with it and I just want business
  • 1:11 - 1:17
    cards with that on it. Anyway, okay, I'm
    Filippo. As you heard I worked on Internet
  • 1:17 - 1:22
    infrastructure, I work on Go, I work on
    cryptography. But this is a collaboration
  • 1:22 - 1:28
    with Sean Devlin, you might know him as
    one of the authors of the Cryptopals or
  • 1:28 - 1:35
    the matasano crypto challenges. A few
    months ago earlier this year a bug.. a
  • 1:35 - 1:46
    CloudFlare engineer was scanning CT logs.
    CT logs are big logs of TLS certificates
  • 1:46 - 1:52
    and he was checking the ECDSA signatures
    on these certificates and one of the
  • 1:52 - 1:59
    signatures was not verifying. Which is
    weird, because CT logs check the
  • 1:59 - 2:04
    certificates before adding them to the
    log. It turned out that the bug was in the
  • 2:04 - 2:09
    code that CloudFlare was using to verify
    the certificates. And more specifically it
  • 2:09 - 2:18
    was in the Go standard library. It was in the
    implementation of the NIST P256 curve,
  • 2:18 - 2:24
    which is a popular, very hard to implement,
    elliptic curve that is used for example
  • 2:24 - 2:32
    for ECDSA TLS certificates. This curve has
    an assembly implementation in the Go
  • 2:32 - 2:41
    standard library, of course, to squeeze
    every bit of performance out of it, and
  • 2:41 - 2:47
    it's specifically optimized for x86-64
    architecture. So that's where the bug was.
  • 2:47 - 2:53
    There was a carry propagation bug. It was
    reported upstream and everyone agreed that
  • 2:53 - 3:00
    this was not obviously exploitable.
    But Adam Langley also said that it would
  • 3:00 - 3:08
    be a cool paper though. And well I mean,
    how do you pass on that? So Sean and I met
  • 3:08 - 3:14
    in Paris and spend a weekend and then some
    eating Thai food and staring at this
  • 3:14 - 3:20
    assembly code to try to understand what is
    it doing. And one month later we have a
  • 3:20 - 3:31
    CVE and two Go security releases. We found
    a way to go from this single carry bit bug
  • 3:31 - 3:36
    to a full key recovery against protocols
    that use elliptic curve Diffie-Hellman
  • 3:36 - 3:43
    within ephemeral static way. If this means
    nothing to you, it's okay, I will try to
  • 3:43 - 3:51
    go over it. One of these protocols for
    example is JWT. Jozy, or however you want
  • 3:51 - 3:58
    to call it - it has 15 different acronyms.
    And it's often implemented in Go, so this
  • 3:58 - 4:06
    was exploitable against real-world
    services. So, let's start by looking at
  • 4:06 - 4:12
    the code with the bug. Let's take it from
    the beginning. This is the short assembly
  • 4:12 - 4:20
    function that does subtraction. When you
    do elliptic curve math, you usually work
  • 4:20 - 4:28
    on a field, you work on math modulo some
    prime p. So if it was subtraction you do a
  • 4:28 - 4:36
    minus b modulo p and this is what this
    assembly snippet does. It sets a to a
  • 4:36 - 4:44
    minus b. Of course these are numbers way
    too big to fit into a single register. So
  • 4:44 - 4:50
    how do you do math, when you can't fit
    into a single register? You do multi-
  • 4:50 - 4:56
    precision math. And the thing is, you all
    know how to do multi-precision math, you
  • 4:56 - 5:01
    learned that in elementary school. When
    you would write numbers in columns and you
  • 5:01 - 5:07
    do math with register size of 10, because
    for every digit you would subtract two
  • 5:07 - 5:13
    digits and carry the minus one if it went
    negative and then subtract with the carry
  • 5:13 - 5:18
    and then carry the minus one. That's
    exactly what this is doing, but it's doing
  • 5:18 - 5:23
    it instead with digits with 64-bit registers,
    because this is a 64-bit architecture
  • 5:23 - 5:28
    . So we look at the first
    lines: the first lines is nothing else
  • 5:28 - 5:33
    than subtract, subtract, subtract with
    carry. And then the carry is finally
  • 5:33 - 5:41
    stored in that mul0 accumulator. But then
    what do we do if it goes negative?
  • 5:41 - 5:48
    We said that this is modulo p so we can't
    just let it wrap around modulo 2^256,
  • 5:48 - 5:53
    because that's how wide, you know, 4
    registers are. But since we're doing
  • 5:53 - 5:58
    arithmetic modulo a number, we can just
    add that number and the result is the
  • 5:58 - 6:04
    same, right? Adding p modulo p is a no-op
    in the result. So that's what this code
  • 6:04 - 6:11
    does: It does a equal a minus b, it takes
    a copy of the result and it adds p. And
  • 6:11 - 6:18
    then in constant time, it uses that final
    carry to check, if it went negative or not
  • 6:18 - 6:23
    to decide in constant time, which one to
    use. The one with b , so a minus b plus p
  • 6:23 - 6:30
    or a minus B - and that's subtraction.
    Straightforward enough. Now the problem
  • 6:30 - 6:36
    with this code is that if you look closely
    you can see something that might be weird
  • 6:36 - 6:41
    if you're not familiar with assembly. It
    still trips me over. To use a condition
  • 6:41 - 6:45
    like these constant time conditionals
    there. Which have to be constant time,
  • 6:45 - 6:50
    because you don't want to leak timings
    based on the size of number. You have to
  • 6:50 - 6:57
    first operate on mul0, so that you set the
    flags, the zero flag. So normally what you
  • 6:57 - 7:07
    do is either add 0 and 1 to your mul0
    to check if to set the flags. But that's
  • 7:07 - 7:13
    not that's not an add, that's an add with
    carry. It means that it's adding zero to
  • 7:13 - 7:21
    mul0 and the carry bit from this addition
    here, which has nothing to do with it,
  • 7:21 - 7:26
    it's not supposed to be there. Like this
    is adding p, but it doesn't matter, if it
  • 7:26 - 7:32
    overflows, because if it does, it wasn't
    going to be picked here anyway. So it's
  • 7:32 - 7:36
    adding another bit into the operation that
    wasn't supposed to be there. So it's
  • 7:36 - 7:41
    flipping the result, but then also the
    conditions here are flipped, so
  • 7:41 - 7:51
    essentially it evens itself out. Except:
    when that carry bit happens not to be set,
  • 7:51 - 7:55
    because the number a minus b is small
    enough, that if you add P, you don't wrap
  • 7:55 - 8:01
    around. That happens once every 2^32
    times, which is why it's so rare that
  • 8:01 - 8:07
    nobody had noticed so far.
    So the code went in and nobody noticed for
  • 8:07 - 8:10
    a long time, until CloudFlare first
    started scanning CT logs and had this
  • 8:10 - 8:16
    weird one signature that wasn't verifying
    and they were lucky enough to actually
  • 8:16 - 8:21
    have it in the logs because you know if it
    happens transiently you might just think..
  • 8:21 - 8:28
    I don't know, the connection broke, right?
    So this is what we call a carry
  • 8:28 - 8:33
    propagation bug. Carry propagation is the
    activity of making sure that you carry the
  • 8:33 - 8:43
    1. And here it's a bit weird, we didn't
    forget to carry it, but we carried a bit
  • 8:43 - 8:49
    that we weren't supposed to carry into a
    result. Most of the times that goes well,
  • 8:49 - 8:55
    sometimes that breaks. When that breaks:
    wrong result! Wrong result, wrong point
  • 8:55 - 9:03
    computation and wrong point computation,
    so what? Like how does forgetting to carry
  • 9:03 - 9:09
    the one lead to a full key recovery? This
    is not one of those bugs, where like
  • 9:09 - 9:13
    buffer overflow you know what you're
    trying to do even if you might have to
  • 9:13 - 9:19
    jump through so many hoops, because there
    might be all these protections, you know
  • 9:19 - 9:23
    what your capabilities are, you can
    overwrite some memory. Here instead it's
  • 9:23 - 9:30
    not clear what your capability is. So
    today I'm going to try to explain that.
  • 9:30 - 9:36
    How we turn this very rare failed
    computation into a complete key recovery
  • 9:36 - 9:44
    attack. But first I apologize that I have
    to give a elliptic curve cryptography
  • 9:44 - 9:55
    crash course here at CCC. So we've seen
    the field is nothing else than operations
  • 9:55 - 10:03
    modulo p, then there are the points: the
    points are x and y, the coordinates. They
  • 10:03 - 10:08
    fit an equation, which we don't care
    about, they just fit an equation. They are
  • 10:08 - 10:13
    integers, so we can work with them, and we
    can use them to build a group. A group is
  • 10:13 - 10:21
    one of the core structures in modern
    cryptography. To build a group we need a
  • 10:21 - 10:27
    zero point, a generator point, and an
    addition over the group, over the
  • 10:27 - 10:30
    points. So we define an addition,
  • 10:30 - 10:35
    again, we don't care about how addition
    works. It's just: you take two points, you
  • 10:35 - 10:38
    add them together, you get a result.
    It has all the properties that you
  • 10:38 - 10:45
    remember from elementary school addition:
    it's commutative, it's associative. And
  • 10:45 - 10:50
    then you have multiplication: You don't
    actually define how to multiply a point,
  • 10:50 - 10:55
    but if I tell you that you have an
    addition operation and I want five times
  • 10:55 - 11:01
    this point, what do you do? You take the
    point and you add the point and you add
  • 11:01 - 11:07
    the point and you add the point,... so
    this is called scalar multiplication:
  • 11:07 - 11:12
    doing multiplication by adding repeatedly
    a certain point.
  • 11:12 - 11:19
    So now we have a group, which is given by
    multiplying the point, a generator point, a
  • 11:19 - 11:25
    certain number of times and we have a
    multiplication operation. So how do we
  • 11:25 - 11:31
    build cryptography out of it? This is all
    awfully abstract so far? So we build
  • 11:31 - 11:35
    elliptic curve Diffie-Hellman. If you're
    familiar with normal Diffie-Hellman, you
  • 11:35 - 11:41
    will see something snap at some point. The
    private key is a giant number, this is
  • 11:41 - 11:50
    important. The private key in ECDH is just
    a random giant 256 bit number. And then we
  • 11:50 - 11:56
    have a public key. A public key is that
    giant number multiplied, the scalar
  • 11:56 - 12:03
    multiplication we just talked about, by a
    generator point. If you know normal
  • 12:03 - 12:09
    Diffie-Hellman, this is like doing g to
    the a. If you don't, it's okay, this is
  • 12:09 - 12:17
    just multiplying private key by a point.
    And then, when I have my private key and
  • 12:17 - 12:23
    my public key, you send me your public
    key. We need to agree on a shared secret,
  • 12:23 - 12:30
    how we do that, is that I take your public
    key, which is a point. I take this and I
  • 12:30 - 12:36
    multiply it by my private key, here, so
    again: it's my giant number private key
  • 12:36 - 12:42
    multiplied by your point. That comes
    together: The two results are the same,
  • 12:42 - 12:49
    because if we do my private key times your
    private key times g it's the same as your
  • 12:49 - 12:55
    private key times my private key times g,
    because that's commutative. And we land on
  • 12:55 - 13:03
    a shared secret. And that's all we need to
    know about elliptic curve cartography to
  • 13:03 - 13:11
    exploit this. However, there's one thing
    that I glossed over: It's easy to multiply
  • 13:11 - 13:18
    by five, you add five times.
    But if I'm asking you to multiply by a 256
  • 13:18 - 13:27
    bit number, you can't add 2 to the 256
    times a point, right? So what do we do
  • 13:27 - 13:31
    there? Remember that here, what we're
    trying to do is the multiplication: The
  • 13:31 - 13:38
    private key times the public key, the
    point. We do something called double and
  • 13:38 - 13:44
    add: We take our private key and we string
    it out like bits. We start from the most
  • 13:44 - 13:50
    significant bit. This is little endian,
    because I have gotten this slide wrong the
  • 13:50 - 13:55
    first time. But, you know, you just claim
    that you meant it to be the opposite
  • 13:55 - 14:01
    endianness. Anyway, that's the most
    significant bit, the one that is two to
  • 14:01 - 14:08
    the 256, no, two to the 255. If you flip
    it, you're adding or removing two to the
  • 14:08 - 14:14
    255. So you start with 0, that's zed, 0,
    nothing, and you check the first bit, the
  • 14:14 - 14:21
    most important bit: Is that set, yes or
    no? Yes, so you add the point Q, which is
  • 14:21 - 14:26
    the point we're trying to multiply by this
    giant e. So we add the point and then we
  • 14:26 - 14:33
    move down one by doubling. Can you imagine
    how we double something? Remember we only
  • 14:33 - 14:40
    have addition. We add it to itself, of
    course. So we use addition to double the
  • 14:40 - 14:46
    point. And you might see, where we're
    going with this. We double every time we
  • 14:46 - 14:50
    slide down one bit.
    By the time we arrive at the end, how many
  • 14:50 - 14:57
    times did we double that first point that
    we added, because the first bit was one?
  • 14:57 - 15:04
    255 times! That bit was worth 2 to the
    255, so at the end that will have the
  • 15:04 - 15:11
    value it was supposed to have. And we keep
    going, we check if this bit is 1: Is it 1?
  • 15:11 - 15:16
    No. So we do nothing, we double again to
    move down one. We check if this bit is
  • 15:16 - 15:24
    one. This bit is 1: so we add the point.
    So we did add the point, double, double,
  • 15:24 - 15:31
    add the point, double, moving down one and
    we keep going like this. We keep going
  • 15:31 - 15:38
    like this, until we reach the least
    significant bit. At the least significant
  • 15:38 - 15:43
    bit, if it's one, we add the point, if
    it's not, we don't add the point. And at
  • 15:43 - 15:49
    the end we have the correct result and the
    result comes from this sequence of
  • 15:49 - 15:56
    operations which at most are twice 255
    operations, which is something that we can
  • 15:56 - 16:07
    do concretely. Now why did I explain this
    very specific algorithm to you. Because to
  • 16:07 - 16:12
    understand this attack, you have to
    recognize that each key, so each string of
  • 16:12 - 16:20
    bits here, converts into a very specific
    sequence of operations. Okay, if you
  • 16:20 - 16:27
    change one bit, there will be an one more
    addition one less addition and each key
  • 16:27 - 16:33
    has a very specific sequence. In this case
    it's add, double, double, add, double,
  • 16:33 - 16:44
    add, double, double, and it keeps going.
    So back to our bug. If you spaced out,
  • 16:44 - 16:53
    because we took a lot of crypto, I saw
    you! But the two things you should take
  • 16:53 - 16:59
    away are: There's a giant number, it's the
    private key, we want to multiply the giant
  • 16:59 - 17:07
    number by a point and we do that by doing
    additions and doubles in an order that is
  • 17:07 - 17:13
    specified by the bits of the giant number.
    That's what you need to have clear, the
  • 17:13 - 17:20
    only thing. So let's go back to see how we
    use that to turn our very small carry bug
  • 17:20 - 17:27
    into a complete key recovery attack. First
    thing we do is we bubble it up. That
  • 17:27 - 17:31
    function that breaks is called P256
    subinternal. That's the function I showed
  • 17:31 - 17:40
    you earlier. P256 subinternal is used by P256
    point add, which is what we spoke about
  • 17:40 - 17:46
    adding two points, the only important
    operation. And adding two points, we've
  • 17:46 - 17:52
    seen, we use when we're multiplying, when
    we're doing that scalar multiplication,
  • 17:52 - 17:59
    which is d times q, d times the point. And
    how is scalar mult used? Here we finally
  • 17:59 - 18:04
    surfaced to a level that if you work with
    cryptographic protocols you might start
  • 18:04 - 18:10
    recognizing pieces of. Scalar
    multiplication is the operation that the
  • 18:10 - 18:15
    peer does in elliptic curve Diffie-
    Hellman. There's a scalar, which is the
  • 18:15 - 18:20
    secret, which is the private key. There's
    a point, which is the public key of the
  • 18:20 - 18:26
    other person, which might be the attacker.
    So the scalar multiplication here,
  • 18:26 - 18:34
    speaking in InfoSec terms, has an attacker
    supplied point and a secret scalar and the
  • 18:34 - 18:41
    result, the shared secret, is the session
    key. For example: TLS, when you open a
  • 18:41 - 18:47
    connection with TLS and you're using
    elliptic curve Diffie-Hellman, then you
  • 18:47 - 18:53
    will do this dance to agree on a session
    key. If the session key is correct, the
  • 18:53 - 18:59
    connection will open and you will be able
    to, I don't know, get send HTTP request.
  • 18:59 - 19:06
    If the bug is hit and the result is wrong,
    so the result bubbles up into a wrong
  • 19:06 - 19:14
    shared secret, the session key is wrong.
    And the session key is wrong you notice,
  • 19:14 - 19:22
    how do you notice? The connection breaks.
    So this is what in cryptography we call an
  • 19:22 - 19:25
    oracle.
    You have an oracle that you can call and
  • 19:25 - 19:33
    send a point, because that's your public
    key, you're the attacker, and you send
  • 19:33 - 19:40
    that point and it gets multiplied by the
    private key. And it gives you one bit of
  • 19:40 - 19:46
    information, did the bug trigger or did it
    not? Most of the times it won't, because
  • 19:46 - 19:54
    remember, this is an extremely rare bug.
    So you have an oracle that tells you: Bug
  • 19:54 - 19:59
    happen, bug didn't happen, based on the
    point you sent. And let's assume that the
  • 19:59 - 20:05
    key stays the same, we'll talk about that.
    Can you imagine how we use that to start
  • 20:05 - 20:13
    learning things about the key? Well, let's
    say, that we can magically conjure a point
  • 20:13 - 20:17
    that in that sequence of operation, that's
    why I told you the sequence of operation
  • 20:17 - 20:25
    was important, the bug happens very
    specifically at that addition and we find
  • 20:25 - 20:33
    another point where the bug happens very
    specifically at that double. If we know
  • 20:33 - 20:39
    already these bits of the key, and we
    aren't sure about this one, what can we do
  • 20:39 - 20:49
    with these two points? We send them both,
    one of them will break the TLS connection,
  • 20:49 - 20:56
    the other one will succeed. That means
    that the one that broke triggered the bug,
  • 20:56 - 21:02
    the one that didn't break, didn't trigger
    the bug. And we know which one corresponds
  • 21:02 - 21:08
    to which key. Because we magically made
    special points that only break very
  • 21:08 - 21:14
    precisely at that point of the
    computation. Okay, so we learned a bit of
  • 21:14 - 21:19
    the key. In cryptography as soon as you
    learn one bit of the key, there's probably
  • 21:19 - 21:28
    a way all the way down. So we build what's
    called an adaptive attack. Let's say we
  • 21:28 - 21:34
    have these bits, but we want to learn
    these, too. We compute two points, one
  • 21:34 - 21:40
    that breaks, when the addition happens
    exactly at that point in the double and
  • 21:40 - 21:47
    add a procedure, and one that triggers
    only when the add doesn't happen at the
  • 21:47 - 21:53
    specific point in the double and add
    sequence. We send them both, only one of
  • 21:53 - 22:01
    them breaks the TLS connection, well, then
    we know a bit! We go back to our magic
  • 22:01 - 22:06
    point generator and we generate two new
    points.
  • 22:06 - 22:11
    This time we don't look for things that
    break here, we look for things that break
  • 22:11 - 22:17
    here. We make two points, we send them
    both. One of them breaks the connection,
  • 22:17 - 22:21
    the other doesn't break the connection. We
    learned one more bit of the key. We go
  • 22:21 - 22:27
    back, we make two points, we send them
    both: One breaks the connection, one
  • 22:27 - 22:31
    doesn't. We keep going like that, once for
    each bit. Every time we go back and we
  • 22:31 - 22:36
    adapt to what we learned so far. That's
    why it's called an adaptive attack, we
  • 22:36 - 22:41
    can't pre-compute all these points. We
    have to come up with them while we learn
  • 22:41 - 22:48
    the key. And the beautiful thing about
    adaptive attacks is that they look exactly
  • 22:48 - 22:54
    like Hollywood, it's beautiful! Because
    you see them flipping and going through
  • 22:54 - 22:59
    values, getting it right and moving to the
    next one, which you all told it was fake,
  • 22:59 - 23:16
    it was not! Everything else is fake. Okay,
    so, this attack we came up with that, we
  • 23:16 - 23:22
    told we had something extremely novel, we
    went to the literature and everyone that
  • 23:22 - 23:28
    had picked an academic career in the
    audience knows exactly what happened: We
  • 23:28 - 23:34
    found a paper that was doing exactly this.
    However, you know, it was a little
  • 23:34 - 23:44
    different. It was still P256 and it was
    still ECDH and, hmm.. okay it's really
  • 23:44 - 23:51
    similar. But it's an attack that depends a
    lot on the implementation details, how you
  • 23:51 - 23:56
    pull it off. You can't suddenly just
    repurpose the code, but the idea so far:
  • 23:56 - 24:00
    an adaptive attack where you send two
    points and you check which one breaks is
  • 24:00 - 24:06
    the same as a BBB paper from a few years
    ago when it worked against open SSL
  • 24:06 - 24:14
    instead. Instead of against this bug in
    the Go standard library. So instead from
  • 24:14 - 24:21
    now on we're going to talk about how
    exactly we implemented this against Go,
  • 24:21 - 24:26
    because this changes a lot based on the
    implementation details of the library
  • 24:26 - 24:31
    you're working with. So this was the
    general idea of how the attack works. All
  • 24:31 - 24:36
    that: You find points that break at the
    right time, you send them both, and that
  • 24:36 - 24:40
    way you learn a bit of the key,
    except I lied to you.
  • 24:40 - 24:46
    I lied to you, because I lied to you on a
    lot of things: The first one is that Go
  • 24:46 - 24:52
    doesn't do double and add one bit at a
    time, it does it five bits at a time! It's
  • 24:52 - 24:58
    called Booth multiplication, it took us
    forever to figure it out. It's an 1980s
  • 24:58 - 25:14
    paper. Instead of adding one point or zero
    points and then doubling, it adds between
  • 25:14 - 25:20
    minus 16 and plus 16 points and then
    doubles five times moving down. It just
  • 25:20 - 25:27
    does it in blocks of five. So it splits up
    the key and then looks at each block of
  • 25:27 - 25:33
    bits, picks a value from a pre-computed
    table, which is just all the values from
  • 25:33 - 25:41
    one times the point to 16 times the point
    and in the loop it doubles five times,
  • 25:41 - 25:48
    because it moved five bits down. And then
    it chooses, which point between zero and
  • 25:48 - 25:56
    16 to use from the table and it adds that
    to the rolling result, instead of adding 1
  • 25:56 - 26:02
    or 0. There's also a bit of constant time
    arithmetic there, because the other thing
  • 26:02 - 26:08
    I lied to you on is that there's no such
    thing as at 0 point. It's an imaginary
  • 26:08 - 26:13
    point that we add to make the math work,
    but when you try to tell the code to
  • 26:13 - 26:17
    actually add the zero point it's like
    asking you to divide by 0. It just won't
  • 26:17 - 26:26
    do it! But you know how you add 0, right?
    You do nothing! So there's some constant
  • 26:26 - 26:31
    time code here that looks at it and if
    it's 0, it does nothing. If it's not 0, it
  • 26:31 - 26:40
    adds the value.
    So the first thing we had to do to adapt
  • 26:40 - 26:47
    to this format is that we worked in limbs.
    When you hear me say limb, it just means
  • 26:47 - 26:54
    that we will look at each five bit block
    on its own as it is zero to sixteen and a
  • 26:54 - 27:01
    sign value. That's much easier, because
    it's actually kind of weird how the five
  • 27:01 - 27:06
    bits are extracted and I don't want to
    bore you with it. So let's just consider
  • 27:06 - 27:12
    that we look at each group of five bits
    converted into a value from zero to
  • 27:12 - 27:19
    sixteen and a one and a sign or plus minus
    sixteen. So when you hear me talk about
  • 27:19 - 27:25
    limbs you just know that it means the five
    bit value from the key. This is still the
  • 27:25 - 27:35
    giant d key that's the multiplier. So how
    does the attack change? Not that much!
  • 27:35 - 27:39
    Instead of attacking one bit at a time,
    you know two points - one that breaks for
  • 27:39 - 27:45
    zero, one that breaks for one - we attack
    one limb at a time, one that breaks for
  • 27:45 - 27:49
    one, one that breaks for two, one that
    breaks for three, sixteen, minus one,
  • 27:49 - 27:58
    minus two, minus 16. So, to recover five
    bits of the key, we'll need on average
  • 27:58 - 28:04
    half the space: seventeen points, which is
    a little less efficient than the bit-by-
  • 28:04 - 28:16
    bit one, because that would be five points
    for five bits. So, however, how the attack
  • 28:16 - 28:23
    triggers is the same: We're looking for a
    bug that happens in the five doubles at
  • 28:23 - 28:34
    the very right time or that happens at the
    addition at the very right time. Now
  • 28:34 - 28:39
    that's still the high level of how we're
    going to do it, but in practice I didn't
  • 28:39 - 28:44
    tell you how we're going to magically
    generate these magic points that break
  • 28:44 - 28:51
    just at the right time. And I didn't tell
    you because it's fuzzing. There is there's
  • 28:51 - 28:59
    no specific way to generate them
    algebraically, so instead we just hooked
  • 28:59 - 29:04
    the assembly code with something that
    would just set a boolean, you know, true,
  • 29:04 - 29:09
    false, if the conditions for the bug are
    met. And then we run through a lot of
  • 29:09 - 29:15
    points. And if for each point we run it
    through the limbs we already know,
  • 29:15 - 29:20
    remember this is an adaptive attack, so
    we want to learn the next limb.
  • 29:20 - 29:24
    We learned a few limbs, we want to learn
    the next one. We run through the ones we
  • 29:24 - 29:31
    know and then we try all the possible
    values for ones that we don't know. If one
  • 29:31 - 29:37
    of them and only one of them breaks,
    that's a useful point. Because if we send
  • 29:37 - 29:46
    that point and it breaks, we know exactly
    what the value of the next limb is. Okay,
  • 29:46 - 29:54
    so this is going even more low-level. If
    you're interested in optimizations, we had
  • 29:54 - 29:58
    to run through a lot of candidate points
    and for each point we needed to know the d
  • 29:58 - 30:03
    value. Because we can find a magic point,
    but if we don't know the private key, we
  • 30:03 - 30:10
    don't know if the entire protocol works
    and our oracle doesn't work anymore. So to
  • 30:10 - 30:15
    work with that instead of multiplying a
    new random private key every time we just
  • 30:15 - 30:24
    add that one to the private key and added
    the point to the public key, this is just
  • 30:24 - 30:33
    an optimization. We called into the
    various assembly of the standard library.
  • 30:33 - 30:38
    Don't do this, but it's beautiful. You can
    go call all the unexported functions in
  • 30:38 - 30:45
    the standard library. I will never approve
    it on code review, but I love it. And then
  • 30:45 - 30:51
    there's some post-processing to make sure
    that the bug is actually there, because
  • 30:51 - 30:56
    sometimes there are false positives. So we
    just run it through the broken code, the
  • 30:56 - 31:00
    fixed code, is the result the same?
    Well, false positive, is it
  • 31:00 - 31:11
    different, good. Okay, so we have a way to
    move through the attack. The only thing we
  • 31:11 - 31:15
    don't have yet is: How we figure out the
    first limb. The first one, the most
  • 31:15 - 31:19
    important, the most significant one, when
    we still don't know anything about the
  • 31:19 - 31:27
    key. The problem with this one is: It's
    like this, so let's pick three, it's an
  • 31:27 - 31:33
    example okay, let's pretend that the limb
    is three. So we do our usual thing, we do
  • 31:33 - 31:40
    our fuzzing and we find something that
    breaks at the fifth doubling and we send
  • 31:40 - 31:47
    it and it breaks. It means that the first
    limb is three, right? Wrong. Sadly, it
  • 31:47 - 31:55
    might mean that the limb is also 6 or 12.
    Because how six is selected for example is
  • 31:55 - 32:02
    that three is taken, it's doubled and
    saved in the precomputation table as six,
  • 32:02 - 32:06
    then it's taken out of the table, doubled
    five times, but what happens after you
  • 32:06 - 32:13
    double six four times? What's the value?
    The exact same as doubling three five
  • 32:13 - 32:19
    times, and the sequence is the exact same!
    So we don't know, which one it is, because
  • 32:19 - 32:24
    we only know that that's the sequence but
    that doesn't tell us anything. And the
  • 32:24 - 32:30
    same happens for 12, 12 is nothing else
    than 3 double double. So if we double it
  • 32:30 - 32:37
    five times, at the third double it breaks
    and we only know if it breaks or not. So
  • 32:37 - 32:43
    we can't move, so instead what we do is
    that we find three points: one that breaks
  • 32:43 - 32:48
    after doubling three five times, one that
    breaks doubling it six times, and one that
  • 32:48 - 32:54
    breaks after doubling it seven times. We
    send them all and we look at which ones
  • 32:54 - 33:01
    break. Only this one? It's a three! The
    first and the second but not the third
  • 33:01 - 33:08
    point? It's a six! All three? It's a 12!
    This took me forever to wrap my head
  • 33:08 - 33:17
    around, this is like pure shaman insight.
    Okay now I can feel that you're getting
  • 33:17 - 33:26
    bit tired, this is intensive, it's a lot
    of math, so let's go for a change of pace
  • 33:26 - 33:33
    and let's talk about kangaroos instead.
    I'm gonna tell you something that I
  • 33:33 - 33:41
    learned from a cryptographic paper, I
    swear. And it's about how kangaroos jump.
  • 33:41 - 33:49
    Apparently kangaroos jump depending on how
    the terrain on which they are jumping from
  • 33:49 - 33:54
    is. Depending on that terrain, if you put
    two kangaroos on the exact same spot, they
  • 33:54 - 34:00
    will jump the same distance and
    approximately the same direction. I don't
  • 34:00 - 34:05
    know, if this is true, but Pollard said so
    in a paper and I am NOT arguing with
  • 34:05 - 34:14
    Pollard. So now, why is this useful? Well,
    this makes for an extremely cool way to
  • 34:14 - 34:22
    catch kangaroos! I mean, did you expect
    some math or crypto? I know, we're
  • 34:22 - 34:28
    catching kangaroos here! So you take a
    kangaroo that you have on hand, because
  • 34:28 - 34:34
    you all have a kangaroo on hand. And you
    put a GPS tracker on it and you let it
  • 34:34 - 34:43
    loose. This kangaroo jumps and it roams it
    enjoys a brief stint of freedom and it
  • 34:43 - 34:48
    runs and at some point you go pick it up
    and because you know where it is and you
  • 34:48 - 34:55
    place a trap there exactly where you find
    it. What happens to the kangaroo that is
  • 34:55 - 35:05
    just passing by? If it steps at any point
    on one of the points, where the other
  • 35:05 - 35:11
    kangaroo jumped from there on it will
    follow the same path. Because each jump
  • 35:11 - 35:21
    depends on the ground. So this way if the
    wild kangaroo lands on any of the prints
  • 35:21 - 35:26
    of the previous kangaroo you're catching
    it because eventually it will end up in
  • 35:26 - 35:31
    the trap. Okay, this had nothing to do
    with the talk, I just wanted to share
  • 35:31 - 35:38
    this!
    Now, okay, so here is how this converts to
  • 35:38 - 35:45
    crypto. We can make elliptic curve points
    jump like kangaroos, we just have to make
  • 35:45 - 35:54
    the jump deterministic based on the input,
    based on the starting point. For example
  • 35:54 - 36:00
    we can take a hash, any hash, you design
    the hash. Apparently that's popular in
  • 36:00 - 36:02
    cryptocurrencies to design your own hash
    ..
  • 36:02 - 36:13
    laughter
    And you hash a point to another point and
  • 36:13 - 36:20
    when you want to do a jump you take the
    point and you add it to its own hash, so
  • 36:20 - 36:28
    Q_N+1 depends only on QN, and this is just
    like the kangaroo jump. How do you use
  • 36:28 - 36:36
    this for what we're doing? We want to
    recover d, right? We want to recover the
  • 36:36 - 36:44
    multiplier, the discrete log it's often
    called, of the public key. How we work
  • 36:44 - 36:51
    with this is that we take a tame kangaroo,
    a point that we know the d off, and we
  • 36:51 - 36:59
    make it jump a lot. It keeps jumping and
    we remember what the value is at the end.
  • 36:59 - 37:03
    We take that value at the end and we save
    it. No need to keep all the points in
  • 37:03 - 37:10
    between, so we don't need some giant
    memory construction and then we take the
  • 37:10 - 37:17
    point that we don't know the d of and we
    make it jump a lot. What happens is that
  • 37:17 - 37:23
    it has much higher probability to
    intersect one of the various prints of the
  • 37:23 - 37:29
    previous point. When it does, it will
    eventually end up in our trap, it will end
  • 37:29 - 37:37
    up having the same value as the one we
    calculated earlier. When that happens, we
  • 37:37 - 37:44
    know how far the wild point traveled,
    because we were the ones making it jump.
  • 37:44 - 37:49
    So we can just walk backwards to the
    starting point. It turns out that this is
  • 37:49 - 37:56
    extremely efficient to find the d value
    when you know the interval of the d value.
  • 37:56 - 38:02
    The intuition there is that if the
    kangaroo starts in a small area: You know
  • 38:02 - 38:07
    when it's been too much time that it
    probably travelled past your trap. So you
  • 38:07 - 38:15
    have to rewind time, which in crypto you
    can, and try again, because it went past
  • 38:15 - 38:18
    your trap. So the smaller the interval,
    the more precise you can be, the more
  • 38:18 - 38:26
    efficient attack. This is called the
    Pollard-kangaroo attack. It's described in
  • 38:26 - 38:31
    original paper from the 1980s, it was
    described on Diffie-Hellman first, but it
  • 38:31 - 38:35
    works on any group, so it works on
    elliptic curves. And the elliptic curve
  • 38:35 - 38:44
    version is then improved by a few papers
    later and there is a chapter in the
  • 38:44 - 38:50
    elliptic and hyperelliptic handbook
    that describes it.
  • 38:50 - 38:56
    So we have it all, we have how to start,
    we have how to run through the attack and
  • 38:56 - 39:04
    we have a how to end. So now let's get
    concrete, let's pick a target, as I said
  • 39:04 - 39:16
    this attack works against JWT. JWT made
    ... a lot of questionable choices. One of
  • 39:16 - 39:21
    them is to include as one of the public
    key algorithms elliptic curve Diffie-
  • 39:21 - 39:25
    Hellman but not the version of Diffie-
    Hellman you and I are familiar with. The
  • 39:25 - 39:31
    one where we both generate a new private
    key every time, which makes this attack
  • 39:31 - 39:35
    impossible. Remember that this is an
    adaptive attack. We need to query the
  • 39:35 - 39:41
    orcale for the same private key over and
    over and over again. Instead there's a
  • 39:41 - 39:48
    variant called ephemeral static Diffie-
    Hellman, where one of the two sides is
  • 39:48 - 39:55
    always the same. This is sometimes done as
    an optimization, don't do that. OpenSSL
  • 39:55 - 40:02
    was doing that and it stopped doing that
    after a bunch of attacks came from that.
  • 40:02 - 40:07
    So in TLS that usually doesn't happen and
    the Go TLS stack thankfully never did
  • 40:07 - 40:14
    that, so the attack doesn't work against
    TLS. But JWT is encoded it straight into
  • 40:14 - 40:21
    the standard, because if your public key
    is fixed, so is your private key, always
  • 40:21 - 40:31
    the same. So if we have a service that
    accepts things encrypted with a ECDHES
  • 40:31 - 40:37
    algorithm, we can use this attack. For
    example against the popular implementation
  • 40:37 - 40:46
    Go-Jose, it's not Go-Jose's fault and Go
    1.8.1.1, the latest vulnerable version,
  • 40:46 - 40:53
    and we can just check if the service can
    decrypt what we're sending it. It can,
  • 40:53 - 40:58
    because it throws HTTP error when it
    can't, because of different timings. This
  • 40:58 - 41:03
    changes in any case, but what you need is
    the Oracle that tells you: did it work did
  • 41:03 - 41:08
    it not work? Did the bug trigger, so are
    you right about your prediction of the
  • 41:08 - 41:13
    limb or are you not?
    Then of course we need to do a lot of
  • 41:13 - 41:22
    work. If you don't have the resources of a
    big corporation of where to spin up
  • 41:22 - 41:28
    things, you can just use EC2 spot
    instances. How we architected that, is
  • 41:28 - 41:34
    that there will be a small piece of code
    that do nothing intensive on your laptop
  • 41:34 - 41:44
    or anything. It would accept HTTP requests
    from the workers. The beautiful thing
  • 41:44 - 41:48
    about this infrastructure is that you can
    horizontally scale the workers, spin up as
  • 41:48 - 41:59
    many as you want on node.js platforms,
    because the only thing that they need to
  • 41:59 - 42:04
    be able to do: they don't need to have
    ports open, you don't have to worry about
  • 42:04 - 42:08
    NAT, you can even run it on your botnet,
    because the only thing they have to do is
  • 42:08 - 42:14
    connect back and ask for work and then
    after 30 cycles or when they find the
  • 42:14 - 42:18
    point, connect back and say I found
    something, I didn't find anything, give me
  • 42:18 - 42:26
    some more work and send the result. This
    was also useful, because if you get the
  • 42:26 - 42:31
    worker code wrong or if you want to change
    the deployment, you can just redeploy the
  • 42:31 - 42:37
    workers without losing state on the
    dispatcher. Because the dispatcher just
  • 42:37 - 42:45
    keeps running and it will just wait for
    workers to come and ask. Specifically we
  • 42:45 - 42:52
    built this just with the small script that
    you can start an EC2 machine with, because
  • 42:52 - 42:59
    we didn't even need to make a custom
    image. So a few figures, a few numbers.
  • 42:59 - 43:05
    Each Key has 52 limbs, it will take a bit
    less than that, because we have kangaroos.
  • 43:05 - 43:14
    But let's say approximately 52. Each limb
    is 16 points on average. It would be 17,
  • 43:14 - 43:19
    but two of the points are faster to find,
    so let's say 16. Each point takes
  • 43:19 - 43:31
    approximately 2 to the 26 candidate
    points, before we find one that triggers
  • 43:31 - 43:41
    the bug at the right point. Since we need
    16 candidate points and each takes 2 to
  • 43:41 - 43:50
    the 26 candidate points, that takes
    approximately 85 CPU hours. That's like
  • 43:50 - 43:58
    one CPU running for an hour, 1 core. Turns
    out that you can get 85 CPU hours from
  • 43:58 - 44:04
    spot instances for about a dollar and a
    quarter, which makes the total cost of the
  • 44:04 - 44:11
    attack something like 60 bucks.
    Which was a relief, because I had done the
  • 44:11 - 44:17
    math wrong first and the it came out as at
    1000 and I run the demo tonight and I
  • 44:17 - 44:26
    didn't check the bill, yeah.. Anyway, it's
    cheap. Now, I am not brave enough to run
  • 44:26 - 44:32
    the attack live, because, yes, it's a nice
    infrastructure, but no, I don't trust it
  • 44:32 - 44:38
    that much. But also, because it takes a
    while. If you don't want to spin up
  • 44:38 - 44:47
    infinite number of EC2 machines, you have
    to accept that it would take about, I
  • 44:47 - 44:54
    think, our attack run in 12 hours. So
    we're gonna look at a sped-up version of a
  • 44:54 - 44:58
    one-hour video in the next 45 minutes, you
    have time, right?
  • 44:58 - 45:03
    laughter
    No, it's a couple of minutes. So this is
  • 45:03 - 45:10
    the UI. It shouldn't be too confusing and
    if anyone works at Hollywood and wants to
  • 45:10 - 45:17
    license it, we can talk. What you're
    seeing is that the red values are the ones
  • 45:17 - 45:23
    that we found a point for from the workers
    and we submitted. And when we submitted
  • 45:23 - 45:29
    that it resulted not to be the right limb
    and the green ones are the ones that
  • 45:29 - 45:36
    instead broke, so they're the right limb.
    Remember that here the target is a JWT
  • 45:36 - 45:43
    receiving application. And then you can
    see the key slowly flipping from the
  • 45:43 - 45:52
    bottom and it's exactly like Hollywood, I
    love that. So you can see the limbs
  • 45:52 - 45:57
    filling up as we find them and that
    approximately there's 30 bits, so 2 to the
  • 45:57 - 46:07
    30 rounds, so 30 candidates work for each
    round for each limb that we find. It
  • 46:07 - 46:17
    obviously depends on luck and yes the the
    thing will probably keep running for a
  • 46:17 - 46:24
    little while, but this is already at limb
    nine, it has to get to 52 and you don't
  • 46:24 - 46:35
    have that patience. So this was the attack
    the code will be open-source soon. Leave
  • 46:35 - 46:40
    the limbs you lost, they belong to us now.
    And: any questions?
  • 46:40 - 46:56
    applause
    Herald: Valsorda, thank you very much for
  • 46:56 - 47:02
    a lovely talk and the kangaroos. So, we
    have a question from the signal angel, go
  • 47:02 - 47:05
    ahead, please.
    Signal Angel: Actually the internet wants
  • 47:05 - 47:10
    to know: Did you compare this bug to
    implementation in other libraries.
  • 47:10 - 47:15
    Filippo: So I guess that means, if I
    looked for similar bugs in other
  • 47:15 - 47:21
    implementations. We did not, because each
    implementation is a bit different. Hanno
  • 47:21 - 47:28
    works on a lot of fuzzing of bigint
    implementations and at one point he asked
  • 47:28 - 47:33
    me, like on Twitter just today, if I tried
    fuzzing the Go implementation for example.
  • 47:33 - 47:41
    And sadly this is constant time code that
    is specific to P256, so the answer is,
  • 47:41 - 47:46
    there's a lot of them and the bug can be
    small and anywhere. It's not like you will
  • 47:46 - 47:53
    be looking for another bug in P256
    subtraction, it can be anywhere in the
  • 47:53 - 47:57
    underlying math and we can turn that into
    the same attack. So, no, we didn't look
  • 47:57 - 48:05
    for this specific one, but I think that
    four CVEs in 2017 on open SSL have
  • 48:05 - 48:12
    descriptions that are very similar, but
    they're about finite field Diffie-Hellman,
  • 48:12 - 48:22
    like normal DH. If you look for something
    that says about it's barely doable,
  • 48:22 - 48:28
    because all the computation can be done
    offline. That's this kind of attacks on
  • 48:28 - 48:33
    open SSL. Next?
    Herald: Are there any other questions from
  • 48:33 - 48:39
    the Signal Angel? So please line up at the
    microphone. Microphone one please.
  • 48:39 - 48:45
    Mic1: So why can't you determine the
    points algebraically?
  • 48:45 - 48:52
    Filippo: Laziness? No, so it's entirely
    assembly and there's a lot of points where
  • 48:52 - 49:01
    the value is then thrown out or it might
    get corrected by .. it's essentially we
  • 49:01 - 49:08
    didn't see a clear path to this, and it's
    $65 on ec2, so it doesn't really change
  • 49:08 - 49:15
    the feasibility to just fuzz them, so we
    just went for the fastest path to the
  • 49:15 - 49:19
    entrance.
    Herald: Are there any other questions?
  • 49:19 - 49:22
    Filippo: No one is asking about kangaroos,
    people, I mean..
  • 49:22 - 49:26
    Herald: Yes, ask about kangaroos, lovely.
    have you been to Australia?
  • 49:26 - 49:29
    Filippo: I haven't.
    Herald: Okay, you have to.
  • 49:29 - 49:31
    Filippo: Yep, definitely.
    Herald: I think, there aren't any other
  • 49:31 - 49:37
    questions. So Filippo Valsorda, big round
    of applause for you. Thank you!
  • 49:37 - 49:40
    applause
  • 49:40 - 49:46
    34c3 outro
  • 49:46 - 50:02
    subtitles created by c3subtitles.de
    in the year 2019. Join, and help us!
Title:
34C3 - Squeezing a key through a carry bit
Description:

more » « less
Video Language:
English
Duration:
50:02

English subtitles

Revisions