WEBVTT 00:00:00.099 --> 00:00:16.000 34c3 preroll music Herald Angel: Today we have our speaker 00:00:16.000 --> 00:00:21.920 Filippo Valsorda. He is a crypto-engineer and he is specialized in Go. He is some 00:00:21.920 --> 00:00:29.030 kind of Go wizard so to say and used to work at CloudFlare. He actually shows now 00:00:29.030 --> 00:00:35.469 today in the talk how he made an attack to exploit a bug in the implementation of the 00:00:35.469 --> 00:00:41.640 elliptic curve P256 in Go, that's why he's a Go wizard. The reason is actually due to 00:00:41.640 --> 00:00:48.210 a misplaced bit on the assembler level, just due to the curves implementation. The 00:00:48.210 --> 00:00:53.769 result is that in certain cases you can retrieve the private key in the elliptic 00:00:53.769 --> 00:00:58.440 curve Diffie-Hellman encryption scheme. Please welcome Filippo Valsorda to his 00:00:58.440 --> 00:01:03.440 talk, give him a round of applause. Thank you very much. 00:01:03.440 --> 00:01:05.710 Filippo Valsorda: Thank you. applause 00:01:05.710 --> 00:01:11.390 Thanks. I love the term crypto-golfer. Tony came up with it and I just want business 00:01:11.390 --> 00:01:17.350 cards with that on it. Anyway, okay, I'm Filippo. As you heard I worked on Internet 00:01:17.350 --> 00:01:22.340 infrastructure, I work on Go, I work on cryptography. But this is a collaboration 00:01:22.340 --> 00:01:28.229 with Sean Devlin, you might know him as one of the authors of the Cryptopals or 00:01:28.229 --> 00:01:35.111 the matasano crypto challenges. A few months ago earlier this year a bug.. a 00:01:35.111 --> 00:01:46.359 CloudFlare engineer was scanning CT logs. CT logs are big logs of TLS certificates 00:01:46.359 --> 00:01:52.090 and he was checking the ECDSA signatures on these certificates and one of the 00:01:52.090 --> 00:01:58.509 signatures was not verifying. Which is weird, because CT logs check the 00:01:58.509 --> 00:02:04.130 certificates before adding them to the log. It turned out that the bug was in the 00:02:04.130 --> 00:02:09.250 code that CloudFlare was using to verify the certificates. And more specifically it 00:02:09.250 --> 00:02:17.690 was in the Go standard library. It was in the implementation of the NIST P256 curve, 00:02:17.690 --> 00:02:24.360 which is a popular, very hard to implement, elliptic curve that is used for example 00:02:24.360 --> 00:02:31.800 for ECDSA TLS certificates. This curve has an assembly implementation in the Go 00:02:31.800 --> 00:02:40.670 standard library, of course, to squeeze every bit of performance out of it, and 00:02:40.670 --> 00:02:47.420 it's specifically optimized for x86-64 architecture. So that's where the bug was. 00:02:47.420 --> 00:02:53.430 There was a carry propagation bug. It was reported upstream and everyone agreed that 00:02:53.430 --> 00:03:00.420 this was not obviously exploitable. But Adam Langley also said that it would 00:03:00.420 --> 00:03:08.240 be a cool paper though. And well I mean, how do you pass on that? So Sean and I met 00:03:08.240 --> 00:03:13.620 in Paris and spend a weekend and then some eating Thai food and staring at this 00:03:13.620 --> 00:03:20.440 assembly code to try to understand what is it doing. And one month later we have a 00:03:20.440 --> 00:03:30.630 CVE and two Go security releases. We found a way to go from this single carry bit bug 00:03:30.630 --> 00:03:36.290 to a full key recovery against protocols that use elliptic curve Diffie-Hellman 00:03:36.290 --> 00:03:42.520 within ephemeral static way. If this means nothing to you, it's okay, I will try to 00:03:42.520 --> 00:03:51.170 go over it. One of these protocols for example is JWT. Jozy, or however you want 00:03:51.170 --> 00:03:58.210 to call it - it has 15 different acronyms. And it's often implemented in Go, so this 00:03:58.210 --> 00:04:05.850 was exploitable against real-world services. So, let's start by looking at 00:04:05.850 --> 00:04:11.900 the code with the bug. Let's take it from the beginning. This is the short assembly 00:04:11.900 --> 00:04:19.760 function that does subtraction. When you do elliptic curve math, you usually work 00:04:19.760 --> 00:04:27.980 on a field, you work on math modulo some prime p. So if it was subtraction you do a 00:04:27.980 --> 00:04:35.680 minus b modulo p and this is what this assembly snippet does. It sets a to a 00:04:35.680 --> 00:04:44.031 minus b. Of course these are numbers way too big to fit into a single register. So 00:04:44.031 --> 00:04:49.600 how do you do math, when you can't fit into a single register? You do multi- 00:04:49.600 --> 00:04:55.730 precision math. And the thing is, you all know how to do multi-precision math, you 00:04:55.730 --> 00:05:01.360 learned that in elementary school. When you would write numbers in columns and you 00:05:01.360 --> 00:05:06.620 do math with register size of 10, because for every digit you would subtract two 00:05:06.620 --> 00:05:13.110 digits and carry the minus one if it went negative and then subtract with the carry 00:05:13.110 --> 00:05:17.500 and then carry the minus one. That's exactly what this is doing, but it's doing 00:05:17.500 --> 00:05:23.440 it instead with digits with 64-bit registers, because this is a 64-bit architecture 00:05:23.440 --> 00:05:27.680 . So we look at the first lines: the first lines is nothing else 00:05:27.680 --> 00:05:32.870 than subtract, subtract, subtract with carry. And then the carry is finally 00:05:32.870 --> 00:05:40.830 stored in that mul0 accumulator. But then what do we do if it goes negative? 00:05:40.830 --> 00:05:47.550 We said that this is modulo p so we can't just let it wrap around modulo 2^256, 00:05:47.550 --> 00:05:53.270 because that's how wide, you know, 4 registers are. But since we're doing 00:05:53.270 --> 00:05:57.680 arithmetic modulo a number, we can just add that number and the result is the 00:05:57.680 --> 00:06:03.720 same, right? Adding p modulo p is a no-op in the result. So that's what this code 00:06:03.720 --> 00:06:11.310 does: It does a equal a minus b, it takes a copy of the result and it adds p. And 00:06:11.310 --> 00:06:17.620 then in constant time, it uses that final carry to check, if it went negative or not 00:06:17.620 --> 00:06:23.340 to decide in constant time, which one to use. The one with b , so a minus b plus p 00:06:23.340 --> 00:06:30.169 or a minus B - and that's subtraction. Straightforward enough. Now the problem 00:06:30.169 --> 00:06:35.760 with this code is that if you look closely you can see something that might be weird 00:06:35.760 --> 00:06:40.860 if you're not familiar with assembly. It still trips me over. To use a condition 00:06:40.860 --> 00:06:45.150 like these constant time conditionals there. Which have to be constant time, 00:06:45.150 --> 00:06:50.190 because you don't want to leak timings based on the size of number. You have to 00:06:50.190 --> 00:06:56.540 first operate on mul0, so that you set the flags, the zero flag. So normally what you 00:06:56.540 --> 00:07:06.921 do is either add 0 and 1 to your mul0 to check if to set the flags. But that's 00:07:06.921 --> 00:07:13.240 not that's not an add, that's an add with carry. It means that it's adding zero to 00:07:13.240 --> 00:07:20.920 mul0 and the carry bit from this addition here, which has nothing to do with it, 00:07:20.920 --> 00:07:25.930 it's not supposed to be there. Like this is adding p, but it doesn't matter, if it 00:07:25.930 --> 00:07:31.570 overflows, because if it does, it wasn't going to be picked here anyway. So it's 00:07:31.570 --> 00:07:35.640 adding another bit into the operation that wasn't supposed to be there. So it's 00:07:35.640 --> 00:07:41.401 flipping the result, but then also the conditions here are flipped, so 00:07:41.401 --> 00:07:50.990 essentially it evens itself out. Except: when that carry bit happens not to be set, 00:07:50.990 --> 00:07:54.520 because the number a minus b is small enough, that if you add P, you don't wrap 00:07:54.520 --> 00:08:00.840 around. That happens once every 2^32 times, which is why it's so rare that 00:08:00.840 --> 00:08:06.540 nobody had noticed so far. So the code went in and nobody noticed for 00:08:06.540 --> 00:08:09.949 a long time, until CloudFlare first started scanning CT logs and had this 00:08:09.949 --> 00:08:16.120 weird one signature that wasn't verifying and they were lucky enough to actually 00:08:16.120 --> 00:08:21.140 have it in the logs because you know if it happens transiently you might just think.. 00:08:21.140 --> 00:08:27.970 I don't know, the connection broke, right? So this is what we call a carry 00:08:27.970 --> 00:08:32.828 propagation bug. Carry propagation is the activity of making sure that you carry the 00:08:32.828 --> 00:08:43.458 1. And here it's a bit weird, we didn't forget to carry it, but we carried a bit 00:08:43.458 --> 00:08:49.119 that we weren't supposed to carry into a result. Most of the times that goes well, 00:08:49.119 --> 00:08:55.490 sometimes that breaks. When that breaks: wrong result! Wrong result, wrong point 00:08:55.490 --> 00:09:02.889 computation and wrong point computation, so what? Like how does forgetting to carry 00:09:02.889 --> 00:09:09.209 the one lead to a full key recovery? This is not one of those bugs, where like 00:09:09.209 --> 00:09:13.480 buffer overflow you know what you're trying to do even if you might have to 00:09:13.480 --> 00:09:18.680 jump through so many hoops, because there might be all these protections, you know 00:09:18.680 --> 00:09:23.459 what your capabilities are, you can overwrite some memory. Here instead it's 00:09:23.459 --> 00:09:29.540 not clear what your capability is. So today I'm going to try to explain that. 00:09:29.540 --> 00:09:36.009 How we turn this very rare failed computation into a complete key recovery 00:09:36.009 --> 00:09:44.399 attack. But first I apologize that I have to give a elliptic curve cryptography 00:09:44.399 --> 00:09:55.449 crash course here at CCC. So we've seen the field is nothing else than operations 00:09:55.449 --> 00:10:03.269 modulo p, then there are the points: the points are x and y, the coordinates. They 00:10:03.269 --> 00:10:07.970 fit an equation, which we don't care about, they just fit an equation. They are 00:10:07.970 --> 00:10:13.499 integers, so we can work with them, and we can use them to build a group. A group is 00:10:13.499 --> 00:10:21.389 one of the core structures in modern cryptography. To build a group we need a 00:10:21.389 --> 00:10:26.929 zero point, a generator point, and an addition over the group, over the 00:10:26.929 --> 00:10:29.929 points. So we define an addition, 00:10:29.929 --> 00:10:34.649 again, we don't care about how addition works. It's just: you take two points, you 00:10:34.649 --> 00:10:38.120 add them together, you get a result. It has all the properties that you 00:10:38.120 --> 00:10:45.259 remember from elementary school addition: it's commutative, it's associative. And 00:10:45.259 --> 00:10:50.240 then you have multiplication: You don't actually define how to multiply a point, 00:10:50.240 --> 00:10:55.399 but if I tell you that you have an addition operation and I want five times 00:10:55.399 --> 00:11:00.689 this point, what do you do? You take the point and you add the point and you add 00:11:00.689 --> 00:11:06.790 the point and you add the point,... so this is called scalar multiplication: 00:11:06.790 --> 00:11:11.999 doing multiplication by adding repeatedly a certain point. 00:11:11.999 --> 00:11:19.139 So now we have a group, which is given by multiplying the point, a generator point, a 00:11:19.139 --> 00:11:25.110 certain number of times and we have a multiplication operation. So how do we 00:11:25.110 --> 00:11:31.189 build cryptography out of it? This is all awfully abstract so far? So we build 00:11:31.189 --> 00:11:35.180 elliptic curve Diffie-Hellman. If you're familiar with normal Diffie-Hellman, you 00:11:35.180 --> 00:11:41.149 will see something snap at some point. The private key is a giant number, this is 00:11:41.149 --> 00:11:50.470 important. The private key in ECDH is just a random giant 256 bit number. And then we 00:11:50.470 --> 00:11:56.250 have a public key. A public key is that giant number multiplied, the scalar 00:11:56.250 --> 00:12:03.319 multiplication we just talked about, by a generator point. If you know normal 00:12:03.319 --> 00:12:08.879 Diffie-Hellman, this is like doing g to the a. If you don't, it's okay, this is 00:12:08.879 --> 00:12:17.320 just multiplying private key by a point. And then, when I have my private key and 00:12:17.320 --> 00:12:23.230 my public key, you send me your public key. We need to agree on a shared secret, 00:12:23.230 --> 00:12:29.680 how we do that, is that I take your public key, which is a point. I take this and I 00:12:29.680 --> 00:12:36.300 multiply it by my private key, here, so again: it's my giant number private key 00:12:36.300 --> 00:12:41.749 multiplied by your point. That comes together: The two results are the same, 00:12:41.749 --> 00:12:48.680 because if we do my private key times your private key times g it's the same as your 00:12:48.680 --> 00:12:54.570 private key times my private key times g, because that's commutative. And we land on 00:12:54.570 --> 00:13:03.009 a shared secret. And that's all we need to know about elliptic curve cartography to 00:13:03.009 --> 00:13:10.629 exploit this. However, there's one thing that I glossed over: It's easy to multiply 00:13:10.629 --> 00:13:18.000 by five, you add five times. But if I'm asking you to multiply by a 256 00:13:18.000 --> 00:13:26.750 bit number, you can't add 2 to the 256 times a point, right? So what do we do 00:13:26.750 --> 00:13:31.399 there? Remember that here, what we're trying to do is the multiplication: The 00:13:31.399 --> 00:13:38.079 private key times the public key, the point. We do something called double and 00:13:38.079 --> 00:13:44.489 add: We take our private key and we string it out like bits. We start from the most 00:13:44.489 --> 00:13:49.939 significant bit. This is little endian, because I have gotten this slide wrong the 00:13:49.939 --> 00:13:54.850 first time. But, you know, you just claim that you meant it to be the opposite 00:13:54.850 --> 00:14:00.509 endianness. Anyway, that's the most significant bit, the one that is two to 00:14:00.509 --> 00:14:08.299 the 256, no, two to the 255. If you flip it, you're adding or removing two to the 00:14:08.299 --> 00:14:14.449 255. So you start with 0, that's zed, 0, nothing, and you check the first bit, the 00:14:14.449 --> 00:14:20.970 most important bit: Is that set, yes or no? Yes, so you add the point Q, which is 00:14:20.970 --> 00:14:26.129 the point we're trying to multiply by this giant e. So we add the point and then we 00:14:26.129 --> 00:14:33.029 move down one by doubling. Can you imagine how we double something? Remember we only 00:14:33.029 --> 00:14:39.709 have addition. We add it to itself, of course. So we use addition to double the 00:14:39.709 --> 00:14:45.800 point. And you might see, where we're going with this. We double every time we 00:14:45.800 --> 00:14:50.399 slide down one bit. By the time we arrive at the end, how many 00:14:50.399 --> 00:14:56.559 times did we double that first point that we added, because the first bit was one? 00:14:56.559 --> 00:15:04.319 255 times! That bit was worth 2 to the 255, so at the end that will have the 00:15:04.319 --> 00:15:10.559 value it was supposed to have. And we keep going, we check if this bit is 1: Is it 1? 00:15:10.559 --> 00:15:16.129 No. So we do nothing, we double again to move down one. We check if this bit is 00:15:16.129 --> 00:15:23.829 one. This bit is 1: so we add the point. So we did add the point, double, double, 00:15:23.829 --> 00:15:30.829 add the point, double, moving down one and we keep going like this. We keep going 00:15:30.829 --> 00:15:38.290 like this, until we reach the least significant bit. At the least significant 00:15:38.290 --> 00:15:42.899 bit, if it's one, we add the point, if it's not, we don't add the point. And at 00:15:42.899 --> 00:15:49.079 the end we have the correct result and the result comes from this sequence of 00:15:49.079 --> 00:15:56.320 operations which at most are twice 255 operations, which is something that we can 00:15:56.320 --> 00:16:07.399 do concretely. Now why did I explain this very specific algorithm to you. Because to 00:16:07.399 --> 00:16:12.199 understand this attack, you have to recognize that each key, so each string of 00:16:12.199 --> 00:16:19.840 bits here, converts into a very specific sequence of operations. Okay, if you 00:16:19.840 --> 00:16:26.959 change one bit, there will be an one more addition one less addition and each key 00:16:26.959 --> 00:16:33.260 has a very specific sequence. In this case it's add, double, double, add, double, 00:16:33.260 --> 00:16:44.230 add, double, double, and it keeps going. So back to our bug. If you spaced out, 00:16:44.230 --> 00:16:53.290 because we took a lot of crypto, I saw you! But the two things you should take 00:16:53.290 --> 00:16:59.300 away are: There's a giant number, it's the private key, we want to multiply the giant 00:16:59.300 --> 00:17:06.589 number by a point and we do that by doing additions and doubles in an order that is 00:17:06.589 --> 00:17:12.501 specified by the bits of the giant number. That's what you need to have clear, the 00:17:12.501 --> 00:17:20.299 only thing. So let's go back to see how we use that to turn our very small carry bug 00:17:20.299 --> 00:17:26.520 into a complete key recovery attack. First thing we do is we bubble it up. That 00:17:26.520 --> 00:17:30.820 function that breaks is called P256 subinternal. That's the function I showed 00:17:30.820 --> 00:17:39.769 you earlier. P256 subinternal is used by P256 point add, which is what we spoke about 00:17:39.769 --> 00:17:45.929 adding two points, the only important operation. And adding two points, we've 00:17:45.929 --> 00:17:51.950 seen, we use when we're multiplying, when we're doing that scalar multiplication, 00:17:51.950 --> 00:17:59.269 which is d times q, d times the point. And how is scalar mult used? Here we finally 00:17:59.269 --> 00:18:04.309 surfaced to a level that if you work with cryptographic protocols you might start 00:18:04.309 --> 00:18:09.840 recognizing pieces of. Scalar multiplication is the operation that the 00:18:09.840 --> 00:18:15.450 peer does in elliptic curve Diffie- Hellman. There's a scalar, which is the 00:18:15.450 --> 00:18:20.019 secret, which is the private key. There's a point, which is the public key of the 00:18:20.019 --> 00:18:26.000 other person, which might be the attacker. So the scalar multiplication here, 00:18:26.000 --> 00:18:34.090 speaking in InfoSec terms, has an attacker supplied point and a secret scalar and the 00:18:34.090 --> 00:18:41.350 result, the shared secret, is the session key. For example: TLS, when you open a 00:18:41.350 --> 00:18:47.019 connection with TLS and you're using elliptic curve Diffie-Hellman, then you 00:18:47.019 --> 00:18:53.360 will do this dance to agree on a session key. If the session key is correct, the 00:18:53.360 --> 00:18:59.320 connection will open and you will be able to, I don't know, get send HTTP request. 00:18:59.320 --> 00:19:06.110 If the bug is hit and the result is wrong, so the result bubbles up into a wrong 00:19:06.110 --> 00:19:14.370 shared secret, the session key is wrong. And the session key is wrong you notice, 00:19:14.370 --> 00:19:21.860 how do you notice? The connection breaks. So this is what in cryptography we call an 00:19:21.860 --> 00:19:25.230 oracle. You have an oracle that you can call and 00:19:25.230 --> 00:19:33.059 send a point, because that's your public key, you're the attacker, and you send 00:19:33.059 --> 00:19:39.929 that point and it gets multiplied by the private key. And it gives you one bit of 00:19:39.929 --> 00:19:46.220 information, did the bug trigger or did it not? Most of the times it won't, because 00:19:46.220 --> 00:19:54.159 remember, this is an extremely rare bug. So you have an oracle that tells you: Bug 00:19:54.159 --> 00:19:58.899 happen, bug didn't happen, based on the point you sent. And let's assume that the 00:19:58.899 --> 00:20:04.919 key stays the same, we'll talk about that. Can you imagine how we use that to start 00:20:04.919 --> 00:20:13.470 learning things about the key? Well, let's say, that we can magically conjure a point 00:20:13.470 --> 00:20:17.221 that in that sequence of operation, that's why I told you the sequence of operation 00:20:17.221 --> 00:20:25.380 was important, the bug happens very specifically at that addition and we find 00:20:25.380 --> 00:20:32.950 another point where the bug happens very specifically at that double. If we know 00:20:32.950 --> 00:20:39.210 already these bits of the key, and we aren't sure about this one, what can we do 00:20:39.210 --> 00:20:48.820 with these two points? We send them both, one of them will break the TLS connection, 00:20:48.820 --> 00:20:55.690 the other one will succeed. That means that the one that broke triggered the bug, 00:20:55.690 --> 00:21:01.679 the one that didn't break, didn't trigger the bug. And we know which one corresponds 00:21:01.679 --> 00:21:08.080 to which key. Because we magically made special points that only break very 00:21:08.080 --> 00:21:13.820 precisely at that point of the computation. Okay, so we learned a bit of 00:21:13.820 --> 00:21:19.240 the key. In cryptography as soon as you learn one bit of the key, there's probably 00:21:19.240 --> 00:21:28.140 a way all the way down. So we build what's called an adaptive attack. Let's say we 00:21:28.140 --> 00:21:34.070 have these bits, but we want to learn these, too. We compute two points, one 00:21:34.070 --> 00:21:40.110 that breaks, when the addition happens exactly at that point in the double and 00:21:40.110 --> 00:21:46.990 add a procedure, and one that triggers only when the add doesn't happen at the 00:21:46.990 --> 00:21:52.929 specific point in the double and add sequence. We send them both, only one of 00:21:52.929 --> 00:22:00.950 them breaks the TLS connection, well, then we know a bit! We go back to our magic 00:22:00.950 --> 00:22:05.669 point generator and we generate two new points. 00:22:05.669 --> 00:22:10.519 This time we don't look for things that break here, we look for things that break 00:22:10.519 --> 00:22:17.100 here. We make two points, we send them both. One of them breaks the connection, 00:22:17.100 --> 00:22:21.070 the other doesn't break the connection. We learned one more bit of the key. We go 00:22:21.070 --> 00:22:26.940 back, we make two points, we send them both: One breaks the connection, one 00:22:26.940 --> 00:22:31.410 doesn't. We keep going like that, once for each bit. Every time we go back and we 00:22:31.410 --> 00:22:36.110 adapt to what we learned so far. That's why it's called an adaptive attack, we 00:22:36.110 --> 00:22:41.269 can't pre-compute all these points. We have to come up with them while we learn 00:22:41.269 --> 00:22:47.940 the key. And the beautiful thing about adaptive attacks is that they look exactly 00:22:47.940 --> 00:22:53.919 like Hollywood, it's beautiful! Because you see them flipping and going through 00:22:53.919 --> 00:22:58.540 values, getting it right and moving to the next one, which you all told it was fake, 00:22:58.540 --> 00:23:16.010 it was not! Everything else is fake. Okay, so, this attack we came up with that, we 00:23:16.010 --> 00:23:21.730 told we had something extremely novel, we went to the literature and everyone that 00:23:21.730 --> 00:23:27.860 had picked an academic career in the audience knows exactly what happened: We 00:23:27.860 --> 00:23:33.659 found a paper that was doing exactly this. However, you know, it was a little 00:23:33.659 --> 00:23:44.109 different. It was still P256 and it was still ECDH and, hmm.. okay it's really 00:23:44.109 --> 00:23:50.740 similar. But it's an attack that depends a lot on the implementation details, how you 00:23:50.740 --> 00:23:55.629 pull it off. You can't suddenly just repurpose the code, but the idea so far: 00:23:55.629 --> 00:23:59.679 an adaptive attack where you send two points and you check which one breaks is 00:23:59.679 --> 00:24:05.580 the same as a BBB paper from a few years ago when it worked against open SSL 00:24:05.580 --> 00:24:13.530 instead. Instead of against this bug in the Go standard library. So instead from 00:24:13.530 --> 00:24:20.789 now on we're going to talk about how exactly we implemented this against Go, 00:24:20.789 --> 00:24:26.190 because this changes a lot based on the implementation details of the library 00:24:26.190 --> 00:24:31.450 you're working with. So this was the general idea of how the attack works. All 00:24:31.450 --> 00:24:35.920 that: You find points that break at the right time, you send them both, and that 00:24:35.920 --> 00:24:40.389 way you learn a bit of the key, except I lied to you. 00:24:40.389 --> 00:24:46.050 I lied to you, because I lied to you on a lot of things: The first one is that Go 00:24:46.050 --> 00:24:51.571 doesn't do double and add one bit at a time, it does it five bits at a time! It's 00:24:51.571 --> 00:24:57.730 called Booth multiplication, it took us forever to figure it out. It's an 1980s 00:24:57.730 --> 00:25:13.570 paper. Instead of adding one point or zero points and then doubling, it adds between 00:25:13.570 --> 00:25:20.059 minus 16 and plus 16 points and then doubles five times moving down. It just 00:25:20.059 --> 00:25:27.149 does it in blocks of five. So it splits up the key and then looks at each block of 00:25:27.149 --> 00:25:33.470 bits, picks a value from a pre-computed table, which is just all the values from 00:25:33.470 --> 00:25:40.620 one times the point to 16 times the point and in the loop it doubles five times, 00:25:40.620 --> 00:25:47.899 because it moved five bits down. And then it chooses, which point between zero and 00:25:47.899 --> 00:25:55.590 16 to use from the table and it adds that to the rolling result, instead of adding 1 00:25:55.590 --> 00:26:02.340 or 0. There's also a bit of constant time arithmetic there, because the other thing 00:26:02.340 --> 00:26:08.260 I lied to you on is that there's no such thing as at 0 point. It's an imaginary 00:26:08.260 --> 00:26:12.799 point that we add to make the math work, but when you try to tell the code to 00:26:12.799 --> 00:26:17.279 actually add the zero point it's like asking you to divide by 0. It just won't 00:26:17.279 --> 00:26:25.539 do it! But you know how you add 0, right? You do nothing! So there's some constant 00:26:25.539 --> 00:26:30.820 time code here that looks at it and if it's 0, it does nothing. If it's not 0, it 00:26:30.820 --> 00:26:39.669 adds the value. So the first thing we had to do to adapt 00:26:39.669 --> 00:26:46.690 to this format is that we worked in limbs. When you hear me say limb, it just means 00:26:46.690 --> 00:26:53.850 that we will look at each five bit block on its own as it is zero to sixteen and a 00:26:53.850 --> 00:27:01.019 sign value. That's much easier, because it's actually kind of weird how the five 00:27:01.019 --> 00:27:05.570 bits are extracted and I don't want to bore you with it. So let's just consider 00:27:05.570 --> 00:27:12.120 that we look at each group of five bits converted into a value from zero to 00:27:12.120 --> 00:27:18.850 sixteen and a one and a sign or plus minus sixteen. So when you hear me talk about 00:27:18.850 --> 00:27:24.529 limbs you just know that it means the five bit value from the key. This is still the 00:27:24.529 --> 00:27:34.730 giant d key that's the multiplier. So how does the attack change? Not that much! 00:27:34.730 --> 00:27:39.299 Instead of attacking one bit at a time, you know two points - one that breaks for 00:27:39.299 --> 00:27:45.179 zero, one that breaks for one - we attack one limb at a time, one that breaks for 00:27:45.179 --> 00:27:49.020 one, one that breaks for two, one that breaks for three, sixteen, minus one, 00:27:49.020 --> 00:27:57.720 minus two, minus 16. So, to recover five bits of the key, we'll need on average 00:27:57.720 --> 00:28:04.429 half the space: seventeen points, which is a little less efficient than the bit-by- 00:28:04.429 --> 00:28:15.950 bit one, because that would be five points for five bits. So, however, how the attack 00:28:15.950 --> 00:28:23.059 triggers is the same: We're looking for a bug that happens in the five doubles at 00:28:23.059 --> 00:28:33.700 the very right time or that happens at the addition at the very right time. Now 00:28:33.700 --> 00:28:38.649 that's still the high level of how we're going to do it, but in practice I didn't 00:28:38.649 --> 00:28:43.870 tell you how we're going to magically generate these magic points that break 00:28:43.870 --> 00:28:51.379 just at the right time. And I didn't tell you because it's fuzzing. There is there's 00:28:51.379 --> 00:28:58.830 no specific way to generate them algebraically, so instead we just hooked 00:28:58.830 --> 00:29:03.500 the assembly code with something that would just set a boolean, you know, true, 00:29:03.500 --> 00:29:08.590 false, if the conditions for the bug are met. And then we run through a lot of 00:29:08.590 --> 00:29:14.929 points. And if for each point we run it through the limbs we already know, 00:29:14.929 --> 00:29:19.519 remember this is an adaptive attack, so we want to learn the next limb. 00:29:19.519 --> 00:29:24.240 We learned a few limbs, we want to learn the next one. We run through the ones we 00:29:24.240 --> 00:29:30.631 know and then we try all the possible values for ones that we don't know. If one 00:29:30.631 --> 00:29:36.610 of them and only one of them breaks, that's a useful point. Because if we send 00:29:36.610 --> 00:29:46.390 that point and it breaks, we know exactly what the value of the next limb is. Okay, 00:29:46.390 --> 00:29:53.679 so this is going even more low-level. If you're interested in optimizations, we had 00:29:53.679 --> 00:29:57.860 to run through a lot of candidate points and for each point we needed to know the d 00:29:57.860 --> 00:30:02.850 value. Because we can find a magic point, but if we don't know the private key, we 00:30:02.850 --> 00:30:10.049 don't know if the entire protocol works and our oracle doesn't work anymore. So to 00:30:10.049 --> 00:30:15.080 work with that instead of multiplying a new random private key every time we just 00:30:15.080 --> 00:30:23.789 add that one to the private key and added the point to the public key, this is just 00:30:23.789 --> 00:30:32.769 an optimization. We called into the various assembly of the standard library. 00:30:32.769 --> 00:30:37.840 Don't do this, but it's beautiful. You can go call all the unexported functions in 00:30:37.840 --> 00:30:45.000 the standard library. I will never approve it on code review, but I love it. And then 00:30:45.000 --> 00:30:51.210 there's some post-processing to make sure that the bug is actually there, because 00:30:51.210 --> 00:30:55.970 sometimes there are false positives. So we just run it through the broken code, the 00:30:55.970 --> 00:30:59.740 fixed code, is the result the same? Well, false positive, is it 00:30:59.740 --> 00:31:10.690 different, good. Okay, so we have a way to move through the attack. The only thing we 00:31:10.690 --> 00:31:15.090 don't have yet is: How we figure out the first limb. The first one, the most 00:31:15.090 --> 00:31:19.100 important, the most significant one, when we still don't know anything about the 00:31:19.100 --> 00:31:27.210 key. The problem with this one is: It's like this, so let's pick three, it's an 00:31:27.210 --> 00:31:32.830 example okay, let's pretend that the limb is three. So we do our usual thing, we do 00:31:32.830 --> 00:31:40.330 our fuzzing and we find something that breaks at the fifth doubling and we send 00:31:40.330 --> 00:31:47.020 it and it breaks. It means that the first limb is three, right? Wrong. Sadly, it 00:31:47.020 --> 00:31:55.059 might mean that the limb is also 6 or 12. Because how six is selected for example is 00:31:55.059 --> 00:32:01.769 that three is taken, it's doubled and saved in the precomputation table as six, 00:32:01.769 --> 00:32:06.080 then it's taken out of the table, doubled five times, but what happens after you 00:32:06.080 --> 00:32:13.029 double six four times? What's the value? The exact same as doubling three five 00:32:13.029 --> 00:32:19.390 times, and the sequence is the exact same! So we don't know, which one it is, because 00:32:19.390 --> 00:32:23.850 we only know that that's the sequence but that doesn't tell us anything. And the 00:32:23.850 --> 00:32:29.740 same happens for 12, 12 is nothing else than 3 double double. So if we double it 00:32:29.740 --> 00:32:36.779 five times, at the third double it breaks and we only know if it breaks or not. So 00:32:36.779 --> 00:32:42.679 we can't move, so instead what we do is that we find three points: one that breaks 00:32:42.679 --> 00:32:48.350 after doubling three five times, one that breaks doubling it six times, and one that 00:32:48.350 --> 00:32:54.020 breaks after doubling it seven times. We send them all and we look at which ones 00:32:54.020 --> 00:33:01.110 break. Only this one? It's a three! The first and the second but not the third 00:33:01.110 --> 00:33:07.669 point? It's a six! All three? It's a 12! This took me forever to wrap my head 00:33:07.669 --> 00:33:17.250 around, this is like pure shaman insight. Okay now I can feel that you're getting 00:33:17.250 --> 00:33:25.870 bit tired, this is intensive, it's a lot of math, so let's go for a change of pace 00:33:25.870 --> 00:33:32.659 and let's talk about kangaroos instead. I'm gonna tell you something that I 00:33:32.659 --> 00:33:40.909 learned from a cryptographic paper, I swear. And it's about how kangaroos jump. 00:33:40.909 --> 00:33:48.729 Apparently kangaroos jump depending on how the terrain on which they are jumping from 00:33:48.729 --> 00:33:54.370 is. Depending on that terrain, if you put two kangaroos on the exact same spot, they 00:33:54.370 --> 00:33:59.770 will jump the same distance and approximately the same direction. I don't 00:33:59.770 --> 00:34:05.470 know, if this is true, but Pollard said so in a paper and I am NOT arguing with 00:34:05.470 --> 00:34:13.541 Pollard. So now, why is this useful? Well, this makes for an extremely cool way to 00:34:13.541 --> 00:34:22.311 catch kangaroos! I mean, did you expect some math or crypto? I know, we're 00:34:22.311 --> 00:34:27.920 catching kangaroos here! So you take a kangaroo that you have on hand, because 00:34:27.920 --> 00:34:34.219 you all have a kangaroo on hand. And you put a GPS tracker on it and you let it 00:34:34.219 --> 00:34:43.250 loose. This kangaroo jumps and it roams it enjoys a brief stint of freedom and it 00:34:43.250 --> 00:34:48.440 runs and at some point you go pick it up and because you know where it is and you 00:34:48.440 --> 00:34:54.679 place a trap there exactly where you find it. What happens to the kangaroo that is 00:34:54.679 --> 00:35:04.630 just passing by? If it steps at any point on one of the points, where the other 00:35:04.630 --> 00:35:10.820 kangaroo jumped from there on it will follow the same path. Because each jump 00:35:10.820 --> 00:35:20.750 depends on the ground. So this way if the wild kangaroo lands on any of the prints 00:35:20.750 --> 00:35:25.650 of the previous kangaroo you're catching it because eventually it will end up in 00:35:25.650 --> 00:35:30.940 the trap. Okay, this had nothing to do with the talk, I just wanted to share 00:35:30.940 --> 00:35:38.350 this! Now, okay, so here is how this converts to 00:35:38.350 --> 00:35:44.950 crypto. We can make elliptic curve points jump like kangaroos, we just have to make 00:35:44.950 --> 00:35:53.670 the jump deterministic based on the input, based on the starting point. For example 00:35:53.670 --> 00:35:59.770 we can take a hash, any hash, you design the hash. Apparently that's popular in 00:35:59.770 --> 00:36:01.740 cryptocurrencies to design your own hash .. 00:36:01.740 --> 00:36:13.090 laughter And you hash a point to another point and 00:36:13.090 --> 00:36:19.870 when you want to do a jump you take the point and you add it to its own hash, so 00:36:19.870 --> 00:36:27.630 Q_N+1 depends only on QN, and this is just like the kangaroo jump. How do you use 00:36:27.630 --> 00:36:35.810 this for what we're doing? We want to recover d, right? We want to recover the 00:36:35.810 --> 00:36:43.930 multiplier, the discrete log it's often called, of the public key. How we work 00:36:43.930 --> 00:36:50.810 with this is that we take a tame kangaroo, a point that we know the d off, and we 00:36:50.810 --> 00:36:58.660 make it jump a lot. It keeps jumping and we remember what the value is at the end. 00:36:58.660 --> 00:37:02.900 We take that value at the end and we save it. No need to keep all the points in 00:37:02.900 --> 00:37:09.810 between, so we don't need some giant memory construction and then we take the 00:37:09.810 --> 00:37:16.740 point that we don't know the d of and we make it jump a lot. What happens is that 00:37:16.740 --> 00:37:23.170 it has much higher probability to intersect one of the various prints of the 00:37:23.170 --> 00:37:28.570 previous point. When it does, it will eventually end up in our trap, it will end 00:37:28.570 --> 00:37:36.950 up having the same value as the one we calculated earlier. When that happens, we 00:37:36.950 --> 00:37:43.700 know how far the wild point traveled, because we were the ones making it jump. 00:37:43.700 --> 00:37:49.490 So we can just walk backwards to the starting point. It turns out that this is 00:37:49.490 --> 00:37:55.880 extremely efficient to find the d value when you know the interval of the d value. 00:37:55.880 --> 00:38:01.880 The intuition there is that if the kangaroo starts in a small area: You know 00:38:01.880 --> 00:38:07.190 when it's been too much time that it probably travelled past your trap. So you 00:38:07.190 --> 00:38:14.680 have to rewind time, which in crypto you can, and try again, because it went past 00:38:14.680 --> 00:38:18.360 your trap. So the smaller the interval, the more precise you can be, the more 00:38:18.360 --> 00:38:26.410 efficient attack. This is called the Pollard-kangaroo attack. It's described in 00:38:26.410 --> 00:38:30.791 original paper from the 1980s, it was described on Diffie-Hellman first, but it 00:38:30.791 --> 00:38:34.680 works on any group, so it works on elliptic curves. And the elliptic curve 00:38:34.680 --> 00:38:44.430 version is then improved by a few papers later and there is a chapter in the 00:38:44.430 --> 00:38:50.310 elliptic and hyperelliptic handbook that describes it. 00:38:50.310 --> 00:38:55.610 So we have it all, we have how to start, we have how to run through the attack and 00:38:55.610 --> 00:39:03.940 we have a how to end. So now let's get concrete, let's pick a target, as I said 00:39:03.940 --> 00:39:15.660 this attack works against JWT. JWT made ... a lot of questionable choices. One of 00:39:15.660 --> 00:39:21.021 them is to include as one of the public key algorithms elliptic curve Diffie- 00:39:21.021 --> 00:39:25.031 Hellman but not the version of Diffie- Hellman you and I are familiar with. The 00:39:25.031 --> 00:39:30.900 one where we both generate a new private key every time, which makes this attack 00:39:30.900 --> 00:39:35.160 impossible. Remember that this is an adaptive attack. We need to query the 00:39:35.160 --> 00:39:40.730 orcale for the same private key over and over and over again. Instead there's a 00:39:40.730 --> 00:39:48.450 variant called ephemeral static Diffie- Hellman, where one of the two sides is 00:39:48.450 --> 00:39:54.690 always the same. This is sometimes done as an optimization, don't do that. OpenSSL 00:39:54.690 --> 00:40:02.020 was doing that and it stopped doing that after a bunch of attacks came from that. 00:40:02.020 --> 00:40:06.960 So in TLS that usually doesn't happen and the Go TLS stack thankfully never did 00:40:06.960 --> 00:40:14.000 that, so the attack doesn't work against TLS. But JWT is encoded it straight into 00:40:14.000 --> 00:40:20.800 the standard, because if your public key is fixed, so is your private key, always 00:40:20.800 --> 00:40:30.610 the same. So if we have a service that accepts things encrypted with a ECDHES 00:40:30.610 --> 00:40:36.740 algorithm, we can use this attack. For example against the popular implementation 00:40:36.740 --> 00:40:46.370 Go-Jose, it's not Go-Jose's fault and Go 1.8.1.1, the latest vulnerable version, 00:40:46.370 --> 00:40:52.840 and we can just check if the service can decrypt what we're sending it. It can, 00:40:52.840 --> 00:40:58.430 because it throws HTTP error when it can't, because of different timings. This 00:40:58.430 --> 00:41:03.300 changes in any case, but what you need is the Oracle that tells you: did it work did 00:41:03.300 --> 00:41:08.280 it not work? Did the bug trigger, so are you right about your prediction of the 00:41:08.280 --> 00:41:13.370 limb or are you not? Then of course we need to do a lot of 00:41:13.370 --> 00:41:22.430 work. If you don't have the resources of a big corporation of where to spin up 00:41:22.430 --> 00:41:27.940 things, you can just use EC2 spot instances. How we architected that, is 00:41:27.940 --> 00:41:33.940 that there will be a small piece of code that do nothing intensive on your laptop 00:41:33.940 --> 00:41:43.540 or anything. It would accept HTTP requests from the workers. The beautiful thing 00:41:43.540 --> 00:41:48.360 about this infrastructure is that you can horizontally scale the workers, spin up as 00:41:48.360 --> 00:41:58.840 many as you want on node.js platforms, because the only thing that they need to 00:41:58.840 --> 00:42:04.040 be able to do: they don't need to have ports open, you don't have to worry about 00:42:04.040 --> 00:42:07.990 NAT, you can even run it on your botnet, because the only thing they have to do is 00:42:07.990 --> 00:42:14.230 connect back and ask for work and then after 30 cycles or when they find the 00:42:14.230 --> 00:42:18.180 point, connect back and say I found something, I didn't find anything, give me 00:42:18.180 --> 00:42:26.500 some more work and send the result. This was also useful, because if you get the 00:42:26.500 --> 00:42:31.210 worker code wrong or if you want to change the deployment, you can just redeploy the 00:42:31.210 --> 00:42:36.620 workers without losing state on the dispatcher. Because the dispatcher just 00:42:36.620 --> 00:42:44.930 keeps running and it will just wait for workers to come and ask. Specifically we 00:42:44.930 --> 00:42:51.800 built this just with the small script that you can start an EC2 machine with, because 00:42:51.800 --> 00:42:59.370 we didn't even need to make a custom image. So a few figures, a few numbers. 00:42:59.370 --> 00:43:05.260 Each Key has 52 limbs, it will take a bit less than that, because we have kangaroos. 00:43:05.260 --> 00:43:14.100 But let's say approximately 52. Each limb is 16 points on average. It would be 17, 00:43:14.100 --> 00:43:18.750 but two of the points are faster to find, so let's say 16. Each point takes 00:43:18.750 --> 00:43:30.570 approximately 2 to the 26 candidate points, before we find one that triggers 00:43:30.570 --> 00:43:41.010 the bug at the right point. Since we need 16 candidate points and each takes 2 to 00:43:41.010 --> 00:43:49.740 the 26 candidate points, that takes approximately 85 CPU hours. That's like 00:43:49.740 --> 00:43:57.520 one CPU running for an hour, 1 core. Turns out that you can get 85 CPU hours from 00:43:57.520 --> 00:44:04.440 spot instances for about a dollar and a quarter, which makes the total cost of the 00:44:04.440 --> 00:44:10.580 attack something like 60 bucks. Which was a relief, because I had done the 00:44:10.580 --> 00:44:16.780 math wrong first and the it came out as at 1000 and I run the demo tonight and I 00:44:16.780 --> 00:44:26.420 didn't check the bill, yeah.. Anyway, it's cheap. Now, I am not brave enough to run 00:44:26.420 --> 00:44:31.810 the attack live, because, yes, it's a nice infrastructure, but no, I don't trust it 00:44:31.810 --> 00:44:38.390 that much. But also, because it takes a while. If you don't want to spin up 00:44:38.390 --> 00:44:46.620 infinite number of EC2 machines, you have to accept that it would take about, I 00:44:46.620 --> 00:44:53.520 think, our attack run in 12 hours. So we're gonna look at a sped-up version of a 00:44:53.520 --> 00:44:58.180 one-hour video in the next 45 minutes, you have time, right? 00:44:58.180 --> 00:45:02.650 laughter No, it's a couple of minutes. So this is 00:45:02.650 --> 00:45:09.810 the UI. It shouldn't be too confusing and if anyone works at Hollywood and wants to 00:45:09.810 --> 00:45:17.100 license it, we can talk. What you're seeing is that the red values are the ones 00:45:17.100 --> 00:45:23.060 that we found a point for from the workers and we submitted. And when we submitted 00:45:23.060 --> 00:45:28.990 that it resulted not to be the right limb and the green ones are the ones that 00:45:28.990 --> 00:45:35.520 instead broke, so they're the right limb. Remember that here the target is a JWT 00:45:35.520 --> 00:45:42.860 receiving application. And then you can see the key slowly flipping from the 00:45:42.860 --> 00:45:51.650 bottom and it's exactly like Hollywood, I love that. So you can see the limbs 00:45:51.650 --> 00:45:57.080 filling up as we find them and that approximately there's 30 bits, so 2 to the 00:45:57.080 --> 00:46:07.000 30 rounds, so 30 candidates work for each round for each limb that we find. It 00:46:07.000 --> 00:46:16.930 obviously depends on luck and yes the the thing will probably keep running for a 00:46:16.930 --> 00:46:23.790 little while, but this is already at limb nine, it has to get to 52 and you don't 00:46:23.790 --> 00:46:34.510 have that patience. So this was the attack the code will be open-source soon. Leave 00:46:34.510 --> 00:46:40.120 the limbs you lost, they belong to us now. And: any questions? 00:46:40.120 --> 00:46:56.330 applause Herald: Valsorda, thank you very much for 00:46:56.330 --> 00:47:02.420 a lovely talk and the kangaroos. So, we have a question from the signal angel, go 00:47:02.420 --> 00:47:05.350 ahead, please. Signal Angel: Actually the internet wants 00:47:05.350 --> 00:47:10.060 to know: Did you compare this bug to implementation in other libraries. 00:47:10.060 --> 00:47:14.560 Filippo: So I guess that means, if I looked for similar bugs in other 00:47:14.560 --> 00:47:21.290 implementations. We did not, because each implementation is a bit different. Hanno 00:47:21.290 --> 00:47:27.520 works on a lot of fuzzing of bigint implementations and at one point he asked 00:47:27.520 --> 00:47:33.300 me, like on Twitter just today, if I tried fuzzing the Go implementation for example. 00:47:33.300 --> 00:47:41.190 And sadly this is constant time code that is specific to P256, so the answer is, 00:47:41.190 --> 00:47:45.780 there's a lot of them and the bug can be small and anywhere. It's not like you will 00:47:45.780 --> 00:47:52.590 be looking for another bug in P256 subtraction, it can be anywhere in the 00:47:52.590 --> 00:47:57.190 underlying math and we can turn that into the same attack. So, no, we didn't look 00:47:57.190 --> 00:48:05.431 for this specific one, but I think that four CVEs in 2017 on open SSL have 00:48:05.431 --> 00:48:11.590 descriptions that are very similar, but they're about finite field Diffie-Hellman, 00:48:11.590 --> 00:48:21.920 like normal DH. If you look for something that says about it's barely doable, 00:48:21.920 --> 00:48:27.560 because all the computation can be done offline. That's this kind of attacks on 00:48:27.560 --> 00:48:32.890 open SSL. Next? Herald: Are there any other questions from 00:48:32.890 --> 00:48:39.200 the Signal Angel? So please line up at the microphone. Microphone one please. 00:48:39.200 --> 00:48:44.550 Mic1: So why can't you determine the points algebraically? 00:48:44.550 --> 00:48:52.300 Filippo: Laziness? No, so it's entirely assembly and there's a lot of points where 00:48:52.300 --> 00:49:01.260 the value is then thrown out or it might get corrected by .. it's essentially we 00:49:01.260 --> 00:49:07.500 didn't see a clear path to this, and it's $65 on ec2, so it doesn't really change 00:49:07.500 --> 00:49:14.740 the feasibility to just fuzz them, so we just went for the fastest path to the 00:49:14.740 --> 00:49:19.000 entrance. Herald: Are there any other questions? 00:49:19.000 --> 00:49:21.830 Filippo: No one is asking about kangaroos, people, I mean.. 00:49:21.830 --> 00:49:26.090 Herald: Yes, ask about kangaroos, lovely. have you been to Australia? 00:49:26.090 --> 00:49:28.530 Filippo: I haven't. Herald: Okay, you have to. 00:49:28.530 --> 00:49:31.230 Filippo: Yep, definitely. Herald: I think, there aren't any other 00:49:31.230 --> 00:49:36.560 questions. So Filippo Valsorda, big round of applause for you. Thank you! 00:49:36.560 --> 00:49:40.398 applause 00:49:40.398 --> 00:49:45.509 34c3 outro 00:49:45.509 --> 00:50:02.000 subtitles created by c3subtitles.de in the year 2019. Join, and help us!