0:00:00.000,0:00:04.090 In the previous segment we saw how to[br]build public key encryption from trapdoor 0:00:04.090,0:00:08.390 functions, in this segment we're going to[br]build a classic trapdoor function called 0:00:08.390,0:00:13.295 RSA. But first let's quickly review what a[br]trapdoor function is. So recall that a 0:00:13.295,0:00:17.283 trapdoor function is made up of three[br]algorithms. There is a key generation 0:00:17.283,0:00:21.056 algorithm, the function itself, and the[br]inverse of the function. The key 0:00:21.056,0:00:25.313 generation algorithm outputs a public key[br]and a secret key. And in this case, in 0:00:25.313,0:00:29.786 this lecture the public key is going to[br]define a function that maps the set X onto 0:00:29.786,0:00:33.882 itself. Which is why I call these things[br]trapdoor permutations, as opposed to 0:00:33.882,0:00:37.978 trapdoor functions, simply because the[br]function maps X onto itself, whereas a 0:00:37.978,0:00:43.356 trapdoor function might map the set X to[br]some arbitrary set Y. Now given the public 0:00:43.356,0:00:47.819 key, the function, as we say, basically[br]defines this function from the set X to 0:00:47.819,0:00:52.914 the set X. And then given the secret key,[br]we can invert this function. So this 0:00:52.914,0:00:57.401 function F evaluates the function in the[br]forward direction, and this function F 0:00:57.401,0:01:02.059 inverse, which means the secret key SK,[br]computes F in the reverse direction. Now 0:01:02.059,0:01:06.489 we say that the trapdoor permutation is[br]secure if the function defined by the 0:01:06.489,0:01:11.033 public key is in fact a one-way function,[br]which means that it's easy to evaluate, 0:01:11.033,0:01:15.404 but without the trapdoor, the secret[br]trapdoor, it's difficult to invert. Before 0:01:15.404,0:01:20.076 we look at our first example of a trapdoor[br]permutation, I want to do a quick review 0:01:20.076,0:01:24.467 of some necessary arithmetic facts that[br]we're gonna need. And in particular, 0:01:24.467,0:01:28.632 let's look at some arithmetic facts[br]modulo composites. So here we have our 0:01:28.632,0:01:33.192 modulus N, which happens to be a product[br]of two primes, and you should be thinking 0:01:33.192,0:01:37.864 of P and Q as roughly equal size numbers,[br]since particular P and Q are both roughly 0:01:37.864,0:01:42.407 on the size of square root of N. So both[br]are roughly the same size. Recall that we 0:01:42.407,0:01:46.834 denoted by ZN the set of integers from[br]zero to N minus one, and we said that we 0:01:46.834,0:01:51.318 can do addition and multiplication module N. We denoted by ZN star the set of 0:01:51.318,0:01:55.801 invertible elements in ZN. So these are[br]all the elements, which have a modular 0:01:55.801,0:02:00.925 inverse. And we said that actually X is[br]invertible if and only if it's relatively 0:02:00.925,0:02:05.928 prime to N. Moreover, I also told you that[br]the number of invertible elements in ZN 0:02:05.928,0:02:11.147 star is denoted by this function phi(N). So phi(N)[br]is the number of invertible elements in ZN 0:02:11.147,0:02:15.814 star, And I told you that when N is a[br]product of two distinct primes, then in 0:02:15.814,0:02:20.788 fact phi(N) is equal to (P - 1) times (Q - 1) and if you extend that out, you 0:02:20.788,0:02:26.007 see that this is really equal to (N - P - Q + 1). Now remember that I said 0:02:26.007,0:02:30.858 that P and Q are on the order of square[br]root of N which means that P + Q is 0:02:30.858,0:02:35.675 also on the order of square root of N.[br]Which means that really phi(N) is on the 0:02:35.675,0:02:41.050 order of N minus two square root of N. So,[br]in other words, it's very, very close to 0:02:41.050,0:02:45.158 N. Basically, subtracting the square root[br]of N from a number, this is from, N is 0:02:45.158,0:02:49.425 going to be a huge number in our case,[br]like 600 digits. And so subtracting from a 0:02:49.425,0:02:53.533 600 digit number, the square root of that[br]600 digit number, namely a 300 digit 0:02:53.533,0:02:57.534 number, hardly affects the size of the[br]number. Which means that phi(N) is really, 0:02:57.534,0:03:01.908 really, really close to N, and I want you[br]to remember that, because we will actually 0:03:01.908,0:03:06.122 be using this now and again. And so the[br]fact that phi(N) is really close to N, means 0:03:06.122,0:03:11.094 that if we choose a random element modulo[br]N It's very, very, very likely to be in ZN 0:03:11.094,0:03:15.633 star. So it's very unlikely that by[br]choosing a random element in ZN, we will 0:03:15.633,0:03:20.085 end up choosing an element that is not[br]invertable. Okay. So just remember that, 0:03:20.085,0:03:25.479 that in fact almost all elements in ZN are[br]actually invertible. And the last thing 0:03:25.479,0:03:30.939 that we'll need is Euler's theorem, which[br]we covered last week, which basically says 0:03:30.939,0:03:36.332 that for any element X in ZN star, if I[br]raise X to the power of phi(N), I get one, in 0:03:36.332,0:03:42.527 ZN. So in other words, I get 1 modulo N. I'll say it one more time because this 0:03:42.527,0:03:47.466 is gonna be critical to what's coming.[br]Again X to the phi(N) is equal to 1 modulo 0:03:47.466,0:03:51.997 N. So now that we have the necessary[br]background we can actually describe the 0:03:51.997,0:03:55.927 RSA trapdoor permutation. This is a classic,[br]classic, classic construction in 0:03:55.927,0:04:00.811 cryptography that was first published in[br]Scientific American back in 1977, this is 0:04:00.811,0:04:05.576 a very well known article in cryptography.[br]And ever since then, this was 35 years 0:04:05.576,0:04:10.340 ago, the RSA trapdoor permutation has been used[br]extensively in cryptographic applications. 0:04:10.340,0:04:15.110 For example, SSL and TLS use RSA both for[br]certificates and for key exchange. There 0:04:15.110,0:04:19.452 are many secure email systems and secure[br]file systems that use RSA to encrypt 0:04:19.452,0:04:23.515 emails and encrypt files in the file[br]system. And there are many, many, many 0:04:23.515,0:04:27.690 other applications of this system. So this[br]is a very, very classic, crypto 0:04:27.690,0:04:32.541 construction, and I'll show you how it[br]works. I should say that RSA is named for 0:04:32.541,0:04:37.150 its inventors, Ron Rivest, Adi Shamir and[br]Len Adleman, who were all at MIT at the 0:04:37.150,0:04:41.758 time they made this important discovery.[br]So now we're ready to describe the RSA 0:04:41.758,0:04:46.425 trapdoor permutation. To do that, I have[br]to describe the key generation algorithm, 0:04:46.425,0:04:50.159 the function ƒ and the function ƒ−1.[br]So let's see. So the way the key 0:04:50.159,0:04:54.826 generation algorithm works is as follows:[br]What we do is we generate two primes, P 0:04:54.826,0:04:59.143 and Q, each would be, say on the order of[br]1000 bits, so, you know, roughly 300 0:04:59.143,0:05:03.751 digits, and then the RSA modulus is simply[br]going to be the product of those two 0:05:03.751,0:05:08.801 primes. The next thing we do is we pick[br]two exponents, e and d, such that e times 0:05:08.801,0:05:13.894 d is 1 modulo phi(N). What this means is[br]that e and d first of all are relatively 0:05:13.894,0:05:19.051 prime to phi(N) and second of all they're[br]actually modular inverses of one another, 0:05:19.051,0:05:24.014 modulo phi(N). And then we output the public[br]key as the pair N,e and the 0:05:24.014,0:05:29.172 secret key is the secret key N,d. I should mention that the exponent e, 0:05:29.172,0:05:34.586 that the number e is sometimes called the[br]encryption exponent and the exponent d is 0:05:34.586,0:05:39.135 sometimes called the decryption exponent.[br]And you'll see why I keep referring to 0:05:39.135,0:05:43.189 these as exponents in just a second. Now[br]the way the RSA function itself is 0:05:43.189,0:05:46.902 defined is really simple. For simplicity[br]I'm gonna define it as the function 0:05:46.902,0:05:51.801 from ZN star to ZN star. And the way[br]the function is defined is simply given an 0:05:51.801,0:05:57.001 input X, all we do is we simply take X and[br]raise it to the power of e in ZN. So we 0:05:57.001,0:06:02.137 just compute X to the e, modulo N. That's[br]it. And then to decrypt, what we do is we 0:06:02.137,0:06:07.451 simply, given an input Y, we simply raise[br]Y to the power of d, modulo N. And that's 0:06:07.451,0:06:12.483 it. So now you can see why as I refer to e[br]and d as exponents. They're the things 0:06:12.483,0:06:17.767 that X and Y are being raised to. So let's[br]quickly verify that F inverse really does 0:06:17.767,0:06:22.673 invert the function F. So let's see what[br]happens when we compute Y to the d. So 0:06:22.673,0:06:27.515 suppose Y itself happens to be the RSA[br]function of some value X. In which case, 0:06:27.515,0:06:33.045 what Y to the d is, is really RSA of X to[br]the power of d. While X by itself is 0:06:33.045,0:06:39.006 simply gonna be X to the e modulo N, And[br]therefore, Y to the d is simply X to the e 0:06:39.006,0:06:44.896 times d modulo N. Again, just using rules[br]of exponentiation, the exponents multiply, 0:06:44.896,0:06:50.857 so we get X to the e times d. But what do[br]we know about e times d? e times d we said 0:06:50.857,0:06:57.390 is one modulo phi(N). And what that means is[br]there exists some integer such that e 0:06:57.390,0:07:04.177 times d is equal to K times phi(N) plus one.[br]This is what it means that e times d is 0:07:04.177,0:07:09.820 one modulo phi(N). So, we can simply replace e[br]times d by K times phi(N)+1. So, that's 0:07:09.820,0:07:14.453 what I wrote here. So, we have X to the[br]power of K times phi(N)+1. But now again 0:07:14.453,0:07:19.868 just using rules of exponentiation, we can[br]re-write this as X to the power of phi(N) to 0:07:19.868,0:07:24.827 the power of K times X. This is really the[br]same thing as K times phi(N)+1 as the 0:07:24.827,0:07:29.917 exponent. I just kind of separated the[br]exponent into it's different components. 0:07:29.917,0:07:35.137 Now, X to the phi(N) by Euler's theorem, we know[br]that X to the phi(N) is equal to one. So what 0:07:35.137,0:07:41.394 is this whole product equal to? Well since[br]X to the phi(N) is equal to one, one to 0:07:41.394,0:07:45.961 the K is also equal to one, so this whole[br]thing over here is simply equal to one. 0:07:45.961,0:07:50.757 And what we're left with is X. So what we[br]just proved is that if I take the output of 0:07:50.757,0:07:55.210 the RSA function and raise it to the power[br]of 'd', I get back X. Which means that 0:07:55.210,0:07:59.663 raising to the power of 'd' basically[br]inverts the RSA function, which is what we 0:07:59.663,0:08:04.638 wanted to show. So that's it, that's the[br]whole description of the function, we've 0:08:04.638,0:08:08.972 described the key generation. The function[br]itself, which is simply raising things to 0:08:08.972,0:08:13.410 the power of e modulo N, and the inversion[br]function which is simply raising things to 0:08:13.410,0:08:17.483 the power of d, also modulo N. The next[br]question is, why is this function secure? 0:08:17.483,0:08:21.609 In other words, why is this function one[br]way if all I have is just a public key, 0:08:21.609,0:08:26.409 but I don't have the secret key? And so to[br]argue that this function is one way we 0:08:26.409,0:08:31.454 basically state the RSA assumption which[br]basically says that the RSA function is a 0:08:31.454,0:08:36.626 one way permutation. And formally the way[br]we state that is that, basically for all 0:08:36.626,0:08:41.416 efficient algorithms, A, it so happens[br]that if I generate two primes p and q, 0:08:41.416,0:08:46.397 random primes p and q. I multiply them to[br]get to modulus N and then I choose a 0:08:46.397,0:08:51.103 random 'y' in ZN star. And now I give[br]the modulus, the exponent and this 'y' to 0:08:51.103,0:08:55.893 algorithm A, the probability that I'll get[br]the inverse of RSA at the point Y, mainly 0:08:55.893,0:09:00.336 I'll get Y to the power of one over E.[br]That's really what the inverse is. This 0:09:00.336,0:09:04.606 probability is negligible. So this[br]assumption is called the RSA assumption. 0:09:04.606,0:09:09.338 It basically states that RSA is a one way[br]permutation just given the public [key?]. And 0:09:09.338,0:09:13.954 therefore, it is a trapdoor permutation[br]because it has a trapdoor. And makes this 0:09:13.954,0:09:19.501 easy to invert for anyone who knows the[br]trap door. So now that we have a secure 0:09:19.501,0:09:23.717 trap door permutation, we can simply plug[br]that into our construction for public key 0:09:23.717,0:09:27.826 encryption, and get our first real world[br]public key encryption system. And so what 0:09:27.826,0:09:32.362 we'll do is we'll simply plug the, the RSA[br]trapdoor permutation into the iso standard 0:09:32.362,0:09:36.151 construction that we saw in the previous[br]segment. So, if you recall, that 0:09:36.151,0:09:40.207 construction was based on a symmetric[br]encryption system that had to provide 0:09:40.207,0:09:44.438 authenticated encryption. And it was also[br]based on a hash function. Then mapped 0:09:44.615,0:09:48.866 while transferring it into the world of[br]RSA, it maps elements in 0:09:48.866,0:09:54.202 ZN, into secret keys for the[br]symmetric key system. And now the 0:09:54.202,0:09:58.947 way the encryption scheme works is really[br]easy to describe. Basically algorithm G 0:09:58.947,0:10:03.751 just runs the RSA key generation algorithm[br]and produces a public key and a secret 0:10:03.751,0:10:07.813 key. Just as before. So you notice the[br]public key contains the encryption 0:10:07.813,0:10:11.948 exponent and the, secret key contains the[br]decryption exponent. And the way we 0:10:11.948,0:10:16.298 encrypt is as follows. Well, we're going[br]to choose a random X in ZN. We're going 0:10:16.298,0:10:21.468 to apply the RSA function to this X, we're[br]going to deduce a symmetric key from this 0:10:21.468,0:10:26.453 X by hashing it, so we compute H of X to[br]obtain the key K, and then we output this 0:10:26.453,0:10:31.130 Y along with the encryption of the message[br]under the symmetric key K. And in 0:10:31.130,0:10:35.930 practice, the hash function H would be[br]just implemented just using SHA-256, and 0:10:35.930,0:10:40.977 you would use the output of SHA-256 to[br]generate a symmetric key that could then 0:10:40.977,0:10:45.687 be used for the symmetric encryption[br]assistant. So that's how we would encrypt 0:10:45.687,0:10:50.084 and then the way we would decrypt is[br]pretty much as we saw in the previous 0:10:50.084,0:10:54.951 segment, where the first thing we would do[br]is we would use the secret key to invert 0:10:54.951,0:10:59.758 the header of the ciphertext. So we would[br]compute RSA invert of Y, that would give 0:10:59.758,0:11:04.566 us the value X. Then we would apply the[br]hash function to X so that this would give 0:11:04.566,0:11:09.198 us the key K. And then we would run the[br]decryption algorithm for the symmetric 0:11:09.198,0:11:15.171 system on the ciphertext and that would[br]produce the original message m. And then 0:11:15.171,0:11:19.103 we stated a theorem in the previous[br]segment to say that if the RSA trapdoor 0:11:19.103,0:11:23.087 permutation is secure, Es and Ds, the[br]symmetric encryption scheme [inaudible] 0:11:23.087,0:11:27.175 provides authenticated encryption. And as[br]we said, H is just random oracle. It's, 0:11:27.332,0:11:31.421 you know, kind of a random function from[br]ZN to the key space. Then, in fact, this 0:11:31.421,0:11:35.720 system is chosen cipher text secure, and[br]is a good public key system to use. 0:11:36.240,0:11:41.729 So now that we have our first example of a[br]good public key system to use, I wanna 0:11:41.729,0:11:46.978 quickly warn you about how not to use RSA[br]for encryption. And this again something 0:11:46.978,0:11:51.101 that we said in the previous segment. And[br]I'm going to repeat it here, except I'm 0:11:51.101,0:11:55.534 going to make it specific to RSA. So[br]I'd like to call this, textbook RSA. 0:11:55.534,0:11:59.400 Which basically is the first thing that[br]comes to mind when you think about 0:11:59.400,0:12:03.678 encrypting using RSA, namely, the secret[br]key and the public key will be as before. 0:12:03.678,0:12:08.162 But now instead of running through, a hash[br]function to generate a symmetric key, what 0:12:08.162,0:12:12.337 we would do is we would directly use RSA[br]to encrypt the given message M. And then 0:12:12.337,0:12:16.202 we would directly use the decryption[br]exponent to decrypt the cipher text and 0:12:16.202,0:12:20.773 obtain the plain text M. I'm going to call[br]this textbook RSA, because there are many 0:12:20.773,0:12:25.350 textbooks that describe RSA encryption in[br]this way. And this is completely wrong. 0:12:25.350,0:12:29.495 This is not how RSA encryption works.[br]It's an insecure system. In particular, 0:12:29.495,0:12:33.936 it's deterministic encryption, and so it[br]can't possibly be semantically secure, but 0:12:33.936,0:12:38.542 in fact there are many attacks exist that[br]I'm going to show you an attack in just a 0:12:38.542,0:12:42.709 minute, but the message that I want to[br]make clear here, is that RSA, all it is, 0:12:42.709,0:12:47.151 is a trap door permutation. By itself[br]it's not an encryption system. You have to 0:12:47.151,0:12:51.427 embellish it with this ISO standard for[br]example, to make it into an encryption 0:12:51.427,0:12:55.826 system. By itself, all it is, is just a[br]function. So let's see what goes wrong if 0:12:55.826,0:13:00.225 you try to use textbook RSA, In other[br]words, if you try to encrypt a message 0:13:00.225,0:13:04.975 using RSA directly. And I'll give you an[br]example attack from the world of the web. 0:13:04.975,0:13:09.725 So imagine we have a web server. The web[br]server has an RSA secret key. Here's it's 0:13:09.725,0:13:14.738 denoted by N and D. And here we have a web[br]browser who's trying to establish a secure 0:13:14.738,0:13:19.124 session, a secure SSL session, with the web[br]server. So the way SSL works is that the 0:13:19.124,0:13:23.401 web browser starts off by sending this[br]client hello message saying, hey, I want 0:13:23.401,0:13:27.787 to set up a secure session with you. The[br]web server responds with a server hello 0:13:27.787,0:13:32.430 message that contains the server's public[br]key And then the web browser will go ahead 0:13:32.430,0:13:36.615 and generate a random what's called a[br]premaster secret K, it will encrypt the 0:13:36.615,0:13:40.692 premaster secret using K and send the[br]result in ciphertext over to the web 0:13:40.692,0:13:44.932 server. The web server will decrypt and[br]then the web server will also get K, so 0:13:44.932,0:13:49.336 now the two have a shared key that they[br]can use to then secure a session between 0:13:49.336,0:13:53.630 them. So I want to show you what goes[br]wrong if we directly use the RSA function 0:13:53.630,0:13:57.762 for encrypting K. In other words, if[br]directly K is encrypted as K to the e 0:13:57.762,0:14:02.828 modulo N. Just for the sake of argument[br]let's suppose that K is a 64-bit key. 0:14:02.828,0:14:08.097 We're going to treat K as an integer in[br]the range as zero to 2 to the 64. 0:14:08.097,0:14:13.100 More precisely two to the 64 minus one,[br]and now what we're going to do is the 0:14:13.100,0:14:18.302 following. First of all, suppose it so[br]happens that K factors into a product of 0:14:18.302,0:14:23.705 roughly equal sized numbers. So K we can[br]write as K1 times K2, where K1 and K2 are 0:14:23.705,0:14:29.745 integers. And both are say less than two[br]to the 34. So, it turns out this happens 0:14:29.745,0:14:34.508 with probability roughly twenty percent so[br]one in five times K can actually be 0:14:34.508,0:14:39.740 written in this way. But, now if we plug[br]this K, K=K1 x K2 if we plug that into the 0:14:39.740,0:14:45.241 equation that defines the cipher text you[br]see that we can simply substitute K by K1 x k2 0:14:45.241,0:14:50.875 and then we can move k1 to the other side.[br]So then we end up with this equation here, 0:14:50.875,0:14:55.897 namely C over K1 to the e is equal to K2 to the e. You notice if I multiply both 0:14:55.897,0:15:00.659 sides by K1 to the e, I get that C is[br]equal to K1 times K2 to the e, 0:15:00.659,0:15:06.374 which is precisely this equation here.[br]Okay, so all I did is I just replaced K by 0:15:06.374,0:15:11.955 K1 times K2 and then divided by K1 to the[br]e. So by now this should look familiar to 0:15:11.955,0:15:16.146 you. So now we have two variables in this[br]equation, of course. C is known to the 0:15:16.146,0:15:20.092 attacker, E is known to the attacker, and[br]N is known to the attacker. The two 0:15:20.092,0:15:24.518 variables in this equation are K1 and[br]K2, and we've separated them into a 0:15:24.518,0:15:28.891 different side of the equation, and as a[br]result we can do now a meet in the middle 0:15:28.891,0:15:33.157 attack on this equation. So let's do the[br]meet in the middle attack. What we would 0:15:33.157,0:15:37.524 do is we would build a table of all[br]possible values Of the left-hand side. So 0:15:37.524,0:15:43.392 all possible values of C over K1 to the E,[br]there are 2 to the 34 of them. And then, 0:15:43.584,0:15:48.878 for all possible values on the right-hand[br]side, [inaudible] for all possible values 0:15:48.878,0:15:54.175 of K2 to the e, we're gonna check whether[br]this value here lives in the table that we 0:15:54.175,0:15:58.749 constructed in step one. And if it is then[br]we found a collision, and basically we 0:15:58.749,0:16:03.265 have a solution to this equation. So as[br]soon as we find an element of the form K2 0:16:03.265,0:16:07.962 to the E that lives in the table that[br]we built in step one, we've solved this 0:16:07.962,0:16:12.717 equation and we found K1 and K2. And[br]of course once we found K1 and K2, 0:16:12.717,0:16:16.950 we can easily recover K simply by[br]multiplying them. So then we multiply K1 0:16:16.950,0:16:21.428 and K2 and we get, the secret key[br]K. Okay? So, we've broken, basically, 0:16:21.428,0:16:25.604 this, this encryption system. And how long[br]did it take? Well, by brute force, we 0:16:25.604,0:16:29.890 could have broken it in time 2 to the 64,[br]simply by trying all possible keys. But 0:16:29.890,0:16:33.792 this attack, you notice, it took 2 to[br]the 34 time for step number one. Well, it 0:16:33.792,0:16:38.353 took a little bit more than 2 to the 34,[br]'cause we had to do these exponentiations. 0:16:38.518,0:16:42.969 It took 2 to the 34 time for step two[br]against slightly more than two to the 34 0:16:42.969,0:16:47.530 because of the exponentiations. So let's[br]say that the whole algorithm took time two 0:16:47.530,0:16:52.277 to the 40. The point is that this is much,[br]much less time due to the 64. So here you 0:16:52.277,0:16:56.667 have an example. Where if you encrypt[br]using RSA directly, in other words you 0:16:56.667,0:17:01.296 directly compute, K to the E, mod N,[br]instead of going through the ISO standard, 0:17:01.296,0:17:05.983 for example, then you get an attack that[br]runs much faster than exhaustive search. 0:17:05.983,0:17:10.730 You get an attack that runs in time two to[br]the 40, rather than time two to the 64. 0:17:10.730,0:17:14.985 Okay, so this is a cute example of how[br]things can break if you use the RSA 0:17:14.985,0:17:19.299 trapdoor permutation directly to[br]encrypt a message. So the message to 0:17:19.299,0:17:23.670 remember here, is never, ever, ever use[br]RSA directly to encrypt. You have to use to go 0:17:23.670,0:17:27.868 through an encryption mechanism. For[br]example, like the ISO standard. And in 0:17:27.868,0:17:32.354 fact, in the next segment we are going to[br]see other ways of using RSA to build 0:17:32.354,0:17:33.620 public key encryption.