1 00:00:00,000 --> 00:00:03,574 In the last segment, we saw that the ElGamal public key encryption system is 2 00:00:03,574 --> 00:00:07,525 chosen cipher text secure under a somewhat strange assumption. In this segment, we're 3 00:00:07,525 --> 00:00:11,146 gonna look at variants of ElGamal that have a much better chosen cipher text 4 00:00:11,146 --> 00:00:14,909 security analysis. Now, I should say that over the past decade, there's been a ton 5 00:00:14,909 --> 00:00:18,766 of research on constructing, public key encryptions that are chosen cipher text 6 00:00:18,766 --> 00:00:22,387 secure. I actually debated for some time on how to best present this here. And 7 00:00:22,387 --> 00:00:26,197 finally, I decided to kind of show you a survey of the main results from the last 8 00:00:26,197 --> 00:00:29,960 decades, specifically as they apply to the ElGamal system. And then, at the end of 9 00:00:29,960 --> 00:00:33,770 the module, I suggest a number of papers that you can look at for further reading. 10 00:00:33,770 --> 00:00:38,332 So let me start by reminding you how the ElGamal encryption system works. I'm sure 11 00:00:38,332 --> 00:00:42,783 by now you all remember how ElGamal works, but just to be safe, let me remind you 12 00:00:42,783 --> 00:00:47,623 that key generation in ElGamal picks a random generator, a random exponent from ZN 13 00:00:47,623 --> 00:00:51,963 and then the public key is simply the generator and this element g to the a, 14 00:00:51,963 --> 00:00:56,332 whereas the secret key is simply the discrete log of h base g. This value A. 15 00:00:56,332 --> 00:01:01,255 Now the way we encrypt is we pick a random exponent B from ZN. We compute the hash of 16 00:01:01,255 --> 00:01:05,947 G to the B and H to the B. And I'm gonna remind you that H to the B is the Diffie 17 00:01:05,947 --> 00:01:10,233 Hellman secret, G to the AB. And then we actually encrypt a message using a 18 00:01:10,233 --> 00:01:15,156 symmetric encryption system with the key K that's derived from the hash function. And 19 00:01:15,156 --> 00:01:19,731 finally, the output cipher text is G to the B, and the symmetric cipher text. The 20 00:01:19,731 --> 00:01:24,654 way we decrypt is you know, as we've seen before basically, by hashing U and the 21 00:01:24,654 --> 00:01:28,940 Diffie Hellman secrets, decrypting the symmetric system, and outputting the 22 00:01:28,940 --> 00:01:33,645 message M. Now in the last segment we said that ElGamal is chosen ciphertext secure under 23 00:01:33,645 --> 00:01:37,881 this funny Interactive Diffie-Hellman assumption. I am not gonna remind you what 24 00:01:37,881 --> 00:01:42,447 the assumption is here but I'll also say that this theorem kind of raises two very 25 00:01:42,447 --> 00:01:46,683 natural questions. The first question is can we prove CCA security of 26 00:01:46,683 --> 00:01:50,864 ElGamal just based on the standard Computational Diffie-Hellman assumption, 27 00:01:50,864 --> 00:01:55,011 namely the G to the A, G to the B, you can't compute G to the AB. Can we prove 28 00:01:55,011 --> 00:01:59,259 chosen-ciphertext security just based on that? And the truth's that there are two 29 00:01:59,259 --> 00:02:03,454 ways to do it. The first option is to use a group where the computational Diffie 30 00:02:03,454 --> 00:02:07,306 Hellman assumption is hard. But is provably equivalent to the Interactive 31 00:02:07,306 --> 00:02:11,227 Diffie Hellman assumption. And it turns out it's actually not that hard to 32 00:02:11,227 --> 00:02:15,147 construct such groups. These groups are called bilinear groups. And in such 33 00:02:15,147 --> 00:02:19,544 groups, we know that the ElGamal system is chosen cipher text secure, strictly based 34 00:02:19,544 --> 00:02:23,782 under the Computational Diffie Hellman assumption, at least in the random oracle 35 00:02:23,782 --> 00:02:28,983 model. I'll tell you that these bi-linear groups are actually constructed from very 36 00:02:28,983 --> 00:02:33,715 special elliptic curves. So there's a bit more algebraic structure that enables us 37 00:02:33,715 --> 00:02:38,404 to prove this equivalence of IDH and CDH. But in general, who knows, maybe you don't 38 00:02:38,404 --> 00:02:42,928 want to use elliptic curve groups, maybe you want to use ZP star for some reason. 39 00:02:42,928 --> 00:02:47,817 So it's a pretty natural question to ask. Can we change the ElGamal system such that 40 00:02:47,817 --> 00:02:52,828 chosen ciphertext security of ElGamal now can be proven, directly based on CDH, for any group 41 00:02:52,828 --> 00:02:57,840 where CDH is hard? [Now with that ??] assuming any additional structure about the group, 42 00:02:57,840 --> 00:03:02,132 And it turns out the answer is yes. And there's kind of an elegant construction 43 00:03:02,132 --> 00:03:06,696 called twin ElGamal, so let me show you how twin ElGamal works. It's a very simple 44 00:03:06,696 --> 00:03:10,607 variation of ElGamal that does the following. Again, in key generation, we 45 00:03:10,607 --> 00:03:14,954 choose a random generator. But this time, we're gonna choose two exponents, A1 and 46 00:03:14,954 --> 00:03:19,409 A2 as the secret key. So the secret key is gonna consist of these two exponents, A1 47 00:03:19,409 --> 00:03:23,783 and A2. You know the public key of course is gonna consist of our generator. And 48 00:03:23,783 --> 00:03:28,340 then we're gonna release G to the A1 and G to the A2. So remember that in regular 49 00:03:28,340 --> 00:03:32,840 ElGamal what the public key is simply g to the A and that's it. Here we have two 50 00:03:32,840 --> 00:03:37,340 exponents A1 and A2 and therefore we, we release both G to the A1 and G to the A2. 51 00:03:37,340 --> 00:03:42,297 Now the way we encrypt, you'll notice the public key here is one element longer than 52 00:03:42,297 --> 00:03:47,137 regular ElGamal. The way we encrypt using this public key is actually very similar 53 00:03:47,137 --> 00:03:51,859 to regular ElGamal. We choose a random B, and now what we'll hash is actually not 54 00:03:51,859 --> 00:03:56,522 just G to the B and H1 to the B, but, in fact, G to the B, H1 to the B, and H2 55 00:03:56,522 --> 00:04:01,691 to the B. Okay so we basically hashing three elements instead of two elements and 56 00:04:01,691 --> 00:04:06,642 then we basically encrypt the message using the derived symmetric encryption key 57 00:04:06,642 --> 00:04:11,410 and as usual we output g to the b and c as our final ciphertext. How do we decrypt? 58 00:04:11,410 --> 00:04:16,027 Well, so now the secret key consists of these two exponents, A1 and A2. And the 59 00:04:16,027 --> 00:04:20,584 cipher text consists of U, and the symmetric cipher text C. So let me ask you 60 00:04:20,584 --> 00:04:25,501 a puzzle, and see if you can figure out how to derive the symmetric encryption key 61 00:04:25,501 --> 00:04:31,934 K, just given the secret key and the value U. So it's kind of a cute puzzle and I 62 00:04:31,934 --> 00:04:37,250 hope everybody worked out, the solution which is basically what we can do is we 63 00:04:37,250 --> 00:04:42,566 can take U to the power of A1, and that is basically G to the B A1 And U to the A2 64 00:04:42,566 --> 00:04:48,012 which is G to the B A2. And that basically gives us exactly the same values, as H1 to 65 00:04:48,012 --> 00:04:53,069 the B and H2 to the B. So this way the decryptor arrives at the same symmetric 66 00:04:53,069 --> 00:04:58,515 key that the encryptor did. Then he decrypts the ciphertext using the symmetric system 67 00:04:58,515 --> 00:05:03,389 and finally outputs the message M. So you notice this is a very simple modification 68 00:05:03,389 --> 00:05:07,802 to regular ElGamal. All we do is we stick one more element to the public key. When 69 00:05:07,802 --> 00:05:11,888 we do the hashing, we hash one more element, as opposed to just two elements. 70 00:05:11,888 --> 00:05:16,246 We hash three elements. And similarly, we do doing decryption, and nothing else 71 00:05:16,246 --> 00:05:20,491 changes. The cipher text is the same length as before, and that's it, Now the 72 00:05:20,491 --> 00:05:24,647 amazing thing is that this simple modification allows us to now prove chosen 73 00:05:24,647 --> 00:05:28,640 cipher text security directly based on standard Computational Diffie-Hellman 74 00:05:28,640 --> 00:05:32,795 assumption, okay? Still we're going to need to assume that we have a symmetric 75 00:05:32,795 --> 00:05:36,897 encryption system that provides us authenticated encryption and that the hash 76 00:05:36,897 --> 00:05:41,430 function that we're using is an ideal hash function in what we call a random oracle 77 00:05:41,430 --> 00:05:45,747 But nevertheless given that, we can prove chosen cipher text security strictly 78 00:05:45,747 --> 00:05:49,803 based on a Computational Diffie-Hellman assumption. So now what's the cost of this? 79 00:05:49,803 --> 00:05:53,966 Of course, if you think about this, during encryption and decryption, encryption has 80 00:05:53,966 --> 00:05:57,418 to do one more exponentiation, and the decryption has to do one more 81 00:05:57,418 --> 00:06:01,581 exponentiation. So the encryptor now does three exponentiations instead of two, and 82 00:06:01,581 --> 00:06:05,490 the decryptor does two exponentiations instead of one. So the question is now, 83 00:06:05,490 --> 00:06:09,965 now it's a philosophical question. Is this extra effort worth it? So you do more work 84 00:06:09,965 --> 00:06:14,228 during encryption and decryption. And your public key is a little bit bigger, but 85 00:06:14,228 --> 00:06:18,331 that doesn't really matter. The work during encryption and decryption is more 86 00:06:18,331 --> 00:06:22,434 significant. And as a result you get chosen ciphertext security based on kind 87 00:06:22,434 --> 00:06:26,644 of a more natural assumption, namely Computational Diffie-Hellman as opposed to 88 00:06:26,644 --> 00:06:30,480 these odd-looking Interactive Diffie-Hellman assumption. But is it worth 89 00:06:30,480 --> 00:06:34,743 it? Kind of the question is are there groups where CDH holds but IDH does not 90 00:06:34,743 --> 00:06:38,815 hold? If there were such groups, then it would definitely be worth it, because 91 00:06:38,815 --> 00:06:43,050 those groups, the twin ElGamal would be secure, but the regular ElGamal would not 92 00:06:43,050 --> 00:06:47,508 be CCA secure. But unfortunately we don't know if there is any such group and in 93 00:06:47,508 --> 00:06:51,383 fact as far as we know, it's certainly possible that any group where CDH holds, 94 00:06:51,383 --> 00:06:55,307 IDH also holds. So the answer is, really, we don't know whether the extra cost is 95 00:06:55,307 --> 00:06:59,530 worth it or not but then nevertheless it's a cute result that shows that if you want 96 00:06:59,530 --> 00:07:03,256 to achieve chosen ciphertext security directly from CDH, you could do 97 00:07:03,256 --> 00:07:08,051 it without too many changes to the ElGamal system. The next question is whether we 98 00:07:08,051 --> 00:07:12,230 can get rid of this assumption that the hash function is an ideal hash function 99 00:07:12,230 --> 00:07:16,617 mainly that it's a random oracle. And this is actually a huge topic. There's a lot of 100 00:07:16,617 --> 00:07:20,482 work in this area on how to build efficient public key encryption systems 101 00:07:20,482 --> 00:07:24,922 that are chosen ciphertext secure without random oracles. This is a very active area 102 00:07:24,922 --> 00:07:29,192 of research as I said in the past decade and even longer. And I'll mention that as 103 00:07:29,192 --> 00:07:33,379 it applies to ElGamal, they are basically, again two families of constructions. The 104 00:07:33,379 --> 00:07:37,515 first construction's a construction that uses these special groups called these 105 00:07:37,515 --> 00:07:41,599 bilinear groups that I just mentioned before. It turns out the extra structure 106 00:07:41,599 --> 00:07:45,682 of these bilinear groups allows us to build very efficient chosen ciphertext 107 00:07:45,682 --> 00:07:50,128 secure systems without referring to random oracles and as I said at the end of the 108 00:07:50,128 --> 00:07:54,367 module, I point to a number of papers that explain how to do that. This is quite an 109 00:07:54,367 --> 00:07:58,876 interesting construction. But it will take me several hours to kind of explain how it 110 00:07:58,876 --> 00:08:03,434 works. The other alternative is actually to use groups where a stronger assumption 111 00:08:03,434 --> 00:08:07,830 called the Decision Diffie-Hellman assumption holds. Again, I am not gonna define this 112 00:08:07,830 --> 00:08:11,798 assumption here. I'll just tell you that this assumption actually holds in 113 00:08:11,798 --> 00:08:16,141 subgroups of ZP star, in particular if we look at the prime order of a subgroup of 114 00:08:16,141 --> 00:08:19,812 ZP star. The Decision Diffie-Hellman assumption seems to hold in those groups 115 00:08:19,812 --> 00:08:23,852 and then in those groups we can actually build a variant of ElGamal, called the 116 00:08:23,852 --> 00:08:27,141 Cramer Shoup system that is chosen ciphertext secure and the Decision 117 00:08:27,141 --> 00:08:30,665 Diffie-Hellman assumption without random oracles. Again, this is a beautiful, 118 00:08:30,665 --> 00:08:34,659 beautiful result but again it would take a couple of hours to explain and so I'm not 119 00:08:34,659 --> 00:08:38,324 gonna do that here. This is probably one of these things that I gonna leave to 120 00:08:38,324 --> 00:08:42,083 either the advanced topics or to do an advanced course so that we'll do it at a 121 00:08:42,083 --> 00:08:46,335 later time. But again I points to a paper at the end of the module just in case you 122 00:08:46,335 --> 00:08:50,626 want to read more about this. So here is a list of papers that provides more reading 123 00:08:50,626 --> 00:08:54,208 material. So if you wanna read about Diffie Hellman assumptions, I guess I 124 00:08:54,208 --> 00:08:58,036 wrote a paper about this a long time ago, that talks about various assumptions 125 00:08:58,036 --> 00:09:01,716 related to, to Diffie Hellman. And in particular, studies the Decision Diffie 126 00:09:01,716 --> 00:09:05,685 Hellman assumption. And if you wanna learn about how to build chosen ciphertext 127 00:09:05,685 --> 00:09:10,098 secure system from Decision Diffie-Hellman and assumptions like it. There's a very 128 00:09:10,098 --> 00:09:14,563 cute paper by Kramer and Shoup back from 2002 that shows how to do it. This is 129 00:09:14,563 --> 00:09:18,618 again a very highly recommended paper. If you want to learn how to build chosen 130 00:09:18,618 --> 00:09:22,672 ciphertext security from these bilinear groups, then the paper to read is 131 00:09:22,672 --> 00:09:26,983 the one, cited here, which actually uses a general mechanism called Identity Based 132 00:09:26,983 --> 00:09:30,884 Encryption which very surprisingly it turns out to actually gives us chosen 133 00:09:30,884 --> 00:09:34,638 ciphertext security almost for free. So, once we build identity-based 134 00:09:34,638 --> 00:09:38,486 encryption chosen ciphertext security falls immediately. And that's covered in this 135 00:09:38,486 --> 00:09:42,283 paper that I, that I describe her. The Twin Diffie-Hellman construction and its 136 00:09:42,283 --> 00:09:46,381 proof is described in this paper that I reference here And finally, if you kind of 137 00:09:46,381 --> 00:09:50,135 want to see a very recent paper. That gives a very general framework for 138 00:09:50,135 --> 00:09:54,345 building, chosen ciphertext secure systems, using what's called extractable hash proofs that 139 00:09:54,345 --> 00:09:58,506 is again a nice paper by Hoeteck Wee, that was just recently published. I definitely 140 00:09:58,506 --> 00:10:02,416 recommend reading it all, all have very, very elegant ideas in them. I wish I 141 00:10:02,416 --> 00:10:06,426 could cover all of them in the basic course but I'm gonna have to leave some of 142 00:10:06,426 --> 00:10:10,436 these to the more advanced course or perhaps the more advanced topics at the 143 00:10:10,436 --> 00:10:14,446 end of this course. Okay, so I'll stop here and in the next segment I'm gonna tie 144 00:10:14,446 --> 00:10:18,506 ElGamal and RSA encryption together so that you see how the two kind 145 00:10:18,506 --> 00:10:20,512 of follow from a more general principle.