WEBVTT 00:00:00.000 --> 00:00:03.574 In the last segment, we saw that the ElGamal public key encryption system is 00:00:03.574 --> 00:00:07.525 chosen cipher text secure under a somewhat strange assumption. In this segment, we're 00:00:07.525 --> 00:00:11.146 gonna look at variants of ElGamal that have a much better chosen cipher text 00:00:11.146 --> 00:00:14.909 security analysis. Now, I should say that over the past decade, there's been a ton 00:00:14.909 --> 00:00:18.766 of research on constructing, public key encryptions that are chosen cipher text 00:00:18.766 --> 00:00:22.387 secure. I actually debated for some time on how to best present this here. And 00:00:22.387 --> 00:00:26.197 finally, I decided to kind of show you a survey of the main results from the last 00:00:26.197 --> 00:00:29.960 decades, specifically as they apply to the ElGamal system. And then, at the end of 00:00:29.960 --> 00:00:33.770 the module, I suggest a number of papers that you can look at for further reading. 00:00:33.770 --> 00:00:38.332 So let me start by reminding you how the ElGamal encryption system works. I'm sure 00:00:38.332 --> 00:00:42.783 by now you all remember how ElGamal works, but just to be safe, let me remind you 00:00:42.783 --> 00:00:47.623 that key generation in ElGamal picks a random generator, a random exponent from ZN 00:00:47.623 --> 00:00:51.963 and then the public key is simply the generator and this element g to the a, 00:00:51.963 --> 00:00:56.332 whereas the secret key is simply the discrete log of h base g. This value A. 00:00:56.332 --> 00:01:01.255 Now the way we encrypt is we pick a random exponent B from ZN. We compute the hash of 00:01:01.255 --> 00:01:05.947 G to the B and H to the B. And I'm gonna remind you that H to the B is the Diffie 00:01:05.947 --> 00:01:10.233 Hellman secret, G to the AB. And then we actually encrypt a message using a 00:01:10.233 --> 00:01:15.156 symmetric encryption system with the key K that's derived from the hash function. And 00:01:15.156 --> 00:01:19.731 finally, the output cipher text is G to the B, and the symmetric cipher text. The 00:01:19.731 --> 00:01:24.654 way we decrypt is you know, as we've seen before basically, by hashing U and the 00:01:24.654 --> 00:01:28.940 Diffie Hellman secrets, decrypting the symmetric system, and outputting the 00:01:28.940 --> 00:01:33.645 message M. Now in the last segment we said that ElGamal is chosen ciphertext secure under 00:01:33.645 --> 00:01:37.881 this funny Interactive Diffie-Hellman assumption. I am not gonna remind you what 00:01:37.881 --> 00:01:42.447 the assumption is here but I'll also say that this theorem kind of raises two very 00:01:42.447 --> 00:01:46.683 natural questions. The first question is can we prove CCA security of 00:01:46.683 --> 00:01:50.864 ElGamal just based on the standard Computational Diffie-Hellman assumption, 00:01:50.864 --> 00:01:55.011 namely the G to the A, G to the B, you can't compute G to the AB. Can we prove 00:01:55.011 --> 00:01:59.259 chosen-ciphertext security just based on that? And the truth's that there are two 00:01:59.259 --> 00:02:03.454 ways to do it. The first option is to use a group where the computational Diffie 00:02:03.454 --> 00:02:07.306 Hellman assumption is hard. But is provably equivalent to the Interactive 00:02:07.306 --> 00:02:11.227 Diffie Hellman assumption. And it turns out it's actually not that hard to 00:02:11.227 --> 00:02:15.147 construct such groups. These groups are called bilinear groups. And in such 00:02:15.147 --> 00:02:19.544 groups, we know that the ElGamal system is chosen cipher text secure, strictly based 00:02:19.544 --> 00:02:23.782 under the Computational Diffie Hellman assumption, at least in the random oracle 00:02:23.782 --> 00:02:28.983 model. I'll tell you that these bi-linear groups are actually constructed from very 00:02:28.983 --> 00:02:33.715 special elliptic curves. So there's a bit more algebraic structure that enables us 00:02:33.715 --> 00:02:38.404 to prove this equivalence of IDH and CDH. But in general, who knows, maybe you don't 00:02:38.404 --> 00:02:42.928 want to use elliptic curve groups, maybe you want to use ZP star for some reason. 00:02:42.928 --> 00:02:47.817 So it's a pretty natural question to ask. Can we change the ElGamal system such that 00:02:47.817 --> 00:02:52.828 chosen ciphertext security of ElGamal now can be proven, directly based on CDH, for any group 00:02:52.828 --> 00:02:57.840 where CDH is hard? [Now with that ??] assuming any additional structure about the group, 00:02:57.840 --> 00:03:02.132 And it turns out the answer is yes. And there's kind of an elegant construction 00:03:02.132 --> 00:03:06.696 called twin ElGamal, so let me show you how twin ElGamal works. It's a very simple 00:03:06.696 --> 00:03:10.607 variation of ElGamal that does the following. Again, in key generation, we 00:03:10.607 --> 00:03:14.954 choose a random generator. But this time, we're gonna choose two exponents, A1 and 00:03:14.954 --> 00:03:19.409 A2 as the secret key. So the secret key is gonna consist of these two exponents, A1 00:03:19.409 --> 00:03:23.783 and A2. You know the public key of course is gonna consist of our generator. And 00:03:23.783 --> 00:03:28.340 then we're gonna release G to the A1 and G to the A2. So remember that in regular 00:03:28.340 --> 00:03:32.840 ElGamal what the public key is simply g to the A and that's it. Here we have two 00:03:32.840 --> 00:03:37.340 exponents A1 and A2 and therefore we, we release both G to the A1 and G to the A2. 00:03:37.340 --> 00:03:42.297 Now the way we encrypt, you'll notice the public key here is one element longer than 00:03:42.297 --> 00:03:47.137 regular ElGamal. The way we encrypt using this public key is actually very similar 00:03:47.137 --> 00:03:51.859 to regular ElGamal. We choose a random B, and now what we'll hash is actually not 00:03:51.859 --> 00:03:56.522 just G to the B and H1 to the B, but, in fact, G to the B, H1 to the B, and H2 00:03:56.522 --> 00:04:01.691 to the B. Okay so we basically hashing three elements instead of two elements and 00:04:01.691 --> 00:04:06.642 then we basically encrypt the message using the derived symmetric encryption key 00:04:06.642 --> 00:04:11.410 and as usual we output g to the b and c as our final ciphertext. How do we decrypt? 00:04:11.410 --> 00:04:16.027 Well, so now the secret key consists of these two exponents, A1 and A2. And the 00:04:16.027 --> 00:04:20.584 cipher text consists of U, and the symmetric cipher text C. So let me ask you 00:04:20.584 --> 00:04:25.501 a puzzle, and see if you can figure out how to derive the symmetric encryption key 00:04:25.501 --> 00:04:31.934 K, just given the secret key and the value U. So it's kind of a cute puzzle and I 00:04:31.934 --> 00:04:37.250 hope everybody worked out, the solution which is basically what we can do is we 00:04:37.250 --> 00:04:42.566 can take U to the power of A1, and that is basically G to the B A1 And U to the A2 00:04:42.566 --> 00:04:48.012 which is G to the B A2. And that basically gives us exactly the same values, as H1 to 00:04:48.012 --> 00:04:53.069 the B and H2 to the B. So this way the decryptor arrives at the same symmetric 00:04:53.069 --> 00:04:58.515 key that the encryptor did. Then he decrypts the ciphertext using the symmetric system 00:04:58.515 --> 00:05:03.389 and finally outputs the message M. So you notice this is a very simple modification 00:05:03.389 --> 00:05:07.802 to regular ElGamal. All we do is we stick one more element to the public key. When 00:05:07.802 --> 00:05:11.888 we do the hashing, we hash one more element, as opposed to just two elements. 00:05:11.888 --> 00:05:16.246 We hash three elements. And similarly, we do doing decryption, and nothing else 00:05:16.246 --> 00:05:20.491 changes. The cipher text is the same length as before, and that's it, Now the 00:05:20.491 --> 00:05:24.647 amazing thing is that this simple modification allows us to now prove chosen 00:05:24.647 --> 00:05:28.640 cipher text security directly based on standard Computational Diffie-Hellman 00:05:28.640 --> 00:05:32.795 assumption, okay? Still we're going to need to assume that we have a symmetric 00:05:32.795 --> 00:05:36.897 encryption system that provides us authenticated encryption and that the hash 00:05:36.897 --> 00:05:41.430 function that we're using is an ideal hash function in what we call a random oracle 00:05:41.430 --> 00:05:45.747 But nevertheless given that, we can prove chosen cipher text security strictly 00:05:45.747 --> 00:05:49.803 based on a Computational Diffie-Hellman assumption. So now what's the cost of this? 00:05:49.803 --> 00:05:53.966 Of course, if you think about this, during encryption and decryption, encryption has 00:05:53.966 --> 00:05:57.418 to do one more exponentiation, and the decryption has to do one more 00:05:57.418 --> 00:06:01.581 exponentiation. So the encryptor now does three exponentiations instead of two, and 00:06:01.581 --> 00:06:05.490 the decryptor does two exponentiations instead of one. So the question is now, 00:06:05.490 --> 00:06:09.965 now it's a philosophical question. Is this extra effort worth it? So you do more work 00:06:09.965 --> 00:06:14.228 during encryption and decryption. And your public key is a little bit bigger, but 00:06:14.228 --> 00:06:18.331 that doesn't really matter. The work during encryption and decryption is more 00:06:18.331 --> 00:06:22.434 significant. And as a result you get chosen ciphertext security based on kind 00:06:22.434 --> 00:06:26.644 of a more natural assumption, namely Computational Diffie-Hellman as opposed to 00:06:26.644 --> 00:06:30.480 these odd-looking Interactive Diffie-Hellman assumption. But is it worth 00:06:30.480 --> 00:06:34.743 it? Kind of the question is are there groups where CDH holds but IDH does not 00:06:34.743 --> 00:06:38.815 hold? If there were such groups, then it would definitely be worth it, because 00:06:38.815 --> 00:06:43.050 those groups, the twin ElGamal would be secure, but the regular ElGamal would not 00:06:43.050 --> 00:06:47.508 be CCA secure. But unfortunately we don't know if there is any such group and in 00:06:47.508 --> 00:06:51.383 fact as far as we know, it's certainly possible that any group where CDH holds, 00:06:51.383 --> 00:06:55.307 IDH also holds. So the answer is, really, we don't know whether the extra cost is 00:06:55.307 --> 00:06:59.530 worth it or not but then nevertheless it's a cute result that shows that if you want 00:06:59.530 --> 00:07:03.256 to achieve chosen ciphertext security directly from CDH, you could do 00:07:03.256 --> 00:07:08.051 it without too many changes to the ElGamal system. The next question is whether we 00:07:08.051 --> 00:07:12.230 can get rid of this assumption that the hash function is an ideal hash function 00:07:12.230 --> 00:07:16.617 mainly that it's a random oracle. And this is actually a huge topic. There's a lot of 00:07:16.617 --> 00:07:20.482 work in this area on how to build efficient public key encryption systems 00:07:20.482 --> 00:07:24.922 that are chosen ciphertext secure without random oracles. This is a very active area 00:07:24.922 --> 00:07:29.192 of research as I said in the past decade and even longer. And I'll mention that as 00:07:29.192 --> 00:07:33.379 it applies to ElGamal, they are basically, again two families of constructions. The 00:07:33.379 --> 00:07:37.515 first construction's a construction that uses these special groups called these 00:07:37.515 --> 00:07:41.599 bilinear groups that I just mentioned before. It turns out the extra structure 00:07:41.599 --> 00:07:45.682 of these bilinear groups allows us to build very efficient chosen ciphertext 00:07:45.682 --> 00:07:50.128 secure systems without referring to random oracles and as I said at the end of the 00:07:50.128 --> 00:07:54.367 module, I point to a number of papers that explain how to do that. This is quite an 00:07:54.367 --> 00:07:58.876 interesting construction. But it will take me several hours to kind of explain how it 00:07:58.876 --> 00:08:03.434 works. The other alternative is actually to use groups where a stronger assumption 00:08:03.434 --> 00:08:07.830 called the Decision Diffie-Hellman assumption holds. Again, I am not gonna define this 00:08:07.830 --> 00:08:11.798 assumption here. I'll just tell you that this assumption actually holds in 00:08:11.798 --> 00:08:16.141 subgroups of ZP star, in particular if we look at the prime order of a subgroup of 00:08:16.141 --> 00:08:19.812 ZP star. The Decision Diffie-Hellman assumption seems to hold in those groups 00:08:19.812 --> 00:08:23.852 and then in those groups we can actually build a variant of ElGamal, called the 00:08:23.852 --> 00:08:27.141 Cramer Shoup system that is chosen ciphertext secure and the Decision 00:08:27.141 --> 00:08:30.665 Diffie-Hellman assumption without random oracles. Again, this is a beautiful, 00:08:30.665 --> 00:08:34.659 beautiful result but again it would take a couple of hours to explain and so I'm not 00:08:34.659 --> 00:08:38.324 gonna do that here. This is probably one of these things that I gonna leave to 00:08:38.324 --> 00:08:42.083 either the advanced topics or to do an advanced course so that we'll do it at a 00:08:42.083 --> 00:08:46.335 later time. But again I points to a paper at the end of the module just in case you 00:08:46.335 --> 00:08:50.626 want to read more about this. So here is a list of papers that provides more reading 00:08:50.626 --> 00:08:54.208 material. So if you wanna read about Diffie Hellman assumptions, I guess I 00:08:54.208 --> 00:08:58.036 wrote a paper about this a long time ago, that talks about various assumptions 00:08:58.036 --> 00:09:01.716 related to, to Diffie Hellman. And in particular, studies the Decision Diffie 00:09:01.716 --> 00:09:05.685 Hellman assumption. And if you wanna learn about how to build chosen ciphertext 00:09:05.685 --> 00:09:10.098 secure system from Decision Diffie-Hellman and assumptions like it. There's a very 00:09:10.098 --> 00:09:14.563 cute paper by Kramer and Shoup back from 2002 that shows how to do it. This is 00:09:14.563 --> 00:09:18.618 again a very highly recommended paper. If you want to learn how to build chosen 00:09:18.618 --> 00:09:22.672 ciphertext security from these bilinear groups, then the paper to read is 00:09:22.672 --> 00:09:26.983 the one, cited here, which actually uses a general mechanism called Identity Based 00:09:26.983 --> 00:09:30.884 Encryption which very surprisingly it turns out to actually gives us chosen 00:09:30.884 --> 00:09:34.638 ciphertext security almost for free. So, once we build identity-based 00:09:34.638 --> 00:09:38.486 encryption chosen ciphertext security falls immediately. And that's covered in this 00:09:38.486 --> 00:09:42.283 paper that I, that I describe her. The Twin Diffie-Hellman construction and its 00:09:42.283 --> 00:09:46.381 proof is described in this paper that I reference here And finally, if you kind of 00:09:46.381 --> 00:09:50.135 want to see a very recent paper. That gives a very general framework for 00:09:50.135 --> 00:09:54.345 building, chosen ciphertext secure systems, using what's called extractable hash proofs that 00:09:54.345 --> 00:09:58.506 is again a nice paper by Hoeteck Wee, that was just recently published. I definitely 00:09:58.506 --> 00:10:02.416 recommend reading it all, all have very, very elegant ideas in them. I wish I 00:10:02.416 --> 00:10:06.426 could cover all of them in the basic course but I'm gonna have to leave some of 00:10:06.426 --> 00:10:10.436 these to the more advanced course or perhaps the more advanced topics at the 00:10:10.436 --> 00:10:14.446 end of this course. Okay, so I'll stop here and in the next segment I'm gonna tie 00:10:14.446 --> 00:10:18.506 ElGamal and RSA encryption together so that you see how the two kind 00:10:18.506 --> 00:10:20.512 of follow from a more general principle.