WEBVTT 00:00:00.000 --> 00:00:03.583 Last week, we learned a number theory that's needed for public key encryption. 00:00:03.583 --> 00:00:07.166 This week we're gonna put this knowledge to work, and we're gonna construct a 00:00:07.166 --> 00:00:10.889 number of secure public key encryption schemes. But first, we need to define what 00:00:10.889 --> 00:00:14.565 is public key encryption, and what does it mean for public key encryption to be 00:00:14.565 --> 00:00:18.241 secure? So let me remind you that in a public key encryption scheme, there is an 00:00:18.241 --> 00:00:21.778 encryption algorithm which is usually denoted by E, and there's a decryption 00:00:21.778 --> 00:00:25.361 algorithm which we denote by D. However here, the encryption algorithm takes a 00:00:25.361 --> 00:00:29.477 public key, while the decryption algorithm takes a secret key. This pair is called a 00:00:29.477 --> 00:00:34.356 key pair. And the public key is used for encrypting messages while the secret key 00:00:34.356 --> 00:00:39.002 is used for decrypting messages. So in this case a message m is encrypting using 00:00:39.002 --> 00:00:43.880 the public key and what comes out of that is the ciphertext c. And similarly the 00:00:43.880 --> 00:00:48.643 ciphertext is fed into the decryption algorithm and using the secret key, what 00:00:48.643 --> 00:00:53.577 comes out of the decryption algorithm is the original message m. Now public key 00:00:53.577 --> 00:00:57.989 encryption has many applications. Last week we saw the classic application which 00:00:57.989 --> 00:01:02.455 is session setup, namely, key exchange and for now we're just looking at key exchange 00:01:02.455 --> 00:01:06.867 that is secure against eavesdropping only. And if you remember the way the protocol 00:01:06.867 --> 00:01:11.227 works, basically Alice, what she would do is she would generate a public key secret 00:01:11.227 --> 00:01:15.546 pair. She would send the public key to Bob. Bob will generate a random X, which 00:01:15.546 --> 00:01:20.136 is gonna serve as their shared secret, and then he sends X encrypted to Alice, 00:01:20.136 --> 00:01:24.904 encrypted under her public key. Alice can decrypt, recover X and now both of them 00:01:24.904 --> 00:01:29.554 have this shared secret X which they can use to communicate securely with one 00:01:29.554 --> 00:01:34.143 another. The attacker, of course, all he gets to see is just the public key, the 00:01:34.143 --> 00:01:38.972 encryption of X under the public key, from which he should not be able to get any 00:01:38.972 --> 00:01:43.800 information about X. And we are going to define that more precisely to understand 00:01:43.800 --> 00:01:48.507 what it means to not be able to learn anything about X. Public key encryption 00:01:48.507 --> 00:01:52.522 actually has many other applications. For example, it's very useful in 00:01:52.522 --> 00:01:57.235 non-interactive applications. So think of an email system for example. So here, Bob 00:01:57.235 --> 00:02:01.716 wants to send mail to Alice, and as Bob sends the email, the email passes from 00:02:01.716 --> 00:02:06.603 mail relay to mail relay until finally it reaches Alice, at which point Alice should 00:02:06.603 --> 00:02:10.502 decrypt. The way the email system is set up, is designed for kind of 00:02:10.502 --> 00:02:15.045 non-interactive settings where Bob sends the email. And then Alice is supposed to 00:02:15.045 --> 00:02:19.195 receive it. And Alice should not be to communicate with Bob in order to decrypt 00:02:19.195 --> 00:02:23.502 the email. So in this case, because of the non-interactivity, there's no opportunity 00:02:23.502 --> 00:02:27.705 for setting up a shared secret between Alice and Bob. So in this case, what would 00:02:27.705 --> 00:02:32.169 happen is, Bob basically would, would send the email encrypted, using Alice's, public 00:02:32.169 --> 00:02:36.571 key. So he sends the email. Anyone in the world can send the email encrypted to 00:02:36.571 --> 00:02:41.103 Alice, encrypted using her public key. When Alice receives this email, she uses 00:02:41.103 --> 00:02:45.748 her secret key to decrypt the ciphertext and recover the plain text message. 00:02:45.748 --> 00:02:50.507 Of course the one caveat in a system like this is that in fact Bob needs to somehow 00:02:50.507 --> 00:02:54.804 obtain Alice's public key So for now we are just going to assume Bob already has 00:02:54.804 --> 00:02:58.297 Alice's public key, but later on, actually, when we talk about digital 00:02:58.297 --> 00:03:02.457 signatures we're gonna see how, this can actually be done very efficiently using what's 00:03:02.457 --> 00:03:06.823 called public key management and as I said we'll actually get back to that at a later 00:03:06.823 --> 00:03:10.931 time. But the main thing I want you to remember, is that public key encryption is 00:03:10.931 --> 00:03:14.578 used for session setup. This is very common on the web, where public key 00:03:14.578 --> 00:03:18.840 encryption is used to set up a secure key between a web browser and, and web server. 00:03:18.840 --> 00:03:22.898 And public key encryption is also very useful for non-interactive applications, 00:03:22.898 --> 00:03:26.390 where anyone in the world, non-interactively, needs to send a message 00:03:26.390 --> 00:03:30.653 to Alice, they can encrypt the message using Alice's public key, and Alice can decrypt 00:03:30.653 --> 00:03:36.105 and recover the plain text. So let me remind you in a bit more detail what a 00:03:36.105 --> 00:03:40.347 public key encryption system is. Well, it's made up of three algorithms G, E, and 00:03:40.347 --> 00:03:44.431 D. G is called the key generation algorithm. Basically what it will do is it will 00:03:44.431 --> 00:03:48.672 generate this key pair, the public key and the secret key. As written here, G takes 00:03:48.672 --> 00:03:53.018 no arguments, but in real life, G actually does take an argument called the security 00:03:53.018 --> 00:03:57.260 parameter which specifies the size of the keys that are generated by this key 00:03:57.260 --> 00:04:01.731 generation algorithm. Then there are these encryption algorithms as usual that take a 00:04:01.731 --> 00:04:06.051 public key and a message and produce a ciphertext in a decryption algorithm that 00:04:06.051 --> 00:04:10.530 takes the corresponding secret key and a ciphertext and it produces a corresponding 00:04:10.530 --> 00:04:14.955 message. And as usual for consistency we say that if we encrypt a message under a 00:04:14.955 --> 00:04:19.380 given public key and then decrypt with a corresponding secret key we should get the 00:04:19.380 --> 00:04:23.852 original message back. Now what does it mean for a public key encryption to be 00:04:23.852 --> 00:04:27.913 secure? I'm going to start off by defining, security against eavesdropping. 00:04:27.913 --> 00:04:32.002 And then we're going to define security against active attacks. So the way to 00:04:32.002 --> 00:04:36.237 define security against eavesdropping is very similar to the symmetric case we've 00:04:36.237 --> 00:04:40.626 already this last week so we're gonna go through this quickly just as a review. 00:04:40.626 --> 00:04:44.808 Basically the attack game is defined as follows. We defined these two experiments, 00:04:44.808 --> 00:04:49.249 experiment zero and experiment one. At in either experiment the challenger is gonna 00:04:49.249 --> 00:04:52.965 generate a public and a secret key pair. He's gonna give the public 00:04:52.965 --> 00:04:57.342 key to the adversary. The adversary's gonna output two messages m0 and m1 of 00:04:57.342 --> 00:05:01.663 equal length and then what he gets back is either the encryption of m0 or the 00:05:01.663 --> 00:05:06.039 encryption of m1. In experiment zero he gets the encryption of m0. In experiment 00:05:06.039 --> 00:05:10.748 one he gets the encryption of m1. And then the adversary is supposed to say which one 00:05:10.748 --> 00:05:15.240 did he get. Did he get the encryption of m0 or did he get the encryption of m1? So 00:05:15.240 --> 00:05:19.676 in this game, the attacker only gets one ciphertext. This corresponds to an 00:05:19.676 --> 00:05:24.226 eavesdropping attack where he simply eavesdropped on that ciphertext C. And now 00:05:24.226 --> 00:05:28.719 his goal is to tell whether the ciphertext C i s the encryption of M0 or M1. No 00:05:28.719 --> 00:05:34.221 tampering on the ciphertext C is allowed just yet. And as usual we say that the 00:05:34.221 --> 00:05:38.206 public key encryption scheme is semantically secure if the attacker cannot 00:05:38.206 --> 00:05:42.085 distinguish experiment zero from experiment one. In other words he cannot 00:05:42.085 --> 00:05:47.757 tell whether he got the encryption of M0, or the encryption of M1. Before we move on 00:05:47.757 --> 00:05:52.311 to active attacks, I want to mention a quick relation between the definition we 00:05:52.311 --> 00:05:56.105 just saw, And the definition of, of eavesdropping security for symmetric 00:05:56.105 --> 00:06:00.438 ciphers. If you remember, when we talked about eavesdropping security for symmetric 00:06:00.438 --> 00:06:04.771 ciphers, we distinguished between the case where the key is used once, and the case 00:06:04.771 --> 00:06:08.998 where the key is used multiple times. And, in fact we saw that, there's a clear 00:06:08.998 --> 00:06:13.357 separation. For example, the onetime pad. Is secure if the key is used to encrypt a 00:06:13.357 --> 00:06:17.382 single message, but is completely insecure if the key is used to encrypt multiple 00:06:17.382 --> 00:06:21.358 messages. And in fact we had two different definitions if you remember we had a 00:06:21.358 --> 00:06:25.383 definition for one-time security, and then we had a separate definition, which was 00:06:25.383 --> 00:06:29.700 stronger, when the key was used multiple times. The definition that I showed you on 00:06:29.700 --> 00:06:34.043 the previous slide's very similar to the definition of one time security for 00:06:34.043 --> 00:06:38.499 symmetric ciphers. And in fact, it turns out that for public key encryption, if a 00:06:38.499 --> 00:06:43.124 system is secure under a onetime key, in a sense, it's also secure for a many time 00:06:43.124 --> 00:06:47.929 key. So in other words, we don't have to explicitly give the attacker the ability 00:06:47.929 --> 00:06:53.171 to, request encryptions of messages of his choice. Because he could just create those 00:06:53.171 --> 00:06:57.870 encryptions all by himself. He is given the public key, and therefore he can by 00:06:57.870 --> 00:07:04.672 himself encrypt any message he likes. As a result any public key secret pair in some 00:07:04.672 --> 00:07:09.289 sense inherently is used to encrypt multiple messages because the attacker 00:07:09.289 --> 00:07:13.905 could have just encrypted many, many messages of his choice using the given 00:07:13.905 --> 00:07:18.891 public key that we just gave him in the first step. And so, as a result in fact, 00:07:18.891 --> 00:07:23.692 the definition of one time security is enough to imply many time security and 00:07:23.692 --> 00:07:28.801 that's why we refer to the concept as indistinguishability under a chosen plain 00:07:28.801 --> 00:07:34.012 text attach. So this is just a minor point to explain why the settings of public 00:07:34.012 --> 00:07:37.770 encryption, we don't need a more complicated definition to capture 00:07:37.770 --> 00:07:42.515 eavesdropping security. Now that we understand eavesdropping security, let's 00:07:42.515 --> 00:07:47.343 look at more powerful adversaries that can actually mount active attacks. So, in 00:07:47.343 --> 00:07:51.585 particular, let's look at the email example. So here, we have our friend Bob 00:07:51.585 --> 00:07:56.228 who wants to send mail to his friend Caroline. And Caroline happens to have, an 00:07:56.228 --> 00:08:00.699 account at Gmail. And the way this works is basically, the email is sent to the 00:08:00.699 --> 00:08:05.514 Gmail server, encrypted. The Gmail server decrypts the email, looks at the, intended 00:08:05.514 --> 00:08:09.297 recipients. And then, if it's, the intended recipient is Caroline, it 00:08:09.297 --> 00:08:13.653 forwards the email to Caroline. If the intended recipient is the attacker, it 00:08:13.653 --> 00:08:18.573 forwards the email to the attacker. This is similar to how Gmail actually works 00:08:18.573 --> 00:08:23.441 because the sender would send the email encrypted over SSL to the Gmail server. 00:08:23.441 --> 00:08:28.087 The Gmail server would terminate the SSL and then forward the email to the 00:08:28.087 --> 00:08:33.081 appropriate recipients. Now suppose Bob encrypts the email using a system that 00:08:33.081 --> 00:08:37.764 allows the adversary to tamper with the ciphertext without being detected. For 00:08:37.764 --> 00:08:42.387 example, imagine this email is encrypted using Counter Mode, or something like 00:08:42.387 --> 00:08:47.070 that. Then when the attacker intercepts this email, he can change the recipient, 00:08:47.070 --> 00:08:50.732 so that now the recipient says attacker@gmail.com, and we know that for 00:08:50.732 --> 00:08:55.415 Counter Mode, for example, this is quite easy to do. The attacker knows that the 00:08:55.415 --> 00:09:00.278 email is intended for Caroline, he is just interested in the email body. So he can 00:09:00.278 --> 00:09:04.226 easily change the email recipient to attacker@gmail.com and now when the server 00:09:04.226 --> 00:09:08.129 receives the email, he will decrypt it, see that the recipient is supposed to be 00:09:08.129 --> 00:09:12.033 attacker, and forward the body to the attacker. And now the attacker was able to 00:09:12.033 --> 00:09:16.022 read the body of the email that was intended for Caroline. So this is a 00:09:16.022 --> 00:09:21.198 classic example of an active attack, and you notice what the attacker could do 00:09:21.198 --> 00:09:26.174 here, is it could decrypt any ciphertext where the intended recipient is to: 00:09:26.174 --> 00:09:31.548 attacker. So any ciphertext where the plain text begins with the words "to: attacker". So our goal is 00:09:31.548 --> 00:09:36.657 to design public key systems that are secure, even if the attacker can tamper 00:09:36.657 --> 00:09:42.999 with ciphertext and possibly decrypt certain cyphertexts. And again, I want to 00:09:42.999 --> 00:09:47.612 emphasize that here the attacker's goal was to get the message body. The attacker 00:09:47.612 --> 00:09:52.055 already knew that the email is intended for Caroline. And all he had to do was 00:09:52.055 --> 00:09:56.863 just change the, intended recipient. So this tampering attack motivates the 00:09:56.863 --> 00:10:01.620 definition of chosen ciphertext security. And in fact this is the standard notion of 00:10:01.620 --> 00:10:07.462 security for public key encryption. So let me explain how the attack [here procedes] and as I 00:10:07.462 --> 00:10:11.899 said our goal is to build systems that are secure under this very, very conservative 00:10:11.899 --> 00:10:15.756 notion of encryption. So we have an encryption scheme (G, E, D). And let's say 00:10:15.756 --> 00:10:20.140 that's defined over a message space and a ciphertext (M, C) and as usual we're 00:10:20.140 --> 00:10:24.313 gonna define two experiments, experiment zero, and experiment one. So 'b' here 00:10:24.313 --> 00:10:28.222 says whether the challenger is implementing experiment zero or experiment 00:10:28.222 --> 00:10:32.659 one. The challenger begins by generating a public key and a secret key, and then gives 00:10:32.659 --> 00:10:37.254 the public key to the adversary. Now the adversary can say, "Well, here are a bunch 00:10:37.254 --> 00:10:41.611 of ciphertexts, please decrypt them for me." So here the adversary submits 00:10:41.611 --> 00:10:46.452 ciphertext C1 and he gets the decryption of ciphertext C1, namely M1. And he gets 00:10:46.452 --> 00:10:51.414 to do this again and again, so he submits ciphertext C2, and he gets the decryption, 00:10:51.414 --> 00:10:56.195 which is M2, ciphertext C3, and he gets the decryption M3, and so on and so forth. 00:10:56.195 --> 00:11:00.188 Finally, the adversary says, "This squaring phase is over," and now he 00:11:00.188 --> 00:11:04.485 submits basically two equal length messages, M0 and M1 as normal, and he 00:11:04.485 --> 00:11:08.820 receives in response the challenge ciphertext C, Which is the encryption of M 00:11:08.820 --> 00:11:13.052 zero or the encryption of M one. Depending on whether we're in experiment zero or 00:11:13.052 --> 00:11:17.003 experiment one. Now, the adversary can continue to issue these ciphertext 00:11:17.003 --> 00:11:21.063 queries. So he can continue to issue, decryption requests. So he submits a 00:11:21.063 --> 00:11:25.447 ciphertext, and he gets a decryption of that ciphertext, but of course, now, there 00:11:25.447 --> 00:11:29.994 has to be a caveat. If the attacker could submit arbitrary ciphertext of his choice, 00:11:29.994 --> 00:11:34.270 of course, he could break the challenge. What he would do is he would submit the 00:11:34.270 --> 00:11:38.506 challenge ciphertext C as a decryption query. And then he would be told whether 00:11:38.506 --> 00:11:42.665 in the challenge phase he was given the encryption of M0 or the encryption of M1. 00:11:42.665 --> 00:11:46.825 As a result we put this limitation here, that says that he can in fact submit any 00:11:46.825 --> 00:11:51.031 ciphertext of his choice except. For the challenge ciphertext. So the attacker 00:11:51.031 --> 00:11:55.034 could ask for the decryption of any ciphertext of his choice other than the 00:11:55.034 --> 00:11:59.297 challenge ciphertext. And even though he was given all these decryptions, he still 00:11:59.297 --> 00:12:03.196 shouldn't be able to tell whether he was given the encryption of M0 or the 00:12:03.196 --> 00:12:09.212 encryption of M1. So you notice this is a very conservative definition. It gives the 00:12:09.212 --> 00:12:14.113 attacker more power than what we saw in the previous slide. On the previous slide, 00:12:14.113 --> 00:12:18.710 the attacker could only decrypt messages where the plain text began with the words 00:12:18.710 --> 00:12:23.611 "to: attacker". Here, we're saying the attacker can decrypt any ciphertext of his choice, 00:12:23.611 --> 00:12:29.717 as long as it's different from the challenge ciphertext C. Okay? And then his 00:12:29.717 --> 00:12:34.094 goal is to say whether the challenge ciphertext is the encryption of M0 or the 00:12:34.094 --> 00:12:37.918 encryption of M1. And as usual, if he can't do that, in other words, his 00:12:37.918 --> 00:12:42.351 behavior in experiment zero is basically the same as his behavior in experiment 00:12:42.351 --> 00:12:46.839 one, so he wasn't able to distinguish the encryption of M0 from the encryption of 00:12:46.839 --> 00:12:51.219 M1, even though he had all this power Then we say that the system is chosen 00:12:51.219 --> 00:12:55.877 ciphertext secure, CCA secure. And sometimes there is an acronym, the acronym 00:12:55.877 --> 00:13:00.596 for this is indistinguishability under a chosen ciphertext attack, but I'm just 00:13:00.596 --> 00:13:05.745 gonna say CCA secured. So let's see how this captures, the email example we saw 00:13:05.745 --> 00:13:10.587 before. So suppose the encryption system being used is such that just given the 00:13:10.587 --> 00:13:15.429 encryption of a message the attacker can change the intended recipient from to 00:13:15.429 --> 00:13:20.129 Alice say to, to Charlie. Then here's how we would win the CCA game. Well in the 00:13:20.129 --> 00:13:25.033 first step he's given the public key of course. And then what the attacker will do 00:13:25.033 --> 00:13:29.578 is he would issue two equal length messages, namely in the first message, the 00:13:29.578 --> 00:13:33.943 body is zero. In the second message the body is one. But both messages are 00:13:33.943 --> 00:13:39.890 intended for Alice. And in response, he would be given the challenge ciphertext C. 00:13:39.890 --> 00:13:45.130 Okay, so now here we have our challenge ciphertext C. Now what the attacker is 00:13:45.130 --> 00:13:49.961 gonna do is he's gonna use his, his ability here to modify the intended 00:13:49.961 --> 00:13:55.269 recipient. And he's gonna send back a ciphertext C', where C' is the encryption 00:13:55.269 --> 00:14:01.760 of the message to Charlie with body being the challenge body b. So if you remember is 00:14:01.760 --> 00:14:07.822 either zero or one. Now, because the plain text is different, we know that the 00:14:07.822 --> 00:14:12.486 ciphertext must also be different. So in particular, C prime must be different from 00:14:12.486 --> 00:14:17.206 the challenge ciphertext C, yeah? So the C prime here must be different from C. And 00:14:17.206 --> 00:14:21.758 as a result, the poor challenger now has to decrypt by definition of the CCA game. 00:14:21.758 --> 00:14:26.141 The challenger must decrypt any ciphertext that's not equal to a challenge 00:14:26.141 --> 00:14:30.648 ciphertext. So the challenger decrypts give the adversary M prime. Basically he 00:14:30.648 --> 00:14:35.256 gave the adversary B, and now the adversary can output the challenge B and 00:14:35.256 --> 00:14:40.293 he wins the game with advantage one. So he's advantage with this particular scheme 00:14:40.293 --> 00:14:45.143 is one. So, simply because the attacker was able to change the challenge ciphertext 00:14:45.146 --> 00:14:49.999 from one recipient to another that allows him to, to win the CCA game with 00:14:49.999 --> 00:14:55.003 advantage one. So as I said, chosen ciphertext security turns out actually is 00:14:55.003 --> 00:14:59.327 the correct notion of security for public key encryption systems. And it's a very, 00:14:59.327 --> 00:15:03.651 very interesting concept, right? Basically, somehow even though the attacker has this ability 00:15:03.651 --> 00:15:07.839 to decrypt anything he wants. Other than the challenge ciphertext, still he can't 00:15:07.839 --> 00:15:12.028 learn what the challenge ciphertext is. And so the goal for the remainder of this module 00:15:12.028 --> 00:15:16.275 and actually the next module as well, is to construct CCA secure systems. It's 00:15:16.275 --> 00:15:20.093 actually quite remarkable that this is achievable and I'm going to show you 00:15:20.093 --> 00:15:24.310 exactly how to do it. And in fact those CCA secure systems that we build are the 00:15:24.310 --> 00:15:28.579 ones that are used in the real world. And every time a system has tried to deploy 00:15:28.737 --> 00:15:33.007 a public key encryption mechanism that's not CCA secure someone has come up with an 00:15:33.007 --> 00:15:37.487 attack and was able to break it. And we're going to see some of these example attacks 00:15:37.487 --> 00:15:39.280 actually in the next few segments.