1 00:00:00,000 --> 00:00:03,583 Last week, we learned a number theory that's needed for public key encryption. 2 00:00:03,583 --> 00:00:07,166 This week we're gonna put this knowledge to work, and we're gonna construct a 3 00:00:07,166 --> 00:00:10,889 number of secure public key encryption schemes. But first, we need to define what 4 00:00:10,889 --> 00:00:14,565 is public key encryption, and what does it mean for public key encryption to be 5 00:00:14,565 --> 00:00:18,241 secure? So let me remind you that in a public key encryption scheme, there is an 6 00:00:18,241 --> 00:00:21,778 encryption algorithm which is usually denoted by E, and there's a decryption 7 00:00:21,778 --> 00:00:25,361 algorithm which we denote by D. However here, the encryption algorithm takes a 8 00:00:25,361 --> 00:00:29,477 public key, while the decryption algorithm takes a secret key. This pair is called a 9 00:00:29,477 --> 00:00:34,356 key pair. And the public key is used for encrypting messages while the secret key 10 00:00:34,356 --> 00:00:39,002 is used for decrypting messages. So in this case a message m is encrypting using 11 00:00:39,002 --> 00:00:43,880 the public key and what comes out of that is the ciphertext c. And similarly the 12 00:00:43,880 --> 00:00:48,643 ciphertext is fed into the decryption algorithm and using the secret key, what 13 00:00:48,643 --> 00:00:53,577 comes out of the decryption algorithm is the original message m. Now public key 14 00:00:53,577 --> 00:00:57,989 encryption has many applications. Last week we saw the classic application which 15 00:00:57,989 --> 00:01:02,455 is session setup, namely, key exchange and for now we're just looking at key exchange 16 00:01:02,455 --> 00:01:06,867 that is secure against eavesdropping only. And if you remember the way the protocol 17 00:01:06,867 --> 00:01:11,227 works, basically Alice, what she would do is she would generate a public key secret 18 00:01:11,227 --> 00:01:15,546 pair. She would send the public key to Bob. Bob will generate a random X, which 19 00:01:15,546 --> 00:01:20,136 is gonna serve as their shared secret, and then he sends X encrypted to Alice, 20 00:01:20,136 --> 00:01:24,904 encrypted under her public key. Alice can decrypt, recover X and now both of them 21 00:01:24,904 --> 00:01:29,554 have this shared secret X which they can use to communicate securely with one 22 00:01:29,554 --> 00:01:34,143 another. The attacker, of course, all he gets to see is just the public key, the 23 00:01:34,143 --> 00:01:38,972 encryption of X under the public key, from which he should not be able to get any 24 00:01:38,972 --> 00:01:43,800 information about X. And we are going to define that more precisely to understand 25 00:01:43,800 --> 00:01:48,507 what it means to not be able to learn anything about X. Public key encryption 26 00:01:48,507 --> 00:01:52,522 actually has many other applications. For example, it's very useful in 27 00:01:52,522 --> 00:01:57,235 non-interactive applications. So think of an email system for example. So here, Bob 28 00:01:57,235 --> 00:02:01,716 wants to send mail to Alice, and as Bob sends the email, the email passes from 29 00:02:01,716 --> 00:02:06,603 mail relay to mail relay until finally it reaches Alice, at which point Alice should 30 00:02:06,603 --> 00:02:10,502 decrypt. The way the email system is set up, is designed for kind of 31 00:02:10,502 --> 00:02:15,045 non-interactive settings where Bob sends the email. And then Alice is supposed to 32 00:02:15,045 --> 00:02:19,195 receive it. And Alice should not be to communicate with Bob in order to decrypt 33 00:02:19,195 --> 00:02:23,502 the email. So in this case, because of the non-interactivity, there's no opportunity 34 00:02:23,502 --> 00:02:27,705 for setting up a shared secret between Alice and Bob. So in this case, what would 35 00:02:27,705 --> 00:02:32,169 happen is, Bob basically would, would send the email encrypted, using Alice's, public 36 00:02:32,169 --> 00:02:36,571 key. So he sends the email. Anyone in the world can send the email encrypted to 37 00:02:36,571 --> 00:02:41,103 Alice, encrypted using her public key. When Alice receives this email, she uses 38 00:02:41,103 --> 00:02:45,748 her secret key to decrypt the ciphertext and recover the plain text message. 39 00:02:45,748 --> 00:02:50,507 Of course the one caveat in a system like this is that in fact Bob needs to somehow 40 00:02:50,507 --> 00:02:54,804 obtain Alice's public key So for now we are just going to assume Bob already has 41 00:02:54,804 --> 00:02:58,297 Alice's public key, but later on, actually, when we talk about digital 42 00:02:58,297 --> 00:03:02,457 signatures we're gonna see how, this can actually be done very efficiently using what's 43 00:03:02,457 --> 00:03:06,823 called public key management and as I said we'll actually get back to that at a later 44 00:03:06,823 --> 00:03:10,931 time. But the main thing I want you to remember, is that public key encryption is 45 00:03:10,931 --> 00:03:14,578 used for session setup. This is very common on the web, where public key 46 00:03:14,578 --> 00:03:18,840 encryption is used to set up a secure key between a web browser and, and web server. 47 00:03:18,840 --> 00:03:22,898 And public key encryption is also very useful for non-interactive applications, 48 00:03:22,898 --> 00:03:26,390 where anyone in the world, non-interactively, needs to send a message 49 00:03:26,390 --> 00:03:30,653 to Alice, they can encrypt the message using Alice's public key, and Alice can decrypt 50 00:03:30,653 --> 00:03:36,105 and recover the plain text. So let me remind you in a bit more detail what a 51 00:03:36,105 --> 00:03:40,347 public key encryption system is. Well, it's made up of three algorithms G, E, and 52 00:03:40,347 --> 00:03:44,431 D. G is called the key generation algorithm. Basically what it will do is it will 53 00:03:44,431 --> 00:03:48,672 generate this key pair, the public key and the secret key. As written here, G takes 54 00:03:48,672 --> 00:03:53,018 no arguments, but in real life, G actually does take an argument called the security 55 00:03:53,018 --> 00:03:57,260 parameter which specifies the size of the keys that are generated by this key 56 00:03:57,260 --> 00:04:01,731 generation algorithm. Then there are these encryption algorithms as usual that take a 57 00:04:01,731 --> 00:04:06,051 public key and a message and produce a ciphertext in a decryption algorithm that 58 00:04:06,051 --> 00:04:10,530 takes the corresponding secret key and a ciphertext and it produces a corresponding 59 00:04:10,530 --> 00:04:14,955 message. And as usual for consistency we say that if we encrypt a message under a 60 00:04:14,955 --> 00:04:19,380 given public key and then decrypt with a corresponding secret key we should get the 61 00:04:19,380 --> 00:04:23,852 original message back. Now what does it mean for a public key encryption to be 62 00:04:23,852 --> 00:04:27,913 secure? I'm going to start off by defining, security against eavesdropping. 63 00:04:27,913 --> 00:04:32,002 And then we're going to define security against active attacks. So the way to 64 00:04:32,002 --> 00:04:36,237 define security against eavesdropping is very similar to the symmetric case we've 65 00:04:36,237 --> 00:04:40,626 already this last week so we're gonna go through this quickly just as a review. 66 00:04:40,626 --> 00:04:44,808 Basically the attack game is defined as follows. We defined these two experiments, 67 00:04:44,808 --> 00:04:49,249 experiment zero and experiment one. At in either experiment the challenger is gonna 68 00:04:49,249 --> 00:04:52,965 generate a public and a secret key pair. He's gonna give the public 69 00:04:52,965 --> 00:04:57,342 key to the adversary. The adversary's gonna output two messages m0 and m1 of 70 00:04:57,342 --> 00:05:01,663 equal length and then what he gets back is either the encryption of m0 or the 71 00:05:01,663 --> 00:05:06,039 encryption of m1. In experiment zero he gets the encryption of m0. In experiment 72 00:05:06,039 --> 00:05:10,748 one he gets the encryption of m1. And then the adversary is supposed to say which one 73 00:05:10,748 --> 00:05:15,240 did he get. Did he get the encryption of m0 or did he get the encryption of m1? So 74 00:05:15,240 --> 00:05:19,676 in this game, the attacker only gets one ciphertext. This corresponds to an 75 00:05:19,676 --> 00:05:24,226 eavesdropping attack where he simply eavesdropped on that ciphertext C. And now 76 00:05:24,226 --> 00:05:28,719 his goal is to tell whether the ciphertext C i s the encryption of M0 or M1. No 77 00:05:28,719 --> 00:05:34,221 tampering on the ciphertext C is allowed just yet. And as usual we say that the 78 00:05:34,221 --> 00:05:38,206 public key encryption scheme is semantically secure if the attacker cannot 79 00:05:38,206 --> 00:05:42,085 distinguish experiment zero from experiment one. In other words he cannot 80 00:05:42,085 --> 00:05:47,757 tell whether he got the encryption of M0, or the encryption of M1. Before we move on 81 00:05:47,757 --> 00:05:52,311 to active attacks, I want to mention a quick relation between the definition we 82 00:05:52,311 --> 00:05:56,105 just saw, And the definition of, of eavesdropping security for symmetric 83 00:05:56,105 --> 00:06:00,438 ciphers. If you remember, when we talked about eavesdropping security for symmetric 84 00:06:00,438 --> 00:06:04,771 ciphers, we distinguished between the case where the key is used once, and the case 85 00:06:04,771 --> 00:06:08,998 where the key is used multiple times. And, in fact we saw that, there's a clear 86 00:06:08,998 --> 00:06:13,357 separation. For example, the onetime pad. Is secure if the key is used to encrypt a 87 00:06:13,357 --> 00:06:17,382 single message, but is completely insecure if the key is used to encrypt multiple 88 00:06:17,382 --> 00:06:21,358 messages. And in fact we had two different definitions if you remember we had a 89 00:06:21,358 --> 00:06:25,383 definition for one-time security, and then we had a separate definition, which was 90 00:06:25,383 --> 00:06:29,700 stronger, when the key was used multiple times. The definition that I showed you on 91 00:06:29,700 --> 00:06:34,043 the previous slide's very similar to the definition of one time security for 92 00:06:34,043 --> 00:06:38,499 symmetric ciphers. And in fact, it turns out that for public key encryption, if a 93 00:06:38,499 --> 00:06:43,124 system is secure under a onetime key, in a sense, it's also secure for a many time 94 00:06:43,124 --> 00:06:47,929 key. So in other words, we don't have to explicitly give the attacker the ability 95 00:06:47,929 --> 00:06:53,171 to, request encryptions of messages of his choice. Because he could just create those 96 00:06:53,171 --> 00:06:57,870 encryptions all by himself. He is given the public key, and therefore he can by 97 00:06:57,870 --> 00:07:04,672 himself encrypt any message he likes. As a result any public key secret pair in some 98 00:07:04,672 --> 00:07:09,289 sense inherently is used to encrypt multiple messages because the attacker 99 00:07:09,289 --> 00:07:13,905 could have just encrypted many, many messages of his choice using the given 100 00:07:13,905 --> 00:07:18,891 public key that we just gave him in the first step. And so, as a result in fact, 101 00:07:18,891 --> 00:07:23,692 the definition of one time security is enough to imply many time security and 102 00:07:23,692 --> 00:07:28,801 that's why we refer to the concept as indistinguishability under a chosen plain 103 00:07:28,801 --> 00:07:34,012 text attach. So this is just a minor point to explain why the settings of public 104 00:07:34,012 --> 00:07:37,770 encryption, we don't need a more complicated definition to capture 105 00:07:37,770 --> 00:07:42,515 eavesdropping security. Now that we understand eavesdropping security, let's 106 00:07:42,515 --> 00:07:47,343 look at more powerful adversaries that can actually mount active attacks. So, in 107 00:07:47,343 --> 00:07:51,585 particular, let's look at the email example. So here, we have our friend Bob 108 00:07:51,585 --> 00:07:56,228 who wants to send mail to his friend Caroline. And Caroline happens to have, an 109 00:07:56,228 --> 00:08:00,699 account at Gmail. And the way this works is basically, the email is sent to the 110 00:08:00,699 --> 00:08:05,514 Gmail server, encrypted. The Gmail server decrypts the email, looks at the, intended 111 00:08:05,514 --> 00:08:09,297 recipients. And then, if it's, the intended recipient is Caroline, it 112 00:08:09,297 --> 00:08:13,653 forwards the email to Caroline. If the intended recipient is the attacker, it 113 00:08:13,653 --> 00:08:18,573 forwards the email to the attacker. This is similar to how Gmail actually works 114 00:08:18,573 --> 00:08:23,441 because the sender would send the email encrypted over SSL to the Gmail server. 115 00:08:23,441 --> 00:08:28,087 The Gmail server would terminate the SSL and then forward the email to the 116 00:08:28,087 --> 00:08:33,081 appropriate recipients. Now suppose Bob encrypts the email using a system that 117 00:08:33,081 --> 00:08:37,764 allows the adversary to tamper with the ciphertext without being detected. For 118 00:08:37,764 --> 00:08:42,387 example, imagine this email is encrypted using Counter Mode, or something like 119 00:08:42,387 --> 00:08:47,070 that. Then when the attacker intercepts this email, he can change the recipient, 120 00:08:47,070 --> 00:08:50,732 so that now the recipient says attacker@gmail.com, and we know that for 121 00:08:50,732 --> 00:08:55,415 Counter Mode, for example, this is quite easy to do. The attacker knows that the 122 00:08:55,415 --> 00:09:00,278 email is intended for Caroline, he is just interested in the email body. So he can 123 00:09:00,278 --> 00:09:04,226 easily change the email recipient to attacker@gmail.com and now when the server 124 00:09:04,226 --> 00:09:08,129 receives the email, he will decrypt it, see that the recipient is supposed to be 125 00:09:08,129 --> 00:09:12,033 attacker, and forward the body to the attacker. And now the attacker was able to 126 00:09:12,033 --> 00:09:16,022 read the body of the email that was intended for Caroline. So this is a 127 00:09:16,022 --> 00:09:21,198 classic example of an active attack, and you notice what the attacker could do 128 00:09:21,198 --> 00:09:26,174 here, is it could decrypt any ciphertext where the intended recipient is to: 129 00:09:26,174 --> 00:09:31,548 attacker. So any ciphertext where the plain text begins with the words "to: attacker". So our goal is 130 00:09:31,548 --> 00:09:36,657 to design public key systems that are secure, even if the attacker can tamper 131 00:09:36,657 --> 00:09:42,999 with ciphertext and possibly decrypt certain cyphertexts. And again, I want to 132 00:09:42,999 --> 00:09:47,612 emphasize that here the attacker's goal was to get the message body. The attacker 133 00:09:47,612 --> 00:09:52,055 already knew that the email is intended for Caroline. And all he had to do was 134 00:09:52,055 --> 00:09:56,863 just change the, intended recipient. So this tampering attack motivates the 135 00:09:56,863 --> 00:10:01,620 definition of chosen ciphertext security. And in fact this is the standard notion of 136 00:10:01,620 --> 00:10:07,462 security for public key encryption. So let me explain how the attack [here procedes] and as I 137 00:10:07,462 --> 00:10:11,899 said our goal is to build systems that are secure under this very, very conservative 138 00:10:11,899 --> 00:10:15,756 notion of encryption. So we have an encryption scheme (G, E, D). And let's say 139 00:10:15,756 --> 00:10:20,140 that's defined over a message space and a ciphertext (M, C) and as usual we're 140 00:10:20,140 --> 00:10:24,313 gonna define two experiments, experiment zero, and experiment one. So 'b' here 141 00:10:24,313 --> 00:10:28,222 says whether the challenger is implementing experiment zero or experiment 142 00:10:28,222 --> 00:10:32,659 one. The challenger begins by generating a public key and a secret key, and then gives 143 00:10:32,659 --> 00:10:37,254 the public key to the adversary. Now the adversary can say, "Well, here are a bunch 144 00:10:37,254 --> 00:10:41,611 of ciphertexts, please decrypt them for me." So here the adversary submits 145 00:10:41,611 --> 00:10:46,452 ciphertext C1 and he gets the decryption of ciphertext C1, namely M1. And he gets 146 00:10:46,452 --> 00:10:51,414 to do this again and again, so he submits ciphertext C2, and he gets the decryption, 147 00:10:51,414 --> 00:10:56,195 which is M2, ciphertext C3, and he gets the decryption M3, and so on and so forth. 148 00:10:56,195 --> 00:11:00,188 Finally, the adversary says, "This squaring phase is over," and now he 149 00:11:00,188 --> 00:11:04,485 submits basically two equal length messages, M0 and M1 as normal, and he 150 00:11:04,485 --> 00:11:08,820 receives in response the challenge ciphertext C, Which is the encryption of M 151 00:11:08,820 --> 00:11:13,052 zero or the encryption of M one. Depending on whether we're in experiment zero or 152 00:11:13,052 --> 00:11:17,003 experiment one. Now, the adversary can continue to issue these ciphertext 153 00:11:17,003 --> 00:11:21,063 queries. So he can continue to issue, decryption requests. So he submits a 154 00:11:21,063 --> 00:11:25,447 ciphertext, and he gets a decryption of that ciphertext, but of course, now, there 155 00:11:25,447 --> 00:11:29,994 has to be a caveat. If the attacker could submit arbitrary ciphertext of his choice, 156 00:11:29,994 --> 00:11:34,270 of course, he could break the challenge. What he would do is he would submit the 157 00:11:34,270 --> 00:11:38,506 challenge ciphertext C as a decryption query. And then he would be told whether 158 00:11:38,506 --> 00:11:42,665 in the challenge phase he was given the encryption of M0 or the encryption of M1. 159 00:11:42,665 --> 00:11:46,825 As a result we put this limitation here, that says that he can in fact submit any 160 00:11:46,825 --> 00:11:51,031 ciphertext of his choice except. For the challenge ciphertext. So the attacker 161 00:11:51,031 --> 00:11:55,034 could ask for the decryption of any ciphertext of his choice other than the 162 00:11:55,034 --> 00:11:59,297 challenge ciphertext. And even though he was given all these decryptions, he still 163 00:11:59,297 --> 00:12:03,196 shouldn't be able to tell whether he was given the encryption of M0 or the 164 00:12:03,196 --> 00:12:09,212 encryption of M1. So you notice this is a very conservative definition. It gives the 165 00:12:09,212 --> 00:12:14,113 attacker more power than what we saw in the previous slide. On the previous slide, 166 00:12:14,113 --> 00:12:18,710 the attacker could only decrypt messages where the plain text began with the words 167 00:12:18,710 --> 00:12:23,611 "to: attacker". Here, we're saying the attacker can decrypt any ciphertext of his choice, 168 00:12:23,611 --> 00:12:29,717 as long as it's different from the challenge ciphertext C. Okay? And then his 169 00:12:29,717 --> 00:12:34,094 goal is to say whether the challenge ciphertext is the encryption of M0 or the 170 00:12:34,094 --> 00:12:37,918 encryption of M1. And as usual, if he can't do that, in other words, his 171 00:12:37,918 --> 00:12:42,351 behavior in experiment zero is basically the same as his behavior in experiment 172 00:12:42,351 --> 00:12:46,839 one, so he wasn't able to distinguish the encryption of M0 from the encryption of 173 00:12:46,839 --> 00:12:51,219 M1, even though he had all this power Then we say that the system is chosen 174 00:12:51,219 --> 00:12:55,877 ciphertext secure, CCA secure. And sometimes there is an acronym, the acronym 175 00:12:55,877 --> 00:13:00,596 for this is indistinguishability under a chosen ciphertext attack, but I'm just 176 00:13:00,596 --> 00:13:05,745 gonna say CCA secured. So let's see how this captures, the email example we saw 177 00:13:05,745 --> 00:13:10,587 before. So suppose the encryption system being used is such that just given the 178 00:13:10,587 --> 00:13:15,429 encryption of a message the attacker can change the intended recipient from to 179 00:13:15,429 --> 00:13:20,129 Alice say to, to Charlie. Then here's how we would win the CCA game. Well in the 180 00:13:20,129 --> 00:13:25,033 first step he's given the public key of course. And then what the attacker will do 181 00:13:25,033 --> 00:13:29,578 is he would issue two equal length messages, namely in the first message, the 182 00:13:29,578 --> 00:13:33,943 body is zero. In the second message the body is one. But both messages are 183 00:13:33,943 --> 00:13:39,890 intended for Alice. And in response, he would be given the challenge ciphertext C. 184 00:13:39,890 --> 00:13:45,130 Okay, so now here we have our challenge ciphertext C. Now what the attacker is 185 00:13:45,130 --> 00:13:49,961 gonna do is he's gonna use his, his ability here to modify the intended 186 00:13:49,961 --> 00:13:55,269 recipient. And he's gonna send back a ciphertext C', where C' is the encryption 187 00:13:55,269 --> 00:14:01,760 of the message to Charlie with body being the challenge body b. So if you remember is 188 00:14:01,760 --> 00:14:07,822 either zero or one. Now, because the plain text is different, we know that the 189 00:14:07,822 --> 00:14:12,486 ciphertext must also be different. So in particular, C prime must be different from 190 00:14:12,486 --> 00:14:17,206 the challenge ciphertext C, yeah? So the C prime here must be different from C. And 191 00:14:17,206 --> 00:14:21,758 as a result, the poor challenger now has to decrypt by definition of the CCA game. 192 00:14:21,758 --> 00:14:26,141 The challenger must decrypt any ciphertext that's not equal to a challenge 193 00:14:26,141 --> 00:14:30,648 ciphertext. So the challenger decrypts give the adversary M prime. Basically he 194 00:14:30,648 --> 00:14:35,256 gave the adversary B, and now the adversary can output the challenge B and 195 00:14:35,256 --> 00:14:40,293 he wins the game with advantage one. So he's advantage with this particular scheme 196 00:14:40,293 --> 00:14:45,143 is one. So, simply because the attacker was able to change the challenge ciphertext 197 00:14:45,146 --> 00:14:49,999 from one recipient to another that allows him to, to win the CCA game with 198 00:14:49,999 --> 00:14:55,003 advantage one. So as I said, chosen ciphertext security turns out actually is 199 00:14:55,003 --> 00:14:59,327 the correct notion of security for public key encryption systems. And it's a very, 200 00:14:59,327 --> 00:15:03,651 very interesting concept, right? Basically, somehow even though the attacker has this ability 201 00:15:03,651 --> 00:15:07,839 to decrypt anything he wants. Other than the challenge ciphertext, still he can't 202 00:15:07,839 --> 00:15:12,028 learn what the challenge ciphertext is. And so the goal for the remainder of this module 203 00:15:12,028 --> 00:15:16,275 and actually the next module as well, is to construct CCA secure systems. It's 204 00:15:16,275 --> 00:15:20,093 actually quite remarkable that this is achievable and I'm going to show you 205 00:15:20,093 --> 00:15:24,310 exactly how to do it. And in fact those CCA secure systems that we build are the 206 00:15:24,310 --> 00:15:28,579 ones that are used in the real world. And every time a system has tried to deploy 207 00:15:28,737 --> 00:15:33,007 a public key encryption mechanism that's not CCA secure someone has come up with an 208 00:15:33,007 --> 00:15:37,487 attack and was able to break it. And we're going to see some of these example attacks 209 00:15:37,487 --> 00:15:39,280 actually in the next few segments.