1 00:00:00,000 --> 00:00:03,911 My goal for the next few segments is to show you that if we use a secure PRG We'll 2 00:00:03,911 --> 00:00:07,892 get a secure stream safer. The first thing we have to do is define is, what does it 3 00:00:07,892 --> 00:00:11,679 mean for a stream safer to be secure? So whenever we define security we always 4 00:00:11,679 --> 00:00:15,174 define it in terms of what can the attacker do? And what is the attacker 5 00:00:15,174 --> 00:00:18,670 trying to do? In the context of stream ciphers remember these are only used 6 00:00:18,670 --> 00:00:22,737 with a onetime key, and as a result the most the attacker is ever going to see is 7 00:00:22,737 --> 00:00:26,754 just one cypher text encrypted using the key that we're using. And so we're gonna 8 00:00:26,754 --> 00:00:30,772 limit the attackers? ability to basically obtain just one cypher text. And in 9 00:00:30,772 --> 00:00:34,641 fact later on we're going to allow the attacker to do much, much, much more, but 10 00:00:34,641 --> 00:00:38,460 for now we're just gonna give him one cypher text. And we wanted to find, 11 00:00:38,460 --> 00:00:42,560 what does it mean for the cypher to be secure? So the first definition that 12 00:00:42,560 --> 00:00:46,892 comes to mind is basically to say, well maybe we wanna require that the attacker 13 00:00:46,892 --> 00:00:50,718 can't actually recover the secret key. Okay so given ciphertext you 14 00:00:50,718 --> 00:00:54,609 shouldn't be able to recover the secret key. But that's a terrible definition 15 00:00:54,609 --> 00:00:58,717 because think about the falling brilliant cipher but the way we encrypt the 16 00:00:58,717 --> 00:01:02,855 message using a key K is basically we just output the message. Okay this 17 00:01:02,855 --> 00:01:07,427 is a brilliant cipher yeah of course it doesn't do anything given a message just 18 00:01:07,427 --> 00:01:12,000 output a message as the cipher text. This is not a particularly good encryption 19 00:01:12,000 --> 00:01:16,029 scheme; however, given the cipher text, mainly the message the poor attacker 20 00:01:16,029 --> 00:01:20,493 cannot recover the key because he doesn't know what the key is. And so as a result 21 00:01:20,493 --> 00:01:24,630 this cipher which clearly is insecure, would be considered secure under this 22 00:01:24,793 --> 00:01:28,636 requirement for security. So this definition will be no good. Okay so it's 23 00:01:28,636 --> 00:01:32,317 recovering the secret key is the wrong way to define security. So the next thing we 24 00:01:32,317 --> 00:01:35,999 can kinda attempt is basically just say, well maybe the attacker doesn't care about 25 00:01:35,999 --> 00:01:39,680 the secret key, what he really cares about are the plain text. So maybe it should be 26 00:01:39,680 --> 00:01:43,518 hard for the attacker to recover the entire plain text. But even that doesn't 27 00:01:43,518 --> 00:01:48,223 work because let's think about the following encryption scheme. So suppose 28 00:01:48,223 --> 00:01:53,436 what this encryption scheme does is it takes two messages, so I'm gonna use two 29 00:01:53,436 --> 00:01:58,014 lines to denote concatenation of two messages M0 line, line M1 means 30 00:01:58,014 --> 00:02:03,100 concatenate M0 and M1. And imagine what the scheme does is really it outputs 31 00:02:03,100 --> 00:02:08,060 M0 in the clear and concatenate to that the encryption of M1. Perhaps even 32 00:02:08,060 --> 00:02:13,337 using the One Time Pad okay? And so here the attacker is gonna be given one 33 00:02:13,337 --> 00:02:17,478 cipher text. And his goal would be to recover the entire plain texts. But the 34 00:02:17,478 --> 00:02:21,702 poor attacker can't do that because here maybe we've encrypted M1 using the One 35 00:02:21,702 --> 00:02:25,872 Time Pad so the attacker can't actually recover M1 because we know the 36 00:02:25,872 --> 00:02:30,043 One Time Pad is secure given just one cipher text. So this construction 37 00:02:30,043 --> 00:02:34,055 would satisfy the definition but unfortunately clearly this is not a secure 38 00:02:34,055 --> 00:02:37,962 encryption scheme because we just leaked half of the plain text. M0 is 39 00:02:37,962 --> 00:02:42,185 completely available to the attacker so even though he can't recover all of the 40 00:02:42,185 --> 00:02:46,462 plain text he might be able to recover most of the plain text, and that's clearly 41 00:02:46,462 --> 00:02:50,658 unsecured. So of course we already know the solution to this and we talked about 42 00:02:50,658 --> 00:02:54,747 Shanon definition of security perfect secrecy where Shannon's idea was that in 43 00:02:54,747 --> 00:02:58,835 fact when the attacker intercepts a cipher text he should learn absolutely no 44 00:02:58,835 --> 00:03:02,818 information about the plain texts. He shouldn't even learn one bit about the 45 00:03:02,818 --> 00:03:07,221 plain text or even he shouldn't learn, he shouldn't even be able to predict a little 46 00:03:07,221 --> 00:03:11,205 bit about a bid of the plain text. Absolutely no information about the plain text. 47 00:03:11,205 --> 00:03:14,926 So let's recall very briefly Shannon's concept of perfect secrecy 48 00:03:14,926 --> 00:03:19,442 basically we said that you know given a cipher We said the cipher has perfect 49 00:03:19,442 --> 00:03:25,069 secrecy if given two messages of the same length it so happens that the distribution 50 00:03:25,069 --> 00:03:30,167 of cipher texts. Yet if we pick a random key and we look into distribution of 51 00:03:30,167 --> 00:03:34,838 cipher texts we encrypt M0 we get exactly the same distribution as when we 52 00:03:34,838 --> 00:03:39,257 encrypt M1. The intuition here was that if the advisory observes the cipher texts 53 00:03:39,257 --> 00:03:43,839 then he doesn't know whether it came from the distribution the result of encrypting 54 00:03:43,839 --> 00:03:48,203 M0 or it came from the distribution as the result of encrypting M1. And as a 55 00:03:48,203 --> 00:03:52,513 result he can't tell whether we encrypted M0 or M1. And that's true for all 56 00:03:52,513 --> 00:03:56,877 messages of the same length and as a result a poor attacker doesn't really know 57 00:03:56,877 --> 00:04:01,212 what message was encrypted. Of course we already said that this definition is too 58 00:04:01,212 --> 00:04:05,400 strong in the sense that it requires really long keys if cipher has short 59 00:04:05,400 --> 00:04:09,535 keys can't possibly satisfy this definition in a particular stream ciphers 60 00:04:09,535 --> 00:04:14,328 can satisfy this definition. Okay so let's try to weaken the definition a little bit 61 00:04:14,328 --> 00:04:19,114 and let's think to the previous segment, and we can say that instead of requiring 62 00:04:19,114 --> 00:04:23,841 the two distributions to be absolutely identical what we can require is, is that 63 00:04:23,841 --> 00:04:28,686 two distributions just be computationally indistinguishable. In other words a poor, 64 00:04:28,863 --> 00:04:33,353 efficient attacker cannot distinguish the two distributions even though the 65 00:04:33,353 --> 00:04:37,815 distributions might be very, very, very different. That just given a sample for 66 00:04:37,815 --> 00:04:42,580 one distribution and a sample for another distribution the attacker can't tell which 67 00:04:42,580 --> 00:04:47,120 distribution he was given a sample from. It turns out this definition is actually 68 00:04:47,120 --> 00:04:51,716 almost right, but it's still a little too strong, that still cannot be satisfied, so 69 00:04:51,716 --> 00:04:56,200 we have to add one more constraint, and that is that instead of saying that this 70 00:04:56,200 --> 00:05:00,797 definition should have hold for all M0 and M1. It is to hold for only pairs M0 M1 71 00:05:00,797 --> 00:05:05,208 that the attackers could actually exhibit. Okay so this actually 72 00:05:05,208 --> 00:05:10,038 leads us to the definition of semantics security. And so, again this is semantics 73 00:05:10,038 --> 00:05:15,050 security for one time key in other words when the attacker is only given one cipher 74 00:05:15,050 --> 00:05:19,819 text. And so the way we define semantic security is by defining two experiments, 75 00:05:19,819 --> 00:05:24,562 okay we'll define experiment 0 and experiment 1. And more generally we will 76 00:05:24,562 --> 00:05:29,230 think of these as experiments parentheses B, where B can be zero or one. Okay so the 77 00:05:29,230 --> 00:05:32,890 way the experiment is defined is as follows, we have an adversary that's 78 00:05:32,890 --> 00:05:37,161 trying to break the system. An adversary A, that's kinda the analog of statistical 79 00:05:37,161 --> 00:05:41,279 tests in the world of pseudo random generators. And then the challenger does 80 00:05:41,279 --> 00:05:45,093 the following, so really we have two challengers, but the challengers are so 81 00:05:45,093 --> 00:05:49,414 similar that we can just describe them as a single challenger that in one case takes 82 00:05:49,414 --> 00:05:53,634 his inputs bits set to zero, and the other case takes his inputs bits set to 83 00:05:53,634 --> 00:05:57,193 one. And let me show you what these challengers do. The first thing the 84 00:05:57,193 --> 00:06:01,349 challenger is gonna do is it's gonna pick a random key and then the adversary 85 00:06:01,349 --> 00:06:06,076 basically is going to output two messages M0 and M1. Okay so this is an explicit 86 00:06:06,076 --> 00:06:11,039 pair of messages that the attacker wants to be challenged on and as usual we're not 87 00:06:11,039 --> 00:06:15,766 trying to hide the length of the messages, we require that the messages be equal 88 00:06:15,766 --> 00:06:21,643 length. And then the challenger basically will output either the encryption of M0 89 00:06:21,643 --> 00:06:25,890 or the encryption of M1, okay so in experiment 0 the challenger will output 90 00:06:25,890 --> 00:06:30,301 the encryption of M0. In experiment 1 the challenger will output the encryption 91 00:06:30,301 --> 00:06:34,385 of M1. Okay so that the difference between the two experiments. And then the 92 00:06:34,385 --> 00:06:38,796 adversary is trying to guess basically whether he was given the encryption of M0 93 00:06:38,796 --> 00:06:44,051 or given the encryption of M1. Okay so here's a little bit of notation let's 94 00:06:44,051 --> 00:06:50,260 define the events Wb to be the events that an experiment B the adversary output one. 95 00:06:50,260 --> 00:06:55,084 Okay so that is just an event that basically in experiment zero W0 means that 96 00:06:55,084 --> 00:07:00,342 the adversary output one and in experiment one W1 means the adversary output one. And 97 00:07:00,342 --> 00:07:05,291 now we can define the advantage of this adversary, basically to say that this is 98 00:07:05,291 --> 00:07:10,425 called the semantics security advantage of the adversary A against the scheme E, 99 00:07:10,425 --> 00:07:15,497 to be the difference of the probability of these two events. In other words we are 100 00:07:15,497 --> 00:07:20,136 looking at whether the adversary behaves differently when he was given the 101 00:07:20,136 --> 00:07:24,818 encryption of M0 from when he was given the encryption of M1. And I wanna make 102 00:07:24,818 --> 00:07:29,201 sure this is clear so I'm gonna say it one more time. So in experiment zero he was 103 00:07:29,201 --> 00:07:33,530 given the encryption of M0 and in experiment one he was given the encryption 104 00:07:33,530 --> 00:07:37,700 of M1. Now we're just interested in whether the adversary output 1 or not. 105 00:07:37,700 --> 00:07:42,356 ... In these experiments. If in both experiments the adversary output 1 with 106 00:07:42,356 --> 00:07:47,013 the same probability that means the adversary wasn't able to distinguish the 107 00:07:47,013 --> 00:07:51,549 two experiments. Experiments zero basically looks to the adversary the same 108 00:07:51,549 --> 00:07:56,206 as experiment one because in both cases upload one with the same probability. 109 00:07:56,206 --> 00:08:01,286 However if the adversary is able to output 1 in one experiment with significantly 110 00:08:01,286 --> 00:08:05,761 different probability than in the other experiment, then the adversary was 111 00:08:05,761 --> 00:08:10,266 actually able to distinguish the two experiments. Okay so... To say this more 112 00:08:10,266 --> 00:08:14,455 formally, essentially the advantage again because it's the difference of two 113 00:08:14,455 --> 00:08:18,918 probabilities the advantage is a number between zero and one. If the advantage is 114 00:08:18,918 --> 00:08:22,886 close to zero that means that the adversary was not able to distinguish 115 00:08:22,886 --> 00:08:27,129 experiment zero from experiment one. However if the advantage is close to one, 116 00:08:27,129 --> 00:08:31,538 that means the adversary was very well able to distinguish experiment zero from 117 00:08:31,538 --> 00:08:36,112 experiment one and that really means that he was able to distinguish an encryption 118 00:08:36,112 --> 00:08:40,299 of M0 from an encryption of M1, okay?So that's out definition. Actually 119 00:08:40,299 --> 00:08:44,055 that is just the definition of the advantage and the definition is just what 120 00:08:44,055 --> 00:08:47,714 you would expect basically we'll say that a symmetric encryption scheme is 121 00:08:47,714 --> 00:08:52,346 semantically secure if for all efficient adversaries here I'll put these in quotes 122 00:08:52,346 --> 00:08:56,932 again, "For all efficient adversaries the advantage is negligible." In other words, 123 00:08:56,982 --> 00:09:01,808 no efficient adversary can distinguish the encryption of M0 from the encryption 124 00:09:01,808 --> 00:09:06,103 of M1 and basically this is what it says repeatedly that for these two 125 00:09:06,103 --> 00:09:10,759 messages that the adversary was able to exhibit he wasn't able to distinguish 126 00:09:10,759 --> 00:09:15,064 these two distributions. Now I want to show you this, this is actually a very 127 00:09:15,064 --> 00:09:19,595 elegant definition. It might not seem so right away, but I want to show you some 128 00:09:19,595 --> 00:09:24,410 implications of this definition and you'll see exactly why the definition is the way 129 00:09:24,410 --> 00:09:28,601 it is. Okay so let's look at some examples. So the first example is suppose 130 00:09:28,601 --> 00:09:33,190 we have a broken encryption scheme, in other words suppose we have an adversary A 131 00:09:33,190 --> 00:09:38,285 that somehow given the cipher text he is always able to deduce the least 132 00:09:38,285 --> 00:09:44,149 significant bit of the plain text. Okay so given the encryption of M0, this adversary 133 00:09:44,149 --> 00:09:48,799 is able to deduce the least significant bit of M0. So that is a terrible 134 00:09:48,799 --> 00:09:52,911 encryption scheme because it basically leaks the least significant bit of the 135 00:09:52,911 --> 00:09:57,128 plain text just given the cipher text. So I want to show you that this scheme is 136 00:09:57,128 --> 00:10:01,609 therefore semantically secure so that kind of shows that if a system is semantically 137 00:10:01,609 --> 00:10:05,931 secure than there is no attacker of this type. Okay so let's see why is the system 138 00:10:05,931 --> 00:10:10,254 not semantically secure well so what we are gonna do is we're gonna basically use 139 00:10:10,254 --> 00:10:14,366 our adversary who is able to learn our most insignificant bits, we're going to 140 00:10:14,366 --> 00:10:18,372 use him to break semantic security so we're going to use him to distinguish 141 00:10:18,372 --> 00:10:22,755 experiment zero from experiment one Okay so here is what we are going to do. We are 142 00:10:22,755 --> 00:10:26,987 algorithm B, we are going to be algorithm B and this algorithm B is going to use 143 00:10:26,987 --> 00:10:31,165 algorithm A in its belly. Okay so the first thing that is going to happen is of 144 00:10:31,165 --> 00:10:35,608 course the challenger is going to choose a random key. The first thing that is going 145 00:10:35,608 --> 00:10:39,762 to happen is that we need to output two messages. The messages that we are going 146 00:10:39,762 --> 00:10:43,493 to output basically are going to have differently significant bits. So one 147 00:10:43,493 --> 00:10:47,727 message is going to end with zero and one message is going to end with one. Now what 148 00:10:47,727 --> 00:10:51,205 is the challenger going to do? The challenger is going to give us the 149 00:10:51,205 --> 00:10:55,238 encryption of either M0 or M1, depending on whether in experiment 0 or 150 00:10:55,238 --> 00:10:59,120 in experiment 1. And then we just forward this cipher text to the adversary 151 00:10:59,120 --> 00:11:03,871 okay so the adversary A. Now what is the property of adversary A? Given the cipher 152 00:11:03,871 --> 00:11:08,505 text, adversary A can tell us what the least significant bits of the plain text is. 153 00:11:08,505 --> 00:11:13,374 In other words the adversary is going to output the least significant bits of M0 or M1 154 00:11:13,374 --> 00:11:17,892 but low and behold that's basically the bit B. And then we're just 155 00:11:17,892 --> 00:11:23,050 going to output that as our guest so let?s call this thing B prime Okay so now this 156 00:11:23,050 --> 00:11:28,376 describes the semantic security adversary. And now you tell me what is the semantic 157 00:11:28,376 --> 00:11:33,593 security advantage of this adversary? Well so let's see. In experiment zero, what is 158 00:11:33,593 --> 00:11:38,775 the probability that adversary B outputs one? Well in experiment zero it is always 159 00:11:38,775 --> 00:11:43,704 given the encryption of M zero and therefore adversary A is always output the 160 00:11:43,704 --> 00:11:48,633 least significant bit of M zero which always happens to be zero. In experiment 161 00:11:48,633 --> 00:11:53,120 zero, B always outputs zero. So the probability that outputs one is zero. 162 00:11:53,120 --> 00:11:57,827 However in experiment one, we're given the encryption of M1. So how likely is 163 00:11:57,827 --> 00:12:02,783 adversary B to output one in experiment one well it always outputs one, again by 164 00:12:02,783 --> 00:12:07,428 the properties of algorithm A and therefore the advantage basically is one. 165 00:12:07,428 --> 00:12:12,384 So this is a huge advantage, it's as big as it's gonna get. Which means that this 166 00:12:12,384 --> 00:12:17,091 adversary completely broke the system. Okay so we consider, so under semantic 167 00:12:17,091 --> 00:12:22,295 security basically just deducing the least significant bits is enough to completely 168 00:12:22,295 --> 00:12:27,187 break the system under a semantic security definition. Okay, now the interesting 169 00:12:27,187 --> 00:12:32,388 thing here of course is that this is not just about the least significant bit, in 170 00:12:32,388 --> 00:12:37,117 fact of the message for example the most significant bit, you know 171 00:12:37,117 --> 00:12:42,040 bit number seven Maybe the XOR of all the bits in the message and so on 172 00:12:42,040 --> 00:12:46,552 and so forth any kind of information, any bit about the plain text they can be 173 00:12:46,552 --> 00:12:50,814 learned basically would mean that the system is not semantically secure. So 174 00:12:50,814 --> 00:12:55,532 basically all the adversary would have to do would be to come up with two messages 175 00:12:55,532 --> 00:13:00,249 M0 and M1 such that under one thing that they learned it's the value zero and then 176 00:13:00,249 --> 00:13:04,626 the other thing, the value, is one. So for example if A was able to learn the XOR 177 00:13:04,626 --> 00:13:08,775 bits of the message then M0 and M1 would just have different 178 00:13:08,775 --> 00:13:13,265 XOR for all the bits of their messages and then this adversary A would 179 00:13:13,265 --> 00:13:18,174 also be sufficient to break semantic security. Okay so, basically for cipher 180 00:13:18,174 --> 00:13:23,203 is semantically secure then no bit of information is revealed to an 181 00:13:23,203 --> 00:13:27,392 efficient adversary. Okay so this is exactly a concept of perfect secrecy only 182 00:13:27,392 --> 00:13:31,318 applied just efficient adversaries rather than all adversaries. So the next 183 00:13:31,318 --> 00:13:35,045 thing I wanna show you is that in fact the one time pad in fact is 184 00:13:35,045 --> 00:13:38,821 semantically secure, they better be semantically secure because it's in fact, 185 00:13:38,821 --> 00:13:42,773 it's more than that it's perfectly secure. So let's see why that's true. Well so 186 00:13:42,773 --> 00:13:47,010 again it's one of these experiments, so suppose we have an adversary that claims 187 00:13:47,010 --> 00:13:51,449 to break semantic security of the one time pad. The first the adversary is gonna do 188 00:13:51,449 --> 00:13:55,874 is he's gonna output two messages M0 and M1 of the same length. 189 00:13:55,874 --> 00:13:59,667 Now what does he get back as a challenge? As a challenge, he gets either the encryption 190 00:13:59,667 --> 00:14:03,988 of M0 or the encryption of M1 under the one time pad. 191 00:14:03,988 --> 00:14:07,886 And he's trying to distinguish between those two possible cipher texts that he gets, right? 192 00:14:07,886 --> 00:14:12,259 In experiment zero, he gets the encryption of M0 and in experiment one, he gets the 193 00:14:12,259 --> 00:14:16,579 encryption of M1. Well so let me ask you, so what is the advantage of adversary 194 00:14:16,579 --> 00:14:21,297 A against the one time patent? So I remember that the property of the ones I 195 00:14:21,297 --> 00:14:26,208 had is that, that K, XOR M0 is distributed identically to K, XOR M1. 196 00:14:26,208 --> 00:14:31,187 In other words, these distributions are absolutely identical distribution, 197 00:14:31,187 --> 00:14:36,026 distributions, identical. This is a property of XOR. If we XOR the random pad 198 00:14:36,026 --> 00:14:40,674 K with anything, either M0 or M1, we get uniform distribution. So in both 199 00:14:40,674 --> 00:14:45,382 cases, algorithm A is given as in input exactly the same distribution. Maybe the 200 00:14:45,382 --> 00:14:50,209 uniform distribution on cipher texts. And therefore it's gonna behave exactly the 201 00:14:50,209 --> 00:14:55,036 same in both cases because it was given the exact distribution as input. And as a 202 00:14:55,036 --> 00:14:59,699 result, its advantage is zero, which means that the onetime pad is semantically 203 00:14:59,723 --> 00:15:04,148 secure. Now the interesting thing here is not only is it semantically secure, it's 204 00:15:04,148 --> 00:15:08,244 semantically secure for all adversaries. We don't even have to restrict the 205 00:15:08,244 --> 00:15:12,450 adversaries to be efficient. No adversary, doesn't matter how smart you are, no 206 00:15:12,450 --> 00:15:16,875 adversary will be able to distinguish K XOR M0 from K XOR M1 because the two 207 00:15:16,875 --> 00:15:21,299 distributions are identical. Okay, so the one time pad is semantically secure. Okay, 208 00:15:21,299 --> 00:15:25,559 so that completes our definition of semantic security so the next thing we're 209 00:15:25,559 --> 00:15:30,093 going to do is prove to the secure PRG, in fact implies that the stream cipher is 210 00:15:30,093 --> 00:15:31,186 semantically secure.