0:00:00.000,0:00:03.911 My goal for the next few segments is to[br]show you that if we use a secure PRG We'll 0:00:03.911,0:00:07.892 get a secure stream safer. The first thing[br]we have to do is define is, what does it 0:00:07.892,0:00:11.679 mean for a stream safer to be secure? So[br]whenever we define security we always 0:00:11.679,0:00:15.174 define it in terms of what can the[br]attacker do? And what is the attacker 0:00:15.174,0:00:18.670 trying to do? In the context of[br]stream ciphers remember these are only used 0:00:18.670,0:00:22.737 with a onetime key, and as a result the[br]most the attacker is ever going to see is 0:00:22.737,0:00:26.754 just one cypher text encrypted using the[br]key that we're using. And so we're gonna 0:00:26.754,0:00:30.772 limit the attackers? ability to basically[br]obtain just one cypher text. And in 0:00:30.772,0:00:34.641 fact later on we're going to allow the[br]attacker to do much, much, much more, but 0:00:34.641,0:00:38.460 for now we're just gonna give him one[br]cypher text. And we wanted to find, 0:00:38.460,0:00:42.560 what does it mean for the cypher to[br]be secure? So the first definition that 0:00:42.560,0:00:46.892 comes to mind is basically to say, well[br]maybe we wanna require that the attacker 0:00:46.892,0:00:50.718 can't actually recover the secret key.[br]Okay so given ciphertext you 0:00:50.718,0:00:54.609 shouldn't be able to recover the secret[br]key. But that's a terrible definition 0:00:54.609,0:00:58.717 because think about the falling brilliant[br]cipher but the way we encrypt the 0:00:58.717,0:01:02.855 message using a key K is basically[br]we just output the message. Okay this 0:01:02.855,0:01:07.427 is a brilliant cipher yeah of course it[br]doesn't do anything given a message just 0:01:07.427,0:01:12.000 output a message as the cipher text.[br]This is not a particularly good encryption 0:01:12.000,0:01:16.029 scheme; however, given the cipher text,[br]mainly the message the poor attacker 0:01:16.029,0:01:20.493 cannot recover the key because he doesn't[br]know what the key is. And so as a result 0:01:20.493,0:01:24.630 this cipher which clearly is insecure,[br]would be considered secure under this 0:01:24.793,0:01:28.636 requirement for security. So this[br]definition will be no good. Okay so it's 0:01:28.636,0:01:32.317 recovering the secret key is the wrong way[br]to define security. So the next thing we 0:01:32.317,0:01:35.999 can kinda attempt is basically just say,[br]well maybe the attacker doesn't care about 0:01:35.999,0:01:39.680 the secret key, what he really cares about[br]are the plain text. So maybe it should be 0:01:39.680,0:01:43.518 hard for the attacker to recover the[br]entire plain text. But even that doesn't 0:01:43.518,0:01:48.223 work because let's think about the[br]following encryption scheme. So suppose 0:01:48.223,0:01:53.436 what this encryption scheme does is it[br]takes two messages, so I'm gonna use two 0:01:53.436,0:01:58.014 lines to denote concatenation of two[br]messages M0 line, line M1 means 0:01:58.014,0:02:03.100 concatenate M0 and M1. And imagine[br]what the scheme does is really it outputs 0:02:03.100,0:02:08.060 M0 in the clear and concatenate to[br]that the encryption of M1. Perhaps even 0:02:08.060,0:02:13.337 using the One Time Pad okay? And[br]so here the attacker is gonna be given one 0:02:13.337,0:02:17.478 cipher text. And his goal would be to[br]recover the entire plain texts. But the 0:02:17.478,0:02:21.702 poor attacker can't do that because here[br]maybe we've encrypted M1 using the One 0:02:21.702,0:02:25.872 Time Pad so the attacker can't[br]actually recover M1 because we know the 0:02:25.872,0:02:30.043 One Time Pad is secure given just[br]one cipher text. So this construction 0:02:30.043,0:02:34.055 would satisfy the definition but[br]unfortunately clearly this is not a secure 0:02:34.055,0:02:37.962 encryption scheme because we just leaked[br]half of the plain text. M0 is 0:02:37.962,0:02:42.185 completely available to the attacker so[br]even though he can't recover all of the 0:02:42.185,0:02:46.462 plain text he might be able to recover[br]most of the plain text, and that's clearly 0:02:46.462,0:02:50.658 unsecured. So of course we already know[br]the solution to this and we talked about 0:02:50.658,0:02:54.747 Shanon definition of security perfect[br]secrecy where Shannon's idea was that in 0:02:54.747,0:02:58.835 fact when the attacker intercepts a cipher[br]text he should learn absolutely no 0:02:58.835,0:03:02.818 information about the plain texts. He[br]shouldn't even learn one bit about the 0:03:02.818,0:03:07.221 plain text or even he shouldn't learn, he[br]shouldn't even be able to predict a little 0:03:07.221,0:03:11.205 bit about a bid of the plain text.[br]Absolutely no information about the plain text. 0:03:11.205,0:03:14.926 So let's recall very briefly[br]Shannon's concept of perfect secrecy 0:03:14.926,0:03:19.442 basically we said that you know given a[br]cipher We said the cipher has perfect 0:03:19.442,0:03:25.069 secrecy if given two messages of the same[br]length it so happens that the distribution 0:03:25.069,0:03:30.167 of cipher texts. Yet if we pick a random[br]key and we look into distribution of 0:03:30.167,0:03:34.838 cipher texts we encrypt M0 we get[br]exactly the same distribution as when we 0:03:34.838,0:03:39.257 encrypt M1. The intuition here was that if[br]the advisory observes the cipher texts 0:03:39.257,0:03:43.839 then he doesn't know whether it came from[br]the distribution the result of encrypting 0:03:43.839,0:03:48.203 M0 or it came from the distribution as[br]the result of encrypting M1. And as a 0:03:48.203,0:03:52.513 result he can't tell whether we encrypted[br]M0 or M1. And that's true for all 0:03:52.513,0:03:56.877 messages of the same length and as a[br]result a poor attacker doesn't really know 0:03:56.877,0:04:01.212 what message was encrypted. Of course we[br]already said that this definition is too 0:04:01.212,0:04:05.400 strong in the sense that it requires[br]really long keys if cipher has short 0:04:05.400,0:04:09.535 keys can't possibly satisfy this[br]definition in a particular stream ciphers 0:04:09.535,0:04:14.328 can satisfy this definition. Okay so let's[br]try to weaken the definition a little bit 0:04:14.328,0:04:19.114 and let's think to the previous segment,[br]and we can say that instead of requiring 0:04:19.114,0:04:23.841 the two distributions to be absolutely[br]identical what we can require is, is that 0:04:23.841,0:04:28.686 two distributions just be computationally[br]indistinguishable. In other words a poor, 0:04:28.863,0:04:33.353 efficient attacker cannot distinguish the[br]two distributions even though the 0:04:33.353,0:04:37.815 distributions might be very, very, very[br]different. That just given a sample for 0:04:37.815,0:04:42.580 one distribution and a sample for another[br]distribution the attacker can't tell which 0:04:42.580,0:04:47.120 distribution he was given a sample from.[br]It turns out this definition is actually 0:04:47.120,0:04:51.716 almost right, but it's still a little too[br]strong, that still cannot be satisfied, so 0:04:51.716,0:04:56.200 we have to add one more constraint, and[br]that is that instead of saying that this 0:04:56.200,0:05:00.797 definition should have hold for all M0 and M1. It is to hold for only pairs M0 M1 0:05:00.797,0:05:05.208 that the attackers could[br]actually exhibit. Okay so this actually 0:05:05.208,0:05:10.038 leads us to the definition of semantics[br]security. And so, again this is semantics 0:05:10.038,0:05:15.050 security for one time key in other words[br]when the attacker is only given one cipher 0:05:15.050,0:05:19.819 text. And so the way we define semantic[br]security is by defining two experiments, 0:05:19.819,0:05:24.562 okay we'll define experiment 0 and[br]experiment 1. And more generally we will 0:05:24.562,0:05:29.230 think of these as experiments parentheses[br]B, where B can be zero or one. Okay so the 0:05:29.230,0:05:32.890 way the experiment is defined is as[br]follows, we have an adversary that's 0:05:32.890,0:05:37.161 trying to break the system. An adversary[br]A, that's kinda the analog of statistical 0:05:37.161,0:05:41.279 tests in the world of pseudo random[br]generators. And then the challenger does 0:05:41.279,0:05:45.093 the following, so really we have two[br]challengers, but the challengers are so 0:05:45.093,0:05:49.414 similar that we can just describe them as[br]a single challenger that in one case takes 0:05:49.414,0:05:53.634 his inputs bits set to zero, and the other[br]case takes his inputs bits set to 0:05:53.634,0:05:57.193 one. And let me show you what these[br]challengers do. The first thing the 0:05:57.193,0:06:01.349 challenger is gonna do is it's gonna pick[br]a random key and then the adversary 0:06:01.349,0:06:06.076 basically is going to output two messages[br]M0 and M1. Okay so this is an explicit 0:06:06.076,0:06:11.039 pair of messages that the attacker wants[br]to be challenged on and as usual we're not 0:06:11.039,0:06:15.766 trying to hide the length of the messages,[br]we require that the messages be equal 0:06:15.766,0:06:21.643 length. And then the challenger basically[br]will output either the encryption of M0 0:06:21.643,0:06:25.890 or the encryption of M1, okay so in[br]experiment 0 the challenger will output 0:06:25.890,0:06:30.301 the encryption of M0. In experiment 1 the challenger will output the encryption 0:06:30.301,0:06:34.385 of M1. Okay so that the difference between[br]the two experiments. And then the 0:06:34.385,0:06:38.796 adversary is trying to guess basically[br]whether he was given the encryption of M0 0:06:38.796,0:06:44.051 or given the encryption of M1. Okay so[br]here's a little bit of notation let's 0:06:44.051,0:06:50.260 define the events Wb to be the events that[br]an experiment B the adversary output one. 0:06:50.260,0:06:55.084 Okay so that is just an event that[br]basically in experiment zero W0 means that 0:06:55.084,0:07:00.342 the adversary output one and in experiment[br]one W1 means the adversary output one. And 0:07:00.342,0:07:05.291 now we can define the advantage of this[br]adversary, basically to say that this is 0:07:05.291,0:07:10.425 called the semantics security advantage of[br]the adversary A against the scheme E, 0:07:10.425,0:07:15.497 to be the difference of the probability of[br]these two events. In other words we are 0:07:15.497,0:07:20.136 looking at whether the adversary behaves[br]differently when he was given the 0:07:20.136,0:07:24.818 encryption of M0 from when he was given[br]the encryption of M1. And I wanna make 0:07:24.818,0:07:29.201 sure this is clear so I'm gonna say it one[br]more time. So in experiment zero he was 0:07:29.201,0:07:33.530 given the encryption of M0 and in[br]experiment one he was given the encryption 0:07:33.530,0:07:37.700 of M1. Now we're just interested in[br]whether the adversary output 1 or not. 0:07:37.700,0:07:42.356 ... In these experiments. If in both[br]experiments the adversary output 1 with 0:07:42.356,0:07:47.013 the same probability that means the[br]adversary wasn't able to distinguish the 0:07:47.013,0:07:51.549 two experiments. Experiments zero[br]basically looks to the adversary the same 0:07:51.549,0:07:56.206 as experiment one because in both cases[br]upload one with the same probability. 0:07:56.206,0:08:01.286 However if the adversary is able to output[br]1 in one experiment with significantly 0:08:01.286,0:08:05.761 different probability than in the other[br]experiment, then the adversary was 0:08:05.761,0:08:10.266 actually able to distinguish the two[br]experiments. Okay so... To say this more 0:08:10.266,0:08:14.455 formally, essentially the advantage again[br]because it's the difference of two 0:08:14.455,0:08:18.918 probabilities the advantage is a number[br]between zero and one. If the advantage is 0:08:18.918,0:08:22.886 close to zero that means that the[br]adversary was not able to distinguish 0:08:22.886,0:08:27.129 experiment zero from experiment one.[br]However if the advantage is close to one, 0:08:27.129,0:08:31.538 that means the adversary was very well[br]able to distinguish experiment zero from 0:08:31.538,0:08:36.112 experiment one and that really means that[br]he was able to distinguish an encryption 0:08:36.112,0:08:40.299 of M0 from an encryption of M1, okay?So that's out definition. Actually 0:08:40.299,0:08:44.055 that is just the definition of the[br]advantage and the definition is just what 0:08:44.055,0:08:47.714 you would expect basically we'll say that[br]a symmetric encryption scheme is 0:08:47.714,0:08:52.346 semantically secure if for all efficient[br]adversaries here I'll put these in quotes 0:08:52.346,0:08:56.932 again, "For all efficient adversaries the[br]advantage is negligible." In other words, 0:08:56.982,0:09:01.808 no efficient adversary can distinguish the[br]encryption of M0 from the encryption 0:09:01.808,0:09:06.103 of M1 and basically this is what it[br]says repeatedly that for these two 0:09:06.103,0:09:10.759 messages that the adversary was able to[br]exhibit he wasn't able to distinguish 0:09:10.759,0:09:15.064 these two distributions. Now I want to[br]show you this, this is actually a very 0:09:15.064,0:09:19.595 elegant definition. It might not seem so[br]right away, but I want to show you some 0:09:19.595,0:09:24.410 implications of this definition and you'll[br]see exactly why the definition is the way 0:09:24.410,0:09:28.601 it is. Okay so let's look at some[br]examples. So the first example is suppose 0:09:28.601,0:09:33.190 we have a broken encryption scheme, in[br]other words suppose we have an adversary A 0:09:33.190,0:09:38.285 that somehow given the cipher text he is[br]always able to deduce the least 0:09:38.285,0:09:44.149 significant bit of the plain text. Okay so[br]given the encryption of M0, this adversary 0:09:44.149,0:09:48.799 is able to deduce the least significant[br]bit of M0. So that is a terrible 0:09:48.799,0:09:52.911 encryption scheme because it basically[br]leaks the least significant bit of the 0:09:52.911,0:09:57.128 plain text just given the cipher text. So[br]I want to show you that this scheme is 0:09:57.128,0:10:01.609 therefore semantically secure so that kind[br]of shows that if a system is semantically 0:10:01.609,0:10:05.931 secure than there is no attacker of this[br]type. Okay so let's see why is the system 0:10:05.931,0:10:10.254 not semantically secure well so what we[br]are gonna do is we're gonna basically use 0:10:10.254,0:10:14.366 our adversary who is able to learn our[br]most insignificant bits, we're going to 0:10:14.366,0:10:18.372 use him to break semantic security so[br]we're going to use him to distinguish 0:10:18.372,0:10:22.755 experiment zero from experiment one Okay[br]so here is what we are going to do. We are 0:10:22.755,0:10:26.987 algorithm B, we are going to be algorithm[br]B and this algorithm B is going to use 0:10:26.987,0:10:31.165 algorithm A in its belly. Okay so the[br]first thing that is going to happen is of 0:10:31.165,0:10:35.608 course the challenger is going to choose a[br]random key. The first thing that is going 0:10:35.608,0:10:39.762 to happen is that we need to output two[br]messages. The messages that we are going 0:10:39.762,0:10:43.493 to output basically are going to have[br]differently significant bits. So one 0:10:43.493,0:10:47.727 message is going to end with zero and one[br]message is going to end with one. Now what 0:10:47.727,0:10:51.205 is the challenger going to do? The[br]challenger is going to give us the 0:10:51.205,0:10:55.238 encryption of either M0 or M1,[br]depending on whether in experiment 0 or 0:10:55.238,0:10:59.120 in experiment 1. And then we just[br]forward this cipher text to the adversary 0:10:59.120,0:11:03.871 okay so the adversary A. Now what is the[br]property of adversary A? Given the cipher 0:11:03.871,0:11:08.505 text, adversary A can tell us what the[br]least significant bits of the plain text is. 0:11:08.505,0:11:13.374 In other words the adversary is going[br]to output the least significant bits of M0 or M1 0:11:13.374,0:11:17.892 but low and behold that's[br]basically the bit B. And then we're just 0:11:17.892,0:11:23.050 going to output that as our guest so let?s[br]call this thing B prime Okay so now this 0:11:23.050,0:11:28.376 describes the semantic security adversary.[br]And now you tell me what is the semantic 0:11:28.376,0:11:33.593 security advantage of this adversary? Well[br]so let's see. In experiment zero, what is 0:11:33.593,0:11:38.775 the probability that adversary B outputs[br]one? Well in experiment zero it is always 0:11:38.775,0:11:43.704 given the encryption of M zero and[br]therefore adversary A is always output the 0:11:43.704,0:11:48.633 least significant bit of M zero which[br]always happens to be zero. In experiment 0:11:48.633,0:11:53.120 zero, B always outputs zero. So the[br]probability that outputs one is zero. 0:11:53.120,0:11:57.827 However in experiment one, we're given the[br]encryption of M1. So how likely is 0:11:57.827,0:12:02.783 adversary B to output one in experiment[br]one well it always outputs one, again by 0:12:02.783,0:12:07.428 the properties of algorithm A and[br]therefore the advantage basically is one. 0:12:07.428,0:12:12.384 So this is a huge advantage, it's as big[br]as it's gonna get. Which means that this 0:12:12.384,0:12:17.091 adversary completely broke the system.[br]Okay so we consider, so under semantic 0:12:17.091,0:12:22.295 security basically just deducing the least[br]significant bits is enough to completely 0:12:22.295,0:12:27.187 break the system under a semantic security[br]definition. Okay, now the interesting 0:12:27.187,0:12:32.388 thing here of course is that this is not[br]just about the least significant bit, in 0:12:32.388,0:12:37.117 fact of the message for[br]example the most significant bit, you know 0:12:37.117,0:12:42.040 bit number seven Maybe the XOR of all the bits in the message and so on 0:12:42.040,0:12:46.552 and so forth any kind of information, any[br]bit about the plain text they can be 0:12:46.552,0:12:50.814 learned basically would mean that the[br]system is not semantically secure. So 0:12:50.814,0:12:55.532 basically all the adversary would have to[br]do would be to come up with two messages 0:12:55.532,0:13:00.249 M0 and M1 such that under one thing that[br]they learned it's the value zero and then 0:13:00.249,0:13:04.626 the other thing, the value, is one. So for[br]example if A was able to learn the XOR 0:13:04.626,0:13:08.775 bits of the message then M0[br]and M1 would just have different 0:13:08.775,0:13:13.265 XOR for all the bits of their[br]messages and then this adversary A would 0:13:13.265,0:13:18.174 also be sufficient to break semantic[br]security. Okay so, basically for cipher 0:13:18.174,0:13:23.203 is semantically secure then no[br]bit of information is revealed to an 0:13:23.203,0:13:27.392 efficient adversary. Okay so this is[br]exactly a concept of perfect secrecy only 0:13:27.392,0:13:31.318 applied just efficient adversaries[br]rather than all adversaries. So the next 0:13:31.318,0:13:35.045 thing I wanna show you is that in fact the[br]one time pad in fact is 0:13:35.045,0:13:38.821 semantically secure, they better be[br]semantically secure because it's in fact, 0:13:38.821,0:13:42.773 it's more than that it's perfectly secure.[br]So let's see why that's true. Well so 0:13:42.773,0:13:47.010 again it's one of these experiments, so[br]suppose we have an adversary that claims 0:13:47.010,0:13:51.449 to break semantic security of the one time[br]pad. The first the adversary is gonna do 0:13:51.449,0:13:55.874 is he's gonna output two messages M0[br]and M1 of the same length. 0:13:55.874,0:13:59.667 Now what does he get back as a challenge? As a[br]challenge, he gets either the encryption 0:13:59.667,0:14:03.988 of M0 or the encryption of M1 under[br]the one time pad. 0:14:03.988,0:14:07.886 And he's trying to distinguish between those two possible[br]cipher texts that he gets, right? 0:14:07.886,0:14:12.259 In experiment zero, he gets the encryption of[br]M0 and in experiment one, he gets the 0:14:12.259,0:14:16.579 encryption of M1. Well so let me ask[br]you, so what is the advantage of adversary 0:14:16.579,0:14:21.297 A against the one time patent? So I[br]remember that the property of the ones I 0:14:21.297,0:14:26.208 had is that, that K, XOR M0 is[br]distributed identically to K, XOR M1. 0:14:26.208,0:14:31.187 In other words, these distributions are[br]absolutely identical distribution, 0:14:31.187,0:14:36.026 distributions, identical. This is a[br]property of XOR. If we XOR the random pad 0:14:36.026,0:14:40.674 K with anything, either M0 or M1,[br]we get uniform distribution. So in both 0:14:40.674,0:14:45.382 cases, algorithm A is given as in input[br]exactly the same distribution. Maybe the 0:14:45.382,0:14:50.209 uniform distribution on cipher texts. And[br]therefore it's gonna behave exactly the 0:14:50.209,0:14:55.036 same in both cases because it was given[br]the exact distribution as input. And as a 0:14:55.036,0:14:59.699 result, its advantage is zero, which means[br]that the onetime pad is semantically 0:14:59.723,0:15:04.148 secure. Now the interesting thing here is[br]not only is it semantically secure, it's 0:15:04.148,0:15:08.244 semantically secure for all adversaries.[br]We don't even have to restrict the 0:15:08.244,0:15:12.450 adversaries to be efficient. No adversary,[br]doesn't matter how smart you are, no 0:15:12.450,0:15:16.875 adversary will be able to distinguish K XOR M0 from K XOR M1 because the two 0:15:16.875,0:15:21.299 distributions are identical. Okay, so the[br]one time pad is semantically secure. Okay, 0:15:21.299,0:15:25.559 so that completes our definition of[br]semantic security so the next thing we're 0:15:25.559,0:15:30.093 going to do is prove to the secure PRG,[br]in fact implies that the stream cipher is 0:15:30.093,0:15:31.186 semantically secure.