WEBVTT 00:00:00.000 --> 00:00:03.911 My goal for the next few segments is to show you that if we use a secure PRG We'll 00:00:03.911 --> 00:00:07.892 get a secure stream safer. The first thing we have to do is define is, what does it 00:00:07.892 --> 00:00:11.679 mean for a stream safer to be secure? So whenever we define security we always 00:00:11.679 --> 00:00:15.174 define it in terms of what can the attacker do? And what is the attacker 00:00:15.174 --> 00:00:18.670 trying to do? In the context of stream ciphers remember these are only used 00:00:18.670 --> 00:00:22.737 with a onetime key, and as a result the most the attacker is ever going to see is 00:00:22.737 --> 00:00:26.754 just one cypher text encrypted using the key that we're using. And so we're gonna 00:00:26.754 --> 00:00:30.772 limit the attackers? ability to basically obtain just one cypher text. And in 00:00:30.772 --> 00:00:34.641 fact later on we're going to allow the attacker to do much, much, much more, but 00:00:34.641 --> 00:00:38.460 for now we're just gonna give him one cypher text. And we wanted to find, 00:00:38.460 --> 00:00:42.560 what does it mean for the cypher to be secure? So the first definition that 00:00:42.560 --> 00:00:46.892 comes to mind is basically to say, well maybe we wanna require that the attacker 00:00:46.892 --> 00:00:50.718 can't actually recover the secret key. Okay so given ciphertext you 00:00:50.718 --> 00:00:54.609 shouldn't be able to recover the secret key. But that's a terrible definition 00:00:54.609 --> 00:00:58.717 because think about the falling brilliant cipher but the way we encrypt the 00:00:58.717 --> 00:01:02.855 message using a key K is basically we just output the message. Okay this 00:01:02.855 --> 00:01:07.427 is a brilliant cipher yeah of course it doesn't do anything given a message just 00:01:07.427 --> 00:01:12.000 output a message as the cipher text. This is not a particularly good encryption 00:01:12.000 --> 00:01:16.029 scheme; however, given the cipher text, mainly the message the poor attacker 00:01:16.029 --> 00:01:20.493 cannot recover the key because he doesn't know what the key is. And so as a result 00:01:20.493 --> 00:01:24.630 this cipher which clearly is insecure, would be considered secure under this 00:01:24.793 --> 00:01:28.636 requirement for security. So this definition will be no good. Okay so it's 00:01:28.636 --> 00:01:32.317 recovering the secret key is the wrong way to define security. So the next thing we 00:01:32.317 --> 00:01:35.999 can kinda attempt is basically just say, well maybe the attacker doesn't care about 00:01:35.999 --> 00:01:39.680 the secret key, what he really cares about are the plain text. So maybe it should be 00:01:39.680 --> 00:01:43.518 hard for the attacker to recover the entire plain text. But even that doesn't 00:01:43.518 --> 00:01:48.223 work because let's think about the following encryption scheme. So suppose 00:01:48.223 --> 00:01:53.436 what this encryption scheme does is it takes two messages, so I'm gonna use two 00:01:53.436 --> 00:01:58.014 lines to denote concatenation of two messages M0 line, line M1 means 00:01:58.014 --> 00:02:03.100 concatenate M0 and M1. And imagine what the scheme does is really it outputs 00:02:03.100 --> 00:02:08.060 M0 in the clear and concatenate to that the encryption of M1. Perhaps even 00:02:08.060 --> 00:02:13.337 using the One Time Pad okay? And so here the attacker is gonna be given one 00:02:13.337 --> 00:02:17.478 cipher text. And his goal would be to recover the entire plain texts. But the 00:02:17.478 --> 00:02:21.702 poor attacker can't do that because here maybe we've encrypted M1 using the One 00:02:21.702 --> 00:02:25.872 Time Pad so the attacker can't actually recover M1 because we know the 00:02:25.872 --> 00:02:30.043 One Time Pad is secure given just one cipher text. So this construction 00:02:30.043 --> 00:02:34.055 would satisfy the definition but unfortunately clearly this is not a secure 00:02:34.055 --> 00:02:37.962 encryption scheme because we just leaked half of the plain text. M0 is 00:02:37.962 --> 00:02:42.185 completely available to the attacker so even though he can't recover all of the 00:02:42.185 --> 00:02:46.462 plain text he might be able to recover most of the plain text, and that's clearly 00:02:46.462 --> 00:02:50.658 unsecured. So of course we already know the solution to this and we talked about 00:02:50.658 --> 00:02:54.747 Shanon definition of security perfect secrecy where Shannon's idea was that in 00:02:54.747 --> 00:02:58.835 fact when the attacker intercepts a cipher text he should learn absolutely no 00:02:58.835 --> 00:03:02.818 information about the plain texts. He shouldn't even learn one bit about the 00:03:02.818 --> 00:03:07.221 plain text or even he shouldn't learn, he shouldn't even be able to predict a little 00:03:07.221 --> 00:03:11.205 bit about a bid of the plain text. Absolutely no information about the plain text. 00:03:11.205 --> 00:03:14.926 So let's recall very briefly Shannon's concept of perfect secrecy 00:03:14.926 --> 00:03:19.442 basically we said that you know given a cipher We said the cipher has perfect 00:03:19.442 --> 00:03:25.069 secrecy if given two messages of the same length it so happens that the distribution 00:03:25.069 --> 00:03:30.167 of cipher texts. Yet if we pick a random key and we look into distribution of 00:03:30.167 --> 00:03:34.838 cipher texts we encrypt M0 we get exactly the same distribution as when we 00:03:34.838 --> 00:03:39.257 encrypt M1. The intuition here was that if the advisory observes the cipher texts 00:03:39.257 --> 00:03:43.839 then he doesn't know whether it came from the distribution the result of encrypting 00:03:43.839 --> 00:03:48.203 M0 or it came from the distribution as the result of encrypting M1. And as a 00:03:48.203 --> 00:03:52.513 result he can't tell whether we encrypted M0 or M1. And that's true for all 00:03:52.513 --> 00:03:56.877 messages of the same length and as a result a poor attacker doesn't really know 00:03:56.877 --> 00:04:01.212 what message was encrypted. Of course we already said that this definition is too 00:04:01.212 --> 00:04:05.400 strong in the sense that it requires really long keys if cipher has short 00:04:05.400 --> 00:04:09.535 keys can't possibly satisfy this definition in a particular stream ciphers 00:04:09.535 --> 00:04:14.328 can satisfy this definition. Okay so let's try to weaken the definition a little bit 00:04:14.328 --> 00:04:19.114 and let's think to the previous segment, and we can say that instead of requiring 00:04:19.114 --> 00:04:23.841 the two distributions to be absolutely identical what we can require is, is that 00:04:23.841 --> 00:04:28.686 two distributions just be computationally indistinguishable. In other words a poor, 00:04:28.863 --> 00:04:33.353 efficient attacker cannot distinguish the two distributions even though the 00:04:33.353 --> 00:04:37.815 distributions might be very, very, very different. That just given a sample for 00:04:37.815 --> 00:04:42.580 one distribution and a sample for another distribution the attacker can't tell which 00:04:42.580 --> 00:04:47.120 distribution he was given a sample from. It turns out this definition is actually 00:04:47.120 --> 00:04:51.716 almost right, but it's still a little too strong, that still cannot be satisfied, so 00:04:51.716 --> 00:04:56.200 we have to add one more constraint, and that is that instead of saying that this 00:04:56.200 --> 00:05:00.797 definition should have hold for all M0 and M1. It is to hold for only pairs M0 M1 00:05:00.797 --> 00:05:05.208 that the attackers could actually exhibit. Okay so this actually 00:05:05.208 --> 00:05:10.038 leads us to the definition of semantics security. And so, again this is semantics 00:05:10.038 --> 00:05:15.050 security for one time key in other words when the attacker is only given one cipher 00:05:15.050 --> 00:05:19.819 text. And so the way we define semantic security is by defining two experiments, 00:05:19.819 --> 00:05:24.562 okay we'll define experiment 0 and experiment 1. And more generally we will 00:05:24.562 --> 00:05:29.230 think of these as experiments parentheses B, where B can be zero or one. Okay so the 00:05:29.230 --> 00:05:32.890 way the experiment is defined is as follows, we have an adversary that's 00:05:32.890 --> 00:05:37.161 trying to break the system. An adversary A, that's kinda the analog of statistical 00:05:37.161 --> 00:05:41.279 tests in the world of pseudo random generators. And then the challenger does 00:05:41.279 --> 00:05:45.093 the following, so really we have two challengers, but the challengers are so 00:05:45.093 --> 00:05:49.414 similar that we can just describe them as a single challenger that in one case takes 00:05:49.414 --> 00:05:53.634 his inputs bits set to zero, and the other case takes his inputs bits set to 00:05:53.634 --> 00:05:57.193 one. And let me show you what these challengers do. The first thing the 00:05:57.193 --> 00:06:01.349 challenger is gonna do is it's gonna pick a random key and then the adversary 00:06:01.349 --> 00:06:06.076 basically is going to output two messages M0 and M1. Okay so this is an explicit 00:06:06.076 --> 00:06:11.039 pair of messages that the attacker wants to be challenged on and as usual we're not 00:06:11.039 --> 00:06:15.766 trying to hide the length of the messages, we require that the messages be equal 00:06:15.766 --> 00:06:21.643 length. And then the challenger basically will output either the encryption of M0 00:06:21.643 --> 00:06:25.890 or the encryption of M1, okay so in experiment 0 the challenger will output 00:06:25.890 --> 00:06:30.301 the encryption of M0. In experiment 1 the challenger will output the encryption 00:06:30.301 --> 00:06:34.385 of M1. Okay so that the difference between the two experiments. And then the 00:06:34.385 --> 00:06:38.796 adversary is trying to guess basically whether he was given the encryption of M0 00:06:38.796 --> 00:06:44.051 or given the encryption of M1. Okay so here's a little bit of notation let's 00:06:44.051 --> 00:06:50.260 define the events Wb to be the events that an experiment B the adversary output one. 00:06:50.260 --> 00:06:55.084 Okay so that is just an event that basically in experiment zero W0 means that 00:06:55.084 --> 00:07:00.342 the adversary output one and in experiment one W1 means the adversary output one. And 00:07:00.342 --> 00:07:05.291 now we can define the advantage of this adversary, basically to say that this is 00:07:05.291 --> 00:07:10.425 called the semantics security advantage of the adversary A against the scheme E, 00:07:10.425 --> 00:07:15.497 to be the difference of the probability of these two events. In other words we are 00:07:15.497 --> 00:07:20.136 looking at whether the adversary behaves differently when he was given the 00:07:20.136 --> 00:07:24.818 encryption of M0 from when he was given the encryption of M1. And I wanna make 00:07:24.818 --> 00:07:29.201 sure this is clear so I'm gonna say it one more time. So in experiment zero he was 00:07:29.201 --> 00:07:33.530 given the encryption of M0 and in experiment one he was given the encryption 00:07:33.530 --> 00:07:37.700 of M1. Now we're just interested in whether the adversary output 1 or not. 00:07:37.700 --> 00:07:42.356 ... In these experiments. If in both experiments the adversary output 1 with 00:07:42.356 --> 00:07:47.013 the same probability that means the adversary wasn't able to distinguish the 00:07:47.013 --> 00:07:51.549 two experiments. Experiments zero basically looks to the adversary the same 00:07:51.549 --> 00:07:56.206 as experiment one because in both cases upload one with the same probability. 00:07:56.206 --> 00:08:01.286 However if the adversary is able to output 1 in one experiment with significantly 00:08:01.286 --> 00:08:05.761 different probability than in the other experiment, then the adversary was 00:08:05.761 --> 00:08:10.266 actually able to distinguish the two experiments. Okay so... To say this more 00:08:10.266 --> 00:08:14.455 formally, essentially the advantage again because it's the difference of two 00:08:14.455 --> 00:08:18.918 probabilities the advantage is a number between zero and one. If the advantage is 00:08:18.918 --> 00:08:22.886 close to zero that means that the adversary was not able to distinguish 00:08:22.886 --> 00:08:27.129 experiment zero from experiment one. However if the advantage is close to one, 00:08:27.129 --> 00:08:31.538 that means the adversary was very well able to distinguish experiment zero from 00:08:31.538 --> 00:08:36.112 experiment one and that really means that he was able to distinguish an encryption 00:08:36.112 --> 00:08:40.299 of M0 from an encryption of M1, okay?So that's out definition. Actually 00:08:40.299 --> 00:08:44.055 that is just the definition of the advantage and the definition is just what 00:08:44.055 --> 00:08:47.714 you would expect basically we'll say that a symmetric encryption scheme is 00:08:47.714 --> 00:08:52.346 semantically secure if for all efficient adversaries here I'll put these in quotes 00:08:52.346 --> 00:08:56.932 again, "For all efficient adversaries the advantage is negligible." In other words, 00:08:56.982 --> 00:09:01.808 no efficient adversary can distinguish the encryption of M0 from the encryption 00:09:01.808 --> 00:09:06.103 of M1 and basically this is what it says repeatedly that for these two 00:09:06.103 --> 00:09:10.759 messages that the adversary was able to exhibit he wasn't able to distinguish 00:09:10.759 --> 00:09:15.064 these two distributions. Now I want to show you this, this is actually a very 00:09:15.064 --> 00:09:19.595 elegant definition. It might not seem so right away, but I want to show you some 00:09:19.595 --> 00:09:24.410 implications of this definition and you'll see exactly why the definition is the way 00:09:24.410 --> 00:09:28.601 it is. Okay so let's look at some examples. So the first example is suppose 00:09:28.601 --> 00:09:33.190 we have a broken encryption scheme, in other words suppose we have an adversary A 00:09:33.190 --> 00:09:38.285 that somehow given the cipher text he is always able to deduce the least 00:09:38.285 --> 00:09:44.149 significant bit of the plain text. Okay so given the encryption of M0, this adversary 00:09:44.149 --> 00:09:48.799 is able to deduce the least significant bit of M0. So that is a terrible 00:09:48.799 --> 00:09:52.911 encryption scheme because it basically leaks the least significant bit of the 00:09:52.911 --> 00:09:57.128 plain text just given the cipher text. So I want to show you that this scheme is 00:09:57.128 --> 00:10:01.609 therefore semantically secure so that kind of shows that if a system is semantically 00:10:01.609 --> 00:10:05.931 secure than there is no attacker of this type. Okay so let's see why is the system 00:10:05.931 --> 00:10:10.254 not semantically secure well so what we are gonna do is we're gonna basically use 00:10:10.254 --> 00:10:14.366 our adversary who is able to learn our most insignificant bits, we're going to 00:10:14.366 --> 00:10:18.372 use him to break semantic security so we're going to use him to distinguish 00:10:18.372 --> 00:10:22.755 experiment zero from experiment one Okay so here is what we are going to do. We are 00:10:22.755 --> 00:10:26.987 algorithm B, we are going to be algorithm B and this algorithm B is going to use 00:10:26.987 --> 00:10:31.165 algorithm A in its belly. Okay so the first thing that is going to happen is of 00:10:31.165 --> 00:10:35.608 course the challenger is going to choose a random key. The first thing that is going 00:10:35.608 --> 00:10:39.762 to happen is that we need to output two messages. The messages that we are going 00:10:39.762 --> 00:10:43.493 to output basically are going to have differently significant bits. So one 00:10:43.493 --> 00:10:47.727 message is going to end with zero and one message is going to end with one. Now what 00:10:47.727 --> 00:10:51.205 is the challenger going to do? The challenger is going to give us the 00:10:51.205 --> 00:10:55.238 encryption of either M0 or M1, depending on whether in experiment 0 or 00:10:55.238 --> 00:10:59.120 in experiment 1. And then we just forward this cipher text to the adversary 00:10:59.120 --> 00:11:03.871 okay so the adversary A. Now what is the property of adversary A? Given the cipher 00:11:03.871 --> 00:11:08.505 text, adversary A can tell us what the least significant bits of the plain text is. 00:11:08.505 --> 00:11:13.374 In other words the adversary is going to output the least significant bits of M0 or M1 00:11:13.374 --> 00:11:17.892 but low and behold that's basically the bit B. And then we're just 00:11:17.892 --> 00:11:23.050 going to output that as our guest so let?s call this thing B prime Okay so now this 00:11:23.050 --> 00:11:28.376 describes the semantic security adversary. And now you tell me what is the semantic 00:11:28.376 --> 00:11:33.593 security advantage of this adversary? Well so let's see. In experiment zero, what is 00:11:33.593 --> 00:11:38.775 the probability that adversary B outputs one? Well in experiment zero it is always 00:11:38.775 --> 00:11:43.704 given the encryption of M zero and therefore adversary A is always output the 00:11:43.704 --> 00:11:48.633 least significant bit of M zero which always happens to be zero. In experiment 00:11:48.633 --> 00:11:53.120 zero, B always outputs zero. So the probability that outputs one is zero. 00:11:53.120 --> 00:11:57.827 However in experiment one, we're given the encryption of M1. So how likely is 00:11:57.827 --> 00:12:02.783 adversary B to output one in experiment one well it always outputs one, again by 00:12:02.783 --> 00:12:07.428 the properties of algorithm A and therefore the advantage basically is one. 00:12:07.428 --> 00:12:12.384 So this is a huge advantage, it's as big as it's gonna get. Which means that this 00:12:12.384 --> 00:12:17.091 adversary completely broke the system. Okay so we consider, so under semantic 00:12:17.091 --> 00:12:22.295 security basically just deducing the least significant bits is enough to completely 00:12:22.295 --> 00:12:27.187 break the system under a semantic security definition. Okay, now the interesting 00:12:27.187 --> 00:12:32.388 thing here of course is that this is not just about the least significant bit, in 00:12:32.388 --> 00:12:37.117 fact of the message for example the most significant bit, you know 00:12:37.117 --> 00:12:42.040 bit number seven Maybe the XOR of all the bits in the message and so on 00:12:42.040 --> 00:12:46.552 and so forth any kind of information, any bit about the plain text they can be 00:12:46.552 --> 00:12:50.814 learned basically would mean that the system is not semantically secure. So 00:12:50.814 --> 00:12:55.532 basically all the adversary would have to do would be to come up with two messages 00:12:55.532 --> 00:13:00.249 M0 and M1 such that under one thing that they learned it's the value zero and then 00:13:00.249 --> 00:13:04.626 the other thing, the value, is one. So for example if A was able to learn the XOR 00:13:04.626 --> 00:13:08.775 bits of the message then M0 and M1 would just have different 00:13:08.775 --> 00:13:13.265 XOR for all the bits of their messages and then this adversary A would 00:13:13.265 --> 00:13:18.174 also be sufficient to break semantic security. Okay so, basically for cipher 00:13:18.174 --> 00:13:23.203 is semantically secure then no bit of information is revealed to an 00:13:23.203 --> 00:13:27.392 efficient adversary. Okay so this is exactly a concept of perfect secrecy only 00:13:27.392 --> 00:13:31.318 applied just efficient adversaries rather than all adversaries. So the next 00:13:31.318 --> 00:13:35.045 thing I wanna show you is that in fact the one time pad in fact is 00:13:35.045 --> 00:13:38.821 semantically secure, they better be semantically secure because it's in fact, 00:13:38.821 --> 00:13:42.773 it's more than that it's perfectly secure. So let's see why that's true. Well so 00:13:42.773 --> 00:13:47.010 again it's one of these experiments, so suppose we have an adversary that claims 00:13:47.010 --> 00:13:51.449 to break semantic security of the one time pad. The first the adversary is gonna do 00:13:51.449 --> 00:13:55.874 is he's gonna output two messages M0 and M1 of the same length. 00:13:55.874 --> 00:13:59.667 Now what does he get back as a challenge? As a challenge, he gets either the encryption 00:13:59.667 --> 00:14:03.988 of M0 or the encryption of M1 under the one time pad. 00:14:03.988 --> 00:14:07.886 And he's trying to distinguish between those two possible cipher texts that he gets, right? 00:14:07.886 --> 00:14:12.259 In experiment zero, he gets the encryption of M0 and in experiment one, he gets the 00:14:12.259 --> 00:14:16.579 encryption of M1. Well so let me ask you, so what is the advantage of adversary 00:14:16.579 --> 00:14:21.297 A against the one time patent? So I remember that the property of the ones I 00:14:21.297 --> 00:14:26.208 had is that, that K, XOR M0 is distributed identically to K, XOR M1. 00:14:26.208 --> 00:14:31.187 In other words, these distributions are absolutely identical distribution, 00:14:31.187 --> 00:14:36.026 distributions, identical. This is a property of XOR. If we XOR the random pad 00:14:36.026 --> 00:14:40.674 K with anything, either M0 or M1, we get uniform distribution. So in both 00:14:40.674 --> 00:14:45.382 cases, algorithm A is given as in input exactly the same distribution. Maybe the 00:14:45.382 --> 00:14:50.209 uniform distribution on cipher texts. And therefore it's gonna behave exactly the 00:14:50.209 --> 00:14:55.036 same in both cases because it was given the exact distribution as input. And as a 00:14:55.036 --> 00:14:59.699 result, its advantage is zero, which means that the onetime pad is semantically 00:14:59.723 --> 00:15:04.148 secure. Now the interesting thing here is not only is it semantically secure, it's 00:15:04.148 --> 00:15:08.244 semantically secure for all adversaries. We don't even have to restrict the 00:15:08.244 --> 00:15:12.450 adversaries to be efficient. No adversary, doesn't matter how smart you are, no 00:15:12.450 --> 00:15:16.875 adversary will be able to distinguish K XOR M0 from K XOR M1 because the two 00:15:16.875 --> 00:15:21.299 distributions are identical. Okay, so the one time pad is semantically secure. Okay, 00:15:21.299 --> 00:15:25.559 so that completes our definition of semantic security so the next thing we're 00:15:25.559 --> 00:15:30.093 going to do is prove to the secure PRG, in fact implies that the stream cipher is 00:15:30.093 --> 00:15:31.186 semantically secure.