0:00:00.000,0:00:03.911
My goal for the next few segments is to[br]show you that if we use a secure PRG We'll

0:00:03.911,0:00:07.892
get a secure stream safer. The first thing[br]we have to do is define is, what does it

0:00:07.892,0:00:11.679
mean for a stream safer to be secure? So[br]whenever we define security we always

0:00:11.679,0:00:15.174
define it in terms of what can the[br]attacker do? And what is the attacker

0:00:15.174,0:00:18.670
trying to do? In the context of[br]stream ciphers remember these are only used

0:00:18.670,0:00:22.737
with a onetime key, and as a result the[br]most the attacker is ever going to see is

0:00:22.737,0:00:26.754
just one cypher text encrypted using the[br]key that we're using. And so we're gonna

0:00:26.754,0:00:30.772
limit the attackers? ability to basically[br]obtain just one cypher text. And in

0:00:30.772,0:00:34.641
fact later on we're going to allow the[br]attacker to do much, much, much more, but

0:00:34.641,0:00:38.460
for now we're just gonna give him one[br]cypher text. And we wanted to find,

0:00:38.460,0:00:42.560
what does it mean for the cypher to[br]be secure? So the first definition that

0:00:42.560,0:00:46.892
comes to mind is basically to say, well[br]maybe we wanna require that the attacker

0:00:46.892,0:00:50.718
can't actually recover the secret key.[br]Okay so given ciphertext you

0:00:50.718,0:00:54.609
shouldn't be able to recover the secret[br]key. But that's a terrible definition

0:00:54.609,0:00:58.717
because think about the falling brilliant[br]cipher but the way we encrypt the

0:00:58.717,0:01:02.855
message using a key K is basically[br]we just output the message. Okay this

0:01:02.855,0:01:07.427
is a brilliant cipher yeah of course it[br]doesn't do anything given a message just

0:01:07.427,0:01:12.000
output a message as the cipher text.[br]This is not a particularly good encryption

0:01:12.000,0:01:16.029
scheme; however, given the cipher text,[br]mainly the message the poor attacker

0:01:16.029,0:01:20.493
cannot recover the key because he doesn't[br]know what the key is. And so as a result

0:01:20.493,0:01:24.630
this cipher which clearly is insecure,[br]would be considered secure under this

0:01:24.793,0:01:28.636
requirement for security. So this[br]definition will be no good. Okay so it's

0:01:28.636,0:01:32.317
recovering the secret key is the wrong way[br]to define security. So the next thing we

0:01:32.317,0:01:35.999
can kinda attempt is basically just say,[br]well maybe the attacker doesn't care about

0:01:35.999,0:01:39.680
the secret key, what he really cares about[br]are the plain text. So maybe it should be

0:01:39.680,0:01:43.518
hard for the attacker to recover the[br]entire plain text. But even that doesn't

0:01:43.518,0:01:48.223
work because let's think about the[br]following encryption scheme. So suppose

0:01:48.223,0:01:53.436
what this encryption scheme does is it[br]takes two messages, so I'm gonna use two

0:01:53.436,0:01:58.014
lines to denote concatenation of two[br]messages M0 line, line M1 means

0:01:58.014,0:02:03.100
concatenate M0 and M1. And imagine[br]what the scheme does is really it outputs

0:02:03.100,0:02:08.060
M0 in the clear and concatenate to[br]that the encryption of M1. Perhaps even

0:02:08.060,0:02:13.337
using the One Time Pad okay? And[br]so here the attacker is gonna be given one

0:02:13.337,0:02:17.478
cipher text. And his goal would be to[br]recover the entire plain texts. But the

0:02:17.478,0:02:21.702
poor attacker can't do that because here[br]maybe we've encrypted M1 using the One

0:02:21.702,0:02:25.872
Time Pad so the attacker can't[br]actually recover M1 because we know the

0:02:25.872,0:02:30.043
One Time Pad is secure given just[br]one cipher text. So this construction

0:02:30.043,0:02:34.055
would satisfy the definition but[br]unfortunately clearly this is not a secure

0:02:34.055,0:02:37.962
encryption scheme because we just leaked[br]half of the plain text. M0 is

0:02:37.962,0:02:42.185
completely available to the attacker so[br]even though he can't recover all of the

0:02:42.185,0:02:46.462
plain text he might be able to recover[br]most of the plain text, and that's clearly

0:02:46.462,0:02:50.658
unsecured. So of course we already know[br]the solution to this and we talked about

0:02:50.658,0:02:54.747
Shanon definition of security perfect[br]secrecy where Shannon's idea was that in

0:02:54.747,0:02:58.835
fact when the attacker intercepts a cipher[br]text he should learn absolutely no

0:02:58.835,0:03:02.818
information about the plain texts. He[br]shouldn't even learn one bit about the

0:03:02.818,0:03:07.221
plain text or even he shouldn't learn, he[br]shouldn't even be able to predict a little

0:03:07.221,0:03:11.205
bit about a bid of the plain text.[br]Absolutely no information about the plain text.

0:03:11.205,0:03:14.926
So let's recall very briefly[br]Shannon's concept of perfect secrecy

0:03:14.926,0:03:19.442
basically we said that you know given a[br]cipher We said the cipher has perfect

0:03:19.442,0:03:25.069
secrecy if given two messages of the same[br]length it so happens that the distribution

0:03:25.069,0:03:30.167
of cipher texts. Yet if we pick a random[br]key and we look into distribution of

0:03:30.167,0:03:34.838
cipher texts we encrypt M0 we get[br]exactly the same distribution as when we

0:03:34.838,0:03:39.257
encrypt M1. The intuition here was that if[br]the advisory observes the cipher texts

0:03:39.257,0:03:43.839
then he doesn't know whether it came from[br]the distribution the result of encrypting

0:03:43.839,0:03:48.203
M0 or it came from the distribution as[br]the result of encrypting M1. And as a

0:03:48.203,0:03:52.513
result he can't tell whether we encrypted[br]M0 or M1. And that's true for all

0:03:52.513,0:03:56.877
messages of the same length and as a[br]result a poor attacker doesn't really know

0:03:56.877,0:04:01.212
what message was encrypted. Of course we[br]already said that this definition is too

0:04:01.212,0:04:05.400
strong in the sense that it requires[br]really long keys if cipher has short

0:04:05.400,0:04:09.535
keys can't possibly satisfy this[br]definition in a particular stream ciphers

0:04:09.535,0:04:14.328
can satisfy this definition. Okay so let's[br]try to weaken the definition a little bit

0:04:14.328,0:04:19.114
and let's think to the previous segment,[br]and we can say that instead of requiring

0:04:19.114,0:04:23.841
the two distributions to be absolutely[br]identical what we can require is, is that

0:04:23.841,0:04:28.686
two distributions just be computationally[br]indistinguishable. In other words a poor,

0:04:28.863,0:04:33.353
efficient attacker cannot distinguish the[br]two distributions even though the

0:04:33.353,0:04:37.815
distributions might be very, very, very[br]different. That just given a sample for

0:04:37.815,0:04:42.580
one distribution and a sample for another[br]distribution the attacker can't tell which

0:04:42.580,0:04:47.120
distribution he was given a sample from.[br]It turns out this definition is actually

0:04:47.120,0:04:51.716
almost right, but it's still a little too[br]strong, that still cannot be satisfied, so

0:04:51.716,0:04:56.200
we have to add one more constraint, and[br]that is that instead of saying that this

0:04:56.200,0:05:00.797
definition should have hold for all M0 and M1. It is to hold for only pairs M0 M1

0:05:00.797,0:05:05.208
that the attackers could[br]actually exhibit. Okay so this actually

0:05:05.208,0:05:10.038
leads us to the definition of semantics[br]security. And so, again this is semantics

0:05:10.038,0:05:15.050
security for one time key in other words[br]when the attacker is only given one cipher

0:05:15.050,0:05:19.819
text. And so the way we define semantic[br]security is by defining two experiments,

0:05:19.819,0:05:24.562
okay we'll define experiment 0 and[br]experiment 1. And more generally we will

0:05:24.562,0:05:29.230
think of these as experiments parentheses[br]B, where B can be zero or one. Okay so the

0:05:29.230,0:05:32.890
way the experiment is defined is as[br]follows, we have an adversary that's

0:05:32.890,0:05:37.161
trying to break the system. An adversary[br]A, that's kinda the analog of statistical

0:05:37.161,0:05:41.279
tests in the world of pseudo random[br]generators. And then the challenger does

0:05:41.279,0:05:45.093
the following, so really we have two[br]challengers, but the challengers are so

0:05:45.093,0:05:49.414
similar that we can just describe them as[br]a single challenger that in one case takes

0:05:49.414,0:05:53.634
his inputs bits set to zero, and the other[br]case takes his inputs bits set to

0:05:53.634,0:05:57.193
one. And let me show you what these[br]challengers do. The first thing the

0:05:57.193,0:06:01.349
challenger is gonna do is it's gonna pick[br]a random key and then the adversary

0:06:01.349,0:06:06.076
basically is going to output two messages[br]M0 and M1. Okay so this is an explicit

0:06:06.076,0:06:11.039
pair of messages that the attacker wants[br]to be challenged on and as usual we're not

0:06:11.039,0:06:15.766
trying to hide the length of the messages,[br]we require that the messages be equal

0:06:15.766,0:06:21.643
length. And then the challenger basically[br]will output either the encryption of M0

0:06:21.643,0:06:25.890
or the encryption of M1, okay so in[br]experiment 0 the challenger will output

0:06:25.890,0:06:30.301
the encryption of M0. In experiment 1 the challenger will output the encryption

0:06:30.301,0:06:34.385
of M1. Okay so that the difference between[br]the two experiments. And then the

0:06:34.385,0:06:38.796
adversary is trying to guess basically[br]whether he was given the encryption of M0

0:06:38.796,0:06:44.051
or given the encryption of M1. Okay so[br]here's a little bit of notation let's

0:06:44.051,0:06:50.260
define the events Wb to be the events that[br]an experiment B the adversary output one.

0:06:50.260,0:06:55.084
Okay so that is just an event that[br]basically in experiment zero W0 means that

0:06:55.084,0:07:00.342
the adversary output one and in experiment[br]one W1 means the adversary output one. And

0:07:00.342,0:07:05.291
now we can define the advantage of this[br]adversary, basically to say that this is

0:07:05.291,0:07:10.425
called the semantics security advantage of[br]the adversary A against the scheme E,

0:07:10.425,0:07:15.497
to be the difference of the probability of[br]these two events. In other words we are

0:07:15.497,0:07:20.136
looking at whether the adversary behaves[br]differently when he was given the

0:07:20.136,0:07:24.818
encryption of M0 from when he was given[br]the encryption of M1. And I wanna make

0:07:24.818,0:07:29.201
sure this is clear so I'm gonna say it one[br]more time. So in experiment zero he was

0:07:29.201,0:07:33.530
given the encryption of M0 and in[br]experiment one he was given the encryption

0:07:33.530,0:07:37.700
of M1. Now we're just interested in[br]whether the adversary output 1 or not.

0:07:37.700,0:07:42.356
... In these experiments. If in both[br]experiments the adversary output 1 with

0:07:42.356,0:07:47.013
the same probability that means the[br]adversary wasn't able to distinguish the

0:07:47.013,0:07:51.549
two experiments. Experiments zero[br]basically looks to the adversary the same

0:07:51.549,0:07:56.206
as experiment one because in both cases[br]upload one with the same probability.

0:07:56.206,0:08:01.286
However if the adversary is able to output[br]1 in one experiment with significantly

0:08:01.286,0:08:05.761
different probability than in the other[br]experiment, then the adversary was

0:08:05.761,0:08:10.266
actually able to distinguish the two[br]experiments. Okay so... To say this more

0:08:10.266,0:08:14.455
formally, essentially the advantage again[br]because it's the difference of two

0:08:14.455,0:08:18.918
probabilities the advantage is a number[br]between zero and one. If the advantage is

0:08:18.918,0:08:22.886
close to zero that means that the[br]adversary was not able to distinguish

0:08:22.886,0:08:27.129
experiment zero from experiment one.[br]However if the advantage is close to one,

0:08:27.129,0:08:31.538
that means the adversary was very well[br]able to distinguish experiment zero from

0:08:31.538,0:08:36.112
experiment one and that really means that[br]he was able to distinguish an encryption

0:08:36.112,0:08:40.299
of M0 from an encryption of M1, okay?So that's out definition. Actually

0:08:40.299,0:08:44.055
that is just the definition of the[br]advantage and the definition is just what

0:08:44.055,0:08:47.714
you would expect basically we'll say that[br]a symmetric encryption scheme is

0:08:47.714,0:08:52.346
semantically secure if for all efficient[br]adversaries here I'll put these in quotes

0:08:52.346,0:08:56.932
again, "For all efficient adversaries the[br]advantage is negligible." In other words,

0:08:56.982,0:09:01.808
no efficient adversary can distinguish the[br]encryption of M0 from the encryption

0:09:01.808,0:09:06.103
of M1 and basically this is what it[br]says repeatedly that for these two

0:09:06.103,0:09:10.759
messages that the adversary was able to[br]exhibit he wasn't able to distinguish

0:09:10.759,0:09:15.064
these two distributions. Now I want to[br]show you this, this is actually a very

0:09:15.064,0:09:19.595
elegant definition. It might not seem so[br]right away, but I want to show you some

0:09:19.595,0:09:24.410
implications of this definition and you'll[br]see exactly why the definition is the way

0:09:24.410,0:09:28.601
it is. Okay so let's look at some[br]examples. So the first example is suppose

0:09:28.601,0:09:33.190
we have a broken encryption scheme, in[br]other words suppose we have an adversary A

0:09:33.190,0:09:38.285
that somehow given the cipher text he is[br]always able to deduce the least

0:09:38.285,0:09:44.149
significant bit of the plain text. Okay so[br]given the encryption of M0, this adversary

0:09:44.149,0:09:48.799
is able to deduce the least significant[br]bit of M0. So that is a terrible

0:09:48.799,0:09:52.911
encryption scheme because it basically[br]leaks the least significant bit of the

0:09:52.911,0:09:57.128
plain text just given the cipher text. So[br]I want to show you that this scheme is

0:09:57.128,0:10:01.609
therefore semantically secure so that kind[br]of shows that if a system is semantically

0:10:01.609,0:10:05.931
secure than there is no attacker of this[br]type. Okay so let's see why is the system

0:10:05.931,0:10:10.254
not semantically secure well so what we[br]are gonna do is we're gonna basically use

0:10:10.254,0:10:14.366
our adversary who is able to learn our[br]most insignificant bits, we're going to

0:10:14.366,0:10:18.372
use him to break semantic security so[br]we're going to use him to distinguish

0:10:18.372,0:10:22.755
experiment zero from experiment one Okay[br]so here is what we are going to do. We are

0:10:22.755,0:10:26.987
algorithm B, we are going to be algorithm[br]B and this algorithm B is going to use

0:10:26.987,0:10:31.165
algorithm A in its belly. Okay so the[br]first thing that is going to happen is of

0:10:31.165,0:10:35.608
course the challenger is going to choose a[br]random key. The first thing that is going

0:10:35.608,0:10:39.762
to happen is that we need to output two[br]messages. The messages that we are going

0:10:39.762,0:10:43.493
to output basically are going to have[br]differently significant bits. So one

0:10:43.493,0:10:47.727
message is going to end with zero and one[br]message is going to end with one. Now what

0:10:47.727,0:10:51.205
is the challenger going to do? The[br]challenger is going to give us the

0:10:51.205,0:10:55.238
encryption of either M0 or M1,[br]depending on whether in experiment 0 or

0:10:55.238,0:10:59.120
in experiment 1. And then we just[br]forward this cipher text to the adversary

0:10:59.120,0:11:03.871
okay so the adversary A. Now what is the[br]property of adversary A? Given the cipher

0:11:03.871,0:11:08.505
text, adversary A can tell us what the[br]least significant bits of the plain text is.

0:11:08.505,0:11:13.374
In other words the adversary is going[br]to output the least significant bits of M0 or M1

0:11:13.374,0:11:17.892
but low and behold that's[br]basically the bit B. And then we're just

0:11:17.892,0:11:23.050
going to output that as our guest so let?s[br]call this thing B prime Okay so now this

0:11:23.050,0:11:28.376
describes the semantic security adversary.[br]And now you tell me what is the semantic

0:11:28.376,0:11:33.593
security advantage of this adversary? Well[br]so let's see. In experiment zero, what is

0:11:33.593,0:11:38.775
the probability that adversary B outputs[br]one? Well in experiment zero it is always

0:11:38.775,0:11:43.704
given the encryption of M zero and[br]therefore adversary A is always output the

0:11:43.704,0:11:48.633
least significant bit of M zero which[br]always happens to be zero. In experiment

0:11:48.633,0:11:53.120
zero, B always outputs zero. So the[br]probability that outputs one is zero.

0:11:53.120,0:11:57.827
However in experiment one, we're given the[br]encryption of M1. So how likely is

0:11:57.827,0:12:02.783
adversary B to output one in experiment[br]one well it always outputs one, again by

0:12:02.783,0:12:07.428
the properties of algorithm A and[br]therefore the advantage basically is one.

0:12:07.428,0:12:12.384
So this is a huge advantage, it's as big[br]as it's gonna get. Which means that this

0:12:12.384,0:12:17.091
adversary completely broke the system.[br]Okay so we consider, so under semantic

0:12:17.091,0:12:22.295
security basically just deducing the least[br]significant bits is enough to completely

0:12:22.295,0:12:27.187
break the system under a semantic security[br]definition. Okay, now the interesting

0:12:27.187,0:12:32.388
thing here of course is that this is not[br]just about the least significant bit, in

0:12:32.388,0:12:37.117
fact of the message for[br]example the most significant bit, you know

0:12:37.117,0:12:42.040
bit number seven Maybe the XOR of all the bits in the message and so on

0:12:42.040,0:12:46.552
and so forth any kind of information, any[br]bit about the plain text they can be

0:12:46.552,0:12:50.814
learned basically would mean that the[br]system is not semantically secure. So

0:12:50.814,0:12:55.532
basically all the adversary would have to[br]do would be to come up with two messages

0:12:55.532,0:13:00.249
M0 and M1 such that under one thing that[br]they learned it's the value zero and then

0:13:00.249,0:13:04.626
the other thing, the value, is one. So for[br]example if A was able to learn the XOR

0:13:04.626,0:13:08.775
bits of the message then M0[br]and M1 would just have different

0:13:08.775,0:13:13.265
XOR for all the bits of their[br]messages and then this adversary A would

0:13:13.265,0:13:18.174
also be sufficient to break semantic[br]security. Okay so, basically for cipher

0:13:18.174,0:13:23.203
is semantically secure then no[br]bit of information is revealed to an

0:13:23.203,0:13:27.392
efficient adversary. Okay so this is[br]exactly a concept of perfect secrecy only

0:13:27.392,0:13:31.318
applied just efficient adversaries[br]rather than all adversaries. So the next

0:13:31.318,0:13:35.045
thing I wanna show you is that in fact the[br]one time pad in fact is

0:13:35.045,0:13:38.821
semantically secure, they better be[br]semantically secure because it's in fact,

0:13:38.821,0:13:42.773
it's more than that it's perfectly secure.[br]So let's see why that's true. Well so

0:13:42.773,0:13:47.010
again it's one of these experiments, so[br]suppose we have an adversary that claims

0:13:47.010,0:13:51.449
to break semantic security of the one time[br]pad. The first the adversary is gonna do

0:13:51.449,0:13:55.874
is he's gonna output two messages M0[br]and M1 of the same length.

0:13:55.874,0:13:59.667
Now what does he get back as a challenge? As a[br]challenge, he gets either the encryption

0:13:59.667,0:14:03.988
of M0 or the encryption of M1 under[br]the one time pad.

0:14:03.988,0:14:07.886
And he's trying to distinguish between those two possible[br]cipher texts that he gets, right?

0:14:07.886,0:14:12.259
In experiment zero, he gets the encryption of[br]M0 and in experiment one, he gets the

0:14:12.259,0:14:16.579
encryption of M1. Well so let me ask[br]you, so what is the advantage of adversary

0:14:16.579,0:14:21.297
A against the one time patent? So I[br]remember that the property of the ones I

0:14:21.297,0:14:26.208
had is that, that K, XOR M0 is[br]distributed identically to K, XOR M1.

0:14:26.208,0:14:31.187
In other words, these distributions are[br]absolutely identical distribution,

0:14:31.187,0:14:36.026
distributions, identical. This is a[br]property of XOR. If we XOR the random pad

0:14:36.026,0:14:40.674
K with anything, either M0 or M1,[br]we get uniform distribution. So in both

0:14:40.674,0:14:45.382
cases, algorithm A is given as in input[br]exactly the same distribution. Maybe the

0:14:45.382,0:14:50.209
uniform distribution on cipher texts. And[br]therefore it's gonna behave exactly the

0:14:50.209,0:14:55.036
same in both cases because it was given[br]the exact distribution as input. And as a

0:14:55.036,0:14:59.699
result, its advantage is zero, which means[br]that the onetime pad is semantically

0:14:59.723,0:15:04.148
secure. Now the interesting thing here is[br]not only is it semantically secure, it's

0:15:04.148,0:15:08.244
semantically secure for all adversaries.[br]We don't even have to restrict the

0:15:08.244,0:15:12.450
adversaries to be efficient. No adversary,[br]doesn't matter how smart you are, no

0:15:12.450,0:15:16.875
adversary will be able to distinguish K XOR M0 from K XOR M1 because the two

0:15:16.875,0:15:21.299
distributions are identical. Okay, so the[br]one time pad is semantically secure. Okay,

0:15:21.299,0:15:25.559
so that completes our definition of[br]semantic security so the next thing we're

0:15:25.559,0:15:30.093
going to do is prove to the secure PRG,[br]in fact implies that the stream cipher is

0:15:30.093,0:15:31.186
semantically secure.