1
00:00:00,000 --> 00:00:03,911
My goal for the next few segments is to
show you that if we use a secure PRG We'll

2
00:00:03,911 --> 00:00:07,892
get a secure stream safer. The first thing
we have to do is define is, what does it

3
00:00:07,892 --> 00:00:11,679
mean for a stream safer to be secure? So
whenever we define security we always

4
00:00:11,679 --> 00:00:15,174
define it in terms of what can the
attacker do? And what is the attacker

5
00:00:15,174 --> 00:00:18,670
trying to do? In the context of
stream ciphers remember these are only used

6
00:00:18,670 --> 00:00:22,737
with a onetime key, and as a result the
most the attacker is ever going to see is

7
00:00:22,737 --> 00:00:26,754
just one cypher text encrypted using the
key that we're using. And so we're gonna

8
00:00:26,754 --> 00:00:30,772
limit the attackers? ability to basically
obtain just one cypher text. And in

9
00:00:30,772 --> 00:00:34,641
fact later on we're going to allow the
attacker to do much, much, much more, but

10
00:00:34,641 --> 00:00:38,460
for now we're just gonna give him one
cypher text. And we wanted to find,

11
00:00:38,460 --> 00:00:42,560
what does it mean for the cypher to
be secure? So the first definition that

12
00:00:42,560 --> 00:00:46,892
comes to mind is basically to say, well
maybe we wanna require that the attacker

13
00:00:46,892 --> 00:00:50,718
can't actually recover the secret key.
Okay so given ciphertext you

14
00:00:50,718 --> 00:00:54,609
shouldn't be able to recover the secret
key. But that's a terrible definition

15
00:00:54,609 --> 00:00:58,717
because think about the falling brilliant
cipher but the way we encrypt the

16
00:00:58,717 --> 00:01:02,855
message using a key K is basically
we just output the message. Okay this

17
00:01:02,855 --> 00:01:07,427
is a brilliant cipher yeah of course it
doesn't do anything given a message just

18
00:01:07,427 --> 00:01:12,000
output a message as the cipher text.
This is not a particularly good encryption

19
00:01:12,000 --> 00:01:16,029
scheme; however, given the cipher text,
mainly the message the poor attacker

20
00:01:16,029 --> 00:01:20,493
cannot recover the key because he doesn't
know what the key is. And so as a result

21
00:01:20,493 --> 00:01:24,630
this cipher which clearly is insecure,
would be considered secure under this

22
00:01:24,793 --> 00:01:28,636
requirement for security. So this
definition will be no good. Okay so it's

23
00:01:28,636 --> 00:01:32,317
recovering the secret key is the wrong way
to define security. So the next thing we

24
00:01:32,317 --> 00:01:35,999
can kinda attempt is basically just say,
well maybe the attacker doesn't care about

25
00:01:35,999 --> 00:01:39,680
the secret key, what he really cares about
are the plain text. So maybe it should be

26
00:01:39,680 --> 00:01:43,518
hard for the attacker to recover the
entire plain text. But even that doesn't

27
00:01:43,518 --> 00:01:48,223
work because let's think about the
following encryption scheme. So suppose

28
00:01:48,223 --> 00:01:53,436
what this encryption scheme does is it
takes two messages, so I'm gonna use two

29
00:01:53,436 --> 00:01:58,014
lines to denote concatenation of two
messages M0 line, line M1 means

30
00:01:58,014 --> 00:02:03,100
concatenate M0 and M1. And imagine
what the scheme does is really it outputs

31
00:02:03,100 --> 00:02:08,060
M0 in the clear and concatenate to
that the encryption of M1. Perhaps even

32
00:02:08,060 --> 00:02:13,337
using the One Time Pad okay? And
so here the attacker is gonna be given one

33
00:02:13,337 --> 00:02:17,478
cipher text. And his goal would be to
recover the entire plain texts. But the

34
00:02:17,478 --> 00:02:21,702
poor attacker can't do that because here
maybe we've encrypted M1 using the One

35
00:02:21,702 --> 00:02:25,872
Time Pad so the attacker can't
actually recover M1 because we know the

36
00:02:25,872 --> 00:02:30,043
One Time Pad is secure given just
one cipher text. So this construction

37
00:02:30,043 --> 00:02:34,055
would satisfy the definition but
unfortunately clearly this is not a secure

38
00:02:34,055 --> 00:02:37,962
encryption scheme because we just leaked
half of the plain text. M0 is

39
00:02:37,962 --> 00:02:42,185
completely available to the attacker so
even though he can't recover all of the

40
00:02:42,185 --> 00:02:46,462
plain text he might be able to recover
most of the plain text, and that's clearly

41
00:02:46,462 --> 00:02:50,658
unsecured. So of course we already know
the solution to this and we talked about

42
00:02:50,658 --> 00:02:54,747
Shanon definition of security perfect
secrecy where Shannon's idea was that in

43
00:02:54,747 --> 00:02:58,835
fact when the attacker intercepts a cipher
text he should learn absolutely no

44
00:02:58,835 --> 00:03:02,818
information about the plain texts. He
shouldn't even learn one bit about the

45
00:03:02,818 --> 00:03:07,221
plain text or even he shouldn't learn, he
shouldn't even be able to predict a little

46
00:03:07,221 --> 00:03:11,205
bit about a bid of the plain text.
Absolutely no information about the plain text.

47
00:03:11,205 --> 00:03:14,926
So let's recall very briefly
Shannon's concept of perfect secrecy

48
00:03:14,926 --> 00:03:19,442
basically we said that you know given a
cipher We said the cipher has perfect

49
00:03:19,442 --> 00:03:25,069
secrecy if given two messages of the same
length it so happens that the distribution

50
00:03:25,069 --> 00:03:30,167
of cipher texts. Yet if we pick a random
key and we look into distribution of

51
00:03:30,167 --> 00:03:34,838
cipher texts we encrypt M0 we get
exactly the same distribution as when we

52
00:03:34,838 --> 00:03:39,257
encrypt M1. The intuition here was that if
the advisory observes the cipher texts

53
00:03:39,257 --> 00:03:43,839
then he doesn't know whether it came from
the distribution the result of encrypting

54
00:03:43,839 --> 00:03:48,203
M0 or it came from the distribution as
the result of encrypting M1. And as a

55
00:03:48,203 --> 00:03:52,513
result he can't tell whether we encrypted
M0 or M1. And that's true for all

56
00:03:52,513 --> 00:03:56,877
messages of the same length and as a
result a poor attacker doesn't really know

57
00:03:56,877 --> 00:04:01,212
what message was encrypted. Of course we
already said that this definition is too

58
00:04:01,212 --> 00:04:05,400
strong in the sense that it requires
really long keys if cipher has short

59
00:04:05,400 --> 00:04:09,535
keys can't possibly satisfy this
definition in a particular stream ciphers

60
00:04:09,535 --> 00:04:14,328
can satisfy this definition. Okay so let's
try to weaken the definition a little bit

61
00:04:14,328 --> 00:04:19,114
and let's think to the previous segment,
and we can say that instead of requiring

62
00:04:19,114 --> 00:04:23,841
the two distributions to be absolutely
identical what we can require is, is that

63
00:04:23,841 --> 00:04:28,686
two distributions just be computationally
indistinguishable. In other words a poor,

64
00:04:28,863 --> 00:04:33,353
efficient attacker cannot distinguish the
two distributions even though the

65
00:04:33,353 --> 00:04:37,815
distributions might be very, very, very
different. That just given a sample for

66
00:04:37,815 --> 00:04:42,580
one distribution and a sample for another
distribution the attacker can't tell which

67
00:04:42,580 --> 00:04:47,120
distribution he was given a sample from.
It turns out this definition is actually

68
00:04:47,120 --> 00:04:51,716
almost right, but it's still a little too
strong, that still cannot be satisfied, so

69
00:04:51,716 --> 00:04:56,200
we have to add one more constraint, and
that is that instead of saying that this

70
00:04:56,200 --> 00:05:00,797
definition should have hold for all M0 and M1. It is to hold for only pairs M0 M1

71
00:05:00,797 --> 00:05:05,208
that the attackers could
actually exhibit. Okay so this actually

72
00:05:05,208 --> 00:05:10,038
leads us to the definition of semantics
security. And so, again this is semantics

73
00:05:10,038 --> 00:05:15,050
security for one time key in other words
when the attacker is only given one cipher

74
00:05:15,050 --> 00:05:19,819
text. And so the way we define semantic
security is by defining two experiments,

75
00:05:19,819 --> 00:05:24,562
okay we'll define experiment 0 and
experiment 1. And more generally we will

76
00:05:24,562 --> 00:05:29,230
think of these as experiments parentheses
B, where B can be zero or one. Okay so the

77
00:05:29,230 --> 00:05:32,890
way the experiment is defined is as
follows, we have an adversary that's

78
00:05:32,890 --> 00:05:37,161
trying to break the system. An adversary
A, that's kinda the analog of statistical

79
00:05:37,161 --> 00:05:41,279
tests in the world of pseudo random
generators. And then the challenger does

80
00:05:41,279 --> 00:05:45,093
the following, so really we have two
challengers, but the challengers are so

81
00:05:45,093 --> 00:05:49,414
similar that we can just describe them as
a single challenger that in one case takes

82
00:05:49,414 --> 00:05:53,634
his inputs bits set to zero, and the other
case takes his inputs bits set to

83
00:05:53,634 --> 00:05:57,193
one. And let me show you what these
challengers do. The first thing the

84
00:05:57,193 --> 00:06:01,349
challenger is gonna do is it's gonna pick
a random key and then the adversary

85
00:06:01,349 --> 00:06:06,076
basically is going to output two messages
M0 and M1. Okay so this is an explicit

86
00:06:06,076 --> 00:06:11,039
pair of messages that the attacker wants
to be challenged on and as usual we're not

87
00:06:11,039 --> 00:06:15,766
trying to hide the length of the messages,
we require that the messages be equal

88
00:06:15,766 --> 00:06:21,643
length. And then the challenger basically
will output either the encryption of M0

89
00:06:21,643 --> 00:06:25,890
or the encryption of M1, okay so in
experiment 0 the challenger will output

90
00:06:25,890 --> 00:06:30,301
the encryption of M0. In experiment 1 the challenger will output the encryption

91
00:06:30,301 --> 00:06:34,385
of M1. Okay so that the difference between
the two experiments. And then the

92
00:06:34,385 --> 00:06:38,796
adversary is trying to guess basically
whether he was given the encryption of M0

93
00:06:38,796 --> 00:06:44,051
or given the encryption of M1. Okay so
here's a little bit of notation let's

94
00:06:44,051 --> 00:06:50,260
define the events Wb to be the events that
an experiment B the adversary output one.

95
00:06:50,260 --> 00:06:55,084
Okay so that is just an event that
basically in experiment zero W0 means that

96
00:06:55,084 --> 00:07:00,342
the adversary output one and in experiment
one W1 means the adversary output one. And

97
00:07:00,342 --> 00:07:05,291
now we can define the advantage of this
adversary, basically to say that this is

98
00:07:05,291 --> 00:07:10,425
called the semantics security advantage of
the adversary A against the scheme E,

99
00:07:10,425 --> 00:07:15,497
to be the difference of the probability of
these two events. In other words we are

100
00:07:15,497 --> 00:07:20,136
looking at whether the adversary behaves
differently when he was given the

101
00:07:20,136 --> 00:07:24,818
encryption of M0 from when he was given
the encryption of M1. And I wanna make

102
00:07:24,818 --> 00:07:29,201
sure this is clear so I'm gonna say it one
more time. So in experiment zero he was

103
00:07:29,201 --> 00:07:33,530
given the encryption of M0 and in
experiment one he was given the encryption

104
00:07:33,530 --> 00:07:37,700
of M1. Now we're just interested in
whether the adversary output 1 or not.

105
00:07:37,700 --> 00:07:42,356
... In these experiments. If in both
experiments the adversary output 1 with

106
00:07:42,356 --> 00:07:47,013
the same probability that means the
adversary wasn't able to distinguish the

107
00:07:47,013 --> 00:07:51,549
two experiments. Experiments zero
basically looks to the adversary the same

108
00:07:51,549 --> 00:07:56,206
as experiment one because in both cases
upload one with the same probability.

109
00:07:56,206 --> 00:08:01,286
However if the adversary is able to output
1 in one experiment with significantly

110
00:08:01,286 --> 00:08:05,761
different probability than in the other
experiment, then the adversary was

111
00:08:05,761 --> 00:08:10,266
actually able to distinguish the two
experiments. Okay so... To say this more

112
00:08:10,266 --> 00:08:14,455
formally, essentially the advantage again
because it's the difference of two

113
00:08:14,455 --> 00:08:18,918
probabilities the advantage is a number
between zero and one. If the advantage is

114
00:08:18,918 --> 00:08:22,886
close to zero that means that the
adversary was not able to distinguish

115
00:08:22,886 --> 00:08:27,129
experiment zero from experiment one.
However if the advantage is close to one,

116
00:08:27,129 --> 00:08:31,538
that means the adversary was very well
able to distinguish experiment zero from

117
00:08:31,538 --> 00:08:36,112
experiment one and that really means that
he was able to distinguish an encryption

118
00:08:36,112 --> 00:08:40,299
of M0 from an encryption of M1, okay?So that's out definition. Actually

119
00:08:40,299 --> 00:08:44,055
that is just the definition of the
advantage and the definition is just what

120
00:08:44,055 --> 00:08:47,714
you would expect basically we'll say that
a symmetric encryption scheme is

121
00:08:47,714 --> 00:08:52,346
semantically secure if for all efficient
adversaries here I'll put these in quotes

122
00:08:52,346 --> 00:08:56,932
again, "For all efficient adversaries the
advantage is negligible." In other words,

123
00:08:56,982 --> 00:09:01,808
no efficient adversary can distinguish the
encryption of M0 from the encryption

124
00:09:01,808 --> 00:09:06,103
of M1 and basically this is what it
says repeatedly that for these two

125
00:09:06,103 --> 00:09:10,759
messages that the adversary was able to
exhibit he wasn't able to distinguish

126
00:09:10,759 --> 00:09:15,064
these two distributions. Now I want to
show you this, this is actually a very

127
00:09:15,064 --> 00:09:19,595
elegant definition. It might not seem so
right away, but I want to show you some

128
00:09:19,595 --> 00:09:24,410
implications of this definition and you'll
see exactly why the definition is the way

129
00:09:24,410 --> 00:09:28,601
it is. Okay so let's look at some
examples. So the first example is suppose

130
00:09:28,601 --> 00:09:33,190
we have a broken encryption scheme, in
other words suppose we have an adversary A

131
00:09:33,190 --> 00:09:38,285
that somehow given the cipher text he is
always able to deduce the least

132
00:09:38,285 --> 00:09:44,149
significant bit of the plain text. Okay so
given the encryption of M0, this adversary

133
00:09:44,149 --> 00:09:48,799
is able to deduce the least significant
bit of M0. So that is a terrible

134
00:09:48,799 --> 00:09:52,911
encryption scheme because it basically
leaks the least significant bit of the

135
00:09:52,911 --> 00:09:57,128
plain text just given the cipher text. So
I want to show you that this scheme is

136
00:09:57,128 --> 00:10:01,609
therefore semantically secure so that kind
of shows that if a system is semantically

137
00:10:01,609 --> 00:10:05,931
secure than there is no attacker of this
type. Okay so let's see why is the system

138
00:10:05,931 --> 00:10:10,254
not semantically secure well so what we
are gonna do is we're gonna basically use

139
00:10:10,254 --> 00:10:14,366
our adversary who is able to learn our
most insignificant bits, we're going to

140
00:10:14,366 --> 00:10:18,372
use him to break semantic security so
we're going to use him to distinguish

141
00:10:18,372 --> 00:10:22,755
experiment zero from experiment one Okay
so here is what we are going to do. We are

142
00:10:22,755 --> 00:10:26,987
algorithm B, we are going to be algorithm
B and this algorithm B is going to use

143
00:10:26,987 --> 00:10:31,165
algorithm A in its belly. Okay so the
first thing that is going to happen is of

144
00:10:31,165 --> 00:10:35,608
course the challenger is going to choose a
random key. The first thing that is going

145
00:10:35,608 --> 00:10:39,762
to happen is that we need to output two
messages. The messages that we are going

146
00:10:39,762 --> 00:10:43,493
to output basically are going to have
differently significant bits. So one

147
00:10:43,493 --> 00:10:47,727
message is going to end with zero and one
message is going to end with one. Now what

148
00:10:47,727 --> 00:10:51,205
is the challenger going to do? The
challenger is going to give us the

149
00:10:51,205 --> 00:10:55,238
encryption of either M0 or M1,
depending on whether in experiment 0 or

150
00:10:55,238 --> 00:10:59,120
in experiment 1. And then we just
forward this cipher text to the adversary

151
00:10:59,120 --> 00:11:03,871
okay so the adversary A. Now what is the
property of adversary A? Given the cipher

152
00:11:03,871 --> 00:11:08,505
text, adversary A can tell us what the
least significant bits of the plain text is.

153
00:11:08,505 --> 00:11:13,374
In other words the adversary is going
to output the least significant bits of M0 or M1

154
00:11:13,374 --> 00:11:17,892
but low and behold that's
basically the bit B. And then we're just

155
00:11:17,892 --> 00:11:23,050
going to output that as our guest so let?s
call this thing B prime Okay so now this

156
00:11:23,050 --> 00:11:28,376
describes the semantic security adversary.
And now you tell me what is the semantic

157
00:11:28,376 --> 00:11:33,593
security advantage of this adversary? Well
so let's see. In experiment zero, what is

158
00:11:33,593 --> 00:11:38,775
the probability that adversary B outputs
one? Well in experiment zero it is always

159
00:11:38,775 --> 00:11:43,704
given the encryption of M zero and
therefore adversary A is always output the

160
00:11:43,704 --> 00:11:48,633
least significant bit of M zero which
always happens to be zero. In experiment

161
00:11:48,633 --> 00:11:53,120
zero, B always outputs zero. So the
probability that outputs one is zero.

162
00:11:53,120 --> 00:11:57,827
However in experiment one, we're given the
encryption of M1. So how likely is

163
00:11:57,827 --> 00:12:02,783
adversary B to output one in experiment
one well it always outputs one, again by

164
00:12:02,783 --> 00:12:07,428
the properties of algorithm A and
therefore the advantage basically is one.

165
00:12:07,428 --> 00:12:12,384
So this is a huge advantage, it's as big
as it's gonna get. Which means that this

166
00:12:12,384 --> 00:12:17,091
adversary completely broke the system.
Okay so we consider, so under semantic

167
00:12:17,091 --> 00:12:22,295
security basically just deducing the least
significant bits is enough to completely

168
00:12:22,295 --> 00:12:27,187
break the system under a semantic security
definition. Okay, now the interesting

169
00:12:27,187 --> 00:12:32,388
thing here of course is that this is not
just about the least significant bit, in

170
00:12:32,388 --> 00:12:37,117
fact of the message for
example the most significant bit, you know

171
00:12:37,117 --> 00:12:42,040
bit number seven Maybe the XOR of all the bits in the message and so on

172
00:12:42,040 --> 00:12:46,552
and so forth any kind of information, any
bit about the plain text they can be

173
00:12:46,552 --> 00:12:50,814
learned basically would mean that the
system is not semantically secure. So

174
00:12:50,814 --> 00:12:55,532
basically all the adversary would have to
do would be to come up with two messages

175
00:12:55,532 --> 00:13:00,249
M0 and M1 such that under one thing that
they learned it's the value zero and then

176
00:13:00,249 --> 00:13:04,626
the other thing, the value, is one. So for
example if A was able to learn the XOR

177
00:13:04,626 --> 00:13:08,775
bits of the message then M0
and M1 would just have different

178
00:13:08,775 --> 00:13:13,265
XOR for all the bits of their
messages and then this adversary A would

179
00:13:13,265 --> 00:13:18,174
also be sufficient to break semantic
security. Okay so, basically for cipher

180
00:13:18,174 --> 00:13:23,203
is semantically secure then no
bit of information is revealed to an

181
00:13:23,203 --> 00:13:27,392
efficient adversary. Okay so this is
exactly a concept of perfect secrecy only

182
00:13:27,392 --> 00:13:31,318
applied just efficient adversaries
rather than all adversaries. So the next

183
00:13:31,318 --> 00:13:35,045
thing I wanna show you is that in fact the
one time pad in fact is

184
00:13:35,045 --> 00:13:38,821
semantically secure, they better be
semantically secure because it's in fact,

185
00:13:38,821 --> 00:13:42,773
it's more than that it's perfectly secure.
So let's see why that's true. Well so

186
00:13:42,773 --> 00:13:47,010
again it's one of these experiments, so
suppose we have an adversary that claims

187
00:13:47,010 --> 00:13:51,449
to break semantic security of the one time
pad. The first the adversary is gonna do

188
00:13:51,449 --> 00:13:55,874
is he's gonna output two messages M0
and M1 of the same length.

189
00:13:55,874 --> 00:13:59,667
Now what does he get back as a challenge? As a
challenge, he gets either the encryption

190
00:13:59,667 --> 00:14:03,988
of M0 or the encryption of M1 under
the one time pad.

191
00:14:03,988 --> 00:14:07,886
And he's trying to distinguish between those two possible
cipher texts that he gets, right?

192
00:14:07,886 --> 00:14:12,259
In experiment zero, he gets the encryption of
M0 and in experiment one, he gets the

193
00:14:12,259 --> 00:14:16,579
encryption of M1. Well so let me ask
you, so what is the advantage of adversary

194
00:14:16,579 --> 00:14:21,297
A against the one time patent? So I
remember that the property of the ones I

195
00:14:21,297 --> 00:14:26,208
had is that, that K, XOR M0 is
distributed identically to K, XOR M1.

196
00:14:26,208 --> 00:14:31,187
In other words, these distributions are
absolutely identical distribution,

197
00:14:31,187 --> 00:14:36,026
distributions, identical. This is a
property of XOR. If we XOR the random pad

198
00:14:36,026 --> 00:14:40,674
K with anything, either M0 or M1,
we get uniform distribution. So in both

199
00:14:40,674 --> 00:14:45,382
cases, algorithm A is given as in input
exactly the same distribution. Maybe the

200
00:14:45,382 --> 00:14:50,209
uniform distribution on cipher texts. And
therefore it's gonna behave exactly the

201
00:14:50,209 --> 00:14:55,036
same in both cases because it was given
the exact distribution as input. And as a

202
00:14:55,036 --> 00:14:59,699
result, its advantage is zero, which means
that the onetime pad is semantically

203
00:14:59,723 --> 00:15:04,148
secure. Now the interesting thing here is
not only is it semantically secure, it's

204
00:15:04,148 --> 00:15:08,244
semantically secure for all adversaries.
We don't even have to restrict the

205
00:15:08,244 --> 00:15:12,450
adversaries to be efficient. No adversary,
doesn't matter how smart you are, no

206
00:15:12,450 --> 00:15:16,875
adversary will be able to distinguish K XOR M0 from K XOR M1 because the two

207
00:15:16,875 --> 00:15:21,299
distributions are identical. Okay, so the
one time pad is semantically secure. Okay,

208
00:15:21,299 --> 00:15:25,559
so that completes our definition of
semantic security so the next thing we're

209
00:15:25,559 --> 00:15:30,093
going to do is prove to the secure PRG,
in fact implies that the stream cipher is

210
00:15:30,093 --> 00:15:31,186
semantically secure.