1
00:00:00,000 --> 00:00:03,583
Last week, we learned a number theory
that's needed for public key encryption.

2
00:00:03,583 --> 00:00:07,166
This week we're gonna put this knowledge
to work, and we're gonna construct a

3
00:00:07,166 --> 00:00:10,889
number of secure public key encryption
schemes. But first, we need to define what

4
00:00:10,889 --> 00:00:14,565
is public key encryption, and what does it
mean for public key encryption to be

5
00:00:14,565 --> 00:00:18,241
secure? So let me remind you that in a
public key encryption scheme, there is an

6
00:00:18,241 --> 00:00:21,778
encryption algorithm which is usually
denoted by E, and there's a decryption

7
00:00:21,778 --> 00:00:25,361
algorithm which we denote by D. However
here, the encryption algorithm takes a

8
00:00:25,361 --> 00:00:29,477
public key, while the decryption algorithm
takes a secret key. This pair is called a

9
00:00:29,477 --> 00:00:34,356
key pair. And the public key is used for
encrypting messages while the secret key

10
00:00:34,356 --> 00:00:39,002
is used for decrypting messages. So in
this case a message m is encrypting using

11
00:00:39,002 --> 00:00:43,880
the public key and what comes out of that
is the ciphertext c. And similarly the

12
00:00:43,880 --> 00:00:48,643
ciphertext is fed into the decryption
algorithm and using the secret key, what

13
00:00:48,643 --> 00:00:53,577
comes out of the decryption algorithm is
the original message m. Now public key

14
00:00:53,577 --> 00:00:57,989
encryption has many applications. Last
week we saw the classic application which

15
00:00:57,989 --> 00:01:02,455
is session setup, namely, key exchange and
for now we're just looking at key exchange

16
00:01:02,455 --> 00:01:06,867
that is secure against eavesdropping only.
And if you remember the way the protocol

17
00:01:06,867 --> 00:01:11,227
works, basically Alice, what she would do
is she would generate a public key secret

18
00:01:11,227 --> 00:01:15,546
pair. She would send the public key to
Bob. Bob will generate a random X, which

19
00:01:15,546 --> 00:01:20,136
is gonna serve as their shared secret, and
then he sends X encrypted to Alice,

20
00:01:20,136 --> 00:01:24,904
encrypted under her public key. Alice can
decrypt, recover X and now both of them

21
00:01:24,904 --> 00:01:29,554
have this shared secret X which they can
use to communicate securely with one

22
00:01:29,554 --> 00:01:34,143
another. The attacker, of course, all he
gets to see is just the public key, the

23
00:01:34,143 --> 00:01:38,972
encryption of X under the public key, from
which he should not be able to get any

24
00:01:38,972 --> 00:01:43,800
information about X. And we are going to
define that more precisely to understand

25
00:01:43,800 --> 00:01:48,507
what it means to not be able to learn
anything about X. Public key encryption

26
00:01:48,507 --> 00:01:52,522
actually has many other applications. For
example, it's very useful in

27
00:01:52,522 --> 00:01:57,235
non-interactive applications. So think of
an email system for example. So here, Bob

28
00:01:57,235 --> 00:02:01,716
wants to send mail to Alice, and as Bob
sends the email, the email passes from

29
00:02:01,716 --> 00:02:06,603
mail relay to mail relay until finally it
reaches Alice, at which point Alice should

30
00:02:06,603 --> 00:02:10,502
decrypt. The way the email system is set
up, is designed for kind of

31
00:02:10,502 --> 00:02:15,045
non-interactive settings where Bob sends
the email. And then Alice is supposed to

32
00:02:15,045 --> 00:02:19,195
receive it. And Alice should not be to
communicate with Bob in order to decrypt

33
00:02:19,195 --> 00:02:23,502
the email. So in this case, because of the
non-interactivity, there's no opportunity

34
00:02:23,502 --> 00:02:27,705
for setting up a shared secret between
Alice and Bob. So in this case, what would

35
00:02:27,705 --> 00:02:32,169
happen is, Bob basically would, would send
the email encrypted, using Alice's, public

36
00:02:32,169 --> 00:02:36,571
key. So he sends the email. Anyone in the
world can send the email encrypted to

37
00:02:36,571 --> 00:02:41,103
Alice, encrypted using her public key.
When Alice receives this email, she uses

38
00:02:41,103 --> 00:02:45,748
her secret key to decrypt the ciphertext and recover the plain text message.

39
00:02:45,748 --> 00:02:50,507
Of course the one caveat in a system like
this is that in fact Bob needs to somehow

40
00:02:50,507 --> 00:02:54,804
obtain Alice's public key So for now we
are just going to assume Bob already has

41
00:02:54,804 --> 00:02:58,297
Alice's public key, but later on,
actually, when we talk about digital

42
00:02:58,297 --> 00:03:02,457
signatures we're gonna see how, this can
actually be done very efficiently using what's

43
00:03:02,457 --> 00:03:06,823
called public key management and as I said
we'll actually get back to that at a later

44
00:03:06,823 --> 00:03:10,931
time. But the main thing I want you to
remember, is that public key encryption is

45
00:03:10,931 --> 00:03:14,578
used for session setup. This is very
common on the web, where public key

46
00:03:14,578 --> 00:03:18,840
encryption is used to set up a secure key
between a web browser and, and web server.

47
00:03:18,840 --> 00:03:22,898
And public key encryption is also very
useful for non-interactive applications,

48
00:03:22,898 --> 00:03:26,390
where anyone in the world,
non-interactively, needs to send a message

49
00:03:26,390 --> 00:03:30,653
to Alice, they can encrypt the message using
Alice's public key, and Alice can decrypt

50
00:03:30,653 --> 00:03:36,105
and recover the plain text. So let me
remind you in a bit more detail what a

51
00:03:36,105 --> 00:03:40,347
public key encryption system is. Well,
it's made up of three algorithms G, E, and

52
00:03:40,347 --> 00:03:44,431
D. G is called the key generation algorithm.
Basically what it will do is it will

53
00:03:44,431 --> 00:03:48,672
generate this key pair, the public key and
the secret key. As written here, G takes

54
00:03:48,672 --> 00:03:53,018
no arguments, but in real life, G actually
does take an argument called the security

55
00:03:53,018 --> 00:03:57,260
parameter which specifies the size of the
keys that are generated by this key

56
00:03:57,260 --> 00:04:01,731
generation algorithm. Then there are these
encryption algorithms as usual that take a

57
00:04:01,731 --> 00:04:06,051
public key and a message and produce a
ciphertext in a decryption algorithm that

58
00:04:06,051 --> 00:04:10,530
takes the corresponding secret key and a
ciphertext and it produces a corresponding

59
00:04:10,530 --> 00:04:14,955
message. And as usual for consistency we
say that if we encrypt a message under a

60
00:04:14,955 --> 00:04:19,380
given public key and then decrypt with a
corresponding secret key we should get the

61
00:04:19,380 --> 00:04:23,852
original message back. Now what does it
mean for a public key encryption to be

62
00:04:23,852 --> 00:04:27,913
secure? I'm going to start off by
defining, security against eavesdropping.

63
00:04:27,913 --> 00:04:32,002
And then we're going to define security
against active attacks. So the way to

64
00:04:32,002 --> 00:04:36,237
define security against eavesdropping is
very similar to the symmetric case we've

65
00:04:36,237 --> 00:04:40,626
already this last week so we're gonna go
through this quickly just as a review.

66
00:04:40,626 --> 00:04:44,808
Basically the attack game is defined as
follows. We defined these two experiments,

67
00:04:44,808 --> 00:04:49,249
experiment zero and experiment one. At in
either experiment the challenger is gonna

68
00:04:49,249 --> 00:04:52,965
generate a public and a secret key pair. He's gonna give the public

69
00:04:52,965 --> 00:04:57,342
key to the adversary. The adversary's
gonna output two messages m0 and m1 of

70
00:04:57,342 --> 00:05:01,663
equal length and then what he gets back is
either the encryption of m0 or the

71
00:05:01,663 --> 00:05:06,039
encryption of m1. In experiment zero he
gets the encryption of m0. In experiment

72
00:05:06,039 --> 00:05:10,748
one he gets the encryption of m1. And then
the adversary is supposed to say which one

73
00:05:10,748 --> 00:05:15,240
did he get. Did he get the encryption of
m0 or did he get the encryption of m1? So

74
00:05:15,240 --> 00:05:19,676
in this game, the attacker only gets one
ciphertext. This corresponds to an

75
00:05:19,676 --> 00:05:24,226
eavesdropping attack where he simply
eavesdropped on that ciphertext C. And now

76
00:05:24,226 --> 00:05:28,719
his goal is to tell whether the ciphertext
C i s the encryption of M0 or M1. No

77
00:05:28,719 --> 00:05:34,221
tampering on the ciphertext C is allowed
just yet. And as usual we say that the

78
00:05:34,221 --> 00:05:38,206
public key encryption scheme is
semantically secure if the attacker cannot

79
00:05:38,206 --> 00:05:42,085
distinguish experiment zero from
experiment one. In other words he cannot

80
00:05:42,085 --> 00:05:47,757
tell whether he got the encryption of M0,
or the encryption of M1. Before we move on

81
00:05:47,757 --> 00:05:52,311
to active attacks, I want to mention a
quick relation between the definition we

82
00:05:52,311 --> 00:05:56,105
just saw, And the definition of, of
eavesdropping security for symmetric

83
00:05:56,105 --> 00:06:00,438
ciphers. If you remember, when we talked
about eavesdropping security for symmetric

84
00:06:00,438 --> 00:06:04,771
ciphers, we distinguished between the case
where the key is used once, and the case

85
00:06:04,771 --> 00:06:08,998
where the key is used multiple times. And,
in fact we saw that, there's a clear

86
00:06:08,998 --> 00:06:13,357
separation. For example, the onetime pad.
Is secure if the key is used to encrypt a

87
00:06:13,357 --> 00:06:17,382
single message, but is completely insecure
if the key is used to encrypt multiple

88
00:06:17,382 --> 00:06:21,358
messages. And in fact we had two different
definitions if you remember we had a

89
00:06:21,358 --> 00:06:25,383
definition for one-time security, and then
we had a separate definition, which was

90
00:06:25,383 --> 00:06:29,700
stronger, when the key was used multiple
times. The definition that I showed you on

91
00:06:29,700 --> 00:06:34,043
the previous slide's very similar to the
definition of one time security for

92
00:06:34,043 --> 00:06:38,499
symmetric ciphers. And in fact, it turns
out that for public key encryption, if a

93
00:06:38,499 --> 00:06:43,124
system is secure under a onetime key, in a
sense, it's also secure for a many time

94
00:06:43,124 --> 00:06:47,929
key. So in other words, we don't have to
explicitly give the attacker the ability

95
00:06:47,929 --> 00:06:53,171
to, request encryptions of messages of his
choice. Because he could just create those

96
00:06:53,171 --> 00:06:57,870
encryptions all by himself. He is given
the public key, and therefore he can by

97
00:06:57,870 --> 00:07:04,672
himself encrypt any message he likes. As a
result any public key secret pair in some

98
00:07:04,672 --> 00:07:09,289
sense inherently is used to encrypt
multiple messages because the attacker

99
00:07:09,289 --> 00:07:13,905
could have just encrypted many, many
messages of his choice using the given

100
00:07:13,905 --> 00:07:18,891
public key that we just gave him in the
first step. And so, as a result in fact,

101
00:07:18,891 --> 00:07:23,692
the definition of one time security is
enough to imply many time security and

102
00:07:23,692 --> 00:07:28,801
that's why we refer to the concept as
indistinguishability under a chosen plain

103
00:07:28,801 --> 00:07:34,012
text attach. So this is just a minor point
to explain why the settings of public

104
00:07:34,012 --> 00:07:37,770
encryption, we don't need a more
complicated definition to capture

105
00:07:37,770 --> 00:07:42,515
eavesdropping security. Now that we
understand eavesdropping security, let's

106
00:07:42,515 --> 00:07:47,343
look at more powerful adversaries that can
actually mount active attacks. So, in

107
00:07:47,343 --> 00:07:51,585
particular, let's look at the email
example. So here, we have our friend Bob

108
00:07:51,585 --> 00:07:56,228
who wants to send mail to his friend
Caroline. And Caroline happens to have, an

109
00:07:56,228 --> 00:08:00,699
account at Gmail. And the way this works
is basically, the email is sent to the

110
00:08:00,699 --> 00:08:05,514
Gmail server, encrypted. The Gmail server
decrypts the email, looks at the, intended

111
00:08:05,514 --> 00:08:09,297
recipients. And then, if it's, the
intended recipient is Caroline, it

112
00:08:09,297 --> 00:08:13,653
forwards the email to Caroline. If the
intended recipient is the attacker, it

113
00:08:13,653 --> 00:08:18,573
forwards the email to the attacker. This
is similar to how Gmail actually works

114
00:08:18,573 --> 00:08:23,441
because the sender would send the email
encrypted over SSL to the Gmail server.

115
00:08:23,441 --> 00:08:28,087
The Gmail server would terminate the SSL
and then forward the email to the

116
00:08:28,087 --> 00:08:33,081
appropriate recipients. Now suppose Bob
encrypts the email using a system that

117
00:08:33,081 --> 00:08:37,764
allows the adversary to tamper with the
ciphertext without being detected. For

118
00:08:37,764 --> 00:08:42,387
example, imagine this email is encrypted
using Counter Mode, or something like

119
00:08:42,387 --> 00:08:47,070
that. Then when the attacker intercepts
this email, he can change the recipient,

120
00:08:47,070 --> 00:08:50,732
so that now the recipient says
attacker@gmail.com, and we know that for

121
00:08:50,732 --> 00:08:55,415
Counter Mode, for example, this is quite
easy to do. The attacker knows that the

122
00:08:55,415 --> 00:09:00,278
email is intended for Caroline, he is just
interested in the email body. So he can

123
00:09:00,278 --> 00:09:04,226
easily change the email recipient to
attacker@gmail.com and now when the server

124
00:09:04,226 --> 00:09:08,129
receives the email, he will decrypt it,
see that the recipient is supposed to be

125
00:09:08,129 --> 00:09:12,033
attacker, and forward the body to the
attacker. And now the attacker was able to

126
00:09:12,033 --> 00:09:16,022
read the body of the email that was
intended for Caroline. So this is a

127
00:09:16,022 --> 00:09:21,198
classic example of an active attack, and
you notice what the attacker could do

128
00:09:21,198 --> 00:09:26,174
here, is it could decrypt any ciphertext
where the intended recipient is to:

129
00:09:26,174 --> 00:09:31,548
attacker. So any ciphertext where the plain
text begins with the words "to: attacker". So our goal is

130
00:09:31,548 --> 00:09:36,657
to design public key systems that are
secure, even if the attacker can tamper

131
00:09:36,657 --> 00:09:42,999
with ciphertext and possibly decrypt
certain cyphertexts. And again, I want to

132
00:09:42,999 --> 00:09:47,612
emphasize that here the attacker's goal
was to get the message body. The attacker

133
00:09:47,612 --> 00:09:52,055
already knew that the email is intended
for Caroline. And all he had to do was

134
00:09:52,055 --> 00:09:56,863
just change the, intended recipient. So
this tampering attack motivates the

135
00:09:56,863 --> 00:10:01,620
definition of chosen ciphertext security.
And in fact this is the standard notion of

136
00:10:01,620 --> 00:10:07,462
security for public key encryption. So let
me explain how the attack [here procedes] and as I

137
00:10:07,462 --> 00:10:11,899
said our goal is to build systems that are
secure under this very, very conservative

138
00:10:11,899 --> 00:10:15,756
notion of encryption. So we have an
encryption scheme (G, E, D). And let's say

139
00:10:15,756 --> 00:10:20,140
that's defined over a message space and
a ciphertext (M, C) and as usual we're

140
00:10:20,140 --> 00:10:24,313
gonna define two experiments, experiment
zero, and experiment one. So 'b' here

141
00:10:24,313 --> 00:10:28,222
says whether the challenger is
implementing experiment zero or experiment

142
00:10:28,222 --> 00:10:32,659
one. The challenger begins by generating a
public key and a secret key, and then gives

143
00:10:32,659 --> 00:10:37,254
the public key to the adversary. Now the
adversary can say, "Well, here are a bunch

144
00:10:37,254 --> 00:10:41,611
of ciphertexts, please decrypt them for
me." So here the adversary submits

145
00:10:41,611 --> 00:10:46,452
ciphertext C1 and he gets the decryption
of ciphertext C1, namely M1. And he gets

146
00:10:46,452 --> 00:10:51,414
to do this again and again, so he submits
ciphertext C2, and he gets the decryption,

147
00:10:51,414 --> 00:10:56,195
which is M2, ciphertext C3, and he gets
the decryption M3, and so on and so forth.

148
00:10:56,195 --> 00:11:00,188
Finally, the adversary says, "This
squaring phase is over," and now he

149
00:11:00,188 --> 00:11:04,485
submits basically two equal length
messages, M0 and M1 as normal, and he

150
00:11:04,485 --> 00:11:08,820
receives in response the challenge
ciphertext C, Which is the encryption of M

151
00:11:08,820 --> 00:11:13,052
zero or the encryption of M one. Depending
on whether we're in experiment zero or

152
00:11:13,052 --> 00:11:17,003
experiment one. Now, the adversary can
continue to issue these ciphertext

153
00:11:17,003 --> 00:11:21,063
queries. So he can continue to issue,
decryption requests. So he submits a

154
00:11:21,063 --> 00:11:25,447
ciphertext, and he gets a decryption of
that ciphertext, but of course, now, there

155
00:11:25,447 --> 00:11:29,994
has to be a caveat. If the attacker could
submit arbitrary ciphertext of his choice,

156
00:11:29,994 --> 00:11:34,270
of course, he could break the challenge.
What he would do is he would submit the

157
00:11:34,270 --> 00:11:38,506
challenge ciphertext C as a decryption
query. And then he would be told whether

158
00:11:38,506 --> 00:11:42,665
in the challenge phase he was given the
encryption of M0 or the encryption of M1.

159
00:11:42,665 --> 00:11:46,825
As a result we put this limitation here,
that says that he can in fact submit any

160
00:11:46,825 --> 00:11:51,031
ciphertext of his choice except. For the
challenge ciphertext. So the attacker

161
00:11:51,031 --> 00:11:55,034
could ask for the decryption of any
ciphertext of his choice other than the

162
00:11:55,034 --> 00:11:59,297
challenge ciphertext. And even though he
was given all these decryptions, he still

163
00:11:59,297 --> 00:12:03,196
shouldn't be able to tell whether he was
given the encryption of M0 or the

164
00:12:03,196 --> 00:12:09,212
encryption of M1. So you notice this is a
very conservative definition. It gives the

165
00:12:09,212 --> 00:12:14,113
attacker more power than what we saw in
the previous slide. On the previous slide,

166
00:12:14,113 --> 00:12:18,710
the attacker could only decrypt messages
where the plain text began with the words

167
00:12:18,710 --> 00:12:23,611
"to: attacker". Here, we're saying the attacker
can decrypt any ciphertext of his choice,

168
00:12:23,611 --> 00:12:29,717
as long as it's different from the
challenge ciphertext C. Okay? And then his

169
00:12:29,717 --> 00:12:34,094
goal is to say whether the challenge
ciphertext is the encryption of M0 or the

170
00:12:34,094 --> 00:12:37,918
encryption of M1. And as usual, if he
can't do that, in other words, his

171
00:12:37,918 --> 00:12:42,351
behavior in experiment zero is basically
the same as his behavior in experiment

172
00:12:42,351 --> 00:12:46,839
one, so he wasn't able to distinguish the
encryption of M0 from the encryption of

173
00:12:46,839 --> 00:12:51,219
M1, even though he had all this power Then
we say that the system is chosen

174
00:12:51,219 --> 00:12:55,877
ciphertext secure, CCA secure. And
sometimes there is an acronym, the acronym

175
00:12:55,877 --> 00:13:00,596
for this is indistinguishability under a
chosen ciphertext attack, but I'm just

176
00:13:00,596 --> 00:13:05,745
gonna say CCA secured. So let's see how
this captures, the email example we saw

177
00:13:05,745 --> 00:13:10,587
before. So suppose the encryption system
being used is such that just given the

178
00:13:10,587 --> 00:13:15,429
encryption of a message the attacker can
change the intended recipient from to

179
00:13:15,429 --> 00:13:20,129
Alice say to, to Charlie. Then here's how
we would win the CCA game. Well in the

180
00:13:20,129 --> 00:13:25,033
first step he's given the public key of
course. And then what the attacker will do

181
00:13:25,033 --> 00:13:29,578
is he would issue two equal length
messages, namely in the first message, the

182
00:13:29,578 --> 00:13:33,943
body is zero. In the second message the
body is one. But both messages are

183
00:13:33,943 --> 00:13:39,890
intended for Alice. And in response, he
would be given the challenge ciphertext C.

184
00:13:39,890 --> 00:13:45,130
Okay, so now here we have our challenge
ciphertext C. Now what the attacker is

185
00:13:45,130 --> 00:13:49,961
gonna do is he's gonna use his, his
ability here to modify the intended

186
00:13:49,961 --> 00:13:55,269
recipient. And he's gonna send back a
ciphertext C', where C' is the encryption

187
00:13:55,269 --> 00:14:01,760
of the message to Charlie with body being
the challenge body b. So if you remember is

188
00:14:01,760 --> 00:14:07,822
either zero or one. Now, because the plain
text is different, we know that the

189
00:14:07,822 --> 00:14:12,486
ciphertext must also be different. So in
particular, C prime must be different from

190
00:14:12,486 --> 00:14:17,206
the challenge ciphertext C, yeah? So the
C prime here must be different from C. And

191
00:14:17,206 --> 00:14:21,758
as a result, the poor challenger now has
to decrypt by definition of the CCA game.

192
00:14:21,758 --> 00:14:26,141
The challenger must decrypt any ciphertext
that's not equal to a challenge

193
00:14:26,141 --> 00:14:30,648
ciphertext. So the challenger decrypts
give the adversary M prime. Basically he

194
00:14:30,648 --> 00:14:35,256
gave the adversary B, and now the
adversary can output the challenge B and

195
00:14:35,256 --> 00:14:40,293
he wins the game with advantage one. So
he's advantage with this particular scheme

196
00:14:40,293 --> 00:14:45,143
is one. So, simply because the attacker
was able to change the challenge ciphertext

197
00:14:45,146 --> 00:14:49,999
from one recipient to another that
allows him to, to win the CCA game with

198
00:14:49,999 --> 00:14:55,003
advantage one. So as I said, chosen
ciphertext security turns out actually is

199
00:14:55,003 --> 00:14:59,327
the correct notion of security for public
key encryption systems. And it's a very,

200
00:14:59,327 --> 00:15:03,651
very interesting concept, right? Basically, somehow
even though the attacker has this ability

201
00:15:03,651 --> 00:15:07,839
to decrypt anything he wants. Other than
the challenge ciphertext, still he can't

202
00:15:07,839 --> 00:15:12,028
learn what the challenge ciphertext is.
And so the goal for the remainder of this module

203
00:15:12,028 --> 00:15:16,275
and actually the next module as well, is
to construct CCA secure systems. It's

204
00:15:16,275 --> 00:15:20,093
actually quite remarkable that this is
achievable and I'm going to show you

205
00:15:20,093 --> 00:15:24,310
exactly how to do it. And in fact those
CCA secure systems that we build are the

206
00:15:24,310 --> 00:15:28,579
ones that are used in the real world. And
every time a system has tried to deploy

207
00:15:28,737 --> 00:15:33,007
a public key encryption mechanism that's not
CCA secure someone has come up with an

208
00:15:33,007 --> 00:15:37,487
attack and was able to break it. And we're
going to see some of these example attacks

209
00:15:37,487 --> 00:15:39,280
actually in the next few segments.