WEBVTT

00:00:00.000 --> 00:00:03.583
Last week, we learned a number theory
that's needed for public key encryption.

00:00:03.583 --> 00:00:07.166
This week we're gonna put this knowledge
to work, and we're gonna construct a

00:00:07.166 --> 00:00:10.889
number of secure public key encryption
schemes. But first, we need to define what

00:00:10.889 --> 00:00:14.565
is public key encryption, and what does it
mean for public key encryption to be

00:00:14.565 --> 00:00:18.241
secure? So let me remind you that in a
public key encryption scheme, there is an

00:00:18.241 --> 00:00:21.778
encryption algorithm which is usually
denoted by E, and there's a decryption

00:00:21.778 --> 00:00:25.361
algorithm which we denote by D. However
here, the encryption algorithm takes a

00:00:25.361 --> 00:00:29.477
public key, while the decryption algorithm
takes a secret key. This pair is called a

00:00:29.477 --> 00:00:34.356
key pair. And the public key is used for
encrypting messages while the secret key

00:00:34.356 --> 00:00:39.002
is used for decrypting messages. So in
this case a message m is encrypting using

00:00:39.002 --> 00:00:43.880
the public key and what comes out of that
is the ciphertext c. And similarly the

00:00:43.880 --> 00:00:48.643
ciphertext is fed into the decryption
algorithm and using the secret key, what

00:00:48.643 --> 00:00:53.577
comes out of the decryption algorithm is
the original message m. Now public key

00:00:53.577 --> 00:00:57.989
encryption has many applications. Last
week we saw the classic application which

00:00:57.989 --> 00:01:02.455
is session setup, namely, key exchange and
for now we're just looking at key exchange

00:01:02.455 --> 00:01:06.867
that is secure against eavesdropping only.
And if you remember the way the protocol

00:01:06.867 --> 00:01:11.227
works, basically Alice, what she would do
is she would generate a public key secret

00:01:11.227 --> 00:01:15.546
pair. She would send the public key to
Bob. Bob will generate a random X, which

00:01:15.546 --> 00:01:20.136
is gonna serve as their shared secret, and
then he sends X encrypted to Alice,

00:01:20.136 --> 00:01:24.904
encrypted under her public key. Alice can
decrypt, recover X and now both of them

00:01:24.904 --> 00:01:29.554
have this shared secret X which they can
use to communicate securely with one

00:01:29.554 --> 00:01:34.143
another. The attacker, of course, all he
gets to see is just the public key, the

00:01:34.143 --> 00:01:38.972
encryption of X under the public key, from
which he should not be able to get any

00:01:38.972 --> 00:01:43.800
information about X. And we are going to
define that more precisely to understand

00:01:43.800 --> 00:01:48.507
what it means to not be able to learn
anything about X. Public key encryption

00:01:48.507 --> 00:01:52.522
actually has many other applications. For
example, it's very useful in

00:01:52.522 --> 00:01:57.235
non-interactive applications. So think of
an email system for example. So here, Bob

00:01:57.235 --> 00:02:01.716
wants to send mail to Alice, and as Bob
sends the email, the email passes from

00:02:01.716 --> 00:02:06.603
mail relay to mail relay until finally it
reaches Alice, at which point Alice should

00:02:06.603 --> 00:02:10.502
decrypt. The way the email system is set
up, is designed for kind of

00:02:10.502 --> 00:02:15.045
non-interactive settings where Bob sends
the email. And then Alice is supposed to

00:02:15.045 --> 00:02:19.195
receive it. And Alice should not be to
communicate with Bob in order to decrypt

00:02:19.195 --> 00:02:23.502
the email. So in this case, because of the
non-interactivity, there's no opportunity

00:02:23.502 --> 00:02:27.705
for setting up a shared secret between
Alice and Bob. So in this case, what would

00:02:27.705 --> 00:02:32.169
happen is, Bob basically would, would send
the email encrypted, using Alice's, public

00:02:32.169 --> 00:02:36.571
key. So he sends the email. Anyone in the
world can send the email encrypted to

00:02:36.571 --> 00:02:41.103
Alice, encrypted using her public key.
When Alice receives this email, she uses

00:02:41.103 --> 00:02:45.748
her secret key to decrypt the ciphertext and recover the plain text message.

00:02:45.748 --> 00:02:50.507
Of course the one caveat in a system like
this is that in fact Bob needs to somehow

00:02:50.507 --> 00:02:54.804
obtain Alice's public key So for now we
are just going to assume Bob already has

00:02:54.804 --> 00:02:58.297
Alice's public key, but later on,
actually, when we talk about digital

00:02:58.297 --> 00:03:02.457
signatures we're gonna see how, this can
actually be done very efficiently using what's

00:03:02.457 --> 00:03:06.823
called public key management and as I said
we'll actually get back to that at a later

00:03:06.823 --> 00:03:10.931
time. But the main thing I want you to
remember, is that public key encryption is

00:03:10.931 --> 00:03:14.578
used for session setup. This is very
common on the web, where public key

00:03:14.578 --> 00:03:18.840
encryption is used to set up a secure key
between a web browser and, and web server.

00:03:18.840 --> 00:03:22.898
And public key encryption is also very
useful for non-interactive applications,

00:03:22.898 --> 00:03:26.390
where anyone in the world,
non-interactively, needs to send a message

00:03:26.390 --> 00:03:30.653
to Alice, they can encrypt the message using
Alice's public key, and Alice can decrypt

00:03:30.653 --> 00:03:36.105
and recover the plain text. So let me
remind you in a bit more detail what a

00:03:36.105 --> 00:03:40.347
public key encryption system is. Well,
it's made up of three algorithms G, E, and

00:03:40.347 --> 00:03:44.431
D. G is called the key generation algorithm.
Basically what it will do is it will

00:03:44.431 --> 00:03:48.672
generate this key pair, the public key and
the secret key. As written here, G takes

00:03:48.672 --> 00:03:53.018
no arguments, but in real life, G actually
does take an argument called the security

00:03:53.018 --> 00:03:57.260
parameter which specifies the size of the
keys that are generated by this key

00:03:57.260 --> 00:04:01.731
generation algorithm. Then there are these
encryption algorithms as usual that take a

00:04:01.731 --> 00:04:06.051
public key and a message and produce a
ciphertext in a decryption algorithm that

00:04:06.051 --> 00:04:10.530
takes the corresponding secret key and a
ciphertext and it produces a corresponding

00:04:10.530 --> 00:04:14.955
message. And as usual for consistency we
say that if we encrypt a message under a

00:04:14.955 --> 00:04:19.380
given public key and then decrypt with a
corresponding secret key we should get the

00:04:19.380 --> 00:04:23.852
original message back. Now what does it
mean for a public key encryption to be

00:04:23.852 --> 00:04:27.913
secure? I'm going to start off by
defining, security against eavesdropping.

00:04:27.913 --> 00:04:32.002
And then we're going to define security
against active attacks. So the way to

00:04:32.002 --> 00:04:36.237
define security against eavesdropping is
very similar to the symmetric case we've

00:04:36.237 --> 00:04:40.626
already this last week so we're gonna go
through this quickly just as a review.

00:04:40.626 --> 00:04:44.808
Basically the attack game is defined as
follows. We defined these two experiments,

00:04:44.808 --> 00:04:49.249
experiment zero and experiment one. At in
either experiment the challenger is gonna

00:04:49.249 --> 00:04:52.965
generate a public and a secret key pair. He's gonna give the public

00:04:52.965 --> 00:04:57.342
key to the adversary. The adversary's
gonna output two messages m0 and m1 of

00:04:57.342 --> 00:05:01.663
equal length and then what he gets back is
either the encryption of m0 or the

00:05:01.663 --> 00:05:06.039
encryption of m1. In experiment zero he
gets the encryption of m0. In experiment

00:05:06.039 --> 00:05:10.748
one he gets the encryption of m1. And then
the adversary is supposed to say which one

00:05:10.748 --> 00:05:15.240
did he get. Did he get the encryption of
m0 or did he get the encryption of m1? So

00:05:15.240 --> 00:05:19.676
in this game, the attacker only gets one
ciphertext. This corresponds to an

00:05:19.676 --> 00:05:24.226
eavesdropping attack where he simply
eavesdropped on that ciphertext C. And now

00:05:24.226 --> 00:05:28.719
his goal is to tell whether the ciphertext
C i s the encryption of M0 or M1. No

00:05:28.719 --> 00:05:34.221
tampering on the ciphertext C is allowed
just yet. And as usual we say that the

00:05:34.221 --> 00:05:38.206
public key encryption scheme is
semantically secure if the attacker cannot

00:05:38.206 --> 00:05:42.085
distinguish experiment zero from
experiment one. In other words he cannot

00:05:42.085 --> 00:05:47.757
tell whether he got the encryption of M0,
or the encryption of M1. Before we move on

00:05:47.757 --> 00:05:52.311
to active attacks, I want to mention a
quick relation between the definition we

00:05:52.311 --> 00:05:56.105
just saw, And the definition of, of
eavesdropping security for symmetric

00:05:56.105 --> 00:06:00.438
ciphers. If you remember, when we talked
about eavesdropping security for symmetric

00:06:00.438 --> 00:06:04.771
ciphers, we distinguished between the case
where the key is used once, and the case

00:06:04.771 --> 00:06:08.998
where the key is used multiple times. And,
in fact we saw that, there's a clear

00:06:08.998 --> 00:06:13.357
separation. For example, the onetime pad.
Is secure if the key is used to encrypt a

00:06:13.357 --> 00:06:17.382
single message, but is completely insecure
if the key is used to encrypt multiple

00:06:17.382 --> 00:06:21.358
messages. And in fact we had two different
definitions if you remember we had a

00:06:21.358 --> 00:06:25.383
definition for one-time security, and then
we had a separate definition, which was

00:06:25.383 --> 00:06:29.700
stronger, when the key was used multiple
times. The definition that I showed you on

00:06:29.700 --> 00:06:34.043
the previous slide's very similar to the
definition of one time security for

00:06:34.043 --> 00:06:38.499
symmetric ciphers. And in fact, it turns
out that for public key encryption, if a

00:06:38.499 --> 00:06:43.124
system is secure under a onetime key, in a
sense, it's also secure for a many time

00:06:43.124 --> 00:06:47.929
key. So in other words, we don't have to
explicitly give the attacker the ability

00:06:47.929 --> 00:06:53.171
to, request encryptions of messages of his
choice. Because he could just create those

00:06:53.171 --> 00:06:57.870
encryptions all by himself. He is given
the public key, and therefore he can by

00:06:57.870 --> 00:07:04.672
himself encrypt any message he likes. As a
result any public key secret pair in some

00:07:04.672 --> 00:07:09.289
sense inherently is used to encrypt
multiple messages because the attacker

00:07:09.289 --> 00:07:13.905
could have just encrypted many, many
messages of his choice using the given

00:07:13.905 --> 00:07:18.891
public key that we just gave him in the
first step. And so, as a result in fact,

00:07:18.891 --> 00:07:23.692
the definition of one time security is
enough to imply many time security and

00:07:23.692 --> 00:07:28.801
that's why we refer to the concept as
indistinguishability under a chosen plain

00:07:28.801 --> 00:07:34.012
text attach. So this is just a minor point
to explain why the settings of public

00:07:34.012 --> 00:07:37.770
encryption, we don't need a more
complicated definition to capture

00:07:37.770 --> 00:07:42.515
eavesdropping security. Now that we
understand eavesdropping security, let's

00:07:42.515 --> 00:07:47.343
look at more powerful adversaries that can
actually mount active attacks. So, in

00:07:47.343 --> 00:07:51.585
particular, let's look at the email
example. So here, we have our friend Bob

00:07:51.585 --> 00:07:56.228
who wants to send mail to his friend
Caroline. And Caroline happens to have, an

00:07:56.228 --> 00:08:00.699
account at Gmail. And the way this works
is basically, the email is sent to the

00:08:00.699 --> 00:08:05.514
Gmail server, encrypted. The Gmail server
decrypts the email, looks at the, intended

00:08:05.514 --> 00:08:09.297
recipients. And then, if it's, the
intended recipient is Caroline, it

00:08:09.297 --> 00:08:13.653
forwards the email to Caroline. If the
intended recipient is the attacker, it

00:08:13.653 --> 00:08:18.573
forwards the email to the attacker. This
is similar to how Gmail actually works

00:08:18.573 --> 00:08:23.441
because the sender would send the email
encrypted over SSL to the Gmail server.

00:08:23.441 --> 00:08:28.087
The Gmail server would terminate the SSL
and then forward the email to the

00:08:28.087 --> 00:08:33.081
appropriate recipients. Now suppose Bob
encrypts the email using a system that

00:08:33.081 --> 00:08:37.764
allows the adversary to tamper with the
ciphertext without being detected. For

00:08:37.764 --> 00:08:42.387
example, imagine this email is encrypted
using Counter Mode, or something like

00:08:42.387 --> 00:08:47.070
that. Then when the attacker intercepts
this email, he can change the recipient,

00:08:47.070 --> 00:08:50.732
so that now the recipient says
attacker@gmail.com, and we know that for

00:08:50.732 --> 00:08:55.415
Counter Mode, for example, this is quite
easy to do. The attacker knows that the

00:08:55.415 --> 00:09:00.278
email is intended for Caroline, he is just
interested in the email body. So he can

00:09:00.278 --> 00:09:04.226
easily change the email recipient to
attacker@gmail.com and now when the server

00:09:04.226 --> 00:09:08.129
receives the email, he will decrypt it,
see that the recipient is supposed to be

00:09:08.129 --> 00:09:12.033
attacker, and forward the body to the
attacker. And now the attacker was able to

00:09:12.033 --> 00:09:16.022
read the body of the email that was
intended for Caroline. So this is a

00:09:16.022 --> 00:09:21.198
classic example of an active attack, and
you notice what the attacker could do

00:09:21.198 --> 00:09:26.174
here, is it could decrypt any ciphertext
where the intended recipient is to:

00:09:26.174 --> 00:09:31.548
attacker. So any ciphertext where the plain
text begins with the words "to: attacker". So our goal is

00:09:31.548 --> 00:09:36.657
to design public key systems that are
secure, even if the attacker can tamper

00:09:36.657 --> 00:09:42.999
with ciphertext and possibly decrypt
certain cyphertexts. And again, I want to

00:09:42.999 --> 00:09:47.612
emphasize that here the attacker's goal
was to get the message body. The attacker

00:09:47.612 --> 00:09:52.055
already knew that the email is intended
for Caroline. And all he had to do was

00:09:52.055 --> 00:09:56.863
just change the, intended recipient. So
this tampering attack motivates the

00:09:56.863 --> 00:10:01.620
definition of chosen ciphertext security.
And in fact this is the standard notion of

00:10:01.620 --> 00:10:07.462
security for public key encryption. So let
me explain how the attack [here procedes] and as I

00:10:07.462 --> 00:10:11.899
said our goal is to build systems that are
secure under this very, very conservative

00:10:11.899 --> 00:10:15.756
notion of encryption. So we have an
encryption scheme (G, E, D). And let's say

00:10:15.756 --> 00:10:20.140
that's defined over a message space and
a ciphertext (M, C) and as usual we're

00:10:20.140 --> 00:10:24.313
gonna define two experiments, experiment
zero, and experiment one. So 'b' here

00:10:24.313 --> 00:10:28.222
says whether the challenger is
implementing experiment zero or experiment

00:10:28.222 --> 00:10:32.659
one. The challenger begins by generating a
public key and a secret key, and then gives

00:10:32.659 --> 00:10:37.254
the public key to the adversary. Now the
adversary can say, "Well, here are a bunch

00:10:37.254 --> 00:10:41.611
of ciphertexts, please decrypt them for
me." So here the adversary submits

00:10:41.611 --> 00:10:46.452
ciphertext C1 and he gets the decryption
of ciphertext C1, namely M1. And he gets

00:10:46.452 --> 00:10:51.414
to do this again and again, so he submits
ciphertext C2, and he gets the decryption,

00:10:51.414 --> 00:10:56.195
which is M2, ciphertext C3, and he gets
the decryption M3, and so on and so forth.

00:10:56.195 --> 00:11:00.188
Finally, the adversary says, "This
squaring phase is over," and now he

00:11:00.188 --> 00:11:04.485
submits basically two equal length
messages, M0 and M1 as normal, and he

00:11:04.485 --> 00:11:08.820
receives in response the challenge
ciphertext C, Which is the encryption of M

00:11:08.820 --> 00:11:13.052
zero or the encryption of M one. Depending
on whether we're in experiment zero or

00:11:13.052 --> 00:11:17.003
experiment one. Now, the adversary can
continue to issue these ciphertext

00:11:17.003 --> 00:11:21.063
queries. So he can continue to issue,
decryption requests. So he submits a

00:11:21.063 --> 00:11:25.447
ciphertext, and he gets a decryption of
that ciphertext, but of course, now, there

00:11:25.447 --> 00:11:29.994
has to be a caveat. If the attacker could
submit arbitrary ciphertext of his choice,

00:11:29.994 --> 00:11:34.270
of course, he could break the challenge.
What he would do is he would submit the

00:11:34.270 --> 00:11:38.506
challenge ciphertext C as a decryption
query. And then he would be told whether

00:11:38.506 --> 00:11:42.665
in the challenge phase he was given the
encryption of M0 or the encryption of M1.

00:11:42.665 --> 00:11:46.825
As a result we put this limitation here,
that says that he can in fact submit any

00:11:46.825 --> 00:11:51.031
ciphertext of his choice except. For the
challenge ciphertext. So the attacker

00:11:51.031 --> 00:11:55.034
could ask for the decryption of any
ciphertext of his choice other than the

00:11:55.034 --> 00:11:59.297
challenge ciphertext. And even though he
was given all these decryptions, he still

00:11:59.297 --> 00:12:03.196
shouldn't be able to tell whether he was
given the encryption of M0 or the

00:12:03.196 --> 00:12:09.212
encryption of M1. So you notice this is a
very conservative definition. It gives the

00:12:09.212 --> 00:12:14.113
attacker more power than what we saw in
the previous slide. On the previous slide,

00:12:14.113 --> 00:12:18.710
the attacker could only decrypt messages
where the plain text began with the words

00:12:18.710 --> 00:12:23.611
"to: attacker". Here, we're saying the attacker
can decrypt any ciphertext of his choice,

00:12:23.611 --> 00:12:29.717
as long as it's different from the
challenge ciphertext C. Okay? And then his

00:12:29.717 --> 00:12:34.094
goal is to say whether the challenge
ciphertext is the encryption of M0 or the

00:12:34.094 --> 00:12:37.918
encryption of M1. And as usual, if he
can't do that, in other words, his

00:12:37.918 --> 00:12:42.351
behavior in experiment zero is basically
the same as his behavior in experiment

00:12:42.351 --> 00:12:46.839
one, so he wasn't able to distinguish the
encryption of M0 from the encryption of

00:12:46.839 --> 00:12:51.219
M1, even though he had all this power Then
we say that the system is chosen

00:12:51.219 --> 00:12:55.877
ciphertext secure, CCA secure. And
sometimes there is an acronym, the acronym

00:12:55.877 --> 00:13:00.596
for this is indistinguishability under a
chosen ciphertext attack, but I'm just

00:13:00.596 --> 00:13:05.745
gonna say CCA secured. So let's see how
this captures, the email example we saw

00:13:05.745 --> 00:13:10.587
before. So suppose the encryption system
being used is such that just given the

00:13:10.587 --> 00:13:15.429
encryption of a message the attacker can
change the intended recipient from to

00:13:15.429 --> 00:13:20.129
Alice say to, to Charlie. Then here's how
we would win the CCA game. Well in the

00:13:20.129 --> 00:13:25.033
first step he's given the public key of
course. And then what the attacker will do

00:13:25.033 --> 00:13:29.578
is he would issue two equal length
messages, namely in the first message, the

00:13:29.578 --> 00:13:33.943
body is zero. In the second message the
body is one. But both messages are

00:13:33.943 --> 00:13:39.890
intended for Alice. And in response, he
would be given the challenge ciphertext C.

00:13:39.890 --> 00:13:45.130
Okay, so now here we have our challenge
ciphertext C. Now what the attacker is

00:13:45.130 --> 00:13:49.961
gonna do is he's gonna use his, his
ability here to modify the intended

00:13:49.961 --> 00:13:55.269
recipient. And he's gonna send back a
ciphertext C', where C' is the encryption

00:13:55.269 --> 00:14:01.760
of the message to Charlie with body being
the challenge body b. So if you remember is

00:14:01.760 --> 00:14:07.822
either zero or one. Now, because the plain
text is different, we know that the

00:14:07.822 --> 00:14:12.486
ciphertext must also be different. So in
particular, C prime must be different from

00:14:12.486 --> 00:14:17.206
the challenge ciphertext C, yeah? So the
C prime here must be different from C. And

00:14:17.206 --> 00:14:21.758
as a result, the poor challenger now has
to decrypt by definition of the CCA game.

00:14:21.758 --> 00:14:26.141
The challenger must decrypt any ciphertext
that's not equal to a challenge

00:14:26.141 --> 00:14:30.648
ciphertext. So the challenger decrypts
give the adversary M prime. Basically he

00:14:30.648 --> 00:14:35.256
gave the adversary B, and now the
adversary can output the challenge B and

00:14:35.256 --> 00:14:40.293
he wins the game with advantage one. So
he's advantage with this particular scheme

00:14:40.293 --> 00:14:45.143
is one. So, simply because the attacker
was able to change the challenge ciphertext

00:14:45.146 --> 00:14:49.999
from one recipient to another that
allows him to, to win the CCA game with

00:14:49.999 --> 00:14:55.003
advantage one. So as I said, chosen
ciphertext security turns out actually is

00:14:55.003 --> 00:14:59.327
the correct notion of security for public
key encryption systems. And it's a very,

00:14:59.327 --> 00:15:03.651
very interesting concept, right? Basically, somehow
even though the attacker has this ability

00:15:03.651 --> 00:15:07.839
to decrypt anything he wants. Other than
the challenge ciphertext, still he can't

00:15:07.839 --> 00:15:12.028
learn what the challenge ciphertext is.
And so the goal for the remainder of this module

00:15:12.028 --> 00:15:16.275
and actually the next module as well, is
to construct CCA secure systems. It's

00:15:16.275 --> 00:15:20.093
actually quite remarkable that this is
achievable and I'm going to show you

00:15:20.093 --> 00:15:24.310
exactly how to do it. And in fact those
CCA secure systems that we build are the

00:15:24.310 --> 00:15:28.579
ones that are used in the real world. And
every time a system has tried to deploy

00:15:28.737 --> 00:15:33.007
a public key encryption mechanism that's not
CCA secure someone has come up with an

00:15:33.007 --> 00:15:37.487
attack and was able to break it. And we're
going to see some of these example attacks

00:15:37.487 --> 00:15:39.280
actually in the next few segments.