Information theoretic security and the one time pad (19 min)

Edit subtitles

0:00 - 0:04

Now that we've seen a few examples of
historic ciphers, all of which are badly
0:04 - 0:07

broken, we're going to switch gears and
talk about ciphers that are much better
0:10 - 0:13

designed. But before we do that, I want to,
first of all, define more precisely what a
0:13 - 0:17

cipher is. So first of all, a cipher is
actually, remember a cipher is made up of
0:17 - 0:22

two algorithms. There's an encryption
algorithm and a decryption algorithm. But
0:22 - 0:26

in fact, a cipher is defined over a
triple. So the set of all possible keys,
0:26 - 0:31

which I'm going to denote by script K, and
sometimes I'll call this the key space,
0:31 - 0:36

it's the set of all possible keys. There's
this set of all possible messages and this
0:36 - 0:40

set of all possible ciphertexts. Okay, so
this triple in some sense defines the
0:40 - 0:45

environment over which the cipher is
defined. And then the cipher itself is a
0:45 - 0:49

pair of efficient algorithms E and D. E is
the encryption algorithm; D is the
0:49 - 0:58

decryption algorithm. Of course, E takes
keys and messages. And outputs ciphertexts.
0:58 - 1:07

And the decryption algorithm takes
keys and ciphertexts. Then outputs messages.
1:07 - 1:12

And the only requirements is that these
algorithms are consistent. They satisfy
1:12 - 1:18

what's called the correctness property. So
for every message in the message space.
1:18 - 1:24

And every key. In the key space, it had
better be the case that if I encrypt the
1:24 - 1:29

message with the key K and then I decrypt
using the same key K I had better get back
1:29 - 1:35

the original message that I started with.
So this equation here is what's called the
1:35 - 1:40

consistency equation and every cipher has
to satisfy it in order to be a cipher
1:40 - 1:45

otherwise it's not possible to decrypt.
One thing I wanted to point out is that I
1:45 - 1:50

put the word efficient here in quotes. And
the reason I do that is because efficient
1:50 - 1:54

means different things to different
people. If you're more inclined towards
1:54 - 1:59

theory, efficient means runs in polynomial
time. So algorithms E and D have to run in
1:59 - 2:03

polynomial time in the size of their
inputs. If you're more practically
2:03 - 2:07

inclined, efficient means runs within a
certain time period. So for example,
2:07 - 2:11

algorithm E might be required to take
under a minute to encrypt a gigabyte of
2:11 - 2:16

data. Now either way, the word efficient
kind of captures both notions and you can
2:16 - 2:20

interpret it in your head whichever way
you'd like. I'm just going to keep
2:20 - 2:24

referring to it as efficient and put
quotes in it as I said if you're theory
2:24 - 2:28

inclined think of it as polynomial time,
otherwise think of it as
2:28 - 2:32

concrete time constraints. Another comment
I want to make is in fact algorithm E.
2:32 - 2:36

It's often a randomized algorithm. What
that means is that as your encrypting
2:36 - 2:41

messages, algorithm E is gonna generate
random bits for itself, and it's going to
2:41 - 2:46

use those random bits to actually encrypt
the messages that are given to it. On the
2:46 - 2:50

other hand the decrypting algorithm is
always deterministic. In other words given
2:50 - 2:55

the key and the ciphertext output is
always the same. Doesn't depend on any
2:55 - 2:59

randomness that's used by the algorithm.
Okay, so now that we understand what a
2:59 - 3:04

cipher is better, I want to kind of show
you the first example of a secure cipher.
3:04 - 3:08

It's called a one time pad It was designed
by Vernam back at the beginning of the
3:08 - 3:13

twentieth century. Before I actually
explain what the cipher is, let's just
3:13 - 3:17

state it in the terminology that we've
just seen. So the message space for the
3:17 - 3:22

Vernam cipher for the one-time pad is the
same as the ciphertext space which is
3:22 - 3:28

just the set of all ended binary strings.
This, this just means all sequences of
3:28 - 3:34

bits, of zero one characters. The key
space is basically the same as the message
3:34 - 3:40

space which is again simply the embed of
all binary strings. So a key in the one
3:40 - 3:46

time pad is simply a random big string,
it's a random sequence of bits. That's as
3:46 - 3:52

long as the message to be encrypted, as
long as the message. Okay, now that we've
3:52 - 3:57

specified kind of what's the cipher's
defined over we can actually specify how
3:57 - 4:02

the cipher works and it's actually really
simple. So essentially the ciphertexts.
4:02 - 4:08

Which is the result of encrypting a
message with a particular key, is simply
4:08 - 4:14

the XOR of the two. Simply K XOR M. [inaudible] see a quick example of
4:14 - 4:20

this. Remember that XOR, this plus
with a circle. XOR means addition
4:20 - 4:27

modulo two. So if I take a particular
message, say, 0110111. And it take a
4:27 - 4:34

particular key, say 1011001. When I
compute the encryption of the message
4:34 - 4:39

using this key, all I do is I
compute the XOR of these two
4:39 - 4:44

strings. In other words, I do addition
module or two bit by bit. So I get one,
4:44 - 4:49

one, zero, one, one, one, zero. That's a
ciphertext. And then how do I decrypt? I
4:49 - 4:53

guess they could do kind of the same
thing. So they decrypt a cipher text using
4:53 - 4:57

a particular key. What I do is I XOR the
key and the ciphertext again. And so all
4:57 - 5:02

we have to verify is that it satisfies the
consistency requirements. And I'm going to
5:02 - 5:06

do this slowly once and then from now on
I'm going to assume this is all, simple to
5:06 - 5:11

you. So we're gonna make, we're gonna have
to make sure that if I decrypt a cipher
5:11 - 5:15

text, that was encrypted using a
particular key, I had better get. Back the
5:15 - 5:20

message M. So what happens here? Well,
let's see. So if I look at the encryption
5:20 - 5:26

of k and m, this is just k XOR m by
definition. What's the decryption of k XOR
5:26 - 5:32

m using k? That's just k XOR (k XOR
m). And so since I said that XOR is
5:32 - 5:37

addition modulo two, addition is
associative, so this is the same as (k XOR k)
5:37 - 5:43

XOR m, which is simply as you know k XOR k is a zero, and zero XOR anything
5:43 - 5:49

is simply m. Okay, so this actually shows
that the one-time pad is in fact a cipher,
5:49 - 5:54

but it doesn't say anything about the
security of the cipher. And we'll talk
5:54 - 5:58

about security of the cipher in just a
minute. First of all, let me quickly ask
5:58 - 6:02

you a question, just to make sure we're
all in sync. Suppose you are given a
6:02 - 6:06

message m and the encryption of that
message using the one time pad. So all
6:06 - 6:11

you're given is the message and the cipher
text. My question to you is, given this
6:11 - 6:15

pair m and c, can you actually figure out
the one-time pad key that was used in the
6:15 - 6:21

creation of c from m?
6:21 - 6:23

So I hope all of you
realize that in fact, given the message in
6:23 - 6:25

the cipher text it's very easy to recover
what the key is. In particular the key is
6:25 - 6:30

simply M XOR C. Then we'll see that if
it's not immediately obvious to you we'll
6:30 - 6:35

see why that's, the case, a little later
in the talk, in the lecture. Okay alright
6:35 - 6:40

so the 1-time pad is a really cool from a
performance point of view all you're doing
6:40 - 6:45

is you exo-ring the key in the message so
it's a super, super fast. Cypher for
6:45 - 6:48

encrypting and decrypting very long
messages. Unfortunately it's very
6:48 - 6:53

difficult to use in practice. The reason
it's difficult to use is the keys are
6:53 - 6:57

essentially as long as the message. So if
Alice and Bob want to communicate
6:57 - 7:01

securely, so you know Alice wants to send
a message end to Bob, before she begins
7:01 - 7:06

even sending the first bit of the message,
she has to transmit a key to Bob that's as
7:06 - 7:11

long as that message. Well, if she has a
way to transmit a secure key to Bob that's
7:11 - 7:15

as long as the message, she might as well
use that same mechanism to also transmit
7:15 - 7:19

the message itself. So the fact that the
key is as long as the message is quite
7:19 - 7:23

problematic and makes the one-time pad
very difficult to use in practice.
7:23 - 7:28

Although we'll see that the idea behind
the one-time pad is actually quite useful
7:28 - 7:33

and we'll see that a little bit later. But
for now I want to focus a little bit on
7:33 - 7:37

security. So the obvious questions are,
you know, why is the one-time pad secure?
7:37 - 7:41

Why is it a good cypher? Then to answer
that question, the first thing we have to
7:41 - 7:45

answer is, what is a secure cipher to
begin with? What is a, what makes cipher
7:45 - 7:50

secure? Okay, so the study, security of
ciphers, we have to talk a little bit
7:50 - 7:55

about information theory. And in fact the
first person, to study security of ciphers
7:55 - 8:00

rigorously. Is very famous, you know, the
father of information theory, Claude
8:00 - 8:05

Shannon, and he published a famous paper
back in 1949 where he analyzes the
8:05 - 8:11

security of the one-time pad. So the idea
behind Shannon's definition of security is
8:11 - 8:15

the following. Basically, if all you get
to see is the cypher text, then you should
8:15 - 8:19

learn absolutely nothing about the plain
text. In other words, the cypher text
8:19 - 8:23

should reveal no information about the
plain text. And you see why it took
8:23 - 8:28

someone who invented information theory to
come up with this notion because you have
8:28 - 8:33

to formulize, formally explain what does
information about the plain text actually
8:33 - 8:38

mean. Okay so that's what Shannon did and
so lemme show you Shannon's definition,
8:38 - 8:43

I'll, I'll write it out slowly first. So
what Shannon said is you know suppose we
8:43 - 8:48

have a cypher E D that's defined over
triple K M and C just as before. So KM and
8:48 - 8:53

C define the key space, the message space
and the cypher text space. And when we say
8:53 - 8:58

that the cypher text sorry we say that the
cypher has perfect secrecy if the
8:58 - 9:04

following condition holds. It so happens
for every two messages M zero and M1 in
9:04 - 9:09

the message space. For every two messages
the only requirement I'm gonna put on
9:09 - 9:14

these messages is they have the same
length. Yeah so we're only, we'll see why
9:14 - 9:19

this requirement is necessary in just a
minute. And for every cyphertext, in the
9:19 - 9:25

cyphertext space. Okay? So for every pair
of method messages and for every cipher
9:25 - 9:31

text, it had better be the case that, if I
ask, what is the probability that,
9:31 - 9:37

encrypting N zero with K, woops.
Encrypting N zero with K gives C, okay? So
9:37 - 9:44

how likely is it, if we pick a random key?
How likely is it that when we encrypt N
9:44 - 9:50

zero, we get C. That probability should be
the same as when we encrypt N1. Okay, so
9:50 - 9:55

the probability of encrypting n one and
getting c is exactly the same as the
9:55 - 10:00

probability of encrypting n zero and
getting c. And just as I said where the
10:00 - 10:05

key, the distribution, is over the
distribution on the key. So, the key is
10:05 - 10:10

uniform in the key space. So k is uniform
in k. And I'm often going to write k arrow
10:10 - 10:15

with a little r above it to denote the
fact that k is a random variable that's
10:15 - 10:20

uniformly sampled in the key space k.
Okay, this is the main part of Shannon's
10:20 - 10:26

definition. And let's think a little bit
about what this definition actually says.
10:26 - 10:31

So what does it mean that these two
probabilities are the same? Well, what it
10:31 - 10:36

says is that if I'm an attacker and I
intercept a particular cypher text c, then
10:36 - 10:42

in reality, the probability that the
cypher text is the encryption of n zero is
10:42 - 10:47

exactly the same as the probability that
it's the incryption of n one. Because
10:47 - 10:52

those probabilities are equal. So if I
have, all I have the cypher text C that's
10:52 - 10:58

all I have intercepted I have no idea
whether the cypher text came from M zero
10:58 - 11:03

or the cypher text came from M one because
again the probability of getting C is
11:03 - 11:09

equally likely whether M zero is being
encrypted or M one is being encrypted. So
11:09 - 11:13

here, we have the definition stated again.
And I just wanna write these properties
11:13 - 11:18

again more precisely. So let's write this
again. So what [inaudible] definition
11:18 - 11:22

means is that if I am given a particular
cipher text, I can't tell where it came
11:22 - 11:27

from. I can't tell if it's, if the message
that was encrypted. Is either N zero or N
11:27 - 11:32

one and in fact, this property is true for
all messages. For all these N zero, for
11:32 - 11:37

all N zero and N ones. So not only can I
not tell if'c' came from N zero or N one,
11:37 - 11:42

I can't tell if it came from N two or N
three or N four or N five because all of
11:42 - 11:47

them are equally likely to produce the
cypher text'c'. So what this means really
11:47 - 11:52

is that if you're encrypting messages with
a one time pad then in fact the most
11:52 - 11:57

powerful adversary, I don't really care
how smart you are, the most powerful
11:57 - 12:03

adversary. Can learn nothing about the
plain text, learns nothing about the plain
12:03 - 12:10

text. From the cypher text. So to say it
in one more way, basically what this
12:10 - 12:16

proves is that there's no, there's no
cypher text-only attack on a cypher that
12:16 - 12:23

has perfect secrecy. Now, cypher attacks
actually aren't the only attacks possible.
12:23 - 12:29

And in fact, other attacks may be
possible, other attacks may be possible.
12:32 - 12:37

Okay. Now that we understand what perfect
secrecy, means, the question is, can we
12:37 - 12:41

build ciphers that actually have perfect
secrecy? And it turns out that we don't
12:41 - 12:46

have to look very far, the one time
pattern fact has perfect secrecy. So I
12:46 - 12:51

want to prove to you this is Shannon's first
results and I wanna prove this fact to
12:51 - 12:56

you, it's very simple proof so let's go
ahead and look at it and just do it. So we
12:56 - 13:01

need to kind of interpret what does that
mean, what is this probability that E K M
13:01 - 13:06

Z is equal to C. So it's actually not that
hard to see that for every message and
13:06 - 13:11

every cyphertext the probability that the
encryption of N under a key K the
13:11 - 13:16

probability that, that's equal to C, the
probability that our random choice of key
13:16 - 13:24

by definition. All that is, is basically
the number of keys. Kay, instruct Kay.
13:25 - 13:32

Such that, well. If I encrypt. And with k
I get c. So I literally count the number
13:32 - 13:37

of keys and I divide by the total number
of keys. Right? That's what it means, that
13:37 - 13:43

if I choose a random key, that key maps m
to c. Right. So it's basically the number
13:43 - 13:48

of key that map n to c divided by the
total number of keys. This is its
13:48 - 13:53

probability. So now suppose that we had a
cypher such that for all messages and all
13:53 - 13:59

cypher texts, it so happens that if I look
at this number, the number of k, k, and k,
13:59 - 14:04

such that e, k, m is equal to c. In other
words, I'm looking at the number of keys
14:04 - 14:09

that map m to c. Suppose this number
happens to be a constant. So say it
14:09 - 14:14

happens to be two, three, or ten or
fifteen. It just hap, happens to be an
14:14 - 14:19

absolute constance. If that's the case,
then by definition, for all n0 and n1 and
14:19 - 14:25

for all c, this probability has to be the
same because the denominator is the same,
14:25 - 14:30

the numerator is the same, it's just as
constant, and therefore the probability is
14:30 - 14:36

always the same for all n and c. And so if
this property is true, then the cypher has
14:36 - 14:44

to have, the cypher has perfect secrecy.
Okay, so lets see what can we say about
14:44 - 14:49

this quantity for the one time pad. So the
sec-, so, the question to you is, if I
14:49 - 14:55

have a message in a cipher-text, how many
one time pad keys are there [inaudible]
14:55 - 15:00

map, this message ends, so the [inaudible]
C? So, in other words, how many keys are
15:00 - 15:06

there, such that M XOR K is equal to C?
So I hope you've all answered one. And
15:06 - 15:13

let's see why that's the case. For the one
time pad, if we have that, the encryption
15:13 - 15:18

of K of M under K is equal to C. But
basically, well, by definition, that
15:18 - 15:25

implies that K XOR M is equal to C. But
that also simply says that K has to equal
15:25 - 15:32

to M XOR C. Yes, I just X over both
sides by M and I get that K must equal the
15:32 - 15:38

M XOR C. Okay? So what that says is
that, for the one time pad, in fact, the
15:38 - 15:44

number of keys, in K, shows the EKM, is
equal to C. That simply is one, and this
15:44 - 15:50

holds for all messages in cipher text. And
so, again, by what we said before, it just
15:50 - 15:55

says that the one time pad has, perfect
secrecy. Perfect secrecy and that
15:55 - 15:59

completes the proof of this [inaudible]
very, very simple. Very, very simple
15:59 - 16:04

lemma. Now the funny thing is that
even though this lemma is so simple to
16:04 - 16:08

prove in fact it proves a pretty powerful
statement again. This basically says for
16:08 - 16:12

the one time [inaudible] there is no
cypher text only attack. So, unlike the
16:12 - 16:16

substitution cipher, or the vigenere
cipher, or the roller machines, all those
16:16 - 16:21

could be broken by ciphertext-only attack.
We've just proved that for the one-time
16:21 - 16:25

pad, that's simply impossible. Given the
cyphertext, you simply learn nothing about
16:25 - 16:29

the plaintext. However, as we can see,
this is simply not the end of the story. I
16:29 - 16:33

mean, are we done? I mean, basically,
we're done with the course now, cuz we
16:33 - 16:37

have a way. To encrypt, so that an
attacker can't recover anything about our
16:37 - 16:41

method. So maybe we're done with the
course. But in fact, as we'll see, there
16:41 - 16:45

are other attacks. And, in fact, the one
time pad is actually not such a
16:45 - 16:49

secure cipher. And in fact, there are
other attacks that are possible, and we'll
16:49 - 16:54

see that shortly. Okay? I emphasize again
the fact that it has perfect secrecy does
16:54 - 16:59

not mean that the one time pad is the
secure cypher to use. Okay. But as we said
16:59 - 17:04

the problem with the one time pad is that
the secret key is really long. If you had
17:04 - 17:08

a way of. Communicating the secret key
over to the other side. You might as well
17:08 - 17:12

use that exact same method to also
communicate the message to the other side,
17:12 - 17:17

in which case you wouldn't need a cypher
to begin with. Okay? So the problem again
17:17 - 17:21

is the one time pad had really long keys
and so the obvious question is are there
17:21 - 17:25

other cyphers that has perfect secrecy and
possibly have much, much shorter keys?
17:25 - 17:30

Well, so the bad news is the Shannon,
after proving that the one-time pad has
17:30 - 17:35

perfect secrecy, proved another theorem
that says that in fact if a cypher has
17:35 - 17:40

perfect secrecy, the number of keys in the
cypher must be at least the number of
17:40 - 17:45

messages that the cypher can handle. Okay,
so in particular, what this means is if I
17:45 - 17:51

have perfect secrecy. Then necessarily the
number of keys, or rather the length of my
17:51 - 17:56

key, must be greater than the length of
the message. So, in fact, since the one
17:56 - 18:01

time pad satisfies us with equality, the
one time pad is an optimal, cipher that
18:01 - 18:05

has perfect secrecy, okay? So basically,
what this shows is that this is an
18:05 - 18:09

interesting notion. The one time pad is an
interesting cipher. But, in fact, in
18:09 - 18:13

reality, it's actually quite hard to use.
It's hard to use in practice, again,
18:13 - 18:18

because of these long keys. And so this
notion of perfect secrecy, even though
18:18 - 18:22

it's quite interesting, basically says
that it doesn't really tell us the
18:22 - 18:26

practical cyphers are going to be secure.
And we're gonna see, but, as we said, the
18:26 - 18:31

idea behind the one time pad is quite good.
And we're gonna see, in the next lecture,
18:31 - 18:34

how to make that into a practical
system.

Title:: Information theoretic security and the one time pad (19 min)
Video Language:: English

	khvan91 edited English subtitles for Information theoretic security and the one time pad (19 min)
	khvan91 edited English subtitles for Information theoretic security and the one time pad (19 min)
	khvan91 edited English subtitles for Information theoretic security and the one time pad (19 min)
	Jesús Quiroga edited English subtitles for Information theoretic security and the one time pad (19 min)
	Jesús Quiroga edited English subtitles for Information theoretic security and the one time pad (19 min)
	Jesús Quiroga edited English subtitles for Information theoretic security and the one time pad (19 min)
	Jesús Quiroga edited English subtitles for Information theoretic security and the one time pad (19 min)
	Jesús Quiroga edited English subtitles for Information theoretic security and the one time pad (19 min)

Show all

English subtitles

Revisions

Revision 10

khvan91

Information theoretic security and the one time pad (19 min)

Revisions

Our website uses cookies

Operating cookies (Required)