My goal for the next few segments is to
show you that if we use a secure PRG We'll
get a secure stream safer. The first thing
we have to do is define is, what does it
mean for a stream safer to be secure? So
whenever we define security we always
define it in terms of what can the
attacker do? And what is the attacker
trying to do? In the context of
stream ciphers remember these are only used
with a onetime key, and as a result the
most the attacker is ever going to see is
just one cypher text encrypted using the
key that we're using. And so we're gonna
limit the attackers? ability to basically
obtain just one cypher text. And in
fact later on we're going to allow the
attacker to do much, much, much more, but
for now we're just gonna give him one
cypher text. And we wanted to find,
what does it mean for the cypher to
be secure? So the first definition that
comes to mind is basically to say, well
maybe we wanna require that the attacker
can't actually recover the secret key.
Okay so given ciphertext you
shouldn't be able to recover the secret
key. But that's a terrible definition
because think about the falling brilliant
cipher but the way we encrypt the
message using a key K is basically
we just output the message. Okay this
is a brilliant cipher yeah of course it
doesn't do anything given a message just
output a message as the cipher text.
This is not a particularly good encryption
scheme; however, given the cipher text,
mainly the message the poor attacker
cannot recover the key because he doesn't
know what the key is. And so as a result
this cipher which clearly is insecure,
would be considered secure under this
requirement for security. So this
definition will be no good. Okay so it's
recovering the secret key is the wrong way
to define security. So the next thing we
can kinda attempt is basically just say,
well maybe the attacker doesn't care about
the secret key, what he really cares about
are the plain text. So maybe it should be
hard for the attacker to recover the
entire plain text. But even that doesn't
work because let's think about the
following encryption scheme. So suppose
what this encryption scheme does is it
takes two messages, so I'm gonna use two
lines to denote concatenation of two
messages M0 line, line M1 means
concatenate M0 and M1. And imagine
what the scheme does is really it outputs
M0 in the clear and concatenate to
that the encryption of M1. Perhaps even
using the One Time Pad okay? And
so here the attacker is gonna be given one
cipher text. And his goal would be to
recover the entire plain texts. But the
poor attacker can't do that because here
maybe we've encrypted M1 using the One
Time Pad so the attacker can't
actually recover M1 because we know the
One Time Pad is secure given just
one cipher text. So this construction
would satisfy the definition but
unfortunately clearly this is not a secure
encryption scheme because we just leaked
half of the plain text. M0 is
completely available to the attacker so
even though he can't recover all of the
plain text he might be able to recover
most of the plain text, and that's clearly
unsecured. So of course we already know
the solution to this and we talked about
Shanon definition of security perfect
secrecy where Shannon's idea was that in
fact when the attacker intercepts a cipher
text he should learn absolutely no
information about the plain texts. He
shouldn't even learn one bit about the
plain text or even he shouldn't learn, he
shouldn't even be able to predict a little
bit about a bid of the plain text.
Absolutely no information about the plain text.
So let's recall very briefly
Shannon's concept of perfect secrecy
basically we said that you know given a
cipher We said the cipher has perfect
secrecy if given two messages of the same
length it so happens that the distribution
of cipher texts. Yet if we pick a random
key and we look into distribution of
cipher texts we encrypt M0 we get
exactly the same distribution as when we
encrypt M1. The intuition here was that if
the advisory observes the cipher texts
then he doesn't know whether it came from
the distribution the result of encrypting
M0 or it came from the distribution as
the result of encrypting M1. And as a
result he can't tell whether we encrypted
M0 or M1. And that's true for all
messages of the same length and as a
result a poor attacker doesn't really know
what message was encrypted. Of course we
already said that this definition is too
strong in the sense that it requires
really long keys if cipher has short
keys can't possibly satisfy this
definition in a particular stream ciphers
can satisfy this definition. Okay so let's
try to weaken the definition a little bit
and let's think to the previous segment,
and we can say that instead of requiring
the two distributions to be absolutely
identical what we can require is, is that
two distributions just be computationally
indistinguishable. In other words a poor,
efficient attacker cannot distinguish the
two distributions even though the
distributions might be very, very, very
different. That just given a sample for
one distribution and a sample for another
distribution the attacker can't tell which
distribution he was given a sample from.
It turns out this definition is actually
almost right, but it's still a little too
strong, that still cannot be satisfied, so
we have to add one more constraint, and
that is that instead of saying that this
definition should have hold for all M0 and M1. It is to hold for only pairs M0 M1
that the attackers could
actually exhibit. Okay so this actually
leads us to the definition of semantics
security. And so, again this is semantics
security for one time key in other words
when the attacker is only given one cipher
text. And so the way we define semantic
security is by defining two experiments,
okay we'll define experiment 0 and
experiment 1. And more generally we will
think of these as experiments parentheses
B, where B can be zero or one. Okay so the
way the experiment is defined is as
follows, we have an adversary that's
trying to break the system. An adversary
A, that's kinda the analog of statistical
tests in the world of pseudo random
generators. And then the challenger does
the following, so really we have two
challengers, but the challengers are so
similar that we can just describe them as
a single challenger that in one case takes
his inputs bits set to zero, and the other
case takes his inputs bits set to
one. And let me show you what these
challengers do. The first thing the
challenger is gonna do is it's gonna pick
a random key and then the adversary
basically is going to output two messages
M0 and M1. Okay so this is an explicit
pair of messages that the attacker wants
to be challenged on and as usual we're not
trying to hide the length of the messages,
we require that the messages be equal
length. And then the challenger basically
will output either the encryption of M0
or the encryption of M1, okay so in
experiment 0 the challenger will output
the encryption of M0. In experiment 1 the challenger will output the encryption
of M1. Okay so that the difference between
the two experiments. And then the
adversary is trying to guess basically
whether he was given the encryption of M0
or given the encryption of M1. Okay so
here's a little bit of notation let's
define the events Wb to be the events that
an experiment B the adversary output one.
Okay so that is just an event that
basically in experiment zero W0 means that
the adversary output one and in experiment
one W1 means the adversary output one. And
now we can define the advantage of this
adversary, basically to say that this is
called the semantics security advantage of
the adversary A against the scheme E,
to be the difference of the probability of
these two events. In other words we are
looking at whether the adversary behaves
differently when he was given the
encryption of M0 from when he was given
the encryption of M1. And I wanna make
sure this is clear so I'm gonna say it one
more time. So in experiment zero he was
given the encryption of M0 and in
experiment one he was given the encryption
of M1. Now we're just interested in
whether the adversary output 1 or not.
... In these experiments. If in both
experiments the adversary output 1 with
the same probability that means the
adversary wasn't able to distinguish the
two experiments. Experiments zero
basically looks to the adversary the same
as experiment one because in both cases
upload one with the same probability.
However if the adversary is able to output
1 in one experiment with significantly
different probability than in the other
experiment, then the adversary was
actually able to distinguish the two
experiments. Okay so... To say this more
formally, essentially the advantage again
because it's the difference of two
probabilities the advantage is a number
between zero and one. If the advantage is
close to zero that means that the
adversary was not able to distinguish
experiment zero from experiment one.
However if the advantage is close to one,
that means the adversary was very well
able to distinguish experiment zero from
experiment one and that really means that
he was able to distinguish an encryption
of M0 from an encryption of M1, okay?So that's out definition. Actually
that is just the definition of the
advantage and the definition is just what
you would expect basically we'll say that
a symmetric encryption scheme is
semantically secure if for all efficient
adversaries here I'll put these in quotes
again, "For all efficient adversaries the
advantage is negligible." In other words,
no efficient adversary can distinguish the
encryption of M0 from the encryption
of M1 and basically this is what it
says repeatedly that for these two
messages that the adversary was able to
exhibit he wasn't able to distinguish
these two distributions. Now I want to
show you this, this is actually a very
elegant definition. It might not seem so
right away, but I want to show you some
implications of this definition and you'll
see exactly why the definition is the way
it is. Okay so let's look at some
examples. So the first example is suppose
we have a broken encryption scheme, in
other words suppose we have an adversary A
that somehow given the cipher text he is
always able to deduce the least
significant bit of the plain text. Okay so
given the encryption of M0, this adversary
is able to deduce the least significant
bit of M0. So that is a terrible
encryption scheme because it basically
leaks the least significant bit of the
plain text just given the cipher text. So
I want to show you that this scheme is
therefore semantically secure so that kind
of shows that if a system is semantically
secure than there is no attacker of this
type. Okay so let's see why is the system
not semantically secure well so what we
are gonna do is we're gonna basically use
our adversary who is able to learn our
most insignificant bits, we're going to
use him to break semantic security so
we're going to use him to distinguish
experiment zero from experiment one Okay
so here is what we are going to do. We are
algorithm B, we are going to be algorithm
B and this algorithm B is going to use
algorithm A in its belly. Okay so the
first thing that is going to happen is of
course the challenger is going to choose a
random key. The first thing that is going
to happen is that we need to output two
messages. The messages that we are going
to output basically are going to have
differently significant bits. So one
message is going to end with zero and one
message is going to end with one. Now what
is the challenger going to do? The
challenger is going to give us the
encryption of either M0 or M1,
depending on whether in experiment 0 or
in experiment 1. And then we just
forward this cipher text to the adversary
okay so the adversary A. Now what is the
property of adversary A? Given the cipher
text, adversary A can tell us what the
least significant bits of the plain text is.
In other words the adversary is going
to output the least significant bits of M0 or M1
but low and behold that's
basically the bit B. And then we're just
going to output that as our guest so let?s
call this thing B prime Okay so now this
describes the semantic security adversary.
And now you tell me what is the semantic
security advantage of this adversary? Well
so let's see. In experiment zero, what is
the probability that adversary B outputs
one? Well in experiment zero it is always
given the encryption of M zero and
therefore adversary A is always output the
least significant bit of M zero which
always happens to be zero. In experiment
zero, B always outputs zero. So the
probability that outputs one is zero.
However in experiment one, we're given the
encryption of M1. So how likely is
adversary B to output one in experiment
one well it always outputs one, again by
the properties of algorithm A and
therefore the advantage basically is one.
So this is a huge advantage, it's as big
as it's gonna get. Which means that this
adversary completely broke the system.
Okay so we consider, so under semantic
security basically just deducing the least
significant bits is enough to completely
break the system under a semantic security
definition. Okay, now the interesting
thing here of course is that this is not
just about the least significant bit, in
fact of the message for
example the most significant bit, you know
bit number seven Maybe the XOR of all the bits in the message and so on
and so forth any kind of information, any
bit about the plain text they can be
learned basically would mean that the
system is not semantically secure. So
basically all the adversary would have to
do would be to come up with two messages
M0 and M1 such that under one thing that
they learned it's the value zero and then
the other thing, the value, is one. So for
example if A was able to learn the XOR
bits of the message then M0
and M1 would just have different
XOR for all the bits of their
messages and then this adversary A would
also be sufficient to break semantic
security. Okay so, basically for cipher
is semantically secure then no
bit of information is revealed to an
efficient adversary. Okay so this is
exactly a concept of perfect secrecy only
applied just efficient adversaries
rather than all adversaries. So the next
thing I wanna show you is that in fact the
one time pad in fact is
semantically secure, they better be
semantically secure because it's in fact,
it's more than that it's perfectly secure.
So let's see why that's true. Well so
again it's one of these experiments, so
suppose we have an adversary that claims
to break semantic security of the one time
pad. The first the adversary is gonna do
is he's gonna output two messages M0
and M1 of the same length.
Now what does he get back as a challenge? As a
challenge, he gets either the encryption
of M0 or the encryption of M1 under
the one time pad.
And he's trying to distinguish between those two possible
cipher texts that he gets, right?
In experiment zero, he gets the encryption of
M0 and in experiment one, he gets the
encryption of M1. Well so let me ask
you, so what is the advantage of adversary
A against the one time patent? So I
remember that the property of the ones I
had is that, that K, XOR M0 is
distributed identically to K, XOR M1.
In other words, these distributions are
absolutely identical distribution,
distributions, identical. This is a
property of XOR. If we XOR the random pad
K with anything, either M0 or M1,
we get uniform distribution. So in both
cases, algorithm A is given as in input
exactly the same distribution. Maybe the
uniform distribution on cipher texts. And
therefore it's gonna behave exactly the
same in both cases because it was given
the exact distribution as input. And as a
result, its advantage is zero, which means
that the onetime pad is semantically
secure. Now the interesting thing here is
not only is it semantically secure, it's
semantically secure for all adversaries.
We don't even have to restrict the
adversaries to be efficient. No adversary,
doesn't matter how smart you are, no
adversary will be able to distinguish K XOR M0 from K XOR M1 because the two
distributions are identical. Okay, so the
one time pad is semantically secure. Okay,
so that completes our definition of
semantic security so the next thing we're
going to do is prove to the secure PRG,
in fact implies that the stream cipher is
semantically secure.