In this segment we will look at how to use
block ciphers to encrypt multiple messages
using the same key. This comes up in
practice for example in file systems where
the same key's used to encrypt multiple
files. It comes up in networking protocols
where the same key is used to encrypt
multiple packets. So let's see how to do
it. The first thing we need to do is to
define what is mean for a cipher to be
secure when the same key is used to
encrypt multiple messages. When we use the
key more than once the result of that is
that the adversary gets to see many cyber
text encrypted using the same key. As a
result, when we define security, we're
gonna allow the adversary to mount what's
called a chosen plain text attack. In
other words, the adversary can obtain the
encryption of arbitrary messages of his
choice. So, for example, if the
adversary's interacting with Alice. The
adversary can ask Alice to encrypt
arbitrary messages of the adversary's
choosing. And Alice will go ahead and
encrypt those messages and give the
adversary the resulting cipher texts. You
might wonder why would Alice ever do this.
How could this possibly happen in real
life? But it turns out this is actually
very common in real life. And in fact,
this modeling is quite a conservative
modeling of real life. For example, the
adversary might send Alice an email. When
Alice receives the email, the writes it to
her encrypted disk, thereby encrypting the
adversary's email using her secret key. If
later the adversary steals this disc, then
he obtains the encryption of an email that
he sent Alice under Alice's secret key. So
that's an example of a chosen plain text
attack, where the adversary provided Alice
with a message and she encrypted that
message using her own key. And then later
the attacker was able to obtain the
resulting cipher text. So that's the
adversary's power. And then the
adversary's goal is basically to break
semantic security. So let's define this
more precisely. As usual, we're gonna
define semantic security under a chosen
plain text attack using two experiments,
experiment zero and experiment one, that
are modeled as a game between a challenger
and an adversary. When the game begins,
the challenger is gonna choose a random
key K. And now the adversary basically
gets to query the challenger. So the
adversary now begins by submitting a
semantic security query, namely, he
submits two messages, M zero and M one. I
added another index, but let me ignore
that extra index for a while. So the
adversary submits two messages, M zero and
M one, that happen to be of the same
length. And then the adversary receives
the encryption of one of those messages,
either of M zero or of M one. In
experiment zero, he receives the
encryption of M zero. In experiment one,
he receives the encryption of M one. So,
so far this would look familiar this looks
exactly like a standard semantic security
[inaudible]. However, plain text attack
the adversary can now repeat this query
again. So now you can issue a query with
two other plain texts, again of the same
length, and again you would receive the
encryption of one of them. In experiment
zero you would receive the encryption of M
zero. In experiment one you would receive
the encryption of M one. And the attacker
can continue issuing queries like this. In
fact we'll say that he can issue up to Q
queries of this type. And then, remember,
every time he issues a pair of messages.
That happen to be of the same length and
every time he either gets the encryption
of the left side or the right side again
in experiment zero he will always get the
encryption of the left message in
experiment one he will always get the
encryption of the left message. And, then
adversary's goal is, basically, to figure
out whether he's in experimental zero or
in experiment one. In other words, whether
he was constantly receiving the encryption
of the left message or the encryption of
the right message. So, in some sense, this
is a standard semantic security game just
iterated over many queries that the
attacker can issue to adaptively one after
the other. Now the chosen plain text
attack is captured by the fact that if the
attacker wants the encryption of a
particular message m. What he could do is,
for example, use query J for sum J, where
in this query J he'll set both the zero
message and the one message to be the
exactly same message M. In other words,
both the left message and the right
message are the same, and both are set to
the message M. In this case, what he will
receive, since both messages are the same,
he knows that he's gonna receive the
encryption of this message M that he was
interested in. So this is exactly what we
meant by a chosen [inaudible] attack.
Where the advisory can submit a message m
and receive the encryption of that
particular message m of his choice. So
some of his queries might be of this chose
plain text flavor where the message on the
left is equal to the message on the right,
but some of the queries might be standard
semantic security queries where the two
messages are distinct and that actually
gives him information on whether he's in
experiment zero or in experiment one. Now
by now you should be used to this
definition where we say that the system is
semantically secure under a chosen plain
text attack. If, for all efficient
adversaries, they cannot distinguish
experiment zero from experiment one. In
other words, the probability that, at the
end, the output, B Prime, which we're
gonna denote by the output of experiment
B. This output will be the same whether
[inaudible] experiment zero or experiment
one. So the attacker couldn't distinguish
between always receiving encryptions of
the left messages, versus always receiving
encryptions of the right messages. So in
your mind, I'd like you to be thinking of
an adversary that is able to mount a
chosen plaintext attack, namely, be given
the encryption of arbitrary messages of
his choice, and his goal is to break
semantic security for some other challenge
cipher texts. And as I said in this
[inaudible] model of the real world the
attacker is able to fool Alice into
encrypting for him messages of his choice
and then the attacker's goal is to somehow
break some challenge cypher text. So I
claim that all the ciphers that we've seen
up until now, namely deterministic counter
mode or the one time pad, are insecure
under a chosen plain text attack. More
generally, suppose we have an encryption
scheme that always outputs the same cipher
text for a particular message M. In other
words, if I ask the encryption scheme to
encrypt the message M once. And then I ask
the encryption scheme to encrypt the
message m again. If in both cases the
encryption scheme outputs the same cypher
text, then that system cannot possibly be
secure under a chosen plain text attack.
And both deterministic counter mode and
the one time pad were of that flavor. They
always output the same cipher text, given
the same message. And so let's see why
that cannot be chosen plain text secure.
And the attack is fairly simple, what the
attacker is gonna do, is he's gonna output
the same message twice. This just says.
That he really wants the encryption of M0.
So here the attacker is given C0 which is
the encryption of M0. So this was his
chosen plain text query where he actually
received the encryption of the message M0
of his choice. And now he's going to break
semantic security. So what he does is he
outputs two messages, M0 and M1 of the
same length, and he's going to be given
the encryption of MB. But low and behold,
we said that the encryption system. Always
outputs the same cipher text when its
encrypting the message, M0. Therefore, if
B is=to zero, we know that C, this
challenged cipher text, is simply=to CO,
because it's the encryption of M0.
However, if B is=to one. Then we know that
this challenge cypher text is the
encryption of M1 which is something other
than C zero so all the attacker does is he
just checks his C is = to C0 the output's
zero in other words he outputs one. So, in
this case, the attacker is able to
perfectly guess this bit B, so he knows
exactly [inaudible] given the encryption
of M0, or the encryption of M1. And as a
result, his advantage in winning this game
is one. Meaning that the system cannot
possibly be CPA secure. One is not a
negligible number. So this shows that the
deterministic encryption schemes cannot
possibly be CPA-secure, but you might
wonder well, what does this mean in
practice? Well in practice this means
again that every message is always
encrypted to the same cipher text. What
this means is if you're encrypting files
on disk, and you happen to be encrypting
two files that happen to be the same, they
will result in the same cipher text and
then the attacker by looking at the
encrypted disk, will learn that these two
files actually contain the same content.
The attacker might not learn what the
content is, but he will learn that these
two encrypted files are an encryption of
the same content and he shouldn't be able
to learn that. Similarly, if you send two
encrypted packets on the network that
happen to be the same, the attacker will
not learn the content of those packets,
but he will learn that those two packets
actually contain the same information.
Think for example of an encrypted voice
conversation. Every time there's quiet on
the line, the system will be sending
encryptions of zero. But since encryption
of zero are always mapped to the same
cypher text. An attacker looking at the
network will be able to identify exactly
the points in the conversation where
there's quiet because he will always see
those exact same cypher text every time.
So these are examples where deterministic
encryption cannot possibly be secure. And
as I say formerly we say that the
terministic encryption can not be
semantically secure under a chosen plain
text attack. So what do we do, well the
lesson here is if the secret keys gonna be
used to encrypt multiple messages, it had
better be the case that given the same
plain text to encrypt twice. The
encryption algorithm must produce
different cipher texts. And so there are
two ways to do that. The first method is
what's called randomized encryption. Here,
the encryption algorithm itself is going
to choose some random string during the
encryption process and it is going to
encrypt the message M using that random
string. So what this means is that a
particular message, M0 for example, isn't
just going to be mapped to one cipher text
but it's going to be mapped to a whole
ball of cipher texts. Whereon every
encryption, basically, we output one point
in this ball. So every time we encrypt, the
encryption algorithm chooses a random
string, and that random string leads to
one point in this ball. Of course, the
decryption algorithm, when it takes any
point in this ball, will always map the
result to M zero. Similarly cipher text M
one will be mapped to a ball, and every
time we encrypt M one, we basically output
one point in this ball. And these balls
have to be disjoint, so that the
encryption algorithm, when it obtains a
point in the ball corresponding to M one,
will always output the message M one. In
this way, since the encryption algorithm
uses randomness, if we encrypt the same
message twice, with high probability we'll
get different cipher texts. Unfortunately
this means that the cipher text
necessarily has to be longer than the
plain text because somehow the randomness
that was used to generate the cipher text
is now encoded somehow in the cipher text.
So the cipher text takes more space. And
roughly speaking, the cipher text size is
going to be larger than the plain text. By
basically the number of random bits that
were used during encryption. So if the
plain texts are very big, if the plain
texts are gigabytes long, the number of
random bits is going to be on the order of
128. So maybe this extra space doesn't
really matter. But if the plain texts are
very short, maybe they themselves are 128
bits, then adding an extra 128 bits to
every cipher text is going to double the
total cipher text size. And that could be
quite expensive. So as I say randomized
encryption is a fine solution but in some
cases it actually introduces quite a bit
of costs. So let's look at a simple example.
So imagine we have a pseudorandom
function that takes inputs in a certain
space r which is gonna be called a nonce
space. And outputs, outputs in the message
space. And, now, let's define the
following randomize encryption scheme
where we want to encrypt the message m
with the encryption of whatever it's gonna
do is first it's gonna generate a random r
in this nonce space R. And then it's going
to open a cypher text that consist of two
components, the first component is going
to be this value R and the second
component is going to be an evaluation of
the pseudo-random function at the point R
XOR with the message M. And my question to
you is, is this encryption system
semantically secure under a chosen plain
text attack. So the correct answer is yes.
But only if the nonce space R is large
enough so that little R never repeats with
very, very high probability. And let's
quickly argue why that's true. So first of
all, because F is a secure pseudo-random
function, we might as well replace it with
a truly random function. In other words,
this is indistinguishable from the case
where we encrypt the message M, using the
truly random function little F, evaluated
to point R, and then XOR with M.
But since this little r never repeats every
cypher text uses a different little r what
this means is that the values of F(r)
are random uniform independent strings
every time. So every time we encrypt a
message, we encrypt it essentially using a
new uniform random one time pad. And since
XORing a uniform string with any string
simply generates a new uniform string, the
resulting cipher text is distributed as
simply two random uniform strings. I'll
call them r and r prime. And so both in
experiment zero and in experiment one, all
the attacker gets to see are truly uniform
random strings r, r', and since in both
experiments the attacker is seeing the same
distribution, he cannot distinguish the
two distributions. And so since security
holds completely when we're using a truly
random function it's also gonna hold when
we're using a pseudorandom function. Okay,
so this is a nice example of how we use
the fact that the pseudo random function
behaves like a random function to argue
security of this particular encryption
scheme. Okay, so now we have a nice
example of randomized encryption. The
other approach to building chosen plain
text secure encryption schemes is what's
called a nonce based encryption. Now, in
a non-spaced encryption system, the
encryption algorithm actually takes three
inputs rather than two. As usual it takes
the key and the message. But it also takes
an additional input called a nonce. And
similarly, the decryption algorithm also
takes the nonce as input, and then produces
the resulting decrypted plain text. And
what is this nonce value n. This nonce is
a public value. It does not need to be
hidden from the adversary but the only
requirement is that the pair (k,n)
is only used to encrypt a single
message. In other words, this pair (k,n)
must change from message to message. And
there are two ways to change it. One way
to change it is by choosing a new random
key for every message. And the other way
is to keep using the same key all the time
but then we must choose a new nonce for
every message. And, and as I said, I wanna
emphasize again, this nonce need not
be secret, and it need not be random. The
only requirement is the nonce is unique.
And in fact, we're gonna use this
term throughout the course. A nonce
for us, means a unique value that doesn't
repeat. It does not have to be random. So
let's look at some examples of choosing an
nonce, well the simplest option is
simply to make the nonce of the
accounter so for example the networking
protocol you can imagine the nonce
being a packet counter that's incremented
every time a packet is sent by a sender or
received by the receiver this means that
the encrypter has to keep state from
message to message mainly that he has to
keep this counter around and increment it
after every message is transmitted.
Interestingly, if the decrypter actually
has the same state then there is no need
to include the nuance in the cipher text
since the nuance is implicit. Let's look
at an example. The https protocol is run
over a reliable transport mechanism which
means that packets sent by the sender are
assumed to be received in order at a
recipient. So if the sender sends packet #5 and then packet #6, the recipient
will receive packet #5 and then
packet #6 in that order. This
means that if the sender maintains a
packet counter, the recipient can also
maintain a packet counter and two counters
basically increment in sync. In this case
there is no reason to include the
nonce in the packets because the
nonce is implicit between the two
sides. However, in other protocols, for
example, in IPsec, IPsec has a protocol
designed to encrypt the IP layer. The IP
layer does not guarantee in order
delivery. And so the sender might send
packet #5 and then packet #6, but
those will be received in reverse order at
the recipient. In this case it's still
fine to use a packet counter as a nonce
but now the nonce has to be included in
the packet so that the recipient knows
which nonce to use to decrypt the
received packet. So as I say, nonce based
encryption is a very efficient way to
achieve CPA security. In particular if the
nonce is implicit, it doesn't even
increase the cipher text length. Of course
another method to generate a unique nonce
is simply to pick the nonce at random
assuming the nonce space is sufficiently
large so that with high probability the
nonce will never repeat for the life of
the key. Now in this case, nonce
based encryption simply reduces to
randomized encryption. However, the
benefit here is that the sender does not
need to maintain any state from message to
message. So this is very useful for
example if encryption happens to take
place on multiple devices. For example, I
might have both a laptop and a smart
phone. They might both use the same key.
But in this case if I require state full
encryption, then my laptop and the
smartphone would have to coordinate to
make sure that they never reuse the same
nonces. Whereas if both of them simply take
nonces at random, they don't need to
coordinate because it was very high
probability they'll simply never choose
the same nonce. Again assuming the nonce
space is big enough. So there are some
cases where stateless encryption is quite
important, in particular where the same
key is used by multiple machines. So I
wanted to find, more precisely, what
security means for nonce based
encryption. And in particular, I want to
emphasize that the system must remain
secure when the nonce are chosen by
the adversary. The reason it's important
to allow the adversary to choose the
nonces is because the adversary can
choose which cipher text it wants to
attack. So imagine the nonce happens to
be a counter and it so happens that when
the couter hits the value fifteen, maybe
at that point it's easy for the adversary
to break semantic security. So the
adversary will wait until the fifteenth
packet is sent and only then he will ask
to break semantic security. So when we
talk about nonce based encryption, we
generally allow the adversary to choose
the nonce and the system should remain
secure even under those settings. So let's
define the CPA game in this case and it's
actually very similar to the game before.
Basically the attacker gets to submit
pairs of messages MI, MI0, and MI1.
Obviously they both have to be of the same
length. And he gets to supply the nonce.
And in response, the adversary is given
the encryption of either MI0, or MI1. But
using the nonce that the adversary
chose. And of course, as usual, the
adversary's goal is to tell whether he was
given the encryption of the left plain
text or the right plain text. And as
before the adversary gets to iterate these
queries and he can issue as, as many
queries as he wants, we usually let q
denote the number of queries that the
adversary issues. Now the only restriction
of course, which is crucial, is that
although the adversary gets to choose the
nonces, he's restricted to choosing
distinct nonces. The reason we force him
to choose distinct nonces is because
that's the requirement in practice. Even
if the adversary fools Alice into
encrypting multiple messages for him,
Alice will never use the same nonce
again. As a result, the adversary will
never see messages encrypted using the
same nonce and therefore, even in the
game, we require that all nonce be
distinct. And then as usual we say that
the system is a nonce based encryption
system that's, semantically secure under a
chosen plain text attack if the adversary
cannot distinguish experiment zero where
he's given encryptions of the left
messages from experiment one where he's
given encryptions of the right messages.
So let's look at an example of a nonce
based encryption system. As before, we
have a secure PRF that takes inputs in the
nonce space R and outputs strings in the
message space M. Now when a new key is
chosen, we're going to reset our counter R
to be zero. And now we encrypt the
particular message M, what we will do is
we will increment our counter R, and then
encrypt the message M using the
pseudorandom function applied to this
value R. And as before, the cipher text is
going to contain two components, our
current value of the counter and then the
one time pad encryption of the message M.
And so my question to you is whether this
is a secure, non-spaced encryption system.
So the answer as before is yes, but only
if the nuance space is large enough. So as
we increment the counter R, it will never
cycle back to zero so that the nuances
will always, always be unique. We argue
security the same way as before. Because
the PRF is secure, we know that this
encryption system is indistinguishable
from using a truly random function. In
other words, if we apply a truly random
function to the counter and XOR the
results with, the plain text M. But now
since the nuance R never repeats, every
time we compute this F of R, we get a
truly random uniform and independent
string so that we're actually encrypting
every message using the one time pad. And
as a result, all the adversary gets to see
in both experiments are basically just a
pair of random strings. So both the
experiment zero and experiment one the
adversary get's to see exactly the same
distribution, namely, the responses to all
this chosen plain text queries are just
pairs of strings that are just uniformly
distributed and this is basically the same
in experiment zero and experiment one and,
therefore, the attacker cannot distinguish
the two experiments. And since he cannot
win the semantic security game with a
truly random function he, also, cannot win
the semantics security game with the
secure PRF, and, therefore, the scheme is
secure. So now we understand what it means
for a symmetric system to be secure when
the keys used to encrypt multiple messages
the requirement is that it be secure under
a chosen plan of attack. And we said that
basically, the only way to be secure under
a chosen plain text attack is either to
use randomized encryption, or to use, use
nonce spaced encryption where the
nonce never repeats. And then in the
next two segments, we're gonna build two
classic encryption systems that are secure
when the key is used multiple times.