In this segment, we're gonna construct
authenticated encryption systems. Since we
already have CPA secured encryption, and
we have secure MACs, the natural question
is whether we can combine the two somehow,
in order to get authenticated encryption.
And if, that's exactly what we're gonna do
in this segment. Authenticated encryption
was introduced in the year 2000, in two
independent papers that I point to at the
end of this module. But before then, many
crytpolibraries provided an API that
separately supported CPA secure
encryption, and MAC-ing. So there was one
function for implementing CPA secure
encryption. For example, CBC with a random
IV. And another function for implementing
a MAC. And then every developer that
wanted to implement encryption, had to,
himself, call separately the CPA secure
encryption scheme and the MAC scheme. In
particular, every developer had to invent
his own way of combining encryption and
MAC-ing to provide some sort of
authenticated encryption. But since the
goals of combining encryption and MAC-ing
wasn't well understood since authenticated
encryption hasn't yet been defined, it
wasn't really clear which combinations of
encryption and MAC-ing are correct and
which aren't. And so, every project as I
said had to invent its own combination.
And in fact, not all combinations were
correct. And I can tell you that the most
common mistake in software projects were
basically incorrectly combining the
encryption and integrity mechanisms. So
hopefully, by the end of this module, you
will know how to combine them correctly,
and you won't be making these mistakes
yourself. So let's look at some
combinations of CPA secure encryption and
MAC, that were introduced by different
projects. So here are three examples. So,
first of all, in all three examples,
there's a separate key for encryption, and
a separate key for MACing. These two
keys are independent of one another, and
both are generated at session setup time.
And we're gonna see how to generate these
two keys later on in the course. So the
first example is the SSL protocol. So the
way SSL combines encryption and MAC in the
hope of achieving authenticated encryption
is the following. Basically you take the
plain text, m, and then you compute a MAC
on the plain text, m. So you use your MAC
key, kI, to compute tag for this message
m. And then you can concatenate the tag to
the message and then you encrypt the
concatenation of the message and the tag
and what comes out is the actual final cipher text.
So that's option number one. The
second option is what IPsec does. So
here, you take the message. The first
thing you do is you encrypt the message.
And then, you compute a tag on the
resulting cipher text. So you notice the
tag itself is computed on the resulting
cipher text. A third option is what the
SSH protocol does. So here, the SSH takes
the message, and encrypts it using a CPA
secure encryption scheme. And then, to it,
it concatenates a tag of the message. The
difference between IPsec and SSH, is that
in IPsec, the tag is computed over the
cipher text, whereas, in SSH, the tag is
computed over the message. And so these
are three completely different ways of
combining encryption and MAC. And the
question is, which one of these is secure?
So, I will let you think about this for a
second, and then when we continue we'll do
the analysis together.
Okay. So let's start with the SSH method. So
in the SSH method you notice that the tag
is computed on the message and then
concatenated in the clear to the cipher text.
Now this is actually quite a problem
because MACs themselves are not designed
to provide confidentiality. MACs are only
designed for integrity. And in fact, there's
nothing wrong with a MAC that as part of
the tag outputs a few bits of the plain
text. Outputs a few bits of the message M.
That would be a perfectly fine tag. And yet if
we did that, that would completely break
CPA security here, because some bits of
the message are leaked in the cipher text.
And so the SSH approach, even though the
specifics of SSH are fine and the
protocol itself is not compromised by
this specific combination, generally it's
advisable not to use this approach. Simply
because the output of the MAC signing algorithm might leak bits of the message. So
now let's look at SSL and IPsec. As it
turns out, the recommended method actually
is the IPsec method. Because it turns out
no matter what CPA secure system and MAC
key you use the combination is always
gonna provide authenticated encryption.
Now let me very, very briefly explain why.
Basically what happens is once we encrypt
the message well the message contents now
is hidden inside the cipher text and now
when we compute a tag of the cipher text
basically we're locking, this tag locks
the cipher text and makes sure no one can
produce a different cipher text that would
look valid. And as a result this approach
ensures that any modifications to the
cipher text will be detected by the
decrypter simply because the MAC isn't
gonna verify. As it turns out, for the SSL
approach, there actually are kind of
pathological examples, where you combine
CPA secure encryption system with a secure
MAC. And the result is vulnerable to a
chosen cipher text attack, so that it does
not actually provide authenticated
encryption. And basically, the reason that
could happen, is that there's some sort of
a bad interaction between the encryption
scheme and the MAC algorithm. Such that,
in fact, there will be a chosen cipher
text attack. So if you're designing a new
project the recommendation now is to
always use encrypt then MAC because that
is secure no matter which CPA secure
encryption and secure MAC algorithm you're
combining. Now, just to set the
terminology, the SSL method is sometimes
called MAC-then-encrypt. And the
IPsec method is called encrypt-then-MAC.
The SSH method even though you're
not supposed to use it, is called encrypt-and-MAC. Okay, so I'll often refer to
encrypt-then-MAC, and MAC-then-encrypt to
differentiate SSL and IPsec. Okay, so
just to repeat what I've just said. The IPsec
method encrypt-then-MAC always
provides authenticated encryption. If you start
from a CPA secure cipher and a secure MAC
you will always get authenticated
encryption. As I said, MAC-then-encrypt in
fact, there are pathological cases where
the result is vulnerable to CCA attacks and
therefore does not provide authenticated
encryption. However, the story's a little
bit more interesting than that, in that,
it turns out, if you're actually using
randomized counter mode or randomized CBC,
then it turns out, for those particular
CPA secure encryption schemes, MAC-then-encrypt
actually does provide authenticated
encryption and therefore it is secure. In
fact, there's even a more interesting
twist here in that if you're using
randomized counter mode. Then, it's enough
that your MAC algorithm just be one time
secure. It doesn't have to be a fully
secure MAC. It just has to be secure when
a key is used to encrypt a single message,
okay? And when we talked about message
integrity, we saw that there are actually
much faster MACs that are one time secure
than MACs that are fully secure. As a
result, if you're using randomized counter
mode MAC-then-encrypt could actually
result in a more efficient encryption
mechanism. However, I'm going to repeat
this again. The recommendation is to use
encrypt-then-MAC and we're going to see a
number of attacks on systems that didn't
use encrypt-then-MAC. And so just to make
sure things are secure without you having
to think too hard about this. Again, I am
going to recommend that you always use
encrypt-then-MAC. Now, once the concept of
authenticated encryption became more
popular, a number of standardized
approaches for combining encryption and
MAC turned up. And those were even
standardized by the National Institute of
Standards. So I'm just gonna mention three
of these standards. Two of these were
standardized by NIST. And these are
called Galois counter mode and CBC counter
mode. And so let me explain what they do.
Galois counter mode basically uses counter
mode encryption, so a randomized counter
mode with a Carter-Wegman MAC, so a very
fact Carter-Wegman MAC. And the way the
Carter-Wegman MAC works in GCM is it's
basically a hash function of the message
that's being MACed. And then the result is
encrypted using a PRF. Now this hash
function in GCM is already quite fast to
the point where the bulk of the running
time of GCM is dominated by the counter
mode encryption and it's even made more so
in that Intel introduces a special
instruction PCLMULQDQ specifically
designed for the purpose of making the
hash function in GCM run as fast as possible.
Now CCM counter mode is another
NIST standard. It uses a CBC MAC and
then counter mode encryption. So this
mechanism, you know, this uses MAC, then
encrypt, like SSL does. So this is
actually not the recommended way of doing
things, but because counter mode
encryption is used. This is actually a
perfectly fine encryption mechanism. One
thing that I'd like to point out about
CCM, is that everything is based on AES.
You notice, it's using AES for the CBC
MAC, and it's using AES for the counter
mode encryption. And as a result, CCM can
be implemented with relatively little
code. Cause all you need is an AES engine
and nothing else. And because of this, CCM
actually was adopted by the Wi-Fi
alliance, and in fact, you're probably
using CCM on a daily basis if you're using
encrypted Wi-Fi 802.11i then you're
basically using CCM to encrypt traffic
between your laptop and the access point.
There's another mode called a EAX that
uses counter mode encryption, and then
CMAC. So, again you notice encrypt-then-MAC
and that's another fine mode to
use. We'll do a comparison of all these
modes in just a minute. Now I wanted to
mention that first of all, all these modes are
nonce-based. In other words, they don't
use any randomness but they do take as
input a nonce and the nonce has to be
unique per key. In other words, as you
remember, the pair (key, nonce)
should never ever, ever repeat. But the
nonce itself need not be random, so
it's perfectly fine to use a counter, for
example, as a nonce. And the other
important point is that, in fact, all
these modes are what's called
authenticated encryption with associated
data. This is an extension of
authenticated encryption, that comes
up very often in networking protocols. So
the idea between AEAD is that, in fact,
the message that's provided to the encryption
mode is not intended to be fully
encrypted. Only part of the message is
intended to be encrypted, but all of the
message is intended to be authenticated. A
good example of this is a network packet.
Think of like a IP packet where there's a
header and then there's a payload. And
typically the header is not gonna be
encrypted. For example, the header might
contain the destination of the packet, but
then the header had better not be
encrypted otherwise routers along the way
wouldn't know where to route the packet.
And so, typically the header is sent in
the clear, but the payload, of course, is
always encrypted, but what you'd like to
do is have the header be authenticated.
Not encrypted but authenticated. So this is
exactly what these AEAD modes do. They
will authenticate the header and then
encrypt the payload. But the header and
the payload are bound together in the
authentication so they can't
actually be separated. So this is not
difficult to do. What happens is in these
three modes GCM, CCM, and EAX, basically
the MAC is applied to the entire data. But
the encryption is only applied to the part
of the data that needs to be encrypted.
So I wanted to show you what an API
to these authenticated encryption with
associated data encryption schemes look
like. So here's what it looks like in OpenSSL.
For example, this is, an API
for GCM. So what you do is you call the
init function to initialize the encryption
mode, and you notice you give it a key and
the nonce. The nonce again,
doesn't have to be random, but it has to
be unique. And after initialization, you
would call this encrypt function, where
you see that you give it the associated
data that's gonna be authenticated, but
not encrypted. You give it the data, and
it's gonna be both authenticated and
encrypted. And it gives you back the full
cipher text, which is an encryption of the
data, but of course does not include the
AEAD, because the AEAD is gonna be sent in
the clear. So now that we understand
this mode of encrypt-then-MAC, we can go
back to the definition of MAC security and
I can explain to you something that might
have been a little obscure when we looked
at that definition. So if you remember,
one of the requirements that followed from
our definition of secure MACs meant that
given a message-MAC pair on a message M,
the attacker cannot produce another tag on
the same message M. In other words, even
though the attacker already has a tag for
the message M, he shouldn't be able to
produce a new tag for the same message M.
And it's really not clear, why does that
matter? Who cares, if the adversary already
has a tag on the message M, who cares if
he can produce another tag? Well, it turns
out if the MAC didn't have this property.
In other words, given a message-MAC pair
you can produce another MAC on
the same message, then that MAC would
result in an insecure encrypt-then-MAC mode.
And so if we want our encrypt-then-MAC to
have cipher text integrity, it's crucial
that our MAC security would imply this strong
notion of security, which, of course, it
does because we defined it correctly.
So let's see what would go wrong, if, in
fact, it was easy to produce this type of
forgery. So what I'll do is I'll show you
a chosen cipher text attack on the
resulting encrypt-then-MAC system. And
since the system has a chosen cipher text
attack on it, it necessarily means that it
doesn't provide authenticated
encryption. So let's see. So the
adversary's gonnna start by sending two
messages, M0 and M1. And he's gonna
receive, as usual, the encryption of one
of them, either the encryption of M0 or
the encryption of M1. And since we're
using encrypt-then-MAC, the adversary
receives the cipher text we'll call it C0
and a MAC on the cipher text C0.
Well now we said that given the MAC on
a message the adversary can produce
another MAC on the same message. So what
he's gonna do is he's gonna produce
another MAC on the message C0. Now he has
a new cipher text (C0,T'), which is a
perfectly valid cipher text. T' is a
valid MAC of C0. Therefore, the adversary
now can submit a chosen cipher text query
on C' and this is a valid chosen
cipher text query because it's different
from C. It's a new cipher text. The poor
challenger now is forced to decrypt this
cipher text C' so he's going to send
back the decryption of C'. It's a
valid cipher text therefore the decryption
of C prime is the message Mb but now the
attacker just learned the value of B
because he can test whether Mb is equal to
M0 or MB is equal to M1. As a result he
can just output B and he gets advantage
one in defeating the scheme. And so
again if our MAC security did not imply
this property here. Then, there would be a
chosen cipher text attack on encrypt-then-MAC.
And therefore, it would not be secure. So the
fact that we define MAC security correctly
means that encrypt-then-MAC really does
provide authenticated encryption. And
throughout all the MACs that we discussed
actually do satisfy this strong notion of
unforgeability. So, interestingly, this is
not the end of the story. So, as we said
before the concept of authenticated
encryption was introduced everyone was
just combining MACs and encryption in
various ways in the hope of achieving
some authenticated encryption. After
the notion of authenticated encryption
became formalized and rigorous, people
kind of started scratching their heads and said,
hey, wait a minute. Maybe we can achieve
authenticated encryption more efficiently
than by combining a MAC and an encryption
scheme. In fact, if you think about how
this combination of MAC and encryption
works, let's say we combine counter mode
with CMAC, then for every block of
plaintext, you first of all have to use
your block cipher for counter mode, and
then you have to use to your block cipher
again, for the CBC-MAC. This means that if
you're combining CPA secure encryption with a
MAC, for every block of plaintext, you
have to evaluate your block cipher twice,
once for the MAC and once for the
encryption scheme. So the natural question
was, can we construct an authenticated
encryption scheme directly from a PRP,
such that we would have to only evaluate
the PRP once per block? And it turns out
the answer is yes, and there's this
beautiful construction called OCB, that
pretty much does everything you want, and
is much faster than constructions that are
separately built from an encryption and a
MAC. So I wrote down, kind of a schematic
of OCB. I don't want to explain it in
detail. I'll just kind of explain it at a
high level. So here we have our input
plain text, here at the top. And you
notice that, first of all, OCB is
parallelizable, completely parallelizable.
So every block can be encrypted separately of
every other block. The other thing to
notice is that as I promised, you only
evaluate your block cipher once per plain
text block. And then you evaluate it one
more time at the end to build your
authentication tag and then the overhead
of OCB beyond just a block cipher is
minimal. All you have to do is evaluate a
certain very simple function P. The
nonce goes into the P you notice, the
key goes into this P and then there is a
block counter that goes into this P. So
you just evaluate this function P, twice
for every block and you XOR the result
before and after encryption using the
block cipher and that's it. That's all you
have to do and then you get a very fast
and efficient authenticated encryption
scheme built from a block cipher. So OCB
actually has a nice security theorem
associated with it and I am going to point
to a paper on OCB when we get to end of
this module where I list some further
reading papers that you can take a look
at. So you might be wondering if OCB is so
much better than everything you've seen so
far, all these three standards CCM, GCM and
EAX why isn't OCB being used or why isn't
OCB the standard? And the answer is a
little sad. The primary answer that
OCB is not being used is actually because
of various patents. And I'll just leave it
at that. So to conclude this section I
wanted to show you some performance
numbers. So here on the right I listed
performance numbers for modes that you
shouldn't be using. So this is for
randomized counter mode, and this is for
randomized CBC. And you can see also the
performance of CBC MAC is basically the
same as the performance of CBC encryption.
Okay. Now here are the authenticated
encryption modes, so these are the ones
that you're supposed to using, these
you're not supposed to be using on their
own, right. These two, you should never
ever use these two because they only
provide CPA security, they don't
actually provide security against active
attacks. You're only supposed to use
authenticated encryption for encryption.
And so I listed performance numbers
for the three standards. And let me remind
you that GCM basically uses a very fast
hash. And then it uses counter mode for
actual encryption. And you can see that
the overhead of GCM over counter mode is
relatively small. CCM and EAX both use a
block cipher based encryption and a
block cipher based MAC. And as a result
they're about twice as slow as counter
modes. You see that OCB is actually the
fastest of these, primarily because it
only use the block cipher once per message
block. So based on these performance
numbers, you would think that GCM is
exactly the right mode to always use. But
it turns out if you're on the space
constrained hardware, GCM is not ideal.
Primarily because its implementation
requires larger code than the other two
modes. However, as I said, Intel
specifically added instructions to speed
up GCM mode. And as a result, implementing
GCM on an Intel architecture takes
very little code. But on other hardware
platforms, say in smart cards or other
constrained environments, the code size
for implementing GCM would be considerably
larger than for the other two modes. But
if code size is not a constraint then GCM
is the right mode to use. So to summarize
this segment I want to say it one more
time that when you want to encrypt
messages you have to use an authenticated
encryption mode and the recommended way to
do it is to use one of the standards,
namely one of these three modes for
providing authenticated encryption.
Don't implement the encryption scheme yourself.
In other words don't implement
encrypt-then-MAC yourself. Just use one of these
three standards. Many crypto libraries
now provide standard API's for these three
modes and these are the one's you should
be using and nothing else. In the next
segment we're going to see what else can
go wrong when you try to implement
authenticated encryption by yourself.