WEBVTT

00:00:00.000 --> 00:00:04.052
Now that we know what MACs are, let's go
ahead and build our first secure MACs.

00:00:04.052 --> 00:00:08.469
First I want to remind you that a MAC is a
pair of algorithms. The first is a signing

00:00:08.469 --> 00:00:12.922
algorithm that's given a message and a key
will generate a corresponding tag And the

00:00:12.922 --> 00:00:17.103
second is verification algorithms are
given a key message and a tag while

00:00:17.103 --> 00:00:21.736
outputs zero or one depending on whether
the tag is valid or not. And we said that

00:00:21.736 --> 00:00:26.313
a MAC is secure if it is existentially
unforgeable under a chosen message attack.

00:00:26.313 --> 00:00:30.890
In other words, the attackers allow to
mount a chosen message attack where he can

00:00:30.890 --> 00:00:35.298
submit arbitrary messages of his choice
and obtain the corresponding tags for

00:00:35.298 --> 00:00:39.520
those messages, and then despite the
ability to generate arbitrary tags The

00:00:39.520 --> 00:00:43.616
attacker cannot create a new message-tag pair that was not given to him

00:00:43.616 --> 00:00:47.976
during the chosen message attack. Okay so
we've already seen this definition in the

00:00:47.976 --> 00:00:52.179
last segment and now the question is how
do we build secure MACs? So the first

00:00:52.179 --> 00:00:57.217
example I want to give you is basically
showing that any secure PRF directly gives

00:00:57.217 --> 00:01:01.952
us a secure MAC as well. So let's see how
we do it. So suppose we have a pseudo

00:01:01.952 --> 00:01:06.808
random function so the pseudo random
function takes and inputs X and outputs in

00:01:06.808 --> 00:01:12.173
Y. And let's define the following MAC. So
the way we sign a message 'M' is basically

00:01:12.173 --> 00:01:17.182
by simply evaluating the function at the
point 'M' So the tag for the message M is

00:01:17.182 --> 00:01:21.350
simply the value of the function at the
point M and then the way we verify a

00:01:21.350 --> 00:01:26.006
message to that pair is by recomputing the
value of the function at the message M and

00:01:26.006 --> 00:01:30.283
checking whether that is equal to the tag
given to us. We say yes if so and we

00:01:30.283 --> 00:01:34.681
reject otherwise. So here you have in
pictures basically that when Alice wants

00:01:34.681 --> 00:01:39.023
to send a message to Bob she computes a
tag by the value of PRF and then she

00:01:39.023 --> 00:01:43.252
appends this tag to the message, Bob
receives the corresponding tab pair, he

00:01:43.252 --> 00:01:47.820
recomputes the value of the function and
tests whether the given tag is actually

00:01:47.820 --> 00:01:52.730
equal to the value of the function at the
point M. So let's look at a bad example of

00:01:52.730 --> 00:01:57.832
this instruction. And so suppose that we
have a pseudo-random function that happens

00:01:57.832 --> 00:02:02.873
to output only ten bits. Okay, so this is
a fine pseudo-random function and it just

00:02:02.873 --> 00:02:07.668
so happens that for any message 'M' it
only outputs ten bits value. My question

00:02:07.668 --> 00:02:12.463
to you is if we use this function 'F' to
construct a MAC, is that going to be a

00:02:12.463 --> 00:02:17.184
secure MAC? So the answer is no, this MAC
is insecure. In particular because the

00:02:17.184 --> 00:02:21.471
tags are too short. So consider the
following simple adversary. What the

00:02:21.471 --> 00:02:26.119
adversary will do is simply choose an
arbitrary message M and just guess the

00:02:26.119 --> 00:02:30.768
value of the MAC for that particular
message. Now, because the tag is only ten

00:02:30.768 --> 00:02:35.175
bits long, the adversary has a chance of
one in two to the ten in guessing the MAC

00:02:35.175 --> 00:02:40.004
correctly. In other words, the advantage
of the guessing adversary, one distinctly

00:02:40.004 --> 00:02:44.351
guesses a random tag for a given message.
That adversary is going to have an

00:02:44.351 --> 00:02:48.898
advantage against the mac that's basically
one over two to the ten which is one over

00:02:48.898 --> 00:02:52.962
a thousand twenty four and that's
definitely non negligible. So the adversary

00:02:52.962 --> 00:02:57.348
basically will successfully forge the mac
on a given message with probability one

00:02:57.348 --> 00:03:01.841
on a thousand which is insecure. However
it turns out this is the only example that

00:03:01.841 --> 00:03:06.280
where things can go wrong, only when the
output of the function is small can things

00:03:06.280 --> 00:03:10.536
go wrong. If the output of the PRF is
large Then we get a secure MAC out of this

00:03:10.536 --> 00:03:14.344
function. And let's state the security
theorem here. So suppose we have a

00:03:14.344 --> 00:03:18.257
function 'F' that takes messages in 'X'
and outputs tags in 'Y', then the MAC

00:03:18.257 --> 00:03:22.588
that's derived from this PRF in fact is a
secure MAC. In particular, if you look at

00:03:22.588 --> 00:03:26.804
the security theorem here, you'll see very
clearly the era bounds, in other words

00:03:26.804 --> 00:03:31.179
since the PRF is secure we know that this
quantity here is negligible. And so if we

00:03:31.179 --> 00:03:35.395
want this quantity to be negligible, this
is what we want, we want to say that no

00:03:35.395 --> 00:03:39.664
adversary can defeat the MAC 'I sub F',
that implies that we want this quantity to

00:03:39.664 --> 00:03:43.722
be negligible, in other words we want the
output space to be large. And so for

00:03:43.722 --> 00:03:48.096
example, taking a PRF that outputs eighty
bits is perfectly fine. That will generate

00:03:48.096 --> 00:03:52.102
an eighty bit MAC and therefore the
advantage of any adversary will be at most

00:03:52.102 --> 00:03:56.521
one over two to the eighty. So now the proof
of this theorem is really simple, so lets

00:03:56.521 --> 00:04:00.906
just go ahead and do it. So in fact lets
start by saying that suppose instead of a

00:04:00.906 --> 00:04:05.446
PRF we have a truly random function from
the message space to the tag space so this

00:04:05.446 --> 00:04:10.087
is just a random function from X to Y
that's chosen at random from the set of

00:04:10.087 --> 00:04:14.966
all such functions. Now lets see if that
function can give us a secure mac. So what

00:04:14.966 --> 00:04:19.551
the adversary says is, 'I want a tag on
the message M1'. What he gets back is the

00:04:19.551 --> 00:04:24.157
tag which just happens to be the function
evaluated at the point M1. Notice there is

00:04:24.157 --> 00:04:28.489
no key here because F is just a truly
random function from X to Y. And then the

00:04:28.489 --> 00:04:33.096
adversary gets to choose a message from M2
and he obtains the tag from M2, he choose

00:04:33.096 --> 00:04:37.264
M3, M4 up to MQ and he obtains all the
corresponding tags. Now his goal is to

00:04:37.264 --> 00:04:41.432
produce a message tag pair and we say that
he wins, remember that this is an

00:04:41.432 --> 00:04:45.891
existential forgery, in other words first
of all T has to be equal to F of M This

00:04:45.891 --> 00:04:49.968
means that 'T' is a valid tag for the
message 'M'. And second of all, the

00:04:49.968 --> 00:04:54.685
message 'M' has to be new. So the message
'M' had better not be one of M-one to M-Q.

00:04:54.685 --> 00:04:58.879
But let's just think about this for a
minute, what does this mean? So the

00:04:58.879 --> 00:05:03.830
adversary got to see the value of a truly
random function at the points M-one to M-Q

00:05:03.830 --> 00:05:08.800
and now he?s suppose to predict the value
of this function as some new point, M

00:05:08.800 --> 00:05:13.411
However, for a truly random function, the
value of the function at the point M is

00:05:13.411 --> 00:05:18.195
independent of its value at the points M-1
to M-Q. So the best the adversary can do

00:05:18.195 --> 00:05:22.749
at predicting the value of the function at
the point M is just guess the value,

00:05:22.749 --> 00:05:27.302
because he has no information about F of
M. And as a result his advantage if he

00:05:27.302 --> 00:05:31.625
guesses the value of the function at the
point M he'll guess it right with

00:05:31.625 --> 00:05:36.294
probability exactly one over Y. And then
the tag that he produced will be correct

00:05:36.294 --> 00:05:40.582
with probability exactly one over Y. Okay,
again he had no information about the

00:05:40.582 --> 00:05:44.801
value of the function of M and so the best
he could do is guess. And if he guesses,

00:05:44.801 --> 00:05:49.347
he'll get it right with probability one
over Y. And now of course, because the

00:05:49.347 --> 00:05:54.420
function capital F is a pseudo random
function The adversary is gonna behave the

00:05:54.420 --> 00:05:58.565
same whether we give him the truly random
function or the pseudo random function.

00:05:58.565 --> 00:06:02.659
The adversary can't tell the difference
and as a result even if we use a pseudo

00:06:02.659 --> 00:06:06.600
random function, the adversary is gonna
have advantages at most one over Y in

00:06:06.600 --> 00:06:10.774
winning the game. Okay, so you can see
exactly the security theorem, let's go

00:06:10.774 --> 00:06:15.561
back there for just a second. Essentially
this is basically why we got an error term

00:06:15.561 --> 00:06:20.005
of one over Y because of the guessing
attack and that's the only way that the

00:06:20.005 --> 00:06:24.734
attacker can win the game. So now that we
know that any secure PRF is also a secure

00:06:24.734 --> 00:06:29.122
MAC, we already know that we have our
first example MAC. In particular, we know

00:06:29.122 --> 00:06:33.680
that AES, or at least we believe that AES
is a secure PRF, therefore, AES since it

00:06:33.680 --> 00:06:38.011
takes sixteen byte inputs, right the
message space for AES is 128 bits, which

00:06:38.011 --> 00:06:43.212
is sixteen bytes. Therefore the AES cipher
essentially gives us a MAC that can match

00:06:43.212 --> 00:06:48.140
messages that are exactly sixteen bytes.
Okay, so that's our first example of a

00:06:48.140 --> 00:06:53.257
MAC. But now the question is if we have a
PRF for small inputs like AES that only

00:06:53.257 --> 00:06:58.564
acts on sixteen bytes, can we build a MAC
for big messages that can act on gigabytes

00:06:58.564 --> 00:07:02.066
of data? Sometimes I call this the
McDonald's problem. Basically given a

00:07:02.066 --> 00:07:05.871
small MAC and we build a big MAC out of
it. In other words, given a MAC for small

00:07:05.871 --> 00:07:10.234
messages and we build a MAC for large
messages. So we're gonna look at two

00:07:10.234 --> 00:07:14.835
constructions for doing so. The first
example is called a CBC MAC that again

00:07:14.835 --> 00:07:19.315
takes PRF for small messages as input
and produces a PRF for very large

00:07:19.315 --> 00:07:23.686
messages. As outputs. The second one we'll
see is HMAC which does the same thing

00:07:23.686 --> 00:07:28.278
again takes a PRF for small inputs and
generates a PRF for very large inputs. Now

00:07:28.278 --> 00:07:32.050
the two are used in very different
contexts. Cbc-mac is actually very

00:07:32.050 --> 00:07:36.315
commonly used in the banking industry. For
example, there's a system called the

00:07:36.315 --> 00:07:40.797
Automatic Clearing House, ACH, which banks
use to clear checks with one another and

00:07:40.797 --> 00:07:44.788
that system, CBC-MAC, is used to ensure
integrity of the checks as they're

00:07:44.788 --> 00:07:49.107
transferred from bank to bank. On the
Internet, protocols like SSL and IPsec and

00:07:49.107 --> 00:07:53.173
SSH, those all use HMAC for integrity. Two
different MACs and were gonna discuss them

00:07:53.173 --> 00:07:57.086
in the next couple of segments. And as I
said were also gonna start from a PRF for

00:07:57.086 --> 00:08:01.274
small messages and produce PRF for
messages that are gigabytes long and in

00:08:01.274 --> 00:08:06.043
particular they can both be substantiated
with AES as the underlying cipher. So the

00:08:06.043 --> 00:08:10.597
last comment I want to make about these
PRF based MACs is that in fact their

00:08:10.597 --> 00:08:15.269
output can be truncated. So suppose we
have a PRF that outputs N bit outputs. So

00:08:15.269 --> 00:08:19.941
again for AES this would be a PRF that
outputs 128 bits as outputs. Its an easy

00:08:19.941 --> 00:08:24.909
lemma to show that in fact if you have an
N bit PRF if you truncated, in other words

00:08:24.909 --> 00:08:31.062
if you only output first key bits The
result is also a secure PRF and the

00:08:31.062 --> 00:08:36.529
intuition here is if the big PRF outputs N
bits of randomness for any inputs that you

00:08:36.529 --> 00:08:42.059
give to the PRF then certainly chopping it
off and truncating it to T bits is still

00:08:42.059 --> 00:08:46.572
gonna look random. The attacker now gets
less information so his job of

00:08:46.572 --> 00:08:51.657
distinguishing the outputs from random
just became harder. In other words, if the

00:08:51.657 --> 00:08:56.742
N bit PRF is secure, then the T less than
N bit PRF, the truncated PRF, would also

00:08:56.742 --> 00:09:01.177
be secure. So this is an easy lemma and
since any secure PRF also gives us a

00:09:01.177 --> 00:09:05.993
secure MAC, what this means is if you give
me a MAC that's based on a PRF and what I

00:09:05.993 --> 00:09:10.686
can do is I can truncate it to W bits,
however, because of the error term in the

00:09:10.864 --> 00:09:15.379
MAC based PRF theorem we know that
truncating to W bits will only be secure

00:09:15.379 --> 00:09:19.954
as long as one over two to the W is
negligible. So if you truncate the PRF to

00:09:19.954 --> 00:09:24.410
only three bits, the resulting MAC is not
going to be secure. However, if you

00:09:24.410 --> 00:09:29.222
truncate it to say 80 bits or maybe even
64 bits, then the resulting MAC is still

00:09:29.222 --> 00:09:33.974
gonna be a secure MAC. Okay, so the thing
to remember here is that even though we

00:09:33.974 --> 00:09:39.235
use AES to construct larger PRFs and the
output of these PRFs is gonna be 128 bits,

00:09:39.235 --> 00:09:44.115
it doesn't mean that the MAC itself has to
produce 128 bit tags We can always

00:09:44.115 --> 00:09:48.550
truncate the outputs to 90 bits or 80
bits, and as a result, we would get still

00:09:48.550 --> 00:09:53.097
secure MACs but now the output tag is
gonna be more reasonable size and doesn't

00:09:53.097 --> 00:09:57.702
have to be the full 128 bits. Okay, so in
the next segment we're gonna look at how

00:09:57.702 --> 00:09:58.726
the CBC-MAC works.