-
Well, we're almost done with our
discussion of symmetric encryption. There
-
are just a couple of odds and ends that
I'd like to discuss before we move on to
-
the next topic. So the first thing I'd
like to mention is how we derive many keys
-
from one key. And it, actually, this comes
up all the time in practice, so I'd like
-
to make sure you know how to do this
correctly. So what's the setting that
-
we're looking at? Well, imagine we have a
certain source key that's generated by one
-
of, a number of methods. Imagine the
source key is generated by a hardware
-
random number generator or perhaps is
generated by a key exchange protocol
-
which we're going to discuss later. But
anyhow, there are a number of ways in
-
which a source key might be generated
between Alice and Bob, such that the
-
attacker doesn't know what the source key
is. But now, as we said, in many cases, we
-
actually need many keys to secure a
session, not just one single source key.
-
For example, if you remember, in TLS there
were unidirectional keys and we
-
needed keys in each direction. And in
fact, in each direction, we needed
-
multiple keys. We needed a MAC key, we
needed an encryption key. We need an IV,
-
and so on. Similarly nonce based
encryption, you remember, there were
-
multiple keys that were being used, and so
on. And so, the question is, how do we use
-
the one source key that we just derived,
either from a hardware process or
-
by key exchange, and generate a bunch of
keys from it that we could then use to
-
secure our session. The way that's done,
is using a mechanism called a key
-
derivation function, KDF. And I want to
talk a little bit about how KDF's are
-
constructed. So first of all, suppose we
have a secure PRF, that happens to have
-
key space K. And now, suppose that it so
happens that our source key SK is uniform
-
in the key K. In this case, the source key
is, in fact, a uniform random key for the
-
secure PRF F. And we can use it directly to
generate keys, all the keys that we need
-
to secure the session. So in this case,
the KDF is really simple. The key
-
derivation function would just work as
follows. It would take as input the
-
source key. It would take an input, a
parameter context, which I'm gonna
-
describe in just a minute. And then it's
gonna take a length input as input as
-
well. And then what it will do is it will
basically evaluate the PRF on zero. Then
-
it will evaluate the PRF on one. Then it
will evaluate the PRF on two, up until L.
-
And I will talk about what this context is
in just a second. And then, basically, you
-
would use as many bits of the output as
you would need to generate all the keys
-
for the session. So, if you need unidirectional keys you would generate, you
-
know, one key in each direction where each key
might include an encryption key and a MAC
-
key. And so, you would basically generate
as many bits as you need and then finally
-
cut off the output at the time when you've
generated enough keys to secure your
-
session. Okay so this is a fairly straight
forward mechanism it's basically using the
-
secure PRF as a pseudo random generator.
And the only question is what is its
-
context string. Well, I'll tell you that
the context string is basically a unique
-
string that identifies the application. So
in fact you might have multiple
-
applications on the same system that's
trying to establish multiple secure keys.
-
Maybe you have SSH running as one process,
you have a web server running as another process,
-
IPsec as a third process and all three need
to have secret keys generated. And this
-
context variable basically tries to separate
the three of them. So, let me ask you,
-
more precisely, what do you think
the purpose of this context variable is?
-
So I guess I've given
it away and this context variable is
-
supposed to basically separate
applications, so that even if, for
-
example, the three services that we just
talked about, SSH, web server, and IPsec,
-
if they all happen to obtain the same
source key from the hardware random number
-
generator then the context since it's
different for the three apps will make
-
sure that they still get three independent
strings that they can then use to secure
-
the sessions. I just want you to remember
that, even though this is actually fairly
-
straightforward, and we discussed this
before, the context string is actually
-
important, and it does need to be specific
to the application, so that each
-
application gets its own session keys,
even if multiple applications happen to
-
sample the same SK. The next
question is, what do we do if the source
-
key actually isn't uniform? Well, now we
got a problem. If the source key is not a
-
uniform key for the pseudo random function
then we can no longer assume that the
-
output of the pseudo random function is
indistinguishable from random. In fact, if
-
we just use the KDF that we just described
then the output might not look random to
-
the adversary and then he might be able to
anticipate some of the session keys that
-
we'll be using and thereby break the
session. So then we have a problem. Now
-
why would this source key not be uniform?
Well there are many reasons why this
-
happened. For example if you use a key
exchange protocol, it so happens typically
-
that key exchange protocols will generate a
high entropy key. But the
-
high entropy key is gonna be
distributed in some subspace of the key
-
space. So it's not going to be a uniform
string. It will be uniform in some
-
subset of a larger set, And we'll see
examples of that as soon as we talk about
-
key exchange protocols. And so KDFs have
to kind of accommodate for the fact that
-
key exchange protocols actually don't
generate uniform bit strings. The other
-
problem is, that, in fact, the hardware
random number generator you're using might
-
actually produce biased outputs. We don't
wanna rely on the non bias of the hardware
-
random number generator. And so all we
want to assume is that it generates a high
-
entropy string, but one that might be
biased. In which case, we have to somehow
-
clean this bias. And so this introduces
this, this paradigm for building KDFs.
-
This is called the extract-then-expand
paradigm, where the first step
-
of the KDF is to extract a pseudo random
key from the actual source key. So in a
-
picture you can think about it like this.
In some sense these are the different
-
possible values of the source key. This is
the horizontal line and the vertical axis
-
is basically the probability of each one
of these values, and you can see that this
-
is a kind of a bumpy function which would
say that the source key is not uniformly
-
distributed in the key space. What we do
in this case is we use what's called an
-
extractor. So an extractor is something
that takes a bumpy distribution and makes
-
it into a uniform distribution over the
key space. In our case we're actually just
-
gonna be using what are called
computational extractors, namely
-
extractors that don't necessarily produce
uniform distribution at the end but
-
they generated distribution that's
indistinguishable from uniform.
-
Now extractors typically take as input
something called a salt, and a salt just
-
like in a salad, it kind of adds flavor to
things, what it does is basically kind of
-
jumbles things around, so that no matter
what the input distribution is, the output
-
distribution is still going to be
indistinguishable from random. So a salt
-
basically, what is it? It's a non-secret
string, so it's publicly known. It doesn't
-
matter if the adversary knows what the
salt is, and it's fixed forever. The only
-
point is that when you chose it, you chose
one at random. And then the hope is that
-
the funny distribution that you're trying
to extract from kinda doesn't inherently
-
depends on the salt that you chose and
hence as a result using your salt, you
-
will actually get a distribution that
looks indistinguishable from random. So
-
essentially the salt, you know, you can
just bang it the keyboard a couple of
-
times when you generate it but it just
needs to be something that's random
-
initially but then it's fixed forever, and
it's fine if the adversary knows what
-
it is and nevertheless the extractor is
able to extract the entropy and output a
-
uniformly random string K. In some sense the
salt is only there to defend against
-
adversarially bad distributions that might
mess up our extractor. Okay, so now that
-
we have extracted a pseudo random key.
Now, we might as well just use it in a KDF
-
that we just saw using a secure
pseudo random function to expand the key
-
into as many bits as we need to actually
secure the session. Okay, so there are
-
these two steps. The first one is we
extract a pseudo-random key, and then once
-
we have a pseudo-random key we already
know how to extend it into as many keys as
-
we need using a pseudo-random function. So
the standardized way of doing this is
-
called HKDF. This is a KDF, a key derivation function that's built from HMAC.
-
And here HMAC is used both as the PRF for
expanding and an extractor for extracting
-
the initial pseudo-random key. So let me
explain how this works. So in the extract
-
step, we're gonna use our salt which you
remember is a public value just happened to
-
be generated at random at the beginning of
time. And we use this salt as the HMAC
-
key. And then the source key we're gonna
use as the HMAC data. So we're kind of
-
using a public value as a key. And
nevertheless, one can argue that HMAC has
-
extraction properties, such that, when we
apply HMAC, the resulting key is going to
-
look indistinguishable from random,
assuming that the source key actually has
-
enough entropy to it. And now that we have
the pseudo random key we're simply going
-
to use HMAC as a PRF to generate a
session key of you know as many bits as we
-
need for the session keys. Okay. So that
basically concludes our discussion of
-
HKDF. And I just want you to remember
that, once you obtain a source key, either
-
from hardware or from a key exchange
protocol, the way you convert it into
-
session keys is not by using that sample
directly. You would never use a source key
-
directly as a session key in a protocol.
What you would do is you will run the
-
source key through a KDF. And the KDF
would give you all the keys and output
-
that you need, for, the randomness, for
the random keys to be used in your
-
protocol. And a typical KDF to use is
HKDF, which is actually getting quite a
-
bit of traction out there. Okay. The last
topic I wanna talk about in this segment
-
is, how do you extract keys from
passwords. These are called password based
-
KDFs or PBKDFs. The problem here is
that passwords have relatively low
-
entropy. In fact, we're gonna talk about
passwords later on in the course when we
-
talk about user authentication. And so,
I'm not gonna say too much here. I'll just
-
say passwords generally have very little
entropy estimated on the order of twenty
-
bits of entropy, say. And as a result,
there is simply not enough entropy to
-
generate session keys out of a password.
And yet we still need to do it very
-
frequently. We still need to derive
encryption keys and MACs and so on out of
-
passwords, so the question is how to do
it. The first thing is, you know, for this
-
kind of purpose, don't use HKDF. That's
not what it's designed for. What will
-
happen is that the derived keys will
actually be vulnerable to something called
-
a dictionary attack, which we're gonna
talk about much later in the course when
-
we talk about user authentication. So, the
way PBKDFs defend against this low entropy
-
problem that results in a dictionary
attack is by two means. First of all, as
-
before they use a salt, a public,
random value that's fixed forever. But in
-
addition, they also use what's called a
slow hash function. And let me describe
-
kind of the standard approach to deriving
keys from passwords. This is called PKCS#5,
-
and in particular, the version I'll describe
is what's called PBKDF1. And I
-
should say that this mechanism is
implemented in most crypto libraries so
-
you shouldn't have to implement this
yourself. All you would do, you know, you
-
would call a function, you know, derived
key from password. You would give the
-
password in as input, and you would get a
key as output. But you should be aware of
-
course that this key is not going to have
high entropy so in fact it will be
-
guessable. What these PBKDFs try to do is
make the guessing problem as hard as
-
possible. Okay. So the way they work,
first of all, is, as we said, they
-
basically hash, the concatenation of the
password and the salt. And then the hash
-
itself is designed to be a very slow hash
function. And the way we build a slow hash
-
function is by taking one particular hash
function, say, SHA-256, and we
-
iterate it many, many times, C times. So
you can imagine 1000 times, perhaps even a
-
million times. And what do I mean by
iterating it? So, well, we take the
-
password and the salt. And we put them
inside of one input to the hash function.
-
And then we apply the hash function, oops,
let me write it like this. And then we
-
apply the hash function and we get an
output, and then we apply the hash
-
function again and we get another output.
And we do this again and again and again
-
maybe a thousand times or a million times
depending on how fast your processors are
-
and then finally we get the final output
that we actually output as, as the key
-
output of this key derivation function.
Now what is the point here? Iterating a
-
function 10,000 times or even a million
times is going to take very little time on
-
a modern CPU, and as a result, it doesn't
really affect the user's experience. The
-
user types in his password, it gets hashed
a million times and then it gets output.
-
And maybe that could even take, you know a
tenth of a second and the user wouldn't
-
even notice it. However the attacker, all
he can do is he can try all the passwords
-
in the dictionary, because we know people
tend to pick passwords in dictionaries,
-
and so he could just try them one by one,
remember the SALT is public, so he knows
-
what the SALT is. And so he can just try this
hash one by one. However because the hash
-
function is slow, each attempt is gonna
take him a tenth of second. So if he needs
-
to run through a dictionary, you know, with,
with a 200 billion passwords in it,
-
because the hash function is slow, this is
gonna take quite awhile. And by doing
-
that, we slow down the dictionary attack,
and we make it harder for the attacker to
-
get our session keys. Not impossible, just
harder. That's all this is trying to do.
-
Okay, so this is basically what I wanted
to say about password based KDFs. As I
-
said, this is not something you would
build yourself. All crypto libraries have
-
an implementation of a PKCS#5
mechanism. And you would just call the
-
appropriate function to convert a password
into a key, and then use the resulting
-
key. Okay, in the next segment, we're
gonna see how to use symmetric encryption
-
in a way that allows us to search on the
cipher texts.