In this segment, we're gonna look at the
Diffie-Hellman protocol, which is our
first practical key exchange mechanism. So
let me remind you of the settings. Our
friends here, Alice and Bob, have never
met and yet they wanna exchange a shared
secret key that they can then use to
communicate securely with one another.
In this segment and the next, we're only
gonna be looking at eavesdropping
security. In other words, we only care
about eavesdroppers. The attackers are
actually not allowed to tamper with data
in the network. They're not allowed to
inject packets. They're not allowed to
change packets in any way. All they do is
to just eavesdrop on the traffic. And at
the end of the protocol, the key exchange
protocol our friends Alice and Bob should
have a shared secret key but the attacker
namely the eavesdropper has no idea what
that key's gonna be. In the previous
segment we looked at a key segment called
Merkle puzzles. That's just using block
ciphers or hash functions. And I showed
you that there that basically the attacker
has a quadratic gap compared to the
participants. In other words if the
participants spent time proportional to N
the attacker can break the protocol in
time proportional to N squared. And as a
result that protocol is to insecure to be
considered practical. In this segment we
are going to ask whether we can do the
same thing but now we'd like to achieve an
exponential gap between the attacker's
work Ended up in the participant's work.
In other words, Alice and Bob might do
work proportional to N, but to break the
protocol the attacker is gonna have to do
work proportional to some exponential in
N. So now there's an exponential gap
between the participants work and the
attacker's work. So as I said in the
previous segment we can't achieve this
exponential gap from blog ciphers like AES
or SHA-256. We have to use hard problems
that have more structure than just those
symmetric primitives. And so instead what
we're gonna do is use a little bit of algebra.
In this segment I'm gonna
describe the Diffie-Hellman protocol very
concretely and somewhat informally. When
we're gonna come back to Diffie-Hellman next week
and we're gonna describe the protocol more
abstractly and we're gonna do a much more
rigorous security analysis of this
protocol. Okay, so how does Diffie-Hellman
work? What we're gonna do is, first of
all, we're gonna fix some large prime
which I'm gonna call p. Actually p I'll
usually use to denote primes. And you
should be thinking of really gigantic
primes. Like, primes that are made up of
600 digits are so. And just for those of
you who like to think in binary, a 600
digit prime roughly corresponds to about
2000 bits. So just writing down the prime
takes about two kilo bits of data. And
then we're also gonna fix an integer g
that happens to live in the range one to
p. So these values p and g are parameters
of the Diffie-Hellman protocol. They are
chosen once and they're fixed forever. Now
the protocol works as follow. So here's
our friends Alice and Bob. So what Alice's
going to do is she's gonna starts off by
choosing some random number a in the range 1 to p-1
And then she is gonna compute
the number g to the power of a modulo p.
Okay, so she computes this exponention,
and reduces the result modular the prime p.
And we're actually going to see how to
compute this efficiently the next module,
so for now just believe me that this
computation can be done efficiently.
Now, let's call capital A the result of this
value. So, what Alice will do is she will
send capital A over to Bob. Now Bob is
going to do the same thing. He's going to
choose a random number b in the range 1
to p-1. He's going to compute g to
the b module of p and let's call this
number B and he's going to send this
number B over to Alice. So Alice sends A
to Bob. Bob sends B. To Alice. And now
they claim that they can generate a shared
secret key. So what's a shared secret key?
Well, it's defined as. Let's call it
K_AB. And it's basically defined as the
value g to the power of a times b. Now the
amazing observation of Diffie-Hellman had
back in 1976 is that in fact both parties
can compute this value g to the ab.
So let's see, Alice can compute this value
because she can take her value B that she
received from Bob. She can take this value
B, raise it to the power of a and, let's
see, what she'll get is g to the b to the
power of b. And by the rules of
exponentiation, g to the power of b to the
power of a is equal to g to the ab. Bob
turns out, can do something similar, so
his goal is to compute g to the a to the b,
which again, is g to the ab, so both
Alice and Bob will have computed K_AB, and
let me ask you, how does Bob actually
compute this quantity g to the ab?
Well so all he does is he takes the value A that
he received from Alice and he raises it to
his own secret b and that gives him the
shared secret g to the ab. So you can see
now that both parties even though they
seemingly computed different values. In
fact they both end up with the same value
namely g ab module p. I should mention by
the way that Martie Hellman and Wig
Diffiie came up with this protocol back in
1976. Martie Hellman was a professor here
at Stanford and Wig Diffie was his
graduate student. They came up with this
protocol and this protocol really changed
the world. In that it introduced this new
age in cryptography. Where now it's not just
about developing block ciphers but it's
actually about designing algebraic
protocols that have properties like the
one we see here. So you notice that what
makes this protocol works is basically
properties of exponentiation. Namely that,
g to the b to the power of a is the same
as g to the a to the power of b, okay?
These are just properties of
exponentiations. Now when I describe a
protocol like the one I just showed you.
The real question that should be in your
mind is not why the protocol works. In
other words, it's very easy to verify
that, in fact, both Alice and Bob end up
with the same secret key. The bigger
question is why is this protocol secure?
In other words, why is it that an
eavesdropper cannot figure out the same
shared key due to the AB that both Alice
and Bob computed by themselves? So let's
analyze this a little bit, then, like I
said, we're gonna do a much more in-depth
analysis next week. But, let's, for now,
just see intuitively why this protocol is
gonna be secure. Well, what does the
eavesdropper see? Well, he sees the prime
and the generator. These are just fixed
forever and everybody knows who they are.
He also sees the value that Alice sent to
Bob, namely this capital A. And he sees
the value that Bob sent to Alice, namely
this capital B. And the question is, can
the, can the eavesdropper compute g to the
ab just given these four values? So more
generally, what we can do is we can define
this Diffie-Hellman function. So we'll say
that the Diffie-Hellman function is defined
based on some value g. And the question is
given g to the a, and g to the b. Can you
compute g to the ab? And what we're
asking is how hard is it to compute this
function module over very large prime p.
Remember p was something like 600 digits.
So the real question we need to answer is
how hard is this, is this Diffie-Hellman
function? So let me show you what's known
about this. So suppose the prime happens
to be n bits long. Okay, so in our case
say n is 2,000 bits. It turns out the best
known algorithm for computing the
Diffie–Hellman function. Is actually a
more general algorithm that computes the
discrete log function, which we're gonna
talk about next week. But for now, let's
just say that this algorithm computes a
Diffie-Hellman function. The algorithm is
called a general number field sieve. I'm
not gonna describe how it works, although
if you'd want to hear how it works, let me
know, and that could be one of the special
topics that we cover at the end of the
course. But the interesting thing is that
it's running time is exponential in
roughly the cube root of n. In other
words, the running time is roughly e to
the power of o of cube root of n. So in
fact the exact expression for the running
time, of this algorithm is much more
complicated than this. I'm deliberately
simplifying it here just to kind of get
the point across. The point that I want to
emphasize is that in fact, this is not
quite an exponential time algorithm.
Exponential would be e to the power of n.
This is sometimes called a sub-exponential
algorithm because the exponent is really
just proportional to the cube root of n,
as opposed to linear n. What this says is
that even though this problem is quite
difficult, it's not really an exponential
time problem. The running time in the
exponent is gonna be just proportional to
the cube root of n. So let's look at some
examples. Suppose I happen to be looking
at a modulus that happens to be about a
thousand bits long. What this algorithm
says is that we can solve the
Diffie-Hellman problem in time that's
approximately e to the qube root of 1,024.
Now this is not really correct, in fact
there are more constants in the exponent.
But again, just to get, the point across,
we can say that the running time would be
roughly e to the qube root of 1,024; which
is actually very small, roughly e to the 10.
So the simple example shows you that
if you have a sub-exponential algorithm,
you see that even if the problem is quite
large, like 1,000 bits. Because of the
cube root effect the exponent actually is not
that big overall. Now to be honest I'm
actually lying here. General number field
sieve actually have many other
constants in the exponents so in fact this
is not going to be ten at all. It's
actually going to be something like 80.
Because of various constants in the
exponents. But nevertheless but you see
the problem is much harder than say 2 to
the 1000. Even though describing it takes
1,000 bits, solving it does not take time
to the 1,000. So here I roll down the
table that kind of shows you the rough
difficulty of breaking down the
Diffie-Hellman protocol compared to the
difficulty of breaking down a cipher with
a appropriate number of bits. For example,
if your cipher has 80 bit keys. That would
be roughly comparable to using 1,000 bit
modules. In other words breaking a cipher
with 80 bit keys takes time roughly 2 to the 80
which means that
breaking if you have Diffie-Hellman function with a 1,000
bit module will take time 2 to the 80.
If your cipher uses 128 bit keys like AES, you should be
really using a 3,000 bit modulus,
even though nobody really does this. In reality
people would use 2,000 bit modulus. And
then if your key is very large, like if
we're using a 256 bit AES key, then in
fact your modulus needs to be very, very
large. Again, you, you can really see the
cube root effect here. We doubled the size
of our key and because of the cube root
effect, what that means is we have to
increase the size of, of our modulus by a
factor of two cubed, namely a factor of
eight, which is where this 15,000 comes from.
from. And again I should mention there's
not exactly a factor of eight because of
the extra constants in the exponent. So
this is a nice table that shows you that
if you're gonna be using Diffie-Hellman to
exchange, a session key. And that session
key is gonna be used for a block cipher
with a certain key size, this table shows
you what modular size you need to use so
that the security of the key exchange
protocol is comparable to the security of
the block cipher you're gonna be using after.
Now this last row should
really be disturbing to you. I should tell
you that working with such a large modulus
is very problematic. This is actually
gonna be quite slow, and so the question
is whether there is anything better that
we can do? And it turns out there is. In
fact the way I describe the Diffie-Hellman
function is I describe it at the way
Diffie and Hellman described it in 1976
using arithmetic module of primes. The
problem with arithmetic modular primes is
that the Diffie-Hellman function is hard
to compute, but it's not as hard as you'd
like. There's this cube root effect that
makes it a little easier than what you'd
really like. And so the question is can we
run the Diffie-Hellman protocol in other
settings. And it turns out the answer is
yes. In fact we can literally translate
the Diffie-Hellman protocol from an
arithmetic model of primes to a different
type of algebraic object and solving the
Diffie-Hellman problem on that other
algebraic object is much, much harder.
This other algebraic object is what's
called an elliptic curve. And as I said,
computing the Diffie-Hellman function on
these elliptic curves is much harder than
computing the Diffie-Hellman modulo primes.
Because the problem is so much harder, now
we can use much smaller objects in
particular, you know we'd be using primes
that are only a 160 bits or 80 bit keys or
only 512 bits for 256 bit keys. So because
these module don't grow as
fast on elliptic curves, there's generally
a slow transition away from using module
arithmetic, to using elliptic curves. I'm
not going to describe elliptic curves
right now for you, but if this is
something you'd like to learn about I can
do that at the very last week of the
course, when we discuss more advanced
topics. But in fact when we come back to
Diffie-Hellman next week I'm gonna describe it
more abstractly so that it really doesn't
matter which algebraic structure you use
whether you use arithmetic model of primes
or whether you use a elliptic curve we
can kinda abstract that whole issue away
and then use the Diffie-Hellman concept a for
key exchange and for other things as well.
And as I said we'll see that more
abstractly next week. So as usual I wanna
show that this beautiful protocol that I
just showed you, the Diffie-Hellman protocol,
is as is, is actually completely insecure
against an active attack. Namely, it's
completely insecure against what's called
the man in the middle attack. We need to
do something more than this protocol to
make is secure against man in the middle.
And again we're gonna come back to Diffie
Hellman and make it secure against man in
the middle next week. Okay. So let's see
why the protocol that I just showed you is
insecure against active attacks. Well
suppose we have this man in the middle
that's trying to eavesdrop on the
conversation between Alice and Bob. Well
so, the protocol starts with Alice sending
g to the a over to Bob. Well, so the man
in the middle is gonna intercept that and
he's gonna take the message that Alice
sent and he's gonna replace it with his
own message. So it's called A', And
let's write that is g to the a'.
Okay? So the man in the middle chooses his
own a' and what he sends to Bob is
actually g to the a'. Now poor Bob
doesn't know that the man in the middle
actually did anything to this traffic, all
he sees is he got the value A'. As
far as he knows, that value came from
Alice. So what is he gonna do in response?
Well, he's gonna send back his value B out
which is g to the b back to Alice. Well
again the man in the middle is gonna
intercept this B. He's gonna generate his
own b' and what he actually sends
back to Alice is B' which is g to the b'.
Okay, now what happens? Well
Alice is gonna compute her part of the
secret key and she's gonna get g to the ab'. Bob is gonna compute his part of
the secret key and he's gonna get g to the
b times a'. Okay, these now you
notice these are not the same keys. In the
man in the middle because he knows both A'
and B' he can compute both g to
the ab' and g to ba'. Yeah it's
not difficult to see the man in the middle
knows both values. And as a result, now he
can basically play this man in the middle
and when Alice sends an encrypted message
to Bob the man in the middle can simply
decrypt this message because he knows the
secret key that Alice encrypted message
with, re-encrypt it using Bob's key. And
then send the message on over to Bob. And
this way Alice sent the message, Bob
received the message. Bob thinks the
message is secure. But in fact that
message went through the man in the
middle. The man in the middle decrypted
it, re-encrypted it for Bob. At the same
time he was able to completely read it,
modify it if he wants to, and so on. So,
the protocol becomes completely insecure
give n a man in the middle. And so as I
said we're gonna have to enhance the
protocol somehow to defend against men in
the middle but it turns out that it's
actually not that difficult to enhance and
prevent man in the middle attacks.
And we're gonna come back to that and see that
in a week or two. The last think I want to
do is show you an interesting property of
the Diffie-Hellman protocol. In fact, I
want to show you that this protocol can be
viewed as a non-interactive protocol. So,
what do I mean by that? So Imagine we have
a whole bunch of users, you know, millions
of users. Let's call them Alice, Bob,
Charlie, David and so on and so forth.
Each one of them is going to choose a
random, secret value, and then on their
Facebook profiles, they're gonna write
down, their contribution to the
Diffie-Hellman protocol. Alright so
everybody just writes down you know g to
the a, g to the b, g to the c and so on.
Now the interesting thing about this is,
if say Alice and Charlie wanna set up a
shared key they don't need to communicate
at all. Basically Alice would go and read
Charlie's public profile. Charlie would go
and read Alice's public profile. And now,
boom, they immediately have a secret key.
Namely, now, Alice knows, g to the c and
a. Charlie knows g to the a and с. And as
a result, both of them can compute п to
the ac. So, in some sense, once they've
posted their contributions to the
Diffie-Hellman protocol to their public
profiles, they don't need to communicate
with each other at all to set up a shared key.
They immediately have a shared key
and now they can start communicating
securely through Facebook or not with one
another. And notice that this is true for
any Facebook user. So as soon as I read
somebody's public profile, I immediately
have a shared key with them without ever
communicating with them. This property is
sometimes called a non-interactive
property of the Diffie-Hellman. So now, let
me show you an open problem. And this is
an open problem that's been open for ages
and ages and ages. So it'd be really cool
if one of you can actually solve it. The
question is, can we do this for more than
two parties? In other words, say we have
four parties. All of them post their
values to their Facebook profiles. And now
we'd like to make it that just by reading
Facebook profiles, all of them can set up
a shared secret key. In other words, Alice
is, what she'll, she'll do is she'll only
read the public profiles of, the three
friends, Bob, Charlie and David. And
already she can compute a shared key with
them. And similarly David is just gonna
read the public profile of Charlie. Bob
and Alice. And already he has a shared key
with all four of them. Okay, so the
question is whether we can kind of setup
non-interactively these, these shared keys
for groups that are larger than just two
people. So as I said, for n equals two,
this is just a Diffie-Hellman protocol. In
other words, if just two parties want to
set up a shared key without communicating
with one another, that's just
Diffie-Hellman. It turns out, for N equals
three, we also know how to do it, there's
a known protocol, it's called protocol due
to Joux. It already uses very, very fancy
mathematics, much more complicated
mathematics than what I just showed you.
And for N equals four, or five, or
anything above this, above four, this
problem is completely open. Nearly the
case where four people post something to
the public profiles and then everybody
else reads the public profile and then
they have a joint shared key, this is
something we don't know how to do even for
four people. And this is a problem that's
been open for ages and ages, it's kind of
a fun problem to think about and so see if
you can solve it, if you will, it's
instant fame in the crypto world. Okay, so
I'll stop here, and we'll continue with
another key exchange mechanism in the next segment.