In this segment, we're gonna look at the Diffie-Hellman protocol, which is our first practical key exchange mechanism. So let me remind you of the settings. Our friends here, Alice and Bob, have never met and yet they wanna exchange a shared secret key that they can then use to communicate securely with one another. In this segment and the next, we're only gonna be looking at eavesdropping security. In other words, we only care about eavesdroppers. The attackers are actually not allowed to tamper with data in the network. They're not allowed to inject packets. They're not allowed to change packets in any way. All they do is to just eavesdrop on the traffic. And at the end of the protocol, the key exchange protocol our friends Alice and Bob should have a shared secret key but the attacker namely the eavesdropper has no idea what that key's gonna be. In the previous segment we looked at a key segment called Merkle puzzles. That's just using block ciphers or hash functions. And I showed you that there that basically the attacker has a quadratic gap compared to the participants. In other words if the participants spent time proportional to N the attacker can break the protocol in time proportional to N squared. And as a result that protocol is to insecure to be considered practical. In this segment we are going to ask whether we can do the same thing but now we'd like to achieve an exponential gap between the attacker's work Ended up in the participant's work. In other words, Alice and Bob might do work proportional to N, but to break the protocol the attacker is gonna have to do work proportional to some exponential in N. So now there's an exponential gap between the participants work and the attacker's work. So as I said in the previous segment we can't achieve this exponential gap from blog ciphers like AES or SHA-256. We have to use hard problems that have more structure than just those symmetric primitives. And so instead what we're gonna do is use a little bit of algebra. In this segment I'm gonna describe the Diffie-Hellman protocol very concretely and somewhat informally. When we're gonna come back to Diffie-Hellman next week and we're gonna describe the protocol more abstractly and we're gonna do a much more rigorous security analysis of this protocol. Okay, so how does Diffie-Hellman work? What we're gonna do is, first of all, we're gonna fix some large prime which I'm gonna call p. Actually p I'll usually use to denote primes. And you should be thinking of really gigantic primes. Like, primes that are made up of 600 digits are so. And just for those of you who like to think in binary, a 600 digit prime roughly corresponds to about 2000 bits. So just writing down the prime takes about two kilo bits of data. And then we're also gonna fix an integer g that happens to live in the range one to p. So these values p and g are parameters of the Diffie-Hellman protocol. They are chosen once and they're fixed forever. Now the protocol works as follow. So here's our friends Alice and Bob. So what Alice's going to do is she's gonna starts off by choosing some random number a in the range 1 to p-1 And then she is gonna compute the number g to the power of a modulo p. Okay, so she computes this exponention, and reduces the result modular the prime p. And we're actually going to see how to compute this efficiently the next module, so for now just believe me that this computation can be done efficiently. Now, let's call capital A the result of this value. So, what Alice will do is she will send capital A over to Bob. Now Bob is going to do the same thing. He's going to choose a random number b in the range 1 to p-1. He's going to compute g to the b module of p and let's call this number B and he's going to send this number B over to Alice. So Alice sends A to Bob. Bob sends B. To Alice. And now they claim that they can generate a shared secret key. So what's a shared secret key? Well, it's defined as. Let's call it K_AB. And it's basically defined as the value g to the power of a times b. Now the amazing observation of Diffie-Hellman had back in 1976 is that in fact both parties can compute this value g to the ab. So let's see, Alice can compute this value because she can take her value B that she received from Bob. She can take this value B, raise it to the power of a and, let's see, what she'll get is g to the b to the power of b. And by the rules of exponentiation, g to the power of b to the power of a is equal to g to the ab. Bob turns out, can do something similar, so his goal is to compute g to the a to the b, which again, is g to the ab, so both Alice and Bob will have computed K_AB, and let me ask you, how does Bob actually compute this quantity g to the ab? Well so all he does is he takes the value A that he received from Alice and he raises it to his own secret b and that gives him the shared secret g to the ab. So you can see now that both parties even though they seemingly computed different values. In fact they both end up with the same value namely g ab module p. I should mention by the way that Martie Hellman and Wig Diffiie came up with this protocol back in 1976. Martie Hellman was a professor here at Stanford and Wig Diffie was his graduate student. They came up with this protocol and this protocol really changed the world. In that it introduced this new age in cryptography. Where now it's not just about developing block ciphers but it's actually about designing algebraic protocols that have properties like the one we see here. So you notice that what makes this protocol works is basically properties of exponentiation. Namely that, g to the b to the power of a is the same as g to the a to the power of b, okay? These are just properties of exponentiations. Now when I describe a protocol like the one I just showed you. The real question that should be in your mind is not why the protocol works. In other words, it's very easy to verify that, in fact, both Alice and Bob end up with the same secret key. The bigger question is why is this protocol secure? In other words, why is it that an eavesdropper cannot figure out the same shared key due to the AB that both Alice and Bob computed by themselves? So let's analyze this a little bit, then, like I said, we're gonna do a much more in-depth analysis next week. But, let's, for now, just see intuitively why this protocol is gonna be secure. Well, what does the eavesdropper see? Well, he sees the prime and the generator. These are just fixed forever and everybody knows who they are. He also sees the value that Alice sent to Bob, namely this capital A. And he sees the value that Bob sent to Alice, namely this capital B. And the question is, can the, can the eavesdropper compute g to the ab just given these four values? So more generally, what we can do is we can define this Diffie-Hellman function. So we'll say that the Diffie-Hellman function is defined based on some value g. And the question is given g to the a, and g to the b. Can you compute g to the ab? And what we're asking is how hard is it to compute this function module over very large prime p. Remember p was something like 600 digits. So the real question we need to answer is how hard is this, is this Diffie-Hellman function? So let me show you what's known about this. So suppose the prime happens to be n bits long. Okay, so in our case say n is 2,000 bits. It turns out the best known algorithm for computing the Diffie–Hellman function. Is actually a more general algorithm that computes the discrete log function, which we're gonna talk about next week. But for now, let's just say that this algorithm computes a Diffie-Hellman function. The algorithm is called a general number field sieve. I'm not gonna describe how it works, although if you'd want to hear how it works, let me know, and that could be one of the special topics that we cover at the end of the course. But the interesting thing is that it's running time is exponential in roughly the cube root of n. In other words, the running time is roughly e to the power of o of cube root of n. So in fact the exact expression for the running time, of this algorithm is much more complicated than this. I'm deliberately simplifying it here just to kind of get the point across. The point that I want to emphasize is that in fact, this is not quite an exponential time algorithm. Exponential would be e to the power of n. This is sometimes called a sub-exponential algorithm because the exponent is really just proportional to the cube root of n, as opposed to linear n. What this says is that even though this problem is quite difficult, it's not really an exponential time problem. The running time in the exponent is gonna be just proportional to the cube root of n. So let's look at some examples. Suppose I happen to be looking at a modulus that happens to be about a thousand bits long. What this algorithm says is that we can solve the Diffie-Hellman problem in time that's approximately e to the qube root of 1,024. Now this is not really correct, in fact there are more constants in the exponent. But again, just to get, the point across, we can say that the running time would be roughly e to the qube root of 1,024; which is actually very small, roughly e to the 10. So the simple example shows you that if you have a sub-exponential algorithm, you see that even if the problem is quite large, like 1,000 bits. Because of the cube root effect the exponent actually is not that big overall. Now to be honest I'm actually lying here. General number field sieve actually have many other constants in the exponents so in fact this is not going to be ten at all. It's actually going to be something like 80. Because of various constants in the exponents. But nevertheless but you see the problem is much harder than say 2 to the 1000. Even though describing it takes 1,000 bits, solving it does not take time to the 1,000. So here I roll down the table that kind of shows you the rough difficulty of breaking down the Diffie-Hellman protocol compared to the difficulty of breaking down a cipher with a appropriate number of bits. For example, if your cipher has 80 bit keys. That would be roughly comparable to using 1,000 bit modules. In other words breaking a cipher with 80 bit keys takes time roughly 2 to the 80 which means that breaking if you have Diffie-Hellman function with a 1,000 bit module will take time 2 to the 80. If your cipher uses 128 bit keys like AES, you should be really using a 3,000 bit modulus, even though nobody really does this. In reality people would use 2,000 bit modulus. And then if your key is very large, like if we're using a 256 bit AES key, then in fact your modulus needs to be very, very large. Again, you, you can really see the cube root effect here. We doubled the size of our key and because of the cube root effect, what that means is we have to increase the size of, of our modulus by a factor of two cubed, namely a factor of eight, which is where this 15,000 comes from. from. And again I should mention there's not exactly a factor of eight because of the extra constants in the exponent. So this is a nice table that shows you that if you're gonna be using Diffie-Hellman to exchange, a session key. And that session key is gonna be used for a block cipher with a certain key size, this table shows you what modular size you need to use so that the security of the key exchange protocol is comparable to the security of the block cipher you're gonna be using after. Now this last row should really be disturbing to you. I should tell you that working with such a large modulus is very problematic. This is actually gonna be quite slow, and so the question is whether there is anything better that we can do? And it turns out there is. In fact the way I describe the Diffie-Hellman function is I describe it at the way Diffie and Hellman described it in 1976 using arithmetic module of primes. The problem with arithmetic modular primes is that the Diffie-Hellman function is hard to compute, but it's not as hard as you'd like. There's this cube root effect that makes it a little easier than what you'd really like. And so the question is can we run the Diffie-Hellman protocol in other settings. And it turns out the answer is yes. In fact we can literally translate the Diffie-Hellman protocol from an arithmetic model of primes to a different type of algebraic object and solving the Diffie-Hellman problem on that other algebraic object is much, much harder. This other algebraic object is what's called an elliptic curve. And as I said, computing the Diffie-Hellman function on these elliptic curves is much harder than computing the Diffie-Hellman modulo primes. Because the problem is so much harder, now we can use much smaller objects in particular, you know we'd be using primes that are only a 160 bits or 80 bit keys or only 512 bits for 256 bit keys. So because these module don't grow as fast on elliptic curves, there's generally a slow transition away from using module arithmetic, to using elliptic curves. I'm not going to describe elliptic curves right now for you, but if this is something you'd like to learn about I can do that at the very last week of the course, when we discuss more advanced topics. But in fact when we come back to Diffie-Hellman next week I'm gonna describe it more abstractly so that it really doesn't matter which algebraic structure you use whether you use arithmetic model of primes or whether you use a elliptic curve we can kinda abstract that whole issue away and then use the Diffie-Hellman concept a for key exchange and for other things as well. And as I said we'll see that more abstractly next week. So as usual I wanna show that this beautiful protocol that I just showed you, the Diffie-Hellman protocol, is as is, is actually completely insecure against an active attack. Namely, it's completely insecure against what's called the man in the middle attack. We need to do something more than this protocol to make is secure against man in the middle. And again we're gonna come back to Diffie Hellman and make it secure against man in the middle next week. Okay. So let's see why the protocol that I just showed you is insecure against active attacks. Well suppose we have this man in the middle that's trying to eavesdrop on the conversation between Alice and Bob. Well so, the protocol starts with Alice sending g to the a over to Bob. Well, so the man in the middle is gonna intercept that and he's gonna take the message that Alice sent and he's gonna replace it with his own message. So it's called A', And let's write that is g to the a'. Okay? So the man in the middle chooses his own a' and what he sends to Bob is actually g to the a'. Now poor Bob doesn't know that the man in the middle actually did anything to this traffic, all he sees is he got the value A'. As far as he knows, that value came from Alice. So what is he gonna do in response? Well, he's gonna send back his value B out which is g to the b back to Alice. Well again the man in the middle is gonna intercept this B. He's gonna generate his own b' and what he actually sends back to Alice is B' which is g to the b'. Okay, now what happens? Well Alice is gonna compute her part of the secret key and she's gonna get g to the ab'. Bob is gonna compute his part of the secret key and he's gonna get g to the b times a'. Okay, these now you notice these are not the same keys. In the man in the middle because he knows both A' and B' he can compute both g to the ab' and g to ba'. Yeah it's not difficult to see the man in the middle knows both values. And as a result, now he can basically play this man in the middle and when Alice sends an encrypted message to Bob the man in the middle can simply decrypt this message because he knows the secret key that Alice encrypted message with, re-encrypt it using Bob's key. And then send the message on over to Bob. And this way Alice sent the message, Bob received the message. Bob thinks the message is secure. But in fact that message went through the man in the middle. The man in the middle decrypted it, re-encrypted it for Bob. At the same time he was able to completely read it, modify it if he wants to, and so on. So, the protocol becomes completely insecure give n a man in the middle. And so as I said we're gonna have to enhance the protocol somehow to defend against men in the middle but it turns out that it's actually not that difficult to enhance and prevent man in the middle attacks. And we're gonna come back to that and see that in a week or two. The last think I want to do is show you an interesting property of the Diffie-Hellman protocol. In fact, I want to show you that this protocol can be viewed as a non-interactive protocol. So, what do I mean by that? So Imagine we have a whole bunch of users, you know, millions of users. Let's call them Alice, Bob, Charlie, David and so on and so forth. Each one of them is going to choose a random, secret value, and then on their Facebook profiles, they're gonna write down, their contribution to the Diffie-Hellman protocol. Alright so everybody just writes down you know g to the a, g to the b, g to the c and so on. Now the interesting thing about this is, if say Alice and Charlie wanna set up a shared key they don't need to communicate at all. Basically Alice would go and read Charlie's public profile. Charlie would go and read Alice's public profile. And now, boom, they immediately have a secret key. Namely, now, Alice knows, g to the c and a. Charlie knows g to the a and с. And as a result, both of them can compute п to the ac. So, in some sense, once they've posted their contributions to the Diffie-Hellman protocol to their public profiles, they don't need to communicate with each other at all to set up a shared key. They immediately have a shared key and now they can start communicating securely through Facebook or not with one another. And notice that this is true for any Facebook user. So as soon as I read somebody's public profile, I immediately have a shared key with them without ever communicating with them. This property is sometimes called a non-interactive property of the Diffie-Hellman. So now, let me show you an open problem. And this is an open problem that's been open for ages and ages and ages. So it'd be really cool if one of you can actually solve it. The question is, can we do this for more than two parties? In other words, say we have four parties. All of them post their values to their Facebook profiles. And now we'd like to make it that just by reading Facebook profiles, all of them can set up a shared secret key. In other words, Alice is, what she'll, she'll do is she'll only read the public profiles of, the three friends, Bob, Charlie and David. And already she can compute a shared key with them. And similarly David is just gonna read the public profile of Charlie. Bob and Alice. And already he has a shared key with all four of them. Okay, so the question is whether we can kind of setup non-interactively these, these shared keys for groups that are larger than just two people. So as I said, for n equals two, this is just a Diffie-Hellman protocol. In other words, if just two parties want to set up a shared key without communicating with one another, that's just Diffie-Hellman. It turns out, for N equals three, we also know how to do it, there's a known protocol, it's called protocol due to Joux. It already uses very, very fancy mathematics, much more complicated mathematics than what I just showed you. And for N equals four, or five, or anything above this, above four, this problem is completely open. Nearly the case where four people post something to the public profiles and then everybody else reads the public profile and then they have a joint shared key, this is something we don't know how to do even for four people. And this is a problem that's been open for ages and ages, it's kind of a fun problem to think about and so see if you can solve it, if you will, it's instant fame in the crypto world. Okay, so I'll stop here, and we'll continue with another key exchange mechanism in the next segment.