0:00:00.000,0:00:03.515 Well, we're almost done with our[br]discussion of symmetric encryption. There 0:00:03.515,0:00:06.471 are just a couple of odds and ends that[br]I'd like to discuss before we move on to 0:00:06.471,0:00:10.431 the next topic. So the first thing I'd[br]like to mention is how we derive many keys 0:00:10.431,0:00:14.497 from one key. And it, actually, this comes[br]up all the time in practice, so I'd like 0:00:14.497,0:00:18.287 to make sure you know how to do this[br]correctly. So what's the setting that 0:00:18.287,0:00:22.775 we're looking at? Well, imagine we have a[br]certain source key that's generated by one 0:00:22.775,0:00:26.435 of, a number of methods. Imagine the[br]source key is generated by a hardware 0:00:26.435,0:00:30.094 random number generator or perhaps is[br]generated by a key exchange protocol 0:00:30.094,0:00:34.036 which we're going to discuss later. But[br]anyhow, there are a number of ways in 0:00:34.036,0:00:38.110 which a source key might be generated[br]between Alice and Bob, such that the 0:00:38.110,0:00:42.569 attacker doesn't know what the source key[br]is. But now, as we said, in many cases, we 0:00:42.569,0:00:46.863 actually need many keys to secure a[br]session, not just one single source key. 0:00:46.863,0:00:51.267 For example, if you remember, in TLS there[br]were unidirectional keys and we 0:00:51.267,0:00:55.285 needed keys in each direction. And in[br]fact, in each direction, we needed 0:00:55.285,0:00:59.469 multiple keys. We needed a MAC key, we[br]needed an encryption key. We need an IV, 0:00:59.469,0:01:03.093 and so on. Similarly nonce based[br]encryption, you remember, there were 0:01:03.093,0:01:07.594 multiple keys that were being used, and so[br]on. And so, the question is, how do we use 0:01:07.594,0:01:12.031 the one source key that we just derived,[br]either from a hardware process or 0:01:12.031,0:01:16.351 by key exchange, and generate a bunch of[br]keys from it that we could then use to 0:01:16.351,0:01:20.531 secure our session. The way that's done,[br]is using a mechanism called a key 0:01:20.531,0:01:24.951 derivation function, KDF. And I want to[br]talk a little bit about how KDF's are 0:01:24.951,0:01:29.846 constructed. So first of all, suppose we[br]have a secure PRF, that happens to have 0:01:29.846,0:01:34.993 key space K. And now, suppose that it so[br]happens that our source key SK is uniform 0:01:34.993,0:01:41.207 in the key K. In this case, the source key[br]is, in fact, a uniform random key for the 0:01:41.207,0:01:46.453 secure PRF F. And we can use it directly to[br]generate keys, all the keys that we need 0:01:46.453,0:01:50.444 to secure the session. So in this case,[br]the KDF is really simple. The key 0:01:50.444,0:01:53.771 derivation function would just work as[br]follows. It would take as input the 0:01:53.771,0:01:58.025 source key. It would take an input, a[br]parameter context, which I'm gonna 0:01:58.025,0:02:02.766 describe in just a minute. And then it's[br]gonna take a length input as input as 0:02:02.766,0:02:07.615 well. And then what it will do is it will[br]basically evaluate the PRF on zero. Then 0:02:07.615,0:02:12.400 it will evaluate the PRF on one. Then it[br]will evaluate the PRF on two, up until L. 0:02:12.400,0:02:16.449 And I will talk about what this context is[br]in just a second. And then, basically, you 0:02:16.449,0:02:20.353 would use as many bits of the output as[br]you would need to generate all the keys 0:02:20.353,0:02:24.256 for the session. So, if you need unidirectional keys you would generate, you 0:02:24.256,0:02:28.355 know, one key in each direction where each key[br]might include an encryption key and a MAC 0:02:28.355,0:02:32.356 key. And so, you would basically generate[br]as many bits as you need and then finally 0:02:32.356,0:02:36.259 cut off the output at the time when you've[br]generated enough keys to secure your 0:02:36.259,0:02:41.177 session. Okay so this is a fairly straight[br]forward mechanism it's basically using the 0:02:41.177,0:02:45.656 secure PRF as a pseudo random generator.[br]And the only question is what is its 0:02:45.656,0:02:49.451 context string. Well, I'll tell you that[br]the context string is basically a unique 0:02:49.451,0:02:53.545 string that identifies the application. So[br]in fact you might have multiple 0:02:53.545,0:02:58.304 applications on the same system that's[br]trying to establish multiple secure keys. 0:02:58.304,0:03:03.169 Maybe you have SSH running as one process,[br]you have a web server running as another process, 0:03:03.169,0:03:09.145 IPsec as a third process and all three need [br]to have secret keys generated. And this 0:03:09.145,0:03:13.533 context variable basically tries to separate [br]the three of them. So, let me ask you, 0:03:13.533,0:03:16.589 more precisely, what do you think [br]the purpose of this context variable is? 0:03:19.204,0:03:22.312 So I guess I've given[br]it away and this context variable is 0:03:22.312,0:03:26.166 supposed to basically separate[br]applications, so that even if, for 0:03:26.166,0:03:30.863 example, the three services that we just[br]talked about, SSH, web server, and IPsec, 0:03:30.863,0:03:35.741 if they all happen to obtain the same[br]source key from the hardware random number 0:03:35.741,0:03:40.378 generator then the context since it's[br]different for the three apps will make 0:03:40.378,0:03:45.617 sure that they still get three independent[br]strings that they can then use to secure 0:03:45.617,0:03:49.873 the sessions. I just want you to remember[br]that, even though this is actually fairly 0:03:49.873,0:03:53.648 straightforward, and we discussed this[br]before, the context string is actually 0:03:53.648,0:03:57.374 important, and it does need to be specific[br]to the application, so that each 0:03:57.374,0:04:01.300 application gets its own session keys,[br]even if multiple applications happen to 0:04:01.300,0:04:05.139 sample the same SK. The next[br]question is, what do we do if the source 0:04:05.139,0:04:09.714 key actually isn't uniform? Well, now we[br]got a problem. If the source key is not a 0:04:09.714,0:04:14.113 uniform key for the pseudo random function[br]then we can no longer assume that the 0:04:14.113,0:04:18.511 output of the pseudo random function is[br]indistinguishable from random. In fact, if 0:04:18.511,0:04:23.841 we just use the KDF that we just described[br]then the output might not look random to 0:04:23.841,0:04:27.416 the adversary and then he might be able to[br]anticipate some of the session keys that 0:04:27.416,0:04:31.562 we'll be using and thereby break the[br]session. So then we have a problem. Now 0:04:31.562,0:04:35.510 why would this source key not be uniform?[br]Well there are many reasons why this 0:04:35.510,0:04:39.560 happened. For example if you use a key[br]exchange protocol, it so happens typically 0:04:39.560,0:04:42.826 that key exchange protocols will generate a[br]high entropy key. But the 0:04:42.826,0:04:46.774 high entropy key is gonna be[br]distributed in some subspace of the key 0:04:46.774,0:04:51.455 space. So it's not going to be a uniform[br]string. It will be uniform in some 0:04:51.455,0:04:55.763 subset of a larger set, And we'll see[br]examples of that as soon as we talk about 0:04:55.926,0:05:00.317 key exchange protocols. And so KDFs have[br]to kind of accommodate for the fact that 0:05:00.317,0:05:04.492 key exchange protocols actually don't[br]generate uniform bit strings. The other 0:05:04.492,0:05:08.830 problem is, that, in fact, the hardware[br]random number generator you're using might 0:05:08.830,0:05:13.384 actually produce biased outputs. We don't[br]wanna rely on the non bias of the hardware 0:05:13.384,0:05:17.252 random number generator. And so all we[br]want to assume is that it generates a high 0:05:17.252,0:05:21.842 entropy string, but one that might be[br]biased. In which case, we have to somehow 0:05:21.842,0:05:26.735 clean this bias. And so this introduces [br]this, this paradigm for building KDFs. 0:05:26.735,0:05:31.962 This is called the extract-then-expand [br]paradigm, where the first step 0:05:31.962,0:05:37.247 of the KDF is to extract a pseudo random[br]key from the actual source key. So in a 0:05:37.247,0:05:40.829 picture you can think about it like this.[br]In some sense these are the different 0:05:40.829,0:05:45.343 possible values of the source key. This is[br]the horizontal line and the vertical axis 0:05:45.343,0:05:49.541 is basically the probability of each one[br]of these values, and you can see that this 0:05:49.541,0:05:53.464 is a kind of a bumpy function which would[br]say that the source key is not uniformly 0:05:53.464,0:05:58.561 distributed in the key space. What we do[br]in this case is we use what's called an 0:05:58.561,0:06:02.913 extractor. So an extractor is something[br]that takes a bumpy distribution and makes 0:06:02.913,0:06:07.462 it into a uniform distribution over the[br]key space. In our case we're actually just 0:06:07.462,0:06:10.027 gonna be using what are called[br]computational extractors, namely 0:06:10.027,0:06:14.894 extractors that don't necessarily produce[br]uniform distribution at the end but 0:06:14.894,0:06:20.257 they generated distribution that's[br]indistinguishable from uniform. 0:06:22.910,0:06:27.323 Now extractors typically take as input[br]something called a salt, and a salt just 0:06:27.323,0:06:31.318 like in a salad, it kind of adds flavor to[br]things, what it does is basically kind of 0:06:31.318,0:06:36.022 jumbles things around, so that no matter[br]what the input distribution is, the output 0:06:36.022,0:06:39.738 distribution is still going to be[br]indistinguishable from random. So a salt 0:06:39.738,0:06:43.973 basically, what is it? It's a non-secret[br]string, so it's publicly known. It doesn't 0:06:43.973,0:06:48.565 matter if the adversary knows what the[br]salt is, and it's fixed forever. The only 0:06:48.565,0:06:53.096 point is that when you chose it, you chose[br]one at random. And then the hope is that 0:06:53.096,0:06:57.173 the funny distribution that you're trying[br]to extract from kinda doesn't inherently 0:06:57.173,0:07:00.274 depends on the salt that you chose and[br]hence as a result using your salt, you 0:07:00.274,0:07:03.729 will actually get a distribution that[br]looks indistinguishable from random. So 0:07:03.729,0:07:07.020 essentially the salt, you know, you can[br]just bang it the keyboard a couple of 0:07:07.020,0:07:10.220 times when you generate it but it just[br]needs to be something that's random 0:07:10.220,0:07:14.249 initially but then it's fixed forever, and[br]it's fine if the adversary knows what 0:07:14.249,0:07:20.304 it is and nevertheless the extractor is[br]able to extract the entropy and output a 0:07:20.304,0:07:24.713 uniformly random string K. In some sense the[br]salt is only there to defend against 0:07:24.713,0:07:29.667 adversarially bad distributions that might[br]mess up our extractor. Okay, so now that 0:07:29.667,0:07:34.581 we have extracted a pseudo random key.[br]Now, we might as well just use it in a KDF 0:07:34.581,0:07:38.911 that we just saw using a secure[br]pseudo random function to expand the key 0:07:38.911,0:07:43.481 into as many bits as we need to actually[br]secure the session. Okay, so there are 0:07:43.481,0:07:47.431 these two steps. The first one is we[br]extract a pseudo-random key, and then once 0:07:47.431,0:07:51.584 we have a pseudo-random key we already[br]know how to extend it into as many keys as 0:07:51.584,0:07:56.033 we need using a pseudo-random function. So[br]the standardized way of doing this is 0:07:56.033,0:08:01.170 called HKDF. This is a KDF, a key derivation function that's built from HMAC. 0:08:01.170,0:08:06.561 And here HMAC is used both as the PRF for[br]expanding and an extractor for extracting 0:08:06.561,0:08:11.699 the initial pseudo-random key. So let me[br]explain how this works. So in the extract 0:08:11.699,0:08:16.900 step, we're gonna use our salt which you[br]remember is a public value just happened to 0:08:16.900,0:08:21.101 be generated at random at the beginning of[br]time. And we use this salt as the HMAC 0:08:21.101,0:08:27.526 key. And then the source key we're gonna[br]use as the HMAC data. So we're kind of 0:08:27.526,0:08:32.292 using a public value as a key. And[br]nevertheless, one can argue that HMAC has 0:08:32.292,0:08:37.623 extraction properties, such that, when we[br]apply HMAC, the resulting key is going to 0:08:37.623,0:08:42.452 look indistinguishable from random,[br]assuming that the source key actually has 0:08:42.452,0:08:47.329 enough entropy to it. And now that we have[br]the pseudo random key we're simply going 0:08:47.329,0:08:52.037 to use HMAC as a PRF to generate a[br]session key of you know as many bits as we 0:08:52.037,0:08:56.389 need for the session keys. Okay. So that[br]basically concludes our discussion of 0:08:56.389,0:09:00.763 HKDF. And I just want you to remember[br]that, once you obtain a source key, either 0:09:00.763,0:09:04.912 from hardware or from a key exchange[br]protocol, the way you convert it into 0:09:04.912,0:09:09.566 session keys is not by using that sample[br]directly. You would never use a source key 0:09:09.566,0:09:14.108 directly as a session key in a protocol.[br]What you would do is you will run the 0:09:14.108,0:09:18.369 source key through a KDF. And the KDF[br]would give you all the keys and output 0:09:18.369,0:09:22.575 that you need, for, the randomness, for[br]the random keys to be used in your 0:09:22.575,0:09:27.042 protocol. And a typical KDF to use is[br]HKDF, which is actually getting quite a 0:09:27.042,0:09:31.952 bit of traction out there. Okay. The last[br]topic I wanna talk about in this segment 0:09:31.952,0:09:36.430 is, how do you extract keys from[br]passwords. These are called password based 0:09:36.430,0:09:41.921 KDFs or PBKDFs. The problem here is[br]that passwords have relatively low 0:09:41.921,0:09:45.764 entropy. In fact, we're gonna talk about[br]passwords later on in the course when we 0:09:45.764,0:09:50.154 talk about user authentication. And so,[br]I'm not gonna say too much here. I'll just 0:09:50.154,0:09:54.291 say passwords generally have very little[br]entropy estimated on the order of twenty 0:09:54.291,0:09:59.010 bits of entropy, say. And as a result,[br]there is simply not enough entropy to 0:09:59.010,0:10:02.882 generate session keys out of a password.[br]And yet we still need to do it very 0:10:02.882,0:10:06.804 frequently. We still need to derive[br]encryption keys and MACs and so on out of 0:10:06.804,0:10:10.828 passwords, so the question is how to do[br]it. The first thing is, you know, for this 0:10:10.828,0:10:14.744 kind of purpose, don't use HKDF. That's[br]not what it's designed for. What will 0:10:14.744,0:10:18.657 happen is that the derived keys will[br]actually be vulnerable to something called 0:10:18.657,0:10:22.105 a dictionary attack, which we're gonna[br]talk about much later in the course when 0:10:22.105,0:10:27.787 we talk about user authentication. So, the[br]way PBKDFs defend against this low entropy 0:10:27.787,0:10:33.002 problem that results in a dictionary[br]attack is by two means. First of all, as 0:10:33.002,0:10:38.878 before they use a salt, a public,[br]random value that's fixed forever. But in 0:10:38.878,0:10:43.097 addition, they also use what's called a[br]slow hash function. And let me describe 0:10:43.097,0:10:49.355 kind of the standard approach to deriving[br]keys from passwords. This is called PKCS#5, 0:10:49.355,0:10:53.911 and in particular, the version I'll describe [br]is what's called PBKDF1. And I 0:10:53.911,0:10:57.398 should say that this mechanism is[br]implemented in most crypto libraries so 0:10:57.398,0:11:00.663 you shouldn't have to implement this[br]yourself. All you would do, you know, you 0:11:00.663,0:11:03.788 would call a function, you know, derived[br]key from password. You would give the 0:11:03.788,0:11:08.353 password in as input, and you would get a[br]key as output. But you should be aware of 0:11:08.353,0:11:11.741 course that this key is not going to have[br]high entropy so in fact it will be 0:11:11.741,0:11:17.155 guessable. What these PBKDFs try to do is[br]make the guessing problem as hard as 0:11:17.155,0:11:20.964 possible. Okay. So the way they work,[br]first of all, is, as we said, they 0:11:20.964,0:11:25.693 basically hash, the concatenation of the[br]password and the salt. And then the hash 0:11:25.693,0:11:29.170 itself is designed to be a very slow hash[br]function. And the way we build a slow hash 0:11:29.170,0:11:34.063 function is by taking one particular hash[br]function, say, SHA-256, and we 0:11:34.063,0:11:39.425 iterate it many, many times, C times. So[br]you can imagine 1000 times, perhaps even a 0:11:39.425,0:11:43.356 million times. And what do I mean by[br]iterating it? So, well, we take the 0:11:43.356,0:11:48.494 password and the salt. And we put them[br]inside of one input to the hash function. 0:11:48.494,0:11:54.025 And then we apply the hash function, oops,[br]let me write it like this. And then we 0:11:54.025,0:11:57.779 apply the hash function and we get an[br]output, and then we apply the hash 0:11:57.779,0:12:00.779 function again and we get another output.[br]And we do this again and again and again 0:12:00.779,0:12:05.830 maybe a thousand times or a million times[br]depending on how fast your processors are 0:12:05.830,0:12:10.710 and then finally we get the final output[br]that we actually output as, as the key 0:12:10.710,0:12:15.057 output of this key derivation function.[br]Now what is the point here? Iterating a 0:12:15.057,0:12:19.371 function 10,000 times or even a million[br]times is going to take very little time on 0:12:19.371,0:12:22.922 a modern CPU, and as a result, it doesn't[br]really affect the user's experience. The 0:12:22.922,0:12:27.792 user types in his password, it gets hashed[br]a million times and then it gets output. 0:12:27.792,0:12:31.654 And maybe that could even take, you know a[br]tenth of a second and the user wouldn't 0:12:31.654,0:12:35.489 even notice it. However the attacker, all[br]he can do is he can try all the passwords 0:12:35.489,0:12:39.689 in the dictionary, because we know people[br]tend to pick passwords in dictionaries, 0:12:39.689,0:12:43.780 and so he could just try them one by one,[br]remember the SALT is public, so he knows 0:12:43.780,0:12:49.232 what the SALT is. And so he can just try this[br]hash one by one. However because the hash 0:12:49.232,0:12:53.796 function is slow, each attempt is gonna[br]take him a tenth of second. So if he needs 0:12:53.796,0:12:57.965 to run through a dictionary, you know, with,[br]with a 200 billion passwords in it, 0:12:57.965,0:13:02.416 because the hash function is slow, this is[br]gonna take quite awhile. And by doing 0:13:02.416,0:13:06.584 that, we slow down the dictionary attack,[br]and we make it harder for the attacker to 0:13:06.584,0:13:11.608 get our session keys. Not impossible, just[br]harder. That's all this is trying to do. 0:13:11.608,0:13:15.748 Okay, so this is basically what I wanted[br]to say about password based KDFs. As I 0:13:15.748,0:13:19.835 said, this is not something you would[br]build yourself. All crypto libraries have 0:13:19.835,0:13:23.976 an implementation of a PKCS#5[br]mechanism. And you would just call the 0:13:23.976,0:13:28.275 appropriate function to convert a password[br]into a key, and then use the resulting 0:13:28.275,0:13:32.362 key. Okay, in the next segment, we're[br]gonna see how to use symmetric encryption 0:13:32.362,0:13:35.229 in a way that allows us to search on the[br]cipher texts.