WEBVTT 00:00:00.000 --> 00:00:03.515 Well, we're almost done with our discussion of symmetric encryption. There 00:00:03.515 --> 00:00:06.471 are just a couple of odds and ends that I'd like to discuss before we move on to 00:00:06.471 --> 00:00:10.431 the next topic. So the first thing I'd like to mention is how we derive many keys 00:00:10.431 --> 00:00:14.497 from one key. And it, actually, this comes up all the time in practice, so I'd like 00:00:14.497 --> 00:00:18.287 to make sure you know how to do this correctly. So what's the setting that 00:00:18.287 --> 00:00:22.775 we're looking at? Well, imagine we have a certain source key that's generated by one 00:00:22.775 --> 00:00:26.435 of, a number of methods. Imagine the source key is generated by a hardware 00:00:26.435 --> 00:00:30.094 random number generator or perhaps is generated by a key exchange protocol 00:00:30.094 --> 00:00:34.036 which we're going to discuss later. But anyhow, there are a number of ways in 00:00:34.036 --> 00:00:38.110 which a source key might be generated between Alice and Bob, such that the 00:00:38.110 --> 00:00:42.569 attacker doesn't know what the source key is. But now, as we said, in many cases, we 00:00:42.569 --> 00:00:46.863 actually need many keys to secure a session, not just one single source key. 00:00:46.863 --> 00:00:51.267 For example, if you remember, in TLS there were unidirectional keys and we 00:00:51.267 --> 00:00:55.285 needed keys in each direction. And in fact, in each direction, we needed 00:00:55.285 --> 00:00:59.469 multiple keys. We needed a MAC key, we needed an encryption key. We need an IV, 00:00:59.469 --> 00:01:03.093 and so on. Similarly nonce based encryption, you remember, there were 00:01:03.093 --> 00:01:07.594 multiple keys that were being used, and so on. And so, the question is, how do we use 00:01:07.594 --> 00:01:12.031 the one source key that we just derived, either from a hardware process or 00:01:12.031 --> 00:01:16.351 by key exchange, and generate a bunch of keys from it that we could then use to 00:01:16.351 --> 00:01:20.531 secure our session. The way that's done, is using a mechanism called a key 00:01:20.531 --> 00:01:24.951 derivation function, KDF. And I want to talk a little bit about how KDF's are 00:01:24.951 --> 00:01:29.846 constructed. So first of all, suppose we have a secure PRF, that happens to have 00:01:29.846 --> 00:01:34.993 key space K. And now, suppose that it so happens that our source key SK is uniform 00:01:34.993 --> 00:01:41.207 in the key K. In this case, the source key is, in fact, a uniform random key for the 00:01:41.207 --> 00:01:46.453 secure PRF F. And we can use it directly to generate keys, all the keys that we need 00:01:46.453 --> 00:01:50.444 to secure the session. So in this case, the KDF is really simple. The key 00:01:50.444 --> 00:01:53.771 derivation function would just work as follows. It would take as input the 00:01:53.771 --> 00:01:58.025 source key. It would take an input, a parameter context, which I'm gonna 00:01:58.025 --> 00:02:02.766 describe in just a minute. And then it's gonna take a length input as input as 00:02:02.766 --> 00:02:07.615 well. And then what it will do is it will basically evaluate the PRF on zero. Then 00:02:07.615 --> 00:02:12.400 it will evaluate the PRF on one. Then it will evaluate the PRF on two, up until L. 00:02:12.400 --> 00:02:16.449 And I will talk about what this context is in just a second. And then, basically, you 00:02:16.449 --> 00:02:20.353 would use as many bits of the output as you would need to generate all the keys 00:02:20.353 --> 00:02:24.256 for the session. So, if you need unidirectional keys you would generate, you 00:02:24.256 --> 00:02:28.355 know, one key in each direction where each key might include an encryption key and a MAC 00:02:28.355 --> 00:02:32.356 key. And so, you would basically generate as many bits as you need and then finally 00:02:32.356 --> 00:02:36.259 cut off the output at the time when you've generated enough keys to secure your 00:02:36.259 --> 00:02:41.177 session. Okay so this is a fairly straight forward mechanism it's basically using the 00:02:41.177 --> 00:02:45.656 secure PRF as a pseudo random generator. And the only question is what is its 00:02:45.656 --> 00:02:49.451 context string. Well, I'll tell you that the context string is basically a unique 00:02:49.451 --> 00:02:53.545 string that identifies the application. So in fact you might have multiple 00:02:53.545 --> 00:02:58.304 applications on the same system that's trying to establish multiple secure keys. 00:02:58.304 --> 00:03:03.169 Maybe you have SSH running as one process, you have a web server running as another process, 00:03:03.169 --> 00:03:09.145 IPsec as a third process and all three need to have secret keys generated. And this 00:03:09.145 --> 00:03:13.533 context variable basically tries to separate the three of them. So, let me ask you, 00:03:13.533 --> 00:03:16.589 more precisely, what do you think the purpose of this context variable is? 00:03:19.204 --> 00:03:22.312 So I guess I've given it away and this context variable is 00:03:22.312 --> 00:03:26.166 supposed to basically separate applications, so that even if, for 00:03:26.166 --> 00:03:30.863 example, the three services that we just talked about, SSH, web server, and IPsec, 00:03:30.863 --> 00:03:35.741 if they all happen to obtain the same source key from the hardware random number 00:03:35.741 --> 00:03:40.378 generator then the context since it's different for the three apps will make 00:03:40.378 --> 00:03:45.617 sure that they still get three independent strings that they can then use to secure 00:03:45.617 --> 00:03:49.873 the sessions. I just want you to remember that, even though this is actually fairly 00:03:49.873 --> 00:03:53.648 straightforward, and we discussed this before, the context string is actually 00:03:53.648 --> 00:03:57.374 important, and it does need to be specific to the application, so that each 00:03:57.374 --> 00:04:01.300 application gets its own session keys, even if multiple applications happen to 00:04:01.300 --> 00:04:05.139 sample the same SK. The next question is, what do we do if the source 00:04:05.139 --> 00:04:09.714 key actually isn't uniform? Well, now we got a problem. If the source key is not a 00:04:09.714 --> 00:04:14.113 uniform key for the pseudo random function then we can no longer assume that the 00:04:14.113 --> 00:04:18.511 output of the pseudo random function is indistinguishable from random. In fact, if 00:04:18.511 --> 00:04:23.841 we just use the KDF that we just described then the output might not look random to 00:04:23.841 --> 00:04:27.416 the adversary and then he might be able to anticipate some of the session keys that 00:04:27.416 --> 00:04:31.562 we'll be using and thereby break the session. So then we have a problem. Now 00:04:31.562 --> 00:04:35.510 why would this source key not be uniform? Well there are many reasons why this 00:04:35.510 --> 00:04:39.560 happened. For example if you use a key exchange protocol, it so happens typically 00:04:39.560 --> 00:04:42.826 that key exchange protocols will generate a high entropy key. But the 00:04:42.826 --> 00:04:46.774 high entropy key is gonna be distributed in some subspace of the key 00:04:46.774 --> 00:04:51.455 space. So it's not going to be a uniform string. It will be uniform in some 00:04:51.455 --> 00:04:55.763 subset of a larger set, And we'll see examples of that as soon as we talk about 00:04:55.926 --> 00:05:00.317 key exchange protocols. And so KDFs have to kind of accommodate for the fact that 00:05:00.317 --> 00:05:04.492 key exchange protocols actually don't generate uniform bit strings. The other 00:05:04.492 --> 00:05:08.830 problem is, that, in fact, the hardware random number generator you're using might 00:05:08.830 --> 00:05:13.384 actually produce biased outputs. We don't wanna rely on the non bias of the hardware 00:05:13.384 --> 00:05:17.252 random number generator. And so all we want to assume is that it generates a high 00:05:17.252 --> 00:05:21.842 entropy string, but one that might be biased. In which case, we have to somehow 00:05:21.842 --> 00:05:26.735 clean this bias. And so this introduces this, this paradigm for building KDFs. 00:05:26.735 --> 00:05:31.962 This is called the extract-then-expand paradigm, where the first step 00:05:31.962 --> 00:05:37.247 of the KDF is to extract a pseudo random key from the actual source key. So in a 00:05:37.247 --> 00:05:40.829 picture you can think about it like this. In some sense these are the different 00:05:40.829 --> 00:05:45.343 possible values of the source key. This is the horizontal line and the vertical axis 00:05:45.343 --> 00:05:49.541 is basically the probability of each one of these values, and you can see that this 00:05:49.541 --> 00:05:53.464 is a kind of a bumpy function which would say that the source key is not uniformly 00:05:53.464 --> 00:05:58.561 distributed in the key space. What we do in this case is we use what's called an 00:05:58.561 --> 00:06:02.913 extractor. So an extractor is something that takes a bumpy distribution and makes 00:06:02.913 --> 00:06:07.462 it into a uniform distribution over the key space. In our case we're actually just 00:06:07.462 --> 00:06:10.027 gonna be using what are called computational extractors, namely 00:06:10.027 --> 00:06:14.894 extractors that don't necessarily produce uniform distribution at the end but 00:06:14.894 --> 00:06:20.257 they generated distribution that's indistinguishable from uniform. 00:06:22.910 --> 00:06:27.323 Now extractors typically take as input something called a salt, and a salt just 00:06:27.323 --> 00:06:31.318 like in a salad, it kind of adds flavor to things, what it does is basically kind of 00:06:31.318 --> 00:06:36.022 jumbles things around, so that no matter what the input distribution is, the output 00:06:36.022 --> 00:06:39.738 distribution is still going to be indistinguishable from random. So a salt 00:06:39.738 --> 00:06:43.973 basically, what is it? It's a non-secret string, so it's publicly known. It doesn't 00:06:43.973 --> 00:06:48.565 matter if the adversary knows what the salt is, and it's fixed forever. The only 00:06:48.565 --> 00:06:53.096 point is that when you chose it, you chose one at random. And then the hope is that 00:06:53.096 --> 00:06:57.173 the funny distribution that you're trying to extract from kinda doesn't inherently 00:06:57.173 --> 00:07:00.274 depends on the salt that you chose and hence as a result using your salt, you 00:07:00.274 --> 00:07:03.729 will actually get a distribution that looks indistinguishable from random. So 00:07:03.729 --> 00:07:07.020 essentially the salt, you know, you can just bang it the keyboard a couple of 00:07:07.020 --> 00:07:10.220 times when you generate it but it just needs to be something that's random 00:07:10.220 --> 00:07:14.249 initially but then it's fixed forever, and it's fine if the adversary knows what 00:07:14.249 --> 00:07:20.304 it is and nevertheless the extractor is able to extract the entropy and output a 00:07:20.304 --> 00:07:24.713 uniformly random string K. In some sense the salt is only there to defend against 00:07:24.713 --> 00:07:29.667 adversarially bad distributions that might mess up our extractor. Okay, so now that 00:07:29.667 --> 00:07:34.581 we have extracted a pseudo random key. Now, we might as well just use it in a KDF 00:07:34.581 --> 00:07:38.911 that we just saw using a secure pseudo random function to expand the key 00:07:38.911 --> 00:07:43.481 into as many bits as we need to actually secure the session. Okay, so there are 00:07:43.481 --> 00:07:47.431 these two steps. The first one is we extract a pseudo-random key, and then once 00:07:47.431 --> 00:07:51.584 we have a pseudo-random key we already know how to extend it into as many keys as 00:07:51.584 --> 00:07:56.033 we need using a pseudo-random function. So the standardized way of doing this is 00:07:56.033 --> 00:08:01.170 called HKDF. This is a KDF, a key derivation function that's built from HMAC. 00:08:01.170 --> 00:08:06.561 And here HMAC is used both as the PRF for expanding and an extractor for extracting 00:08:06.561 --> 00:08:11.699 the initial pseudo-random key. So let me explain how this works. So in the extract 00:08:11.699 --> 00:08:16.900 step, we're gonna use our salt which you remember is a public value just happened to 00:08:16.900 --> 00:08:21.101 be generated at random at the beginning of time. And we use this salt as the HMAC 00:08:21.101 --> 00:08:27.526 key. And then the source key we're gonna use as the HMAC data. So we're kind of 00:08:27.526 --> 00:08:32.292 using a public value as a key. And nevertheless, one can argue that HMAC has 00:08:32.292 --> 00:08:37.623 extraction properties, such that, when we apply HMAC, the resulting key is going to 00:08:37.623 --> 00:08:42.452 look indistinguishable from random, assuming that the source key actually has 00:08:42.452 --> 00:08:47.329 enough entropy to it. And now that we have the pseudo random key we're simply going 00:08:47.329 --> 00:08:52.037 to use HMAC as a PRF to generate a session key of you know as many bits as we 00:08:52.037 --> 00:08:56.389 need for the session keys. Okay. So that basically concludes our discussion of 00:08:56.389 --> 00:09:00.763 HKDF. And I just want you to remember that, once you obtain a source key, either 00:09:00.763 --> 00:09:04.912 from hardware or from a key exchange protocol, the way you convert it into 00:09:04.912 --> 00:09:09.566 session keys is not by using that sample directly. You would never use a source key 00:09:09.566 --> 00:09:14.108 directly as a session key in a protocol. What you would do is you will run the 00:09:14.108 --> 00:09:18.369 source key through a KDF. And the KDF would give you all the keys and output 00:09:18.369 --> 00:09:22.575 that you need, for, the randomness, for the random keys to be used in your 00:09:22.575 --> 00:09:27.042 protocol. And a typical KDF to use is HKDF, which is actually getting quite a 00:09:27.042 --> 00:09:31.952 bit of traction out there. Okay. The last topic I wanna talk about in this segment 00:09:31.952 --> 00:09:36.430 is, how do you extract keys from passwords. These are called password based 00:09:36.430 --> 00:09:41.921 KDFs or PBKDFs. The problem here is that passwords have relatively low 00:09:41.921 --> 00:09:45.764 entropy. In fact, we're gonna talk about passwords later on in the course when we 00:09:45.764 --> 00:09:50.154 talk about user authentication. And so, I'm not gonna say too much here. I'll just 00:09:50.154 --> 00:09:54.291 say passwords generally have very little entropy estimated on the order of twenty 00:09:54.291 --> 00:09:59.010 bits of entropy, say. And as a result, there is simply not enough entropy to 00:09:59.010 --> 00:10:02.882 generate session keys out of a password. And yet we still need to do it very 00:10:02.882 --> 00:10:06.804 frequently. We still need to derive encryption keys and MACs and so on out of 00:10:06.804 --> 00:10:10.828 passwords, so the question is how to do it. The first thing is, you know, for this 00:10:10.828 --> 00:10:14.744 kind of purpose, don't use HKDF. That's not what it's designed for. What will 00:10:14.744 --> 00:10:18.657 happen is that the derived keys will actually be vulnerable to something called 00:10:18.657 --> 00:10:22.105 a dictionary attack, which we're gonna talk about much later in the course when 00:10:22.105 --> 00:10:27.787 we talk about user authentication. So, the way PBKDFs defend against this low entropy 00:10:27.787 --> 00:10:33.002 problem that results in a dictionary attack is by two means. First of all, as 00:10:33.002 --> 00:10:38.878 before they use a salt, a public, random value that's fixed forever. But in 00:10:38.878 --> 00:10:43.097 addition, they also use what's called a slow hash function. And let me describe 00:10:43.097 --> 00:10:49.355 kind of the standard approach to deriving keys from passwords. This is called PKCS#5, 00:10:49.355 --> 00:10:53.911 and in particular, the version I'll describe is what's called PBKDF1. And I 00:10:53.911 --> 00:10:57.398 should say that this mechanism is implemented in most crypto libraries so 00:10:57.398 --> 00:11:00.663 you shouldn't have to implement this yourself. All you would do, you know, you 00:11:00.663 --> 00:11:03.788 would call a function, you know, derived key from password. You would give the 00:11:03.788 --> 00:11:08.353 password in as input, and you would get a key as output. But you should be aware of 00:11:08.353 --> 00:11:11.741 course that this key is not going to have high entropy so in fact it will be 00:11:11.741 --> 00:11:17.155 guessable. What these PBKDFs try to do is make the guessing problem as hard as 00:11:17.155 --> 00:11:20.964 possible. Okay. So the way they work, first of all, is, as we said, they 00:11:20.964 --> 00:11:25.693 basically hash, the concatenation of the password and the salt. And then the hash 00:11:25.693 --> 00:11:29.170 itself is designed to be a very slow hash function. And the way we build a slow hash 00:11:29.170 --> 00:11:34.063 function is by taking one particular hash function, say, SHA-256, and we 00:11:34.063 --> 00:11:39.425 iterate it many, many times, C times. So you can imagine 1000 times, perhaps even a 00:11:39.425 --> 00:11:43.356 million times. And what do I mean by iterating it? So, well, we take the 00:11:43.356 --> 00:11:48.494 password and the salt. And we put them inside of one input to the hash function. 00:11:48.494 --> 00:11:54.025 And then we apply the hash function, oops, let me write it like this. And then we 00:11:54.025 --> 00:11:57.779 apply the hash function and we get an output, and then we apply the hash 00:11:57.779 --> 00:12:00.779 function again and we get another output. And we do this again and again and again 00:12:00.779 --> 00:12:05.830 maybe a thousand times or a million times depending on how fast your processors are 00:12:05.830 --> 00:12:10.710 and then finally we get the final output that we actually output as, as the key 00:12:10.710 --> 00:12:15.057 output of this key derivation function. Now what is the point here? Iterating a 00:12:15.057 --> 00:12:19.371 function 10,000 times or even a million times is going to take very little time on 00:12:19.371 --> 00:12:22.922 a modern CPU, and as a result, it doesn't really affect the user's experience. The 00:12:22.922 --> 00:12:27.792 user types in his password, it gets hashed a million times and then it gets output. 00:12:27.792 --> 00:12:31.654 And maybe that could even take, you know a tenth of a second and the user wouldn't 00:12:31.654 --> 00:12:35.489 even notice it. However the attacker, all he can do is he can try all the passwords 00:12:35.489 --> 00:12:39.689 in the dictionary, because we know people tend to pick passwords in dictionaries, 00:12:39.689 --> 00:12:43.780 and so he could just try them one by one, remember the SALT is public, so he knows 00:12:43.780 --> 00:12:49.232 what the SALT is. And so he can just try this hash one by one. However because the hash 00:12:49.232 --> 00:12:53.796 function is slow, each attempt is gonna take him a tenth of second. So if he needs 00:12:53.796 --> 00:12:57.965 to run through a dictionary, you know, with, with a 200 billion passwords in it, 00:12:57.965 --> 00:13:02.416 because the hash function is slow, this is gonna take quite awhile. And by doing 00:13:02.416 --> 00:13:06.584 that, we slow down the dictionary attack, and we make it harder for the attacker to 00:13:06.584 --> 00:13:11.608 get our session keys. Not impossible, just harder. That's all this is trying to do. 00:13:11.608 --> 00:13:15.748 Okay, so this is basically what I wanted to say about password based KDFs. As I 00:13:15.748 --> 00:13:19.835 said, this is not something you would build yourself. All crypto libraries have 00:13:19.835 --> 00:13:23.976 an implementation of a PKCS#5 mechanism. And you would just call the 00:13:23.976 --> 00:13:28.275 appropriate function to convert a password into a key, and then use the resulting 00:13:28.275 --> 00:13:32.362 key. Okay, in the next segment, we're gonna see how to use symmetric encryption 00:13:32.362 --> 00:13:35.229 in a way that allows us to search on the cipher texts.