1 00:00:00,000 --> 00:00:03,515 Well, we're almost done with our discussion of symmetric encryption. There 2 00:00:03,515 --> 00:00:06,471 are just a couple of odds and ends that I'd like to discuss before we move on to 3 00:00:06,471 --> 00:00:10,431 the next topic. So the first thing I'd like to mention is how we derive many keys 4 00:00:10,431 --> 00:00:14,497 from one key. And it, actually, this comes up all the time in practice, so I'd like 5 00:00:14,497 --> 00:00:18,287 to make sure you know how to do this correctly. So what's the setting that 6 00:00:18,287 --> 00:00:22,775 we're looking at? Well, imagine we have a certain source key that's generated by one 7 00:00:22,775 --> 00:00:26,435 of, a number of methods. Imagine the source key is generated by a hardware 8 00:00:26,435 --> 00:00:30,094 random number generator or perhaps is generated by a key exchange protocol 9 00:00:30,094 --> 00:00:34,036 which we're going to discuss later. But anyhow, there are a number of ways in 10 00:00:34,036 --> 00:00:38,110 which a source key might be generated between Alice and Bob, such that the 11 00:00:38,110 --> 00:00:42,569 attacker doesn't know what the source key is. But now, as we said, in many cases, we 12 00:00:42,569 --> 00:00:46,863 actually need many keys to secure a session, not just one single source key. 13 00:00:46,863 --> 00:00:51,267 For example, if you remember, in TLS there were unidirectional keys and we 14 00:00:51,267 --> 00:00:55,285 needed keys in each direction. And in fact, in each direction, we needed 15 00:00:55,285 --> 00:00:59,469 multiple keys. We needed a MAC key, we needed an encryption key. We need an IV, 16 00:00:59,469 --> 00:01:03,093 and so on. Similarly nonce based encryption, you remember, there were 17 00:01:03,093 --> 00:01:07,594 multiple keys that were being used, and so on. And so, the question is, how do we use 18 00:01:07,594 --> 00:01:12,031 the one source key that we just derived, either from a hardware process or 19 00:01:12,031 --> 00:01:16,351 by key exchange, and generate a bunch of keys from it that we could then use to 20 00:01:16,351 --> 00:01:20,531 secure our session. The way that's done, is using a mechanism called a key 21 00:01:20,531 --> 00:01:24,951 derivation function, KDF. And I want to talk a little bit about how KDF's are 22 00:01:24,951 --> 00:01:29,846 constructed. So first of all, suppose we have a secure PRF, that happens to have 23 00:01:29,846 --> 00:01:34,993 key space K. And now, suppose that it so happens that our source key SK is uniform 24 00:01:34,993 --> 00:01:41,207 in the key K. In this case, the source key is, in fact, a uniform random key for the 25 00:01:41,207 --> 00:01:46,453 secure PRF F. And we can use it directly to generate keys, all the keys that we need 26 00:01:46,453 --> 00:01:50,444 to secure the session. So in this case, the KDF is really simple. The key 27 00:01:50,444 --> 00:01:53,771 derivation function would just work as follows. It would take as input the 28 00:01:53,771 --> 00:01:58,025 source key. It would take an input, a parameter context, which I'm gonna 29 00:01:58,025 --> 00:02:02,766 describe in just a minute. And then it's gonna take a length input as input as 30 00:02:02,766 --> 00:02:07,615 well. And then what it will do is it will basically evaluate the PRF on zero. Then 31 00:02:07,615 --> 00:02:12,400 it will evaluate the PRF on one. Then it will evaluate the PRF on two, up until L. 32 00:02:12,400 --> 00:02:16,449 And I will talk about what this context is in just a second. And then, basically, you 33 00:02:16,449 --> 00:02:20,353 would use as many bits of the output as you would need to generate all the keys 34 00:02:20,353 --> 00:02:24,256 for the session. So, if you need unidirectional keys you would generate, you 35 00:02:24,256 --> 00:02:28,355 know, one key in each direction where each key might include an encryption key and a MAC 36 00:02:28,355 --> 00:02:32,356 key. And so, you would basically generate as many bits as you need and then finally 37 00:02:32,356 --> 00:02:36,259 cut off the output at the time when you've generated enough keys to secure your 38 00:02:36,259 --> 00:02:41,177 session. Okay so this is a fairly straight forward mechanism it's basically using the 39 00:02:41,177 --> 00:02:45,656 secure PRF as a pseudo random generator. And the only question is what is its 40 00:02:45,656 --> 00:02:49,451 context string. Well, I'll tell you that the context string is basically a unique 41 00:02:49,451 --> 00:02:53,545 string that identifies the application. So in fact you might have multiple 42 00:02:53,545 --> 00:02:58,304 applications on the same system that's trying to establish multiple secure keys. 43 00:02:58,304 --> 00:03:03,169 Maybe you have SSH running as one process, you have a web server running as another process, 44 00:03:03,169 --> 00:03:09,145 IPsec as a third process and all three need to have secret keys generated. And this 45 00:03:09,145 --> 00:03:13,533 context variable basically tries to separate the three of them. So, let me ask you, 46 00:03:13,533 --> 00:03:16,589 more precisely, what do you think the purpose of this context variable is? 47 00:03:19,204 --> 00:03:22,312 So I guess I've given it away and this context variable is 48 00:03:22,312 --> 00:03:26,166 supposed to basically separate applications, so that even if, for 49 00:03:26,166 --> 00:03:30,863 example, the three services that we just talked about, SSH, web server, and IPsec, 50 00:03:30,863 --> 00:03:35,741 if they all happen to obtain the same source key from the hardware random number 51 00:03:35,741 --> 00:03:40,378 generator then the context since it's different for the three apps will make 52 00:03:40,378 --> 00:03:45,617 sure that they still get three independent strings that they can then use to secure 53 00:03:45,617 --> 00:03:49,873 the sessions. I just want you to remember that, even though this is actually fairly 54 00:03:49,873 --> 00:03:53,648 straightforward, and we discussed this before, the context string is actually 55 00:03:53,648 --> 00:03:57,374 important, and it does need to be specific to the application, so that each 56 00:03:57,374 --> 00:04:01,300 application gets its own session keys, even if multiple applications happen to 57 00:04:01,300 --> 00:04:05,139 sample the same SK. The next question is, what do we do if the source 58 00:04:05,139 --> 00:04:09,714 key actually isn't uniform? Well, now we got a problem. If the source key is not a 59 00:04:09,714 --> 00:04:14,113 uniform key for the pseudo random function then we can no longer assume that the 60 00:04:14,113 --> 00:04:18,511 output of the pseudo random function is indistinguishable from random. In fact, if 61 00:04:18,511 --> 00:04:23,841 we just use the KDF that we just described then the output might not look random to 62 00:04:23,841 --> 00:04:27,416 the adversary and then he might be able to anticipate some of the session keys that 63 00:04:27,416 --> 00:04:31,562 we'll be using and thereby break the session. So then we have a problem. Now 64 00:04:31,562 --> 00:04:35,510 why would this source key not be uniform? Well there are many reasons why this 65 00:04:35,510 --> 00:04:39,560 happened. For example if you use a key exchange protocol, it so happens typically 66 00:04:39,560 --> 00:04:42,826 that key exchange protocols will generate a high entropy key. But the 67 00:04:42,826 --> 00:04:46,774 high entropy key is gonna be distributed in some subspace of the key 68 00:04:46,774 --> 00:04:51,455 space. So it's not going to be a uniform string. It will be uniform in some 69 00:04:51,455 --> 00:04:55,763 subset of a larger set, And we'll see examples of that as soon as we talk about 70 00:04:55,926 --> 00:05:00,317 key exchange protocols. And so KDFs have to kind of accommodate for the fact that 71 00:05:00,317 --> 00:05:04,492 key exchange protocols actually don't generate uniform bit strings. The other 72 00:05:04,492 --> 00:05:08,830 problem is, that, in fact, the hardware random number generator you're using might 73 00:05:08,830 --> 00:05:13,384 actually produce biased outputs. We don't wanna rely on the non bias of the hardware 74 00:05:13,384 --> 00:05:17,252 random number generator. And so all we want to assume is that it generates a high 75 00:05:17,252 --> 00:05:21,842 entropy string, but one that might be biased. In which case, we have to somehow 76 00:05:21,842 --> 00:05:26,735 clean this bias. And so this introduces this, this paradigm for building KDFs. 77 00:05:26,735 --> 00:05:31,962 This is called the extract-then-expand paradigm, where the first step 78 00:05:31,962 --> 00:05:37,247 of the KDF is to extract a pseudo random key from the actual source key. So in a 79 00:05:37,247 --> 00:05:40,829 picture you can think about it like this. In some sense these are the different 80 00:05:40,829 --> 00:05:45,343 possible values of the source key. This is the horizontal line and the vertical axis 81 00:05:45,343 --> 00:05:49,541 is basically the probability of each one of these values, and you can see that this 82 00:05:49,541 --> 00:05:53,464 is a kind of a bumpy function which would say that the source key is not uniformly 83 00:05:53,464 --> 00:05:58,561 distributed in the key space. What we do in this case is we use what's called an 84 00:05:58,561 --> 00:06:02,913 extractor. So an extractor is something that takes a bumpy distribution and makes 85 00:06:02,913 --> 00:06:07,462 it into a uniform distribution over the key space. In our case we're actually just 86 00:06:07,462 --> 00:06:10,027 gonna be using what are called computational extractors, namely 87 00:06:10,027 --> 00:06:14,894 extractors that don't necessarily produce uniform distribution at the end but 88 00:06:14,894 --> 00:06:20,257 they generated distribution that's indistinguishable from uniform. 89 00:06:22,910 --> 00:06:27,323 Now extractors typically take as input something called a salt, and a salt just 90 00:06:27,323 --> 00:06:31,318 like in a salad, it kind of adds flavor to things, what it does is basically kind of 91 00:06:31,318 --> 00:06:36,022 jumbles things around, so that no matter what the input distribution is, the output 92 00:06:36,022 --> 00:06:39,738 distribution is still going to be indistinguishable from random. So a salt 93 00:06:39,738 --> 00:06:43,973 basically, what is it? It's a non-secret string, so it's publicly known. It doesn't 94 00:06:43,973 --> 00:06:48,565 matter if the adversary knows what the salt is, and it's fixed forever. The only 95 00:06:48,565 --> 00:06:53,096 point is that when you chose it, you chose one at random. And then the hope is that 96 00:06:53,096 --> 00:06:57,173 the funny distribution that you're trying to extract from kinda doesn't inherently 97 00:06:57,173 --> 00:07:00,274 depends on the salt that you chose and hence as a result using your salt, you 98 00:07:00,274 --> 00:07:03,729 will actually get a distribution that looks indistinguishable from random. So 99 00:07:03,729 --> 00:07:07,020 essentially the salt, you know, you can just bang it the keyboard a couple of 100 00:07:07,020 --> 00:07:10,220 times when you generate it but it just needs to be something that's random 101 00:07:10,220 --> 00:07:14,249 initially but then it's fixed forever, and it's fine if the adversary knows what 102 00:07:14,249 --> 00:07:20,304 it is and nevertheless the extractor is able to extract the entropy and output a 103 00:07:20,304 --> 00:07:24,713 uniformly random string K. In some sense the salt is only there to defend against 104 00:07:24,713 --> 00:07:29,667 adversarially bad distributions that might mess up our extractor. Okay, so now that 105 00:07:29,667 --> 00:07:34,581 we have extracted a pseudo random key. Now, we might as well just use it in a KDF 106 00:07:34,581 --> 00:07:38,911 that we just saw using a secure pseudo random function to expand the key 107 00:07:38,911 --> 00:07:43,481 into as many bits as we need to actually secure the session. Okay, so there are 108 00:07:43,481 --> 00:07:47,431 these two steps. The first one is we extract a pseudo-random key, and then once 109 00:07:47,431 --> 00:07:51,584 we have a pseudo-random key we already know how to extend it into as many keys as 110 00:07:51,584 --> 00:07:56,033 we need using a pseudo-random function. So the standardized way of doing this is 111 00:07:56,033 --> 00:08:01,170 called HKDF. This is a KDF, a key derivation function that's built from HMAC. 112 00:08:01,170 --> 00:08:06,561 And here HMAC is used both as the PRF for expanding and an extractor for extracting 113 00:08:06,561 --> 00:08:11,699 the initial pseudo-random key. So let me explain how this works. So in the extract 114 00:08:11,699 --> 00:08:16,900 step, we're gonna use our salt which you remember is a public value just happened to 115 00:08:16,900 --> 00:08:21,101 be generated at random at the beginning of time. And we use this salt as the HMAC 116 00:08:21,101 --> 00:08:27,526 key. And then the source key we're gonna use as the HMAC data. So we're kind of 117 00:08:27,526 --> 00:08:32,292 using a public value as a key. And nevertheless, one can argue that HMAC has 118 00:08:32,292 --> 00:08:37,623 extraction properties, such that, when we apply HMAC, the resulting key is going to 119 00:08:37,623 --> 00:08:42,452 look indistinguishable from random, assuming that the source key actually has 120 00:08:42,452 --> 00:08:47,329 enough entropy to it. And now that we have the pseudo random key we're simply going 121 00:08:47,329 --> 00:08:52,037 to use HMAC as a PRF to generate a session key of you know as many bits as we 122 00:08:52,037 --> 00:08:56,389 need for the session keys. Okay. So that basically concludes our discussion of 123 00:08:56,389 --> 00:09:00,763 HKDF. And I just want you to remember that, once you obtain a source key, either 124 00:09:00,763 --> 00:09:04,912 from hardware or from a key exchange protocol, the way you convert it into 125 00:09:04,912 --> 00:09:09,566 session keys is not by using that sample directly. You would never use a source key 126 00:09:09,566 --> 00:09:14,108 directly as a session key in a protocol. What you would do is you will run the 127 00:09:14,108 --> 00:09:18,369 source key through a KDF. And the KDF would give you all the keys and output 128 00:09:18,369 --> 00:09:22,575 that you need, for, the randomness, for the random keys to be used in your 129 00:09:22,575 --> 00:09:27,042 protocol. And a typical KDF to use is HKDF, which is actually getting quite a 130 00:09:27,042 --> 00:09:31,952 bit of traction out there. Okay. The last topic I wanna talk about in this segment 131 00:09:31,952 --> 00:09:36,430 is, how do you extract keys from passwords. These are called password based 132 00:09:36,430 --> 00:09:41,921 KDFs or PBKDFs. The problem here is that passwords have relatively low 133 00:09:41,921 --> 00:09:45,764 entropy. In fact, we're gonna talk about passwords later on in the course when we 134 00:09:45,764 --> 00:09:50,154 talk about user authentication. And so, I'm not gonna say too much here. I'll just 135 00:09:50,154 --> 00:09:54,291 say passwords generally have very little entropy estimated on the order of twenty 136 00:09:54,291 --> 00:09:59,010 bits of entropy, say. And as a result, there is simply not enough entropy to 137 00:09:59,010 --> 00:10:02,882 generate session keys out of a password. And yet we still need to do it very 138 00:10:02,882 --> 00:10:06,804 frequently. We still need to derive encryption keys and MACs and so on out of 139 00:10:06,804 --> 00:10:10,828 passwords, so the question is how to do it. The first thing is, you know, for this 140 00:10:10,828 --> 00:10:14,744 kind of purpose, don't use HKDF. That's not what it's designed for. What will 141 00:10:14,744 --> 00:10:18,657 happen is that the derived keys will actually be vulnerable to something called 142 00:10:18,657 --> 00:10:22,105 a dictionary attack, which we're gonna talk about much later in the course when 143 00:10:22,105 --> 00:10:27,787 we talk about user authentication. So, the way PBKDFs defend against this low entropy 144 00:10:27,787 --> 00:10:33,002 problem that results in a dictionary attack is by two means. First of all, as 145 00:10:33,002 --> 00:10:38,878 before they use a salt, a public, random value that's fixed forever. But in 146 00:10:38,878 --> 00:10:43,097 addition, they also use what's called a slow hash function. And let me describe 147 00:10:43,097 --> 00:10:49,355 kind of the standard approach to deriving keys from passwords. This is called PKCS#5, 148 00:10:49,355 --> 00:10:53,911 and in particular, the version I'll describe is what's called PBKDF1. And I 149 00:10:53,911 --> 00:10:57,398 should say that this mechanism is implemented in most crypto libraries so 150 00:10:57,398 --> 00:11:00,663 you shouldn't have to implement this yourself. All you would do, you know, you 151 00:11:00,663 --> 00:11:03,788 would call a function, you know, derived key from password. You would give the 152 00:11:03,788 --> 00:11:08,353 password in as input, and you would get a key as output. But you should be aware of 153 00:11:08,353 --> 00:11:11,741 course that this key is not going to have high entropy so in fact it will be 154 00:11:11,741 --> 00:11:17,155 guessable. What these PBKDFs try to do is make the guessing problem as hard as 155 00:11:17,155 --> 00:11:20,964 possible. Okay. So the way they work, first of all, is, as we said, they 156 00:11:20,964 --> 00:11:25,693 basically hash, the concatenation of the password and the salt. And then the hash 157 00:11:25,693 --> 00:11:29,170 itself is designed to be a very slow hash function. And the way we build a slow hash 158 00:11:29,170 --> 00:11:34,063 function is by taking one particular hash function, say, SHA-256, and we 159 00:11:34,063 --> 00:11:39,425 iterate it many, many times, C times. So you can imagine 1000 times, perhaps even a 160 00:11:39,425 --> 00:11:43,356 million times. And what do I mean by iterating it? So, well, we take the 161 00:11:43,356 --> 00:11:48,494 password and the salt. And we put them inside of one input to the hash function. 162 00:11:48,494 --> 00:11:54,025 And then we apply the hash function, oops, let me write it like this. And then we 163 00:11:54,025 --> 00:11:57,779 apply the hash function and we get an output, and then we apply the hash 164 00:11:57,779 --> 00:12:00,779 function again and we get another output. And we do this again and again and again 165 00:12:00,779 --> 00:12:05,830 maybe a thousand times or a million times depending on how fast your processors are 166 00:12:05,830 --> 00:12:10,710 and then finally we get the final output that we actually output as, as the key 167 00:12:10,710 --> 00:12:15,057 output of this key derivation function. Now what is the point here? Iterating a 168 00:12:15,057 --> 00:12:19,371 function 10,000 times or even a million times is going to take very little time on 169 00:12:19,371 --> 00:12:22,922 a modern CPU, and as a result, it doesn't really affect the user's experience. The 170 00:12:22,922 --> 00:12:27,792 user types in his password, it gets hashed a million times and then it gets output. 171 00:12:27,792 --> 00:12:31,654 And maybe that could even take, you know a tenth of a second and the user wouldn't 172 00:12:31,654 --> 00:12:35,489 even notice it. However the attacker, all he can do is he can try all the passwords 173 00:12:35,489 --> 00:12:39,689 in the dictionary, because we know people tend to pick passwords in dictionaries, 174 00:12:39,689 --> 00:12:43,780 and so he could just try them one by one, remember the SALT is public, so he knows 175 00:12:43,780 --> 00:12:49,232 what the SALT is. And so he can just try this hash one by one. However because the hash 176 00:12:49,232 --> 00:12:53,796 function is slow, each attempt is gonna take him a tenth of second. So if he needs 177 00:12:53,796 --> 00:12:57,965 to run through a dictionary, you know, with, with a 200 billion passwords in it, 178 00:12:57,965 --> 00:13:02,416 because the hash function is slow, this is gonna take quite awhile. And by doing 179 00:13:02,416 --> 00:13:06,584 that, we slow down the dictionary attack, and we make it harder for the attacker to 180 00:13:06,584 --> 00:13:11,608 get our session keys. Not impossible, just harder. That's all this is trying to do. 181 00:13:11,608 --> 00:13:15,748 Okay, so this is basically what I wanted to say about password based KDFs. As I 182 00:13:15,748 --> 00:13:19,835 said, this is not something you would build yourself. All crypto libraries have 183 00:13:19,835 --> 00:13:23,976 an implementation of a PKCS#5 mechanism. And you would just call the 184 00:13:23,976 --> 00:13:28,275 appropriate function to convert a password into a key, and then use the resulting 185 00:13:28,275 --> 00:13:32,362 key. Okay, in the next segment, we're gonna see how to use symmetric encryption 186 00:13:32,362 --> 00:13:35,229 in a way that allows us to search on the cipher texts.