1 00:00:01,100 --> 00:00:05,310 okay so again let's get started with 2 00:00:03,330 --> 00:00:06,750 today's lecture so today we're going to 3 00:00:05,310 --> 00:00:09,240 be talking about security and 4 00:00:06,750 --> 00:00:10,410 cryptography and today's lecture is 5 00:00:09,240 --> 00:00:12,780 gonna be a little bit different than our 6 00:00:10,410 --> 00:00:14,849 treatment of this topic in last year's 7 00:00:12,780 --> 00:00:16,619 class so last year we focused a little 8 00:00:14,849 --> 00:00:19,350 bit more on security and privacy and 9 00:00:16,619 --> 00:00:21,720 from the perspective of a user of a 10 00:00:19,350 --> 00:00:22,949 computer but today we're going to focus 11 00:00:21,720 --> 00:00:24,689 a little bit more on security and 12 00:00:22,949 --> 00:00:26,730 cryptography concepts that are relevant 13 00:00:24,689 --> 00:00:29,400 in understanding some of the tools that 14 00:00:26,730 --> 00:00:30,810 we talked about earlier in this class so 15 00:00:29,400 --> 00:00:32,250 for example we talked about hash 16 00:00:30,810 --> 00:00:34,800 functions or cryptographic hash 17 00:00:32,250 --> 00:00:37,829 functions like sha-1 in the get lecture 18 00:00:34,800 --> 00:00:39,239 or we talked about public keys when we 19 00:00:37,829 --> 00:00:41,010 talked about SSH in the command line 20 00:00:39,239 --> 00:00:43,260 environment in a command line 21 00:00:41,010 --> 00:00:44,610 environment lecture and so today we'll 22 00:00:43,260 --> 00:00:46,320 talk about there's different 23 00:00:44,610 --> 00:00:47,879 cryptographic primitives in more detail 24 00:00:46,320 --> 00:00:49,350 and get an understanding of how they 25 00:00:47,879 --> 00:00:50,579 work and how they're used in these 26 00:00:49,350 --> 00:00:53,460 different tools that we're teaching in 27 00:00:50,579 --> 00:00:55,890 this class this lecture is not a 28 00:00:53,460 --> 00:00:58,199 substitute for a more rigorous class in 29 00:00:55,890 --> 00:00:59,640 security so they're they're a bunch of 30 00:00:58,199 --> 00:01:01,050 really good classes at MIT like six 31 00:00:59,640 --> 00:01:03,660 eight five eight which is on computer 32 00:01:01,050 --> 00:01:06,420 system security or six eight five seven 33 00:01:03,660 --> 00:01:08,850 and six eight seven five which are more 34 00:01:06,420 --> 00:01:10,950 focused on cryptography so don't do 35 00:01:08,850 --> 00:01:13,380 security work without formal training in 36 00:01:10,950 --> 00:01:16,380 security from these classes or elsewhere 37 00:01:13,380 --> 00:01:17,580 and unless you're an expert don't roll 38 00:01:16,380 --> 00:01:19,860 your own crypto don't build your own 39 00:01:17,580 --> 00:01:21,960 crypto implementations or protocols and 40 00:01:19,860 --> 00:01:23,909 the same principle applies to computer 41 00:01:21,960 --> 00:01:25,560 system security this lecture is not 42 00:01:23,909 --> 00:01:27,869 about building your own stuff it's about 43 00:01:25,560 --> 00:01:29,369 understanding what's already out there 44 00:01:27,869 --> 00:01:31,290 and so this lecture will have a very 45 00:01:29,369 --> 00:01:32,820 informal but we think practical 46 00:01:31,290 --> 00:01:35,400 treatment of these basic cryptography 47 00:01:32,820 --> 00:01:36,659 concepts and yeah hopefully it'll help 48 00:01:35,400 --> 00:01:39,540 you understand some of the tools we 49 00:01:36,659 --> 00:01:40,590 talked about earlier in this class any 50 00:01:39,540 --> 00:01:45,000 questions about the plan for today's 51 00:01:40,590 --> 00:01:46,560 lecture great so the first topic for 52 00:01:45,000 --> 00:01:48,780 today is something called entropy 53 00:01:46,560 --> 00:01:51,119 entropy is a measure of randomness and 54 00:01:48,780 --> 00:01:52,380 this is useful for example when trying 55 00:01:51,119 --> 00:01:55,229 to determine the strength of a password 56 00:01:52,380 --> 00:01:58,290 so let's take a look at this comic from 57 00:01:55,229 --> 00:02:00,299 xkcd we're a big fan of xkcd comics so 58 00:01:58,290 --> 00:02:02,490 this comic raise your hand if you've 59 00:02:00,299 --> 00:02:05,159 seen this before know a good number of 60 00:02:02,490 --> 00:02:07,530 you so this comic is complaining about 61 00:02:05,159 --> 00:02:09,270 this common pattern that's been taught 62 00:02:07,530 --> 00:02:11,879 to users of computers that when you 63 00:02:09,270 --> 00:02:12,940 design passwords they should be things 64 00:02:11,879 --> 00:02:16,750 like that T 65 00:02:12,940 --> 00:02:19,420 our zero ub40 orm purse and three string 66 00:02:16,750 --> 00:02:21,040 in the top-left like we should design 67 00:02:19,420 --> 00:02:22,450 passwords that are full of funny 68 00:02:21,040 --> 00:02:25,210 characters and things like that to make 69 00:02:22,450 --> 00:02:26,620 it hard for attackers to guess and yet 70 00:02:25,210 --> 00:02:28,690 turns out that passwords like that are 71 00:02:26,620 --> 00:02:30,370 actually pretty weak and guessable by 72 00:02:28,690 --> 00:02:32,530 computers that can guess postures really 73 00:02:30,370 --> 00:02:34,660 fast and brute-force attacks and on the 74 00:02:32,530 --> 00:02:37,510 other hand passwords which maybe 75 00:02:34,660 --> 00:02:39,130 intuitively don't look as secure like 76 00:02:37,510 --> 00:02:41,620 the one on the bottom left correct horse 77 00:02:39,130 --> 00:02:45,100 battery staple that one turns out to be 78 00:02:41,620 --> 00:02:47,650 way more secure so how do I actually 79 00:02:45,100 --> 00:02:49,810 quantify the security of these different 80 00:02:47,650 --> 00:02:51,940 passwords it's by measuring the amount 81 00:02:49,810 --> 00:02:55,230 of randomness in the password how many 82 00:02:51,940 --> 00:02:57,700 bits of randomness are in there and so 83 00:02:55,230 --> 00:02:59,170 entropy is measured in bits this is like 84 00:02:57,700 --> 00:03:07,330 the same bits from information theory 85 00:02:59,170 --> 00:03:09,310 and we're only going to talk about the 86 00:03:07,330 --> 00:03:11,110 simple case where you're trying to 87 00:03:09,310 --> 00:03:12,610 measure the amount of randomness when 88 00:03:11,110 --> 00:03:15,550 you're choosing from a set of things 89 00:03:12,610 --> 00:03:17,320 uniformly at random so for example when 90 00:03:15,550 --> 00:03:19,780 you're constructing a password that's in 91 00:03:17,320 --> 00:03:22,120 the format of four random words you're 92 00:03:19,780 --> 00:03:24,310 kind of considering all possible 93 00:03:22,120 --> 00:03:25,720 sequences of four random words made from 94 00:03:24,310 --> 00:03:26,739 some dictionary you have you might have 95 00:03:25,720 --> 00:03:28,390 a dictionary would say a hundred 96 00:03:26,739 --> 00:03:31,209 thousand words and you're selecting each 97 00:03:28,390 --> 00:03:32,380 word uniformly at random how many 98 00:03:31,209 --> 00:03:33,730 possibilities are there 99 00:03:32,380 --> 00:03:35,830 well you can go and figure that out in 100 00:03:33,730 --> 00:03:37,870 that example once you know how many 101 00:03:35,830 --> 00:03:40,690 possibilities there the measure of 102 00:03:37,870 --> 00:03:49,390 entropy is log base 2 of the number of 103 00:03:40,690 --> 00:03:51,459 possibilities and as that comic suggests 104 00:03:49,390 --> 00:03:52,900 this is related to how long it'll take 105 00:03:51,459 --> 00:03:54,970 an attacker to try to brute-force 106 00:03:52,900 --> 00:03:56,590 through your different passwords like if 107 00:03:54,970 --> 00:03:57,820 you have a thousand possibilities you're 108 00:03:56,590 --> 00:03:59,650 guessing passwords at a thousand 109 00:03:57,820 --> 00:04:04,300 passwords a second that's not a very 110 00:03:59,650 --> 00:04:07,840 good password so this is a couple of 111 00:04:04,300 --> 00:04:09,340 quick examples a coin flip has two 112 00:04:07,840 --> 00:04:15,280 possibilities and let's assume we have a 113 00:04:09,340 --> 00:04:19,650 fair coin and so a coin flip has log 114 00:04:15,280 --> 00:04:21,489 base 2 of 2 is one bit of entropy and 115 00:04:19,650 --> 00:04:24,880 another thing we might look at is 116 00:04:21,489 --> 00:04:26,540 something like a dice roll so there's 117 00:04:24,880 --> 00:04:32,510 six possibilities and log 118 00:04:26,540 --> 00:04:33,640 two of six is something like 2.6 bits of 119 00:04:32,510 --> 00:04:36,380 entropy 120 00:04:33,640 --> 00:04:40,100 so that's how we quantify the amount of 121 00:04:36,380 --> 00:04:41,930 randomness in something so now going 122 00:04:40,100 --> 00:04:43,400 back to that example in the xkcd comic 123 00:04:41,930 --> 00:04:44,690 when we want to figure out how much 124 00:04:43,400 --> 00:04:46,670 entropy is in a password we have to 125 00:04:44,690 --> 00:04:48,680 consider and if the model for how the 126 00:04:46,670 --> 00:04:51,110 password was generated for example in 127 00:04:48,680 --> 00:04:53,450 the top left you could consider okay we 128 00:04:51,110 --> 00:04:55,310 take one dictionary word make some 129 00:04:53,450 --> 00:04:57,650 substitutions of some of the characters 130 00:04:55,310 --> 00:04:59,750 with numbers that look similar to that 131 00:04:57,650 --> 00:05:01,430 character add one punctuation mark at 132 00:04:59,750 --> 00:05:04,340 the end and add one numeral after that 133 00:05:01,430 --> 00:05:05,630 and we can take that model and then use 134 00:05:04,340 --> 00:05:07,400 common rhetoric to figure out how many 135 00:05:05,630 --> 00:05:09,260 possibilities there are and from that we 136 00:05:07,400 --> 00:05:10,550 can derive how many bits of entropy are 137 00:05:09,260 --> 00:05:13,130 in that password so in that particular 138 00:05:10,550 --> 00:05:14,180 example I don't know exactly what model 139 00:05:13,130 --> 00:05:16,750 they were using for the password but 140 00:05:14,180 --> 00:05:19,280 they calculated their 28 bits of entropy 141 00:05:16,750 --> 00:05:22,250 whereas in the bottom-left example that 142 00:05:19,280 --> 00:05:23,960 correct horse battery staple they assume 143 00:05:22,250 --> 00:05:26,630 that you're working from a dictionary of 144 00:05:23,960 --> 00:05:28,850 about 2,000 words and so when you 145 00:05:26,630 --> 00:05:30,770 combine four of those words together you 146 00:05:28,850 --> 00:05:31,850 get about 44 bits of entropy from that 147 00:05:30,770 --> 00:05:34,670 so it's much more secure than the 148 00:05:31,850 --> 00:05:36,620 example before it so any questions about 149 00:05:34,670 --> 00:05:43,520 this definition of entropy or why it's 150 00:05:36,620 --> 00:05:45,620 useful and so when you're generating 151 00:05:43,520 --> 00:05:47,060 your own passwords keep this in mind you 152 00:05:45,620 --> 00:05:48,800 want a high entropy password and the 153 00:05:47,060 --> 00:05:50,360 exact number you need depends on exactly 154 00:05:48,800 --> 00:05:52,190 what you're trying to protect against 155 00:05:50,360 --> 00:05:53,540 like in general a concept in securities 156 00:05:52,190 --> 00:05:55,700 you have to keep in mind what your 157 00:05:53,540 --> 00:05:56,930 threat model is like what attackers 158 00:05:55,700 --> 00:05:58,460 you're concerned about what kinds of 159 00:05:56,930 --> 00:06:01,160 technique the attackers might be using 160 00:05:58,460 --> 00:06:02,930 for example this comic refers to an 161 00:06:01,160 --> 00:06:04,580 attacker that can guess a thousand 162 00:06:02,930 --> 00:06:07,100 passwords a second this might be 163 00:06:04,580 --> 00:06:09,140 something that's possible for say a web 164 00:06:07,100 --> 00:06:11,480 service that allows people to try to log 165 00:06:09,140 --> 00:06:12,830 in with your email and then random 166 00:06:11,480 --> 00:06:15,430 passwords that the attacker is trying 167 00:06:12,830 --> 00:06:17,600 but this thousand passwords the second 168 00:06:15,430 --> 00:06:19,970 model might not be accurate for other 169 00:06:17,600 --> 00:06:21,590 scenarios for example an offline 170 00:06:19,970 --> 00:06:23,870 password cracking scenario or maybe the 171 00:06:21,590 --> 00:06:25,790 attacker has broken into a website and 172 00:06:23,870 --> 00:06:27,920 downloaded their database and they have 173 00:06:25,790 --> 00:06:28,700 some obfuscated form of your password 174 00:06:27,920 --> 00:06:30,560 and they're trying to figure out what 175 00:06:28,700 --> 00:06:31,880 the password is maybe they can paralyze 176 00:06:30,560 --> 00:06:34,010 this attack and make it go to million 177 00:06:31,880 --> 00:06:35,510 guesses a second and so exactly how much 178 00:06:34,010 --> 00:06:37,210 entropy you need depends on exactly what 179 00:06:35,510 --> 00:06:39,350 you're trying to protect against but 180 00:06:37,210 --> 00:06:40,230 roughly forty bits of entropy might be 181 00:06:39,350 --> 00:06:42,660 good enough for 182 00:06:40,230 --> 00:06:44,370 which is protected by a website and 183 00:06:42,660 --> 00:06:47,340 you're concerned about online password 184 00:06:44,370 --> 00:06:49,140 guesses and then maybe something like 80 185 00:06:47,340 --> 00:06:51,000 bits of entropy might be good if you're 186 00:06:49,140 --> 00:06:52,530 concerned about offline attacks and you 187 00:06:51,000 --> 00:06:58,260 want to be really really secure so 188 00:06:52,530 --> 00:07:00,390 they're rough guidelines you can use and 189 00:06:58,260 --> 00:07:02,370 then how do you actually generate strong 190 00:07:00,390 --> 00:07:03,990 passwords well you have some model for 191 00:07:02,370 --> 00:07:05,340 password for example the for dictionary 192 00:07:03,990 --> 00:07:07,320 works thing and you can actually get a 193 00:07:05,340 --> 00:07:08,700 dictionary and then you can use methods 194 00:07:07,320 --> 00:07:10,680 like dice where so there's a song we 195 00:07:08,700 --> 00:07:12,840 linked to in the lecture notes where you 196 00:07:10,680 --> 00:07:14,460 can actually get physical dye and roll 197 00:07:12,840 --> 00:07:15,900 them and then map dice rolls to 198 00:07:14,460 --> 00:07:17,730 dictionary words in order to eventually 199 00:07:15,900 --> 00:07:19,290 turn that into a password and doing 200 00:07:17,730 --> 00:07:21,540 something like this using some kind of 201 00:07:19,290 --> 00:07:24,150 physical token that you know is random 202 00:07:21,540 --> 00:07:26,280 like a balanced die or a coin that you 203 00:07:24,150 --> 00:07:27,870 know is balanced is a good thing to do 204 00:07:26,280 --> 00:07:29,610 because humans are actually not good at 205 00:07:27,870 --> 00:07:31,200 choosing random numbers right if I just 206 00:07:29,610 --> 00:07:32,940 asked you to name a random number for 1 207 00:07:31,200 --> 00:07:34,710 to 100 chances are that you're probably 208 00:07:32,940 --> 00:07:35,790 not doing so uniformly at random very 209 00:07:34,710 --> 00:07:37,200 well and so that's why it's actually 210 00:07:35,790 --> 00:07:42,300 good to use these physical tokens in 211 00:07:37,200 --> 00:07:44,610 order to produce randomness so entropy 212 00:07:42,300 --> 00:07:49,280 that's our first concept recovering any 213 00:07:44,610 --> 00:07:52,080 questions about that or about this comic 214 00:07:49,280 --> 00:07:54,600 great so getting into slightly more 215 00:07:52,080 --> 00:07:55,860 interesting and complicated topics the 216 00:07:54,600 --> 00:07:58,590 next thing we're going to talk about is 217 00:07:55,860 --> 00:08:00,510 hash functions so hopefully most of you 218 00:07:58,590 --> 00:08:02,280 were here during the get lecture where 219 00:08:00,510 --> 00:08:04,680 we talked about the sha-1 hash function 220 00:08:02,280 --> 00:08:12,150 used in get so now going into that topic 221 00:08:04,680 --> 00:08:14,100 in a little bit more detail hash 222 00:08:12,150 --> 00:08:16,620 functions at a high level are functions 223 00:08:14,100 --> 00:08:20,220 that map a variable amount of data into 224 00:08:16,620 --> 00:08:22,950 a fixed size output so for example the 225 00:08:20,220 --> 00:08:25,350 sha-1 hash functions is one example of a 226 00:08:22,950 --> 00:08:30,230 hash function takes in some input of 227 00:08:25,350 --> 00:08:35,340 some number of bytes and outputs exactly 228 00:08:30,230 --> 00:08:37,320 160 bits of output so that's kind of the 229 00:08:35,340 --> 00:08:39,570 type signature of this particular hash 230 00:08:37,320 --> 00:08:41,040 function and then these functions have 231 00:08:39,570 --> 00:08:42,690 some number of properties that are 232 00:08:41,040 --> 00:08:46,710 useful so at a high level 233 00:08:42,690 --> 00:08:47,820 these can be thought about as hard to 234 00:08:46,710 --> 00:08:50,010 invert functions that have 235 00:08:47,820 --> 00:08:53,190 random-looking outputs we can actually 236 00:08:50,010 --> 00:08:54,080 try this out on some random piece of 237 00:08:53,190 --> 00:08:56,930 data 238 00:08:54,080 --> 00:08:59,730 for example if I enter into my terminal 239 00:08:56,930 --> 00:09:02,130 printf hello this does exactly what you 240 00:08:59,730 --> 00:09:04,020 would expect it does prints the set to 241 00:09:02,130 --> 00:09:07,380 standard out and I can pipe this to the 242 00:09:04,020 --> 00:09:09,090 sha-1 sum command so this is a command 243 00:09:07,380 --> 00:09:11,700 line program that accepts input via 244 00:09:09,090 --> 00:09:13,170 standard in and computes this sha-1 245 00:09:11,700 --> 00:09:14,340 function which takes in some variable 246 00:09:13,170 --> 00:09:17,400 number of bytes from the input and 247 00:09:14,340 --> 00:09:20,130 produces a 160-bit output which in this 248 00:09:17,400 --> 00:09:21,720 particular case is represent or encoded 249 00:09:20,130 --> 00:09:24,000 as a hexadecimal string so it's a length 250 00:09:21,720 --> 00:09:26,130 40 hexadecimal string and you see this 251 00:09:24,000 --> 00:09:27,510 output right here this - just means it 252 00:09:26,130 --> 00:09:30,660 took it it took its input from 253 00:09:27,510 --> 00:09:32,400 standardin so this output just looks 254 00:09:30,660 --> 00:09:33,650 like some random number but one 255 00:09:32,400 --> 00:09:37,050 important thing is that this is a 256 00:09:33,650 --> 00:09:39,240 deterministic number if you try the same 257 00:09:37,050 --> 00:09:40,680 command on your own computer printf 258 00:09:39,240 --> 00:09:42,990 hello sha-1 something you will get the 259 00:09:40,680 --> 00:09:44,340 same number out so sha-1 is some 260 00:09:42,990 --> 00:09:47,190 well-known function that people have 261 00:09:44,340 --> 00:09:48,540 agreed upon for all its parameters we'll 262 00:09:47,190 --> 00:09:51,900 see that if we tweak the input a little 263 00:09:48,540 --> 00:09:54,390 bit like say changed hello to holo with 264 00:09:51,900 --> 00:09:55,920 a capital H now I get a completely 265 00:09:54,390 --> 00:09:57,540 different looking output and this also 266 00:09:55,920 --> 00:09:58,860 looks like some other kind of random ish 267 00:09:57,540 --> 00:10:00,090 number even though it is deterministic 268 00:09:58,860 --> 00:10:07,740 and you could reproduce this on your own 269 00:10:00,090 --> 00:10:16,140 computer hash functions have a number of 270 00:10:07,740 --> 00:10:17,610 properties that are pretty important the 271 00:10:16,140 --> 00:10:18,810 first property that cryptographic hash 272 00:10:17,610 --> 00:10:21,060 functions have is that their 273 00:10:18,810 --> 00:10:22,440 non-invertible and what that means is 274 00:10:21,060 --> 00:10:25,230 that if you take the output from this 275 00:10:22,440 --> 00:10:27,720 function for example that a a f4 276 00:10:25,230 --> 00:10:30,390 ballaugh 3 for D strings shown there 277 00:10:27,720 --> 00:10:32,850 from that output it's hard to figure out 278 00:10:30,390 --> 00:10:36,510 what the input was that produced that 279 00:10:32,850 --> 00:10:38,310 output so you can go one way compute the 280 00:10:36,510 --> 00:10:41,190 sha-1 hash easily but you can't go 281 00:10:38,310 --> 00:10:43,380 backwards another property that these 282 00:10:41,190 --> 00:10:50,910 functions have is that their collision 283 00:10:43,380 --> 00:10:53,400 resistant and what this property means 284 00:10:50,910 --> 00:10:56,780 is that it's hard to find two different 285 00:10:53,400 --> 00:10:59,910 inputs that produce the same output and 286 00:10:56,780 --> 00:11:02,460 so this basically describes what a 287 00:10:59,910 --> 00:11:04,040 cryptographic hash function is so any 288 00:11:02,460 --> 00:11:06,510 questions about the kind of 289 00:11:04,040 --> 00:11:08,900 specification of a cryptographic hash 290 00:11:06,510 --> 00:11:08,900 function 291 00:11:09,350 --> 00:11:13,380 okay so what are these hash functions 292 00:11:11,580 --> 00:11:16,170 actually useful for well we've already 293 00:11:13,380 --> 00:11:19,260 seen one application in git for content 294 00:11:16,170 --> 00:11:22,070 address storage so we want it get we 295 00:11:19,260 --> 00:11:24,180 want some uniform way of naming 296 00:11:22,070 --> 00:11:26,130 different objects that are in the object 297 00:11:24,180 --> 00:11:27,900 store and it turns out that get 298 00:11:26,130 --> 00:11:29,940 addresses all of them by their sha-1 299 00:11:27,900 --> 00:11:33,000 hash so you have the actual data you 300 00:11:29,940 --> 00:11:34,830 want to store and then to name that 301 00:11:33,000 --> 00:11:36,420 particular piece of data you just name 302 00:11:34,830 --> 00:11:37,650 the sha-1 hash and all of that is stored 303 00:11:36,420 --> 00:11:41,820 in the object store in that particular 304 00:11:37,650 --> 00:11:43,230 way we see this when looking at many 305 00:11:41,820 --> 00:11:45,030 different parts of git for example right 306 00:11:43,230 --> 00:11:47,520 here I'm going to get repository if I do 307 00:11:45,030 --> 00:11:50,610 get log it shows me the commits and for 308 00:11:47,520 --> 00:11:53,280 example this number up here is the 309 00:11:50,610 --> 00:11:55,290 cryptographic hash function sha-1 apply 310 00:11:53,280 --> 00:11:58,470 to the commit object that describes this 311 00:11:55,290 --> 00:12:00,360 particular commit so does anybody know 312 00:11:58,470 --> 00:12:02,370 why git uses a cryptographic hash 313 00:12:00,360 --> 00:12:03,480 function here as opposed to so you might 314 00:12:02,370 --> 00:12:04,830 have heard in your other computer 315 00:12:03,480 --> 00:12:06,570 science classes like say your 316 00:12:04,830 --> 00:12:08,850 introductory algorithms class there are 317 00:12:06,570 --> 00:12:11,010 things called hash functions without the 318 00:12:08,850 --> 00:12:13,740 word cryptographic appended in front of 319 00:12:11,010 --> 00:12:16,080 them and they have similar properties 320 00:12:13,740 --> 00:12:18,570 that they turn a variable sized input 321 00:12:16,080 --> 00:12:21,300 into some fixed size output but they 322 00:12:18,570 --> 00:12:23,400 don't quite have these properties where 323 00:12:21,300 --> 00:12:24,810 it's hard to find an input that produces 324 00:12:23,400 --> 00:12:26,820 a particular output or things like that 325 00:12:24,810 --> 00:12:29,010 it's a kind of weaker definition than 326 00:12:26,820 --> 00:12:29,880 this so why is it that in get we care 327 00:12:29,010 --> 00:12:32,100 about having a cryptographic hash 328 00:12:29,880 --> 00:12:33,810 function as opposed to just a regular 329 00:12:32,100 --> 00:12:36,080 old hash function does anybody have any 330 00:12:33,810 --> 00:12:36,080 ideas 331 00:12:45,390 --> 00:12:52,180 yeah that's that's basically it that we 332 00:12:49,180 --> 00:12:54,010 don't want to have kind of conflicts in 333 00:12:52,180 --> 00:12:55,660 the output from this hash function like 334 00:12:54,010 --> 00:12:57,700 every commit is identified by a hash 335 00:12:55,660 --> 00:12:59,770 function every file is identified by the 336 00:12:57,700 --> 00:13:01,000 hash of that file if it were ever the 337 00:12:59,770 --> 00:13:03,190 case that two different pieces of 338 00:13:01,000 --> 00:13:05,620 content in practice produce the same 339 00:13:03,190 --> 00:13:07,600 output that is if the function were not 340 00:13:05,620 --> 00:13:09,610 collision resistant that could be really 341 00:13:07,600 --> 00:13:12,010 problematic right because then you and I 342 00:13:09,610 --> 00:13:13,990 we could have to do to get repos that we 343 00:13:12,010 --> 00:13:15,760 think are the same we check out the same 344 00:13:13,990 --> 00:13:18,460 commit hash and we might end up with 345 00:13:15,760 --> 00:13:21,400 different files and this is concerning 346 00:13:18,460 --> 00:13:23,380 because git is used to track software a 347 00:13:21,400 --> 00:13:25,510 track development of software and it's 348 00:13:23,380 --> 00:13:28,180 also kind of involved in making sure 349 00:13:25,510 --> 00:13:29,440 that the right people are authoring the 350 00:13:28,180 --> 00:13:30,700 software nothing funny has happened in 351 00:13:29,440 --> 00:13:32,620 the process for example there all these 352 00:13:30,700 --> 00:13:34,540 open source projects like the Linux 353 00:13:32,620 --> 00:13:37,150 kernel where development is done using 354 00:13:34,540 --> 00:13:39,610 git it would be really bad if some 355 00:13:37,150 --> 00:13:41,290 contributor to get could say edit some 356 00:13:39,610 --> 00:13:42,970 file and propose some change that looks 357 00:13:41,290 --> 00:13:44,980 pretty benign like oh let me go and 358 00:13:42,970 --> 00:13:46,720 improve this part of Linux submit that 359 00:13:44,980 --> 00:13:48,970 change request to the Linux developers 360 00:13:46,720 --> 00:13:52,120 and then in practice actually supply a 361 00:13:48,970 --> 00:13:54,220 git repository that has the same commit 362 00:13:52,120 --> 00:13:55,570 hash and whatnot but actually the file 363 00:13:54,220 --> 00:13:59,080 contents are different there's something 364 00:13:55,570 --> 00:14:00,730 malicious so git actually relies on this 365 00:13:59,080 --> 00:14:03,190 sha-1 function being a cryptographic 366 00:14:00,730 --> 00:14:10,150 hash function in order to achieve 367 00:14:03,190 --> 00:14:11,680 security any questions about that and 368 00:14:10,150 --> 00:14:13,810 some other interesting applications of 369 00:14:11,680 --> 00:14:15,670 hash functions so as we saw hash 370 00:14:13,810 --> 00:14:17,950 functions turn big inputs into small 371 00:14:15,670 --> 00:14:19,750 outputs and in a way because the hash 372 00:14:17,950 --> 00:14:21,670 function is collision resistant the 373 00:14:19,750 --> 00:14:24,430 output can be used to kind of attest to 374 00:14:21,670 --> 00:14:27,040 or identify the input and so you can 375 00:14:24,430 --> 00:14:30,130 think of a hash as a short summary of a 376 00:14:27,040 --> 00:14:32,020 file for example in this directory of a 377 00:14:30,130 --> 00:14:34,780 bunch of files and I can compute the 378 00:14:32,020 --> 00:14:37,630 sha-1 sum of some file in this directory 379 00:14:34,780 --> 00:14:40,660 and this is the sha-1 algorithm applied 380 00:14:37,630 --> 00:14:41,770 to this readme MD file and what's 381 00:14:40,660 --> 00:14:43,810 interesting is that it is 382 00:14:41,770 --> 00:14:44,800 computationally hard or like impossible 383 00:14:43,810 --> 00:14:47,650 you can kind of think of it as 384 00:14:44,800 --> 00:14:50,230 impossible to find any other file so a 385 00:14:47,650 --> 00:14:52,870 different file that has the same hash 386 00:14:50,230 --> 00:14:55,030 output and one scenario in which this is 387 00:14:52,870 --> 00:14:56,090 useful is when you download files from 388 00:14:55,030 --> 00:14:59,220 the internet 389 00:14:56,090 --> 00:15:01,410 for example there are lots of Linux 390 00:14:59,220 --> 00:15:03,630 distributions that distribute large CD 391 00:15:01,410 --> 00:15:05,250 or DVD images from their website like I 392 00:15:03,630 --> 00:15:07,860 can go to Debian org and download the 393 00:15:05,250 --> 00:15:09,510 latest version of Debian the thing is 394 00:15:07,860 --> 00:15:10,950 that hosting those files can be 395 00:15:09,510 --> 00:15:12,420 expensive and so a lot of people are 396 00:15:10,950 --> 00:15:14,490 nice enough to host mirrors of these 397 00:15:12,420 --> 00:15:17,520 files so instead of downloading Debian 398 00:15:14,490 --> 00:15:20,310 from Debian org I can go to one of many 399 00:15:17,520 --> 00:15:21,750 other sites and download what are 400 00:15:20,310 --> 00:15:23,580 supposed to be the same files that are 401 00:15:21,750 --> 00:15:25,490 hosted at Debian org but how do I know 402 00:15:23,580 --> 00:15:28,920 that I actually got the correct file 403 00:15:25,490 --> 00:15:30,780 like what if I set up a malicious mirror 404 00:15:28,920 --> 00:15:33,180 and you go to like Anisha is evil Debian 405 00:15:30,780 --> 00:15:35,190 website calm and then try to download 406 00:15:33,180 --> 00:15:37,290 Debian turns out that your Linux 407 00:15:35,190 --> 00:15:38,880 installation is backdoored well one 408 00:15:37,290 --> 00:15:40,620 thing you could do is download a copy 409 00:15:38,880 --> 00:15:42,060 from the original double-unit website 410 00:15:40,620 --> 00:15:43,440 and then download my version and compare 411 00:15:42,060 --> 00:15:44,550 them but that kind of defeats the 412 00:15:43,440 --> 00:15:46,110 purpose right because we want to avoid 413 00:15:44,550 --> 00:15:47,580 downloading things from Debian org 414 00:15:46,110 --> 00:15:49,080 because hosting these files is expensive 415 00:15:47,580 --> 00:15:50,490 and we want all these different people 416 00:15:49,080 --> 00:15:53,660 to be able to mirror copies of the files 417 00:15:50,490 --> 00:15:55,770 elsewhere so does anybody see how 418 00:15:53,660 --> 00:15:57,630 cryptographic hash functions could be 419 00:15:55,770 --> 00:15:59,190 useful to solve this problem that I want 420 00:15:57,630 --> 00:16:02,970 to download a file from an untrusted 421 00:15:59,190 --> 00:16:04,950 source but and not from like the trusted 422 00:16:02,970 --> 00:16:06,210 source itself but maybe I can get some 423 00:16:04,950 --> 00:16:07,920 small piece of information from this 424 00:16:06,210 --> 00:16:09,660 trusted source in order to know whether 425 00:16:07,920 --> 00:16:11,460 the file I downloaded from the untrusted 426 00:16:09,660 --> 00:16:18,060 source is the thing I was supposed to 427 00:16:11,460 --> 00:16:19,200 get yes like it's basically just a 428 00:16:18,060 --> 00:16:20,610 straightforward application of 429 00:16:19,200 --> 00:16:22,980 cryptographic hash functions 430 00:16:20,610 --> 00:16:25,620 so what Debian org can do is they can 431 00:16:22,980 --> 00:16:27,870 produce their kind of correct ISO file 432 00:16:25,620 --> 00:16:29,730 or whatever they want and instead of 433 00:16:27,870 --> 00:16:32,550 publishing the file itself on their 434 00:16:29,730 --> 00:16:35,190 website they can publish a hash of that 435 00:16:32,550 --> 00:16:37,140 file so compared to the file itself 436 00:16:35,190 --> 00:16:39,390 which may be many gigabytes this is only 437 00:16:37,140 --> 00:16:41,190 like in this particular case 160 bits of 438 00:16:39,390 --> 00:16:43,620 data right so very cheap to host and 439 00:16:41,190 --> 00:16:46,260 then what I can do is a user is I can 440 00:16:43,620 --> 00:16:48,090 download that file from any random 441 00:16:46,260 --> 00:16:50,460 website it could be an untrusted website 442 00:16:48,090 --> 00:16:53,780 and after I download I just double check 443 00:16:50,460 --> 00:16:56,040 the sha-1 hash and if the hash matches 444 00:16:53,780 --> 00:16:57,330 then I know that I have the right file 445 00:16:56,040 --> 00:17:00,300 because it's computationally infeasible 446 00:16:57,330 --> 00:17:01,980 for somebody to give me some different 447 00:17:00,300 --> 00:17:04,260 file that happens to have the same hash 448 00:17:01,980 --> 00:17:07,230 because hash functions are collision 449 00:17:04,260 --> 00:17:10,130 resistant so any questions about that 450 00:17:07,230 --> 00:17:10,130 application yeah 451 00:17:18,170 --> 00:17:22,350 yeah so that's a good question like why 452 00:17:20,490 --> 00:17:24,120 do you need different people to host the 453 00:17:22,350 --> 00:17:26,100 information like wouldn't it be equally 454 00:17:24,120 --> 00:17:27,000 expensive for everybody so the answer is 455 00:17:26,100 --> 00:17:29,010 that question is a little bit 456 00:17:27,000 --> 00:17:30,840 complicated but like here's that here's 457 00:17:29,010 --> 00:17:32,970 a partial answer one thing is that 458 00:17:30,840 --> 00:17:34,500 downloading files from a server is 459 00:17:32,970 --> 00:17:36,360 affected by how far away the server is 460 00:17:34,500 --> 00:17:38,730 from you so for example if the servers 461 00:17:36,360 --> 00:17:40,830 in Massachusetts and you're in say China 462 00:17:38,730 --> 00:17:42,030 like you have to kind of make a big 463 00:17:40,830 --> 00:17:43,830 round trip across the internet and that 464 00:17:42,030 --> 00:17:46,200 may be expensive for a number of reasons 465 00:17:43,830 --> 00:17:47,700 like the latency is high and the traffic 466 00:17:46,200 --> 00:17:48,990 traffic needs to go through kind of lots 467 00:17:47,700 --> 00:17:50,610 of different wires to make its way all 468 00:17:48,990 --> 00:17:52,680 the way to where you are and so one 469 00:17:50,610 --> 00:17:54,180 thing that these websites do is that 470 00:17:52,680 --> 00:17:55,710 they distribute their content to servers 471 00:17:54,180 --> 00:17:57,060 that are all over the world and then as 472 00:17:55,710 --> 00:17:58,500 a user you download from the server 473 00:17:57,060 --> 00:18:00,960 that's closest to you like for example 474 00:17:58,500 --> 00:18:02,490 mit maintains a Debian package 475 00:18:00,960 --> 00:18:04,560 repository and like kind of mirrors all 476 00:18:02,490 --> 00:18:07,230 the Debbie and stuff so if you're a 477 00:18:04,560 --> 00:18:10,260 Debian user at MIT you can use the MIT 478 00:18:07,230 --> 00:18:12,060 copy of everything and then you can kind 479 00:18:10,260 --> 00:18:14,100 of access it over our fast local network 480 00:18:12,060 --> 00:18:15,690 and that traffic never needs to go to 481 00:18:14,100 --> 00:18:18,750 the outside Internet at all so it's very 482 00:18:15,690 --> 00:18:23,130 fast that's a good question any other 483 00:18:18,750 --> 00:18:24,180 questions ok and then one final kind of 484 00:18:23,130 --> 00:18:25,530 interesting application of hash 485 00:18:24,180 --> 00:18:28,650 functions is something called a 486 00:18:25,530 --> 00:18:30,120 commitment scheme so I want to play a 487 00:18:28,650 --> 00:18:31,590 game and I need a volunteer for this so 488 00:18:30,120 --> 00:18:32,850 you don't actually need to get up from 489 00:18:31,590 --> 00:18:34,830 your seat or anything I was need you to 490 00:18:32,850 --> 00:18:36,450 talk with me so any volunteers raise 491 00:18:34,830 --> 00:18:37,730 your hand yeah okay yeah what's your 492 00:18:36,450 --> 00:18:40,650 name 493 00:18:37,730 --> 00:18:42,360 Abdul Aziz okay great so Abdul Aziz 494 00:18:40,650 --> 00:18:44,250 we're going to play a game we're going 495 00:18:42,360 --> 00:18:46,140 to play a game where I'm going to flip a 496 00:18:44,250 --> 00:18:48,180 coin and then you're gonna call heads or 497 00:18:46,140 --> 00:18:49,860 tails and if you call it right you win 498 00:18:48,180 --> 00:18:52,140 and if you call it wrong you lose and 499 00:18:49,860 --> 00:18:56,220 there are no stakes for this game but 500 00:18:52,140 --> 00:18:57,660 just the pride of winning so sadly I 501 00:18:56,220 --> 00:18:58,950 checked my wallet and all I have is 502 00:18:57,660 --> 00:19:00,570 dollar bills I don't have any coins with 503 00:18:58,950 --> 00:19:02,190 me so instead I'm going to just flip the 504 00:19:00,570 --> 00:19:04,950 coin in my head all right 505 00:19:02,190 --> 00:19:07,740 so okay I flip the coin call heads or 506 00:19:04,950 --> 00:19:13,680 tails sorry you lost it was heads I 507 00:19:07,740 --> 00:19:15,420 don't I play again yeah I can cheat 508 00:19:13,680 --> 00:19:17,280 right I can just see what you say and 509 00:19:15,420 --> 00:19:19,950 say the opposite thing so let's try 510 00:19:17,280 --> 00:19:22,260 fixing this game how about you call 511 00:19:19,950 --> 00:19:26,250 heads or tails after I say what the 512 00:19:22,260 --> 00:19:27,510 flip result was okay yeah so if I say oh 513 00:19:26,250 --> 00:19:35,220 the result is tails what are you gonna 514 00:19:27,510 --> 00:19:39,030 say are you call tails yeah so is it 515 00:19:35,220 --> 00:19:40,890 possible to play this guess what guess 516 00:19:39,030 --> 00:19:43,020 what the coin flip result is game in a 517 00:19:40,890 --> 00:19:44,550 fair way without having a physical coin 518 00:19:43,020 --> 00:19:45,660 that we share like because I can't 519 00:19:44,550 --> 00:19:47,160 really manipulate your physical reality 520 00:19:45,660 --> 00:19:48,570 if I flip a coin in front of you 521 00:19:47,160 --> 00:19:50,820 probably trust that it's okay right 522 00:19:48,570 --> 00:19:51,990 so it turns out that hash functions give 523 00:19:50,820 --> 00:19:54,720 us a kind of cool way to solve this 524 00:19:51,990 --> 00:19:57,990 problem to through a idea called a 525 00:19:54,720 --> 00:19:59,370 commitment scheme so I can say they're 526 00:19:57,990 --> 00:20:02,670 like here's the construction of the 527 00:19:59,370 --> 00:20:05,250 solution I can pick heads or tails and 528 00:20:02,670 --> 00:20:09,390 I'm actually going to pick a big random 529 00:20:05,250 --> 00:20:14,370 number say like this number here and 530 00:20:09,390 --> 00:20:16,830 what I can do is compute the sha-1 sum 531 00:20:14,370 --> 00:20:17,910 of this number at this moment you 532 00:20:16,830 --> 00:20:19,980 haven't seen this number yet I'm just 533 00:20:17,910 --> 00:20:22,680 doing all this in my head and then what 534 00:20:19,980 --> 00:20:25,170 I do is I tell you okay I flipped a coin 535 00:20:22,680 --> 00:20:26,340 and I'm not going to tell you what the 536 00:20:25,170 --> 00:20:28,560 result is just yet because you haven't 537 00:20:26,340 --> 00:20:29,850 called heads or tails but I'll tell you 538 00:20:28,560 --> 00:20:31,770 what the shell wants some of the result 539 00:20:29,850 --> 00:20:34,230 is here you go and I tell you this value 540 00:20:31,770 --> 00:20:36,000 now after this you can call heads or 541 00:20:34,230 --> 00:20:38,310 tails so what do you say like say say 542 00:20:36,000 --> 00:20:40,080 heads afterwards what I can do is I can 543 00:20:38,310 --> 00:20:42,000 reveal to you what my input to this 544 00:20:40,080 --> 00:20:43,500 function was and then you can 545 00:20:42,000 --> 00:20:45,480 cross-check this right you can compute 546 00:20:43,500 --> 00:20:47,040 the sha-1 sum on the input to verify 547 00:20:45,480 --> 00:20:48,930 that the output is what I said it was 548 00:20:47,040 --> 00:20:50,760 earlier and then we can have some way of 549 00:20:48,930 --> 00:20:52,500 mapping these numbers to heads or tails 550 00:20:50,760 --> 00:20:54,360 so I might have agreed upon beforehand 551 00:20:52,500 --> 00:20:56,640 that even numbers are heads and odd 552 00:20:54,360 --> 00:20:58,080 numbers or tails and so this is a way of 553 00:20:56,640 --> 00:21:01,200 fixing that game so we can actually play 554 00:20:58,080 --> 00:21:03,600 this game in in our heads right I can 555 00:21:01,200 --> 00:21:05,970 pick a value but not reveal that value 556 00:21:03,600 --> 00:21:07,350 to you but I can commit to the value so 557 00:21:05,970 --> 00:21:09,390 this is a kind of binding commitment 558 00:21:07,350 --> 00:21:11,310 scheme that I can't change my mind after 559 00:21:09,390 --> 00:21:14,220 I've told you this but it doesn't reveal 560 00:21:11,310 --> 00:21:15,480 the original value to you and so this is 561 00:21:14,220 --> 00:21:17,520 one other neat application of 562 00:21:15,480 --> 00:21:18,330 cryptographic hash functions so any 563 00:21:17,520 --> 00:21:24,030 questions about this particular 564 00:21:18,330 --> 00:21:26,790 construction okay great so moving on to 565 00:21:24,030 --> 00:21:29,710 the next topic we're going to talk about 566 00:21:26,790 --> 00:21:35,369 key derivation functions 567 00:21:29,710 --> 00:21:35,369 [Applause] 568 00:21:38,650 --> 00:21:46,850 often abbreviate it as KDF so this is a 569 00:21:45,350 --> 00:21:49,580 concept that's very similar to hash 570 00:21:46,850 --> 00:21:51,710 functions except it has kind of one 571 00:21:49,580 --> 00:21:54,740 extra property that it is slow to 572 00:21:51,710 --> 00:21:56,320 compute for example there's a hash 573 00:21:54,740 --> 00:22:08,240 function of key derivation function 574 00:21:56,320 --> 00:22:12,110 known as P pbkdf2 pbkdf2 password-based 575 00:22:08,240 --> 00:22:14,180 key derivation function that has a kind 576 00:22:12,110 --> 00:22:15,410 of similar form as these hash functions 577 00:22:14,180 --> 00:22:16,850 we were talking about here that they 578 00:22:15,410 --> 00:22:18,380 take in some variable length input in 579 00:22:16,850 --> 00:22:19,280 pretty so fixed length output but 580 00:22:18,380 --> 00:22:20,450 they're meant to be used for one 581 00:22:19,280 --> 00:22:22,370 particular purpose 582 00:22:20,450 --> 00:22:24,740 the purpose is generally to use the 583 00:22:22,370 --> 00:22:26,510 fixed length output as a key in another 584 00:22:24,740 --> 00:22:28,340 cryptographic algorithm and we'll talk 585 00:22:26,510 --> 00:22:31,310 about those algorithms like what use the 586 00:22:28,340 --> 00:22:32,900 output of this thing for in a moment but 587 00:22:31,310 --> 00:22:36,380 a one property of these things is that 588 00:22:32,900 --> 00:22:39,410 they're slow does anybody have any idea 589 00:22:36,380 --> 00:22:40,700 why you'd want an algorithm to be slow 590 00:22:39,410 --> 00:22:42,920 like normally we want algorithms to be 591 00:22:40,700 --> 00:22:46,000 fast right so why would we want an 592 00:22:42,920 --> 00:22:46,000 algorithm to be slow yes 593 00:22:54,430 --> 00:22:59,630 yeah that's exactly it so I'll repeat so 594 00:22:57,860 --> 00:23:01,940 it goes into the microphone the reason 595 00:22:59,630 --> 00:23:04,130 you want these to be slow is when you're 596 00:23:01,940 --> 00:23:05,840 actually using it for something like 597 00:23:04,130 --> 00:23:07,520 password authentication where you have 598 00:23:05,840 --> 00:23:08,810 the hash of a password saved and then 599 00:23:07,520 --> 00:23:10,100 somebody inputs the password you want to 600 00:23:08,810 --> 00:23:12,440 know if that corresponds to the hash 601 00:23:10,100 --> 00:23:14,510 it's ok if it's slow because you're only 602 00:23:12,440 --> 00:23:15,680 doing this check kind of once but the 603 00:23:14,510 --> 00:23:17,090 other scenario in which you're going to 604 00:23:15,680 --> 00:23:18,500 be using this function is when 605 00:23:17,090 --> 00:23:20,540 somebody's trying to brute-force a 606 00:23:18,500 --> 00:23:22,430 password say a website has their 607 00:23:20,540 --> 00:23:23,360 password database stolen and somebody's 608 00:23:22,430 --> 00:23:25,430 going through all the accounts I'm 609 00:23:23,360 --> 00:23:27,530 trying to break all the passwords well 610 00:23:25,430 --> 00:23:28,610 in that case you want this to be slow 611 00:23:27,530 --> 00:23:29,990 because someone's gonna be doing this 612 00:23:28,610 --> 00:23:31,640 like millions and millions of times and 613 00:23:29,990 --> 00:23:33,260 you can slow down the attacker a lot by 614 00:23:31,640 --> 00:23:35,030 making this function slow and so it's 615 00:23:33,260 --> 00:23:36,500 fine if this takes you like one second 616 00:23:35,030 --> 00:23:38,750 upon logging in to compute this function 617 00:23:36,500 --> 00:23:40,100 but when your brute forcing it we don't 618 00:23:38,750 --> 00:23:41,960 go to a thousand guesses a second like 619 00:23:40,100 --> 00:23:46,220 in that xkcd comic we can slow it down a 620 00:23:41,960 --> 00:23:47,860 little bit so what is the output of key 621 00:23:46,220 --> 00:23:50,060 derivation functions actually used for 622 00:23:47,860 --> 00:23:52,490 well the next topic we're going to talk 623 00:23:50,060 --> 00:23:53,570 about probably like one of the most 624 00:23:52,490 --> 00:23:55,430 classic things when you think about 625 00:23:53,570 --> 00:24:00,470 cryptography is encryption and 626 00:23:55,430 --> 00:24:17,300 decryption the next topic is symmetric 627 00:24:00,470 --> 00:24:18,410 key cryptography and like the rest of 628 00:24:17,300 --> 00:24:19,700 this lecture we're not going to talk 629 00:24:18,410 --> 00:24:21,470 about how you implement these we're 630 00:24:19,700 --> 00:24:23,900 going to talk about the API for a 631 00:24:21,470 --> 00:24:24,740 symmetric key symmetric key crypto like 632 00:24:23,900 --> 00:24:28,070 how it's used 633 00:24:24,740 --> 00:24:30,530 so symmetric key cryptosystems have a 634 00:24:28,070 --> 00:24:32,930 couple different functions they have a 635 00:24:30,530 --> 00:24:35,240 key generation function which is a 636 00:24:32,930 --> 00:24:38,570 randomized function that produces a 637 00:24:35,240 --> 00:24:42,820 thing we call the key and then they have 638 00:24:38,570 --> 00:24:42,820 a pair of functions encrypt and decrypt 639 00:24:45,790 --> 00:24:52,940 and encrypt take as input something we 640 00:24:49,130 --> 00:24:54,620 refer to as the plaintext and this is 641 00:24:52,940 --> 00:24:57,710 just some sequence of bytes some data 642 00:24:54,620 --> 00:24:59,420 and it takes in a key so something that 643 00:24:57,710 --> 00:25:03,190 came as an output of this key generation 644 00:24:59,420 --> 00:25:03,190 function and produces 645 00:25:04,140 --> 00:25:08,730 what we call the ciphertext and then 646 00:25:06,750 --> 00:25:14,760 decrypt does the opposite of this so it 647 00:25:08,730 --> 00:25:23,130 takes the ciphertext along with the key 648 00:25:14,760 --> 00:25:24,929 and produces the plaintext and this 649 00:25:23,130 --> 00:25:29,429 triple of functions has a couple 650 00:25:24,929 --> 00:25:31,590 properties one is that like one one team 651 00:25:29,429 --> 00:25:33,450 you might expect is that this thing 652 00:25:31,590 --> 00:25:36,290 doesn't really tell you all that much 653 00:25:33,450 --> 00:25:44,700 about this input to the encryption so 654 00:25:36,290 --> 00:25:46,559 property number one is given the 655 00:25:44,700 --> 00:26:02,280 ciphertext you can't figure out the 656 00:25:46,559 --> 00:26:03,299 plaintext without the key and the other 657 00:26:02,280 --> 00:26:12,210 property is kind of the obvious 658 00:26:03,299 --> 00:26:14,460 correctness property that if you take 659 00:26:12,210 --> 00:26:16,710 something and you encrypt it some 660 00:26:14,460 --> 00:26:19,559 message M with a key K and then you 661 00:26:16,710 --> 00:26:24,470 decrypt that ciphertext using the same 662 00:26:19,559 --> 00:26:24,470 key that gives you back the same message 663 00:26:24,500 --> 00:26:30,360 this is the kind of obvious correctness 664 00:26:27,090 --> 00:26:32,280 property so does this description make 665 00:26:30,360 --> 00:26:33,990 sense does it fit your kind of intuitive 666 00:26:32,280 --> 00:26:36,000 understanding of taking some piece of 667 00:26:33,990 --> 00:26:37,679 data and obscuring it so you can't 668 00:26:36,000 --> 00:26:40,020 really tell anything about the original 669 00:26:37,679 --> 00:26:42,510 input but then taking that obscured 670 00:26:40,020 --> 00:26:44,760 result and then passing it there's some 671 00:26:42,510 --> 00:26:50,190 decryption function given that key to 672 00:26:44,760 --> 00:26:51,990 retrieve the original input and this 673 00:26:50,190 --> 00:26:53,130 this isn't really a rigorous definition 674 00:26:51,990 --> 00:26:55,440 of what it means for something to be 675 00:26:53,130 --> 00:27:01,799 secure but it's a good enough intuitive 676 00:26:55,440 --> 00:27:03,179 definition that we can work with it so 677 00:27:01,799 --> 00:27:08,220 any questions about that description 678 00:27:03,179 --> 00:27:09,780 there so where can key cryptography be 679 00:27:08,220 --> 00:27:11,610 useful we'll talk about a whole bunch of 680 00:27:09,780 --> 00:27:13,110 examples later in this lecture but one 681 00:27:11,610 --> 00:27:15,150 example we'll talk about right now is 682 00:27:13,110 --> 00:27:16,719 encrypting files for storage and 683 00:27:15,150 --> 00:27:20,450 untrusted cloud service 684 00:27:16,719 --> 00:27:23,539 so consider say something like Dropbox 685 00:27:20,450 --> 00:27:25,609 or Google Drive or things like that when 686 00:27:23,539 --> 00:27:27,649 you upload files there you're trusting 687 00:27:25,609 --> 00:27:30,200 the service to not look at your files or 688 00:27:27,649 --> 00:27:32,269 do anything malicious with them these 689 00:27:30,200 --> 00:27:34,129 services like at least the ones I named 690 00:27:32,269 --> 00:27:36,379 are not intend encrypted or anything 691 00:27:34,129 --> 00:27:38,119 like that like in theory any employee 692 00:27:36,379 --> 00:27:39,919 those companies could look at your files 693 00:27:38,119 --> 00:27:41,539 now of course these companies have lots 694 00:27:39,919 --> 00:27:43,219 of policies and technical controls in 695 00:27:41,539 --> 00:27:45,229 place for making sure that that sort of 696 00:27:43,219 --> 00:27:46,399 thing doesn't happen but that doesn't 697 00:27:45,229 --> 00:27:48,709 mean that it's not technically possible 698 00:27:46,399 --> 00:27:50,119 and so one thing you might want to do if 699 00:27:48,709 --> 00:27:52,639 you don't want to trust these cloud 700 00:27:50,119 --> 00:27:53,899 services to not peek at your data not do 701 00:27:52,639 --> 00:27:55,339 like machine learning over them or do 702 00:27:53,899 --> 00:27:57,259 other sorts of things that you wouldn't 703 00:27:55,339 --> 00:27:59,359 really want is you can just take your 704 00:27:57,259 --> 00:28:04,399 files and encrypt them before uploading 705 00:27:59,359 --> 00:28:05,779 them to these these web services so does 706 00:28:04,399 --> 00:28:07,039 that idea make sense that I can take my 707 00:28:05,779 --> 00:28:08,839 file like Center pictures or whatever 708 00:28:07,039 --> 00:28:10,039 pass it through an encryption function 709 00:28:08,839 --> 00:28:11,269 and peruse the cipher text and then 710 00:28:10,039 --> 00:28:13,190 place that cipher text on the web 711 00:28:11,269 --> 00:28:14,779 service safe for backup purposes or 712 00:28:13,190 --> 00:28:17,389 whatever and if I ever need that I can 713 00:28:14,779 --> 00:28:18,979 retrieve the cipher text then use my key 714 00:28:17,389 --> 00:28:20,269 to decrypt it back into the plaintext 715 00:28:18,979 --> 00:28:22,190 and they can use a result for doing 716 00:28:20,269 --> 00:28:29,029 whatever I need to do does that make 717 00:28:22,190 --> 00:28:30,349 sense yeah yeah so that's that's a good 718 00:28:29,029 --> 00:28:31,669 question the question is couldn't 719 00:28:30,349 --> 00:28:34,879 anybody else run it through the same 720 00:28:31,669 --> 00:28:36,109 encryption program one detail maybe I 721 00:28:34,879 --> 00:28:38,419 should have explained in a little bit 722 00:28:36,109 --> 00:28:46,399 more detail is this key generation 723 00:28:38,419 --> 00:28:48,139 function is randomized and this key has 724 00:28:46,399 --> 00:28:50,539 high entropy so going back to that topic 725 00:28:48,139 --> 00:28:55,129 we talked about earlier so like an 726 00:28:50,539 --> 00:28:58,249 example is we might have aes 256 this is 727 00:28:55,129 --> 00:29:01,279 one particular symmetric cipher and this 728 00:28:58,249 --> 00:29:03,589 as the name might indicate has 256 bits 729 00:29:01,279 --> 00:29:05,570 of entropy in the key and so that means 730 00:29:03,589 --> 00:29:07,190 that as long as the attacker like 731 00:29:05,570 --> 00:29:08,959 whoever downloads the cipher text from 732 00:29:07,190 --> 00:29:10,789 the web service doesn't know your key 733 00:29:08,959 --> 00:29:11,209 unless they have some better attack in 734 00:29:10,789 --> 00:29:13,219 place 735 00:29:11,209 --> 00:29:14,629 they'll have to try all the different 736 00:29:13,219 --> 00:29:16,940 possible keys and if they're two to the 737 00:29:14,629 --> 00:29:19,639 256 keys that's too many keys to try in 738 00:29:16,940 --> 00:29:21,289 a reasonable amount of time does that 739 00:29:19,639 --> 00:29:26,109 answer the question okay any other 740 00:29:21,289 --> 00:29:26,109 questions yeah 741 00:29:35,009 --> 00:29:38,679 that's an excellent question and that 742 00:29:37,089 --> 00:29:40,119 leads into what I was going to talk 743 00:29:38,679 --> 00:29:43,509 about next so thanks for that question 744 00:29:40,119 --> 00:29:45,099 so as you point out like if I lose my 745 00:29:43,509 --> 00:29:46,659 key I'm kind of stuck right 746 00:29:45,099 --> 00:29:47,949 I need my key to decrypt that's kind of 747 00:29:46,659 --> 00:29:49,269 the point of this thing like if I didn't 748 00:29:47,949 --> 00:29:50,589 need my key to decrypt then this 749 00:29:49,269 --> 00:29:53,139 wouldn't be a very good crypto system 750 00:29:50,589 --> 00:29:54,909 and so I can combine this idea of 751 00:29:53,139 --> 00:29:56,199 symmetric key cryptography with the 752 00:29:54,909 --> 00:29:58,359 topic we just talked about key 753 00:29:56,199 --> 00:30:00,039 derivation functions so instead of 754 00:29:58,359 --> 00:30:01,239 having some key that's randomly 755 00:30:00,039 --> 00:30:03,459 generated with my key generation 756 00:30:01,239 --> 00:30:04,239 function say sampling entropy from 757 00:30:03,459 --> 00:30:11,289 somewhere on my machine 758 00:30:04,239 --> 00:30:13,419 I can have a passphrase and pass it 759 00:30:11,289 --> 00:30:17,679 through my key derivation function box 760 00:30:13,419 --> 00:30:23,259 and this gives me my key and then I can 761 00:30:17,679 --> 00:30:29,259 take my plaintext and combine it with my 762 00:30:23,259 --> 00:30:35,259 key in my encrypt function and this 763 00:30:29,259 --> 00:30:37,359 produces my ciphertext and I store this 764 00:30:35,259 --> 00:30:39,459 cipher text on the web service but now I 765 00:30:37,359 --> 00:30:41,609 don't need to save this key instead I 766 00:30:39,459 --> 00:30:43,839 can just remember in my pass phrase and 767 00:30:41,609 --> 00:30:45,359 whenever I need my key I can reconstruct 768 00:30:43,839 --> 00:30:48,359 it from the key derivation function 769 00:30:45,359 --> 00:30:48,359 question 770 00:30:56,680 --> 00:30:59,930 yeah so that's a good question the 771 00:30:58,700 --> 00:31:02,390 question is is the key derivation 772 00:30:59,930 --> 00:31:05,000 function slow enough to prevent 773 00:31:02,390 --> 00:31:06,500 brute-force guessing and the answer is 774 00:31:05,000 --> 00:31:08,600 it depends on how long your passphrase 775 00:31:06,500 --> 00:31:11,060 is so for example if your passphrase is 776 00:31:08,600 --> 00:31:12,890 like the string password is probably 777 00:31:11,060 --> 00:31:14,360 gonna get broken very quickly but as 778 00:31:12,890 --> 00:31:16,460 long as there's enough entropy in your 779 00:31:14,360 --> 00:31:17,930 passphrase this is good enough so like 780 00:31:16,460 --> 00:31:19,370 if I was uploading something to Dropbox 781 00:31:17,930 --> 00:31:21,770 and I really want it to stay secret I 782 00:31:19,370 --> 00:31:23,540 think like a 64-bit passphrase really a 783 00:31:21,770 --> 00:31:24,590 passphrase with 64 bits of entropy it 784 00:31:23,540 --> 00:31:28,130 would be more than enough in that 785 00:31:24,590 --> 00:31:30,200 scenario for example and just a quick 786 00:31:28,130 --> 00:31:31,880 demo of this so there are tools to make 787 00:31:30,200 --> 00:31:34,310 this really easy to do this is actually 788 00:31:31,880 --> 00:31:37,100 one of the exercises but we can take a 789 00:31:34,310 --> 00:31:39,500 tool for example called open SSL and use 790 00:31:37,100 --> 00:31:42,050 it to apply a symmetric cipher to some 791 00:31:39,500 --> 00:31:44,060 file so I had my readme text here for 792 00:31:42,050 --> 00:31:47,090 example readme MD it has a bunch of 793 00:31:44,060 --> 00:31:50,680 stuff in it and I can do open SSL AES 794 00:31:47,090 --> 00:31:54,140 256 cbc this is the name of a particular 795 00:31:50,680 --> 00:31:57,650 symmetric cipher and i can say that i 796 00:31:54,140 --> 00:32:01,910 want to apply this to read me md and 797 00:31:57,650 --> 00:32:03,560 produce readme dot and MD let's give it 798 00:32:01,910 --> 00:32:05,090 some name and then it's asking you for a 799 00:32:03,560 --> 00:32:06,470 password so by default this works in 800 00:32:05,090 --> 00:32:08,300 this mode where I provide a passphrase 801 00:32:06,470 --> 00:32:10,250 is run through a KDF to produce a key 802 00:32:08,300 --> 00:32:12,410 and that's used for encryption so I'll 803 00:32:10,250 --> 00:32:15,320 type in some password type it in again 804 00:32:12,410 --> 00:32:19,310 and now I produce this readme MD file 805 00:32:15,320 --> 00:32:21,080 and if I look at this it kind of looks 806 00:32:19,310 --> 00:32:23,000 like garbage and that's at a high level 807 00:32:21,080 --> 00:32:24,890 the point of a symmetric cipher it 808 00:32:23,000 --> 00:32:26,450 produces some cipher text that should be 809 00:32:24,890 --> 00:32:29,690 kind of indistinguishable from random 810 00:32:26,450 --> 00:32:33,680 data and when I want to decrypt this I 811 00:32:29,690 --> 00:32:37,670 can run a similar command open SSL AES 812 00:32:33,680 --> 00:32:40,340 256 cbc dash D for decrypt take the 813 00:32:37,670 --> 00:32:44,230 input from readme tank done MD and I 814 00:32:40,340 --> 00:32:49,010 like do like readme dot read need 815 00:32:44,230 --> 00:32:53,930 decrypted MD as the output and I can 816 00:32:49,010 --> 00:32:55,850 compare these two files and the 817 00:32:53,930 --> 00:32:57,530 correctness property of symmetric 818 00:32:55,850 --> 00:32:59,120 cryptography tells me that this should 819 00:32:57,530 --> 00:33:01,070 be identical and this indeed is 820 00:32:59,120 --> 00:33:02,720 identical if I look at the return value 821 00:33:01,070 --> 00:33:04,920 compare return 0 so that means that are 822 00:33:02,720 --> 00:33:08,319 the same file 823 00:33:04,920 --> 00:33:08,319 [Applause] 824 00:33:08,960 --> 00:33:14,359 so any questions about symmetric key 825 00:33:11,549 --> 00:33:14,359 cryptography yeah 826 00:33:20,340 --> 00:33:26,039 so the this particular command did make 827 00:33:23,700 --> 00:33:29,100 a new file so it took us input readme MD 828 00:33:26,039 --> 00:33:31,289 and produces output this file so that is 829 00:33:29,100 --> 00:33:32,879 the encrypted version of the file it 830 00:33:31,289 --> 00:33:35,779 left the original untouched but then of 831 00:33:32,879 --> 00:33:47,639 course I could delete it if I wanted to 832 00:33:35,779 --> 00:33:48,599 yeah that's a good question this is 833 00:33:47,639 --> 00:33:50,190 something I wasn't gonna talk about in 834 00:33:48,599 --> 00:33:52,559 too much detail the question is I 835 00:33:50,190 --> 00:33:55,830 provided the salt argument here and 836 00:33:52,559 --> 00:33:58,489 where is that stored so the answer is 837 00:33:55,830 --> 00:34:01,859 that that is stored in this output here 838 00:33:58,489 --> 00:34:05,460 so this output format stores both the 839 00:34:01,859 --> 00:34:06,869 salt and the actual output ciphertext so 840 00:34:05,460 --> 00:34:13,319 can be used in the reconstruction and 841 00:34:06,869 --> 00:34:14,819 decrypt yeah that's correct it doesn't 842 00:34:13,319 --> 00:34:19,950 keep any database or anything this is 843 00:34:14,819 --> 00:34:23,190 fully self-contained yeah and as John 844 00:34:19,950 --> 00:34:24,869 says the salt is not the secret like the 845 00:34:23,190 --> 00:34:33,569 the passphrase is what is the secret 846 00:34:24,869 --> 00:34:36,839 thing here okay so let's go back to so 847 00:34:33,569 --> 00:34:39,809 the so the question is what is salt the 848 00:34:36,839 --> 00:34:42,089 idea of a cryptographic salt is probably 849 00:34:39,809 --> 00:34:47,790 best explained in the context of hash 850 00:34:42,089 --> 00:34:49,559 functions so one common application of 851 00:34:47,790 --> 00:34:51,299 hash functions is to store passwords in 852 00:34:49,559 --> 00:34:53,669 a password database if I have a website 853 00:34:51,299 --> 00:34:55,290 and I have logins for users like people 854 00:34:53,669 --> 00:34:57,329 log in with their username and password 855 00:34:55,290 --> 00:34:59,640 I don't actually want to store people's 856 00:34:57,329 --> 00:35:01,349 passwords in plain text just like as is 857 00:34:59,640 --> 00:35:06,440 does anybody know why I wouldn't want to 858 00:35:01,349 --> 00:35:08,280 do that yes 859 00:35:06,440 --> 00:35:10,350 exactly what if there was a breach and 860 00:35:08,280 --> 00:35:12,480 someone got all your data so it's really 861 00:35:10,350 --> 00:35:13,860 bad if you leak all your users passwords 862 00:35:12,480 --> 00:35:15,150 it's especially bad because a lot of 863 00:35:13,860 --> 00:35:17,130 people reuse their passwords across 864 00:35:15,150 --> 00:35:18,540 different sites so you'll see attackers 865 00:35:17,130 --> 00:35:20,250 break into one thing like there was big 866 00:35:18,540 --> 00:35:22,020 yahoo breach a while ago and they find 867 00:35:20,250 --> 00:35:23,730 all these usernames and passwords and 868 00:35:22,020 --> 00:35:25,560 then they go and try those same login 869 00:35:23,730 --> 00:35:27,270 credentials on Google and on Facebook 870 00:35:25,560 --> 00:35:30,030 and on YouTube and whatnot these people 871 00:35:27,270 --> 00:35:32,640 reuse passwords so it's bad to store 872 00:35:30,030 --> 00:35:33,750 plaintext passwords so one thing you 873 00:35:32,640 --> 00:35:35,400 should do is you should store hashed 874 00:35:33,750 --> 00:35:37,620 passwords with a hash function or 875 00:35:35,400 --> 00:35:39,260 ideally a password hashing function 876 00:35:37,620 --> 00:35:42,360 that's intentionally designed to be slow 877 00:35:39,260 --> 00:35:44,160 but one thing attackers one thing 878 00:35:42,360 --> 00:35:45,390 attacker started doing once they realize 879 00:35:44,160 --> 00:35:47,310 that people started storing hashed 880 00:35:45,390 --> 00:35:49,170 passwords is they built these things 881 00:35:47,310 --> 00:35:52,560 called rainbow tables what people did 882 00:35:49,170 --> 00:35:54,570 was they took a way of generating big 883 00:35:52,560 --> 00:35:56,700 password lists like the kind of model 884 00:35:54,570 --> 00:35:58,200 what passwords might look like say take 885 00:35:56,700 --> 00:36:00,510 all the dictionary words take all 886 00:35:58,200 --> 00:36:01,770 strings of like length from zero to 887 00:36:00,510 --> 00:36:03,840 eight and whatnot 888 00:36:01,770 --> 00:36:06,180 take all of those and then hash them and 889 00:36:03,840 --> 00:36:08,910 produce a big database mapping hashes 890 00:36:06,180 --> 00:36:10,530 back to their pre image and so given the 891 00:36:08,910 --> 00:36:12,030 output of a hash function rather than 892 00:36:10,530 --> 00:36:13,620 have to like brute force said on the fly 893 00:36:12,030 --> 00:36:15,210 you can just go look up in this database 894 00:36:13,620 --> 00:36:16,830 Oh what is the input that corresponds to 895 00:36:15,210 --> 00:36:19,410 this output and people have built these 896 00:36:16,830 --> 00:36:22,650 for reasonably large password databases 897 00:36:19,410 --> 00:36:25,080 and so one thing that you can do in 898 00:36:22,650 --> 00:36:28,500 reaction to that as a defense is rather 899 00:36:25,080 --> 00:36:31,350 than storing in your database as your to 900 00:36:28,500 --> 00:36:34,860 write it rather than storing just the 901 00:36:31,350 --> 00:36:42,080 hash of the password what you do is you 902 00:36:34,860 --> 00:36:44,640 compute what's called a salt value and 903 00:36:42,080 --> 00:36:47,010 what this is is a large random string 904 00:36:44,640 --> 00:36:50,160 and then what you do is you store in 905 00:36:47,010 --> 00:36:53,100 your password database the salt which is 906 00:36:50,160 --> 00:36:54,900 not really a secret like you can store 907 00:36:53,100 --> 00:36:59,730 this in your password database along 908 00:36:54,900 --> 00:37:04,980 with a hash of the password with the 909 00:36:59,730 --> 00:37:08,190 salt appended to it why is this useful 910 00:37:04,980 --> 00:37:10,320 well this salt is a random unique value 911 00:37:08,190 --> 00:37:12,060 for every user and so if someone has the 912 00:37:10,320 --> 00:37:14,640 password safe password one two three on 913 00:37:12,060 --> 00:37:16,350 one web service if you are just storing 914 00:37:14,640 --> 00:37:17,970 the hash of the password then the hash 915 00:37:16,350 --> 00:37:18,900 would be the same on both Web Services 916 00:37:17,970 --> 00:37:20,970 right because this hash 917 00:37:18,900 --> 00:37:22,619 function is a deterministic function but 918 00:37:20,970 --> 00:37:26,369 now since we're using this randomized 919 00:37:22,619 --> 00:37:28,470 salt value we store the hash of the 920 00:37:26,369 --> 00:37:29,700 password plus of salt and so even if 921 00:37:28,470 --> 00:37:32,640 someone's using the same password on 922 00:37:29,700 --> 00:37:34,589 multiple sites this thing looks 923 00:37:32,640 --> 00:37:37,140 different in both cases and it makes it 924 00:37:34,589 --> 00:37:40,770 so these big databases mapping these 925 00:37:37,140 --> 00:37:42,210 short passwords or hash outputs to the 926 00:37:40,770 --> 00:37:44,609 short passwords that they came from 927 00:37:42,210 --> 00:37:47,099 those are no longer useful when you have 928 00:37:44,609 --> 00:37:48,900 salted passwords you kind of need to do 929 00:37:47,099 --> 00:37:51,150 the brute-force attack for every user 930 00:37:48,900 --> 00:37:52,079 once you find their salt value rather 931 00:37:51,150 --> 00:37:54,210 than being able to use this big 932 00:37:52,079 --> 00:37:58,829 precomputed database does that answer 933 00:37:54,210 --> 00:38:00,450 the question of what a salt is and so 934 00:37:58,829 --> 00:38:02,960 that's what that salt argument is 935 00:38:00,450 --> 00:38:02,960 related to 936 00:38:05,580 --> 00:38:12,800 let's see so any questions about 937 00:38:08,430 --> 00:38:16,740 anything we talked about so far great 938 00:38:12,800 --> 00:38:20,040 okay so the I'm gonna go ahead and erase 939 00:38:16,740 --> 00:38:22,230 this and then the last topic we'll talk 940 00:38:20,040 --> 00:38:23,640 about is one of the most exciting 941 00:38:22,230 --> 00:38:24,600 developments of cryptography happen 942 00:38:23,640 --> 00:38:26,580 quite a while ago but it's still a 943 00:38:24,600 --> 00:38:42,030 really cool concept something called a 944 00:38:26,580 --> 00:38:43,680 symmetric key cryptography and so this 945 00:38:42,030 --> 00:38:45,720 is an idea that actually enables a lot 946 00:38:43,680 --> 00:38:48,510 of the security and privacy related 947 00:38:45,720 --> 00:38:50,160 features of basically anything you use 948 00:38:48,510 --> 00:38:53,430 today like we need to go and type in 949 00:38:50,160 --> 00:38:56,880 like www.google.com/mapmaker flog Rafi 950 00:38:53,430 --> 00:38:58,290 is used as part of what goes on there so 951 00:38:56,880 --> 00:38:59,700 this is going to look pretty similar to 952 00:38:58,290 --> 00:39:04,830 what we talked about in symmetric key 953 00:38:59,700 --> 00:39:06,120 cryptography except with a twist there's 954 00:39:04,830 --> 00:39:08,430 a key generation function which 955 00:39:06,120 --> 00:39:10,530 similarly is randomized but instead of 956 00:39:08,430 --> 00:39:16,860 producing a single key it produces a 957 00:39:10,530 --> 00:39:21,570 pair of keys two different things one of 958 00:39:16,860 --> 00:39:25,260 which is referred to as a public key and 959 00:39:21,570 --> 00:39:27,750 the other is referred to as a private 960 00:39:25,260 --> 00:39:29,550 key and then these can be used for 961 00:39:27,750 --> 00:39:31,650 encryption and decryption in a manner 962 00:39:29,550 --> 00:39:33,270 kind of similar to symmetric key crypto 963 00:39:31,650 --> 00:39:35,340 except these different keys have 964 00:39:33,270 --> 00:39:39,150 different uses now so we have an 965 00:39:35,340 --> 00:39:41,340 encryption function which takes in a 966 00:39:39,150 --> 00:39:46,830 plaintext Isles right P here and it 967 00:39:41,340 --> 00:39:49,110 takes in the public key and praises the 968 00:39:46,830 --> 00:39:53,250 ciphertext and then I have a decryption 969 00:39:49,110 --> 00:39:59,190 function which takes in my ciphertext 970 00:39:53,250 --> 00:40:05,790 and the private key and gives me back my 971 00:39:59,190 --> 00:40:08,520 plaintext and then similarly to those 972 00:40:05,790 --> 00:40:11,010 two properties we had over there given 973 00:40:08,520 --> 00:40:14,070 just the ciphertext we can't figure out 974 00:40:11,010 --> 00:40:15,720 the plaintext unless we have the private 975 00:40:14,070 --> 00:40:17,460 key and then we have the obvious 976 00:40:15,720 --> 00:40:18,809 correctness property that if we encrypt 977 00:40:17,460 --> 00:40:20,789 something with the private key 978 00:40:18,809 --> 00:40:23,219 sorry encrypt something with the public 979 00:40:20,789 --> 00:40:25,380 key and then take that cypher text and 980 00:40:23,219 --> 00:40:26,819 try decrypting it with the corresponding 981 00:40:25,380 --> 00:40:28,619 private key that came from this key 982 00:40:26,819 --> 00:40:30,809 generation process that outputs these 983 00:40:28,619 --> 00:40:36,059 two different things at once then I get 984 00:40:30,809 --> 00:40:37,890 the same result back so this is very 985 00:40:36,059 --> 00:40:39,029 similar to what's above but there's a 986 00:40:37,890 --> 00:40:40,769 twist that we have these two different 987 00:40:39,029 --> 00:40:42,839 keys that have different functions and 988 00:40:40,769 --> 00:40:44,999 it's really neat that this public key 989 00:40:42,839 --> 00:40:47,309 can actually be made as the name 990 00:40:44,999 --> 00:40:49,229 indicates public like I could be using a 991 00:40:47,309 --> 00:40:51,119 crypto system that works like this post 992 00:40:49,229 --> 00:40:53,279 a public key on the internet for anybody 993 00:40:51,119 --> 00:40:54,869 to see but keep my private key secret 994 00:40:53,279 --> 00:40:56,400 and then I have this interesting 995 00:40:54,869 --> 00:40:58,259 property that anybody on the internet 996 00:40:56,400 --> 00:41:00,779 can take any piece of content and 997 00:40:58,259 --> 00:41:02,400 encrypt it for me using my public key 998 00:41:00,779 --> 00:41:04,409 and send it over the Internet 999 00:41:02,400 --> 00:41:06,329 to me and then I can decrypt it using my 1000 00:41:04,409 --> 00:41:08,400 private key and as long as my private 1001 00:41:06,329 --> 00:41:10,409 key C's stays secret it doesn't matter 1002 00:41:08,400 --> 00:41:11,759 if my public key is available to anybody 1003 00:41:10,409 --> 00:41:15,089 on the Internet so here's where the 1004 00:41:11,759 --> 00:41:18,569 asymmetry comes from before we were in a 1005 00:41:15,089 --> 00:41:20,549 scenario where like suppose I was on the 1006 00:41:18,569 --> 00:41:21,150 internet but you weren't like talking to 1007 00:41:20,549 --> 00:41:22,890 me face-to-face 1008 00:41:21,150 --> 00:41:25,349 and you wanted to send me some data over 1009 00:41:22,890 --> 00:41:26,609 the Internet over some unencrypted 1010 00:41:25,349 --> 00:41:28,140 Channel where anybody could listen on 1011 00:41:26,609 --> 00:41:30,150 what you were saying and you wanted to 1012 00:41:28,140 --> 00:41:32,009 use symmetric key cryptography well we 1013 00:41:30,150 --> 00:41:33,479 need some way of exchanging a key in 1014 00:41:32,009 --> 00:41:34,769 advance so that you could encrypt some 1015 00:41:33,479 --> 00:41:36,659 plaintext with a key and give me that 1016 00:41:34,769 --> 00:41:38,549 ciphertext over the Internet so that I 1017 00:41:36,659 --> 00:41:41,279 could done decrypt it with that key in 1018 00:41:38,549 --> 00:41:42,839 symmetric key crypto if the keys public 1019 00:41:41,279 --> 00:41:45,299 it's game over like anybody can decrypt 1020 00:41:42,839 --> 00:41:47,339 your stuff whereas an asymmetric key 1021 00:41:45,299 --> 00:41:49,019 cryptography I could take my public key 1022 00:41:47,339 --> 00:41:50,909 and like post it on a bulletin board on 1023 00:41:49,019 --> 00:41:52,650 the Internet and you can go look at that 1024 00:41:50,909 --> 00:41:54,749 take some contents and encrypt them for 1025 00:41:52,650 --> 00:41:55,799 me and then send them over and that 1026 00:41:54,749 --> 00:41:57,269 would be totally fine 1027 00:41:55,799 --> 00:41:59,969 you can only decrypt it with the private 1028 00:41:57,269 --> 00:42:02,249 key and so one analogy that may be 1029 00:41:59,969 --> 00:42:05,609 helpful is comparing these mathematical 1030 00:42:02,249 --> 00:42:07,439 ideas to physical locks so you probably 1031 00:42:05,609 --> 00:42:09,329 have a lock on your door to your house 1032 00:42:07,439 --> 00:42:11,189 and you can put in a key and like turn 1033 00:42:09,329 --> 00:42:12,479 the thing in order to lock the door or 1034 00:42:11,189 --> 00:42:14,339 you can turn it the other way to unlock 1035 00:42:12,479 --> 00:42:15,929 the door so there's a single key and it 1036 00:42:14,339 --> 00:42:17,459 can both lock and unlock the door 1037 00:42:15,929 --> 00:42:19,259 but now consider this alternative 1038 00:42:17,459 --> 00:42:20,640 construction which you might use if say 1039 00:42:19,259 --> 00:42:23,429 I want you to be able to send me a 1040 00:42:20,640 --> 00:42:25,019 message and have it be sent over the 1041 00:42:23,429 --> 00:42:27,209 internet and you I don't really need a 1042 00:42:25,019 --> 00:42:29,549 way to exchange a key with you 1043 00:42:27,209 --> 00:42:30,719 beforehand I could get a box which you 1044 00:42:29,549 --> 00:42:32,350 could put a letter inside and you can 1045 00:42:30,719 --> 00:42:36,010 close the box and I can get one of the 1046 00:42:32,350 --> 00:42:37,540 padlock things which I can give you by I 1047 00:42:36,010 --> 00:42:39,820 could like take those padlock and open 1048 00:42:37,540 --> 00:42:41,530 it and give it to you and you at your 1049 00:42:39,820 --> 00:42:43,360 own leisure could put your message 1050 00:42:41,530 --> 00:42:45,850 inside a box and take this padlock which 1051 00:42:43,360 --> 00:42:48,430 is open and shut it around the box and 1052 00:42:45,850 --> 00:42:50,350 then send it over to me and then I could 1053 00:42:48,430 --> 00:42:52,090 put in my key and unlock it so do you 1054 00:42:50,350 --> 00:42:54,310 see how there is this asymmetry there as 1055 00:42:52,090 --> 00:42:56,050 opposed to the key that I used to open 1056 00:42:54,310 --> 00:42:57,760 the door to my house where the same key 1057 00:42:56,050 --> 00:42:59,920 opens and closes the thing instead I 1058 00:42:57,760 --> 00:43:01,750 give you this open padlock that you have 1059 00:42:59,920 --> 00:43:03,880 the ability to close but not open and 1060 00:43:01,750 --> 00:43:05,590 after you closed it I can use my key 1061 00:43:03,880 --> 00:43:07,120 which I've kept secret in order to open 1062 00:43:05,590 --> 00:43:09,100 the thing and retrieve what's inside 1063 00:43:07,120 --> 00:43:10,930 maybe this analogy is helpful maybe it's 1064 00:43:09,100 --> 00:43:13,720 not the mathematical construction works 1065 00:43:10,930 --> 00:43:17,350 just fine if that works for a year so 1066 00:43:13,720 --> 00:43:18,790 any questions about a symmetric key 1067 00:43:17,350 --> 00:43:21,190 encryption and decryption and how it 1068 00:43:18,790 --> 00:43:25,900 relates to symmetric key crypto how it's 1069 00:43:21,190 --> 00:43:27,940 a little bit different so before we talk 1070 00:43:25,900 --> 00:43:30,370 about applications of this idea I'm 1071 00:43:27,940 --> 00:43:33,580 going to talk about one other set of 1072 00:43:30,370 --> 00:43:36,160 concepts in a symmetric key cryptography 1073 00:43:33,580 --> 00:43:37,870 so these crypto systems give you another 1074 00:43:36,160 --> 00:43:39,940 set of tools which are related to 1075 00:43:37,870 --> 00:43:42,370 encryption and decryption something 1076 00:43:39,940 --> 00:43:44,350 called signing and verifying and this is 1077 00:43:42,370 --> 00:43:46,210 kind of similar to the real world like I 1078 00:43:44,350 --> 00:43:48,370 can get a document and sign it with my 1079 00:43:46,210 --> 00:43:50,260 signature except real world signatures 1080 00:43:48,370 --> 00:43:52,210 are I don't think that hard to forge 1081 00:43:50,260 --> 00:43:56,260 these are pretty hard to forge and 1082 00:43:52,210 --> 00:43:57,940 therefore more useful what does 1083 00:43:56,260 --> 00:44:00,600 signature schemes look like there's a 1084 00:43:57,940 --> 00:44:08,380 function sign that takes us some message 1085 00:44:00,600 --> 00:44:09,910 and the private key so notice this this 1086 00:44:08,380 --> 00:44:14,620 is the private key not the public key 1087 00:44:09,910 --> 00:44:18,370 and it produces a signature and then 1088 00:44:14,620 --> 00:44:23,640 there's another function verify which 1089 00:44:18,370 --> 00:44:27,540 takes in the message the signature and 1090 00:44:23,640 --> 00:44:27,540 the public key this time 1091 00:44:30,430 --> 00:44:35,750 and it tells me it returns a boolean 1092 00:44:33,890 --> 00:44:40,610 whether or not the signature checks out 1093 00:44:35,750 --> 00:44:43,640 and then this pair of functions has the 1094 00:44:40,610 --> 00:44:45,080 property that again these are kind of 1095 00:44:43,640 --> 00:44:49,070 properties that follow the intuition 1096 00:44:45,080 --> 00:44:54,170 that come from physical signatures that 1097 00:44:49,070 --> 00:44:56,990 uh without the private key it's hard to 1098 00:44:54,170 --> 00:44:58,850 produce a signature for any message such 1099 00:44:56,990 --> 00:45:00,380 that you can give the message in the 1100 00:44:58,850 --> 00:45:02,360 signature and the public key to the 1101 00:45:00,380 --> 00:45:07,540 verify function to get it to return true 1102 00:45:02,360 --> 00:45:07,540 like at a high level it's hard to Forge 1103 00:45:09,520 --> 00:45:20,720 it's hard to forge a signature of course 1104 00:45:11,930 --> 00:45:24,200 without the private key and then there's 1105 00:45:20,720 --> 00:45:25,610 the obvious correctness property that if 1106 00:45:24,200 --> 00:45:26,690 you signed a thing with a public key and 1107 00:45:25,610 --> 00:45:28,670 then try verifying it with the 1108 00:45:26,690 --> 00:45:30,170 corresponding sorry if you sign a thing 1109 00:45:28,670 --> 00:45:31,520 with the private key and try to verify 1110 00:45:30,170 --> 00:45:34,010 it with the corresponding public key 1111 00:45:31,520 --> 00:45:38,300 it returns okay that this verification 1112 00:45:34,010 --> 00:45:41,080 checks out so these are two different 1113 00:45:38,300 --> 00:45:44,030 kinds of things you can do with 1114 00:45:41,080 --> 00:45:46,280 asymmetric key cryptosystems 1115 00:45:44,030 --> 00:45:47,510 an example of an asymmetric key 1116 00:45:46,280 --> 00:45:50,360 cryptosystem that you might have heard 1117 00:45:47,510 --> 00:45:51,800 of is something called RSA so RSA is 1118 00:45:50,360 --> 00:45:53,180 designed by a number of people one of 1119 00:45:51,800 --> 00:45:59,330 whom is ron rivest who's a professor 1120 00:45:53,180 --> 00:46:01,730 here so there are couple of interesting 1121 00:45:59,330 --> 00:46:03,170 applications of asymmetric key crypto 1122 00:46:01,730 --> 00:46:04,580 actually like tons and tons and tons of 1123 00:46:03,170 --> 00:46:06,800 you spend like days talking about this 1124 00:46:04,580 --> 00:46:08,540 but a couple examples are email 1125 00:46:06,800 --> 00:46:10,400 encryption so we talked a little bit 1126 00:46:08,540 --> 00:46:12,410 about sending messages what we can do 1127 00:46:10,400 --> 00:46:14,360 with asymmetric key crypto is that you 1128 00:46:12,410 --> 00:46:16,670 can have public keys posted online I 1129 00:46:14,360 --> 00:46:18,530 think some of the instructors have PGP 1130 00:46:16,670 --> 00:46:19,850 public keys on their website so for 1131 00:46:18,530 --> 00:46:21,740 example you go to my website or John's 1132 00:46:19,850 --> 00:46:24,740 website you'll find a public key and 1133 00:46:21,740 --> 00:46:27,410 then what you can do is you can send us 1134 00:46:24,740 --> 00:46:29,060 an encrypted email and so even if that 1135 00:46:27,410 --> 00:46:30,260 message goes through Gmail or whatever 1136 00:46:29,060 --> 00:46:31,970 other mail service throughout my T's 1137 00:46:30,260 --> 00:46:34,040 mail servers if there happens to be an 1138 00:46:31,970 --> 00:46:35,540 attacker snooping on the messages they 1139 00:46:34,040 --> 00:46:38,000 can't make any sense of their contents 1140 00:46:35,540 --> 00:46:39,650 because they're all encrypted and this 1141 00:46:38,000 --> 00:46:42,440 is really cool because you can do this 1142 00:46:39,650 --> 00:46:43,109 without kind of finding us in person and 1143 00:46:42,440 --> 00:46:44,099 exchanging 1144 00:46:43,109 --> 00:46:45,989 which you might have to do in a 1145 00:46:44,099 --> 00:46:47,609 symmetric key cryptosystem you can just 1146 00:46:45,989 --> 00:46:49,289 find our public key which can be posted 1147 00:46:47,609 --> 00:46:52,229 online without causing any issues and 1148 00:46:49,289 --> 00:46:53,759 then send us encrypted email another 1149 00:46:52,229 --> 00:46:56,160 thing asymmetric key crypto is used for 1150 00:46:53,759 --> 00:46:58,019 is private messaging so raise your hand 1151 00:46:56,160 --> 00:47:00,509 if you've used anything like signal or 1152 00:46:58,019 --> 00:47:01,950 telegram or I think what's up is in 1153 00:47:00,509 --> 00:47:05,039 theory antenna encrypted so a good 1154 00:47:01,950 --> 00:47:07,380 number of you these private messaging 1155 00:47:05,039 --> 00:47:09,479 applications also use asymmetric key 1156 00:47:07,380 --> 00:47:11,819 crypto to establish private 1157 00:47:09,479 --> 00:47:14,309 communication channels basically every 1158 00:47:11,819 --> 00:47:16,499 person has associated with them a key 1159 00:47:14,309 --> 00:47:18,390 pair and so your device has run this key 1160 00:47:16,499 --> 00:47:20,400 generation function produced a public 1161 00:47:18,390 --> 00:47:22,049 key and a private key and automatically 1162 00:47:20,400 --> 00:47:23,670 posted your public key to the internet 1163 00:47:22,049 --> 00:47:25,529 so for example if you're using signal 1164 00:47:23,670 --> 00:47:27,029 your public key is on the signal servers 1165 00:47:25,529 --> 00:47:30,029 and then when someone wants to contact 1166 00:47:27,029 --> 00:47:31,709 you their phone can look up your public 1167 00:47:30,029 --> 00:47:33,150 key retreat and once it's retrieve your 1168 00:47:31,709 --> 00:47:35,279 public key they can encrypt information 1169 00:47:33,150 --> 00:47:36,599 for you this is a kind of approximation 1170 00:47:35,279 --> 00:47:38,509 of how their algorithm works but at a 1171 00:47:36,599 --> 00:47:40,979 high level that's what's going on 1172 00:47:38,509 --> 00:47:43,499 another neat application of asymmetric 1173 00:47:40,979 --> 00:47:44,880 key crypto is we were talking about 1174 00:47:43,499 --> 00:47:46,079 earlier like making sure you have the 1175 00:47:44,880 --> 00:47:46,519 right software we downloaded from the 1176 00:47:46,079 --> 00:47:48,599 internet 1177 00:47:46,519 --> 00:47:50,609 asymmetric key crypto can be used to 1178 00:47:48,599 --> 00:47:52,430 sign software releases and this is 1179 00:47:50,609 --> 00:47:55,199 something that people do for example 1180 00:47:52,430 --> 00:47:56,489 like Debian packages or whatever things 1181 00:47:55,199 --> 00:47:57,959 you download from the internet the 1182 00:47:56,489 --> 00:47:59,670 developer will try to sign their 1183 00:47:57,959 --> 00:48:00,630 software so that you can make sure that 1184 00:47:59,670 --> 00:48:01,769 whatever you've downloaded from the 1185 00:48:00,630 --> 00:48:04,799 internet is actually the right thing 1186 00:48:01,769 --> 00:48:06,660 that came from the right person we 1187 00:48:04,799 --> 00:48:07,920 talked about in the get lecture all the 1188 00:48:06,660 --> 00:48:10,440 interesting things you can do with git 1189 00:48:07,920 --> 00:48:15,239 one thing we didn't cover was signing 1190 00:48:10,440 --> 00:48:17,670 related functionality and get so git has 1191 00:48:15,239 --> 00:48:19,799 commits and you can associate with 1192 00:48:17,670 --> 00:48:21,150 commits something called tags at a high 1193 00:48:19,799 --> 00:48:22,949 level you can basically take a git 1194 00:48:21,150 --> 00:48:26,160 commit and attach a signature to it 1195 00:48:22,949 --> 00:48:28,589 which binds your public key to this 1196 00:48:26,160 --> 00:48:31,170 commit and then anybody who has your 1197 00:48:28,589 --> 00:48:32,789 public key can take the commit and your 1198 00:48:31,170 --> 00:48:35,519 public key and make sure that there's a 1199 00:48:32,789 --> 00:48:40,920 legitimate signature on the key or sorry 1200 00:48:35,519 --> 00:48:44,670 on the commit so let me go to like some 1201 00:48:40,920 --> 00:48:46,650 random repository that I have I can look 1202 00:48:44,670 --> 00:48:51,959 at a bunch of tags associated with 1203 00:48:46,650 --> 00:48:55,279 repository if I do look at the raw data 1204 00:48:51,959 --> 00:48:55,279 associated with this tag 1205 00:48:55,500 --> 00:49:02,820 it has some metadata and then a blob of 1206 00:48:59,400 --> 00:49:05,700 like ascii encoded information that i 1207 00:49:02,820 --> 00:49:08,880 can use the get tagged - v4 verify 1208 00:49:05,700 --> 00:49:11,040 command to make sure that oh this is a 1209 00:49:08,880 --> 00:49:13,050 good signature from this person happens 1210 00:49:11,040 --> 00:49:14,220 to be me so I sign the software release 1211 00:49:13,050 --> 00:49:15,630 so that anybody who downloads it from 1212 00:49:14,220 --> 00:49:18,150 the Internet can make sure that they 1213 00:49:15,630 --> 00:49:31,680 actually got an authentic copy yes 1214 00:49:18,150 --> 00:49:33,599 question so the question is what exactly 1215 00:49:31,680 --> 00:49:38,460 is the verify function doing or what is 1216 00:49:33,599 --> 00:49:39,830 it checking against the like if you want 1217 00:49:38,460 --> 00:49:41,130 to know mathematically what's going on 1218 00:49:39,830 --> 00:49:43,619 talk to me 1219 00:49:41,130 --> 00:49:45,510 after this lecture but for from kind of 1220 00:49:43,619 --> 00:49:48,000 an API perspective what's going on here 1221 00:49:45,510 --> 00:49:49,800 is that the signature and also the 1222 00:49:48,000 --> 00:49:52,320 message here are just a blob of bytes 1223 00:49:49,800 --> 00:49:56,130 and it happens to be the case that these 1224 00:49:52,320 --> 00:49:58,560 things are designed such that basically 1225 00:49:56,130 --> 00:50:00,660 if I take for some particular public key 1226 00:49:58,560 --> 00:50:02,910 like if you take my public key it's 1227 00:50:00,660 --> 00:50:06,450 impossible for you without knowledge of 1228 00:50:02,910 --> 00:50:07,950 my private key for any message to find a 1229 00:50:06,450 --> 00:50:10,800 second argument to this function that 1230 00:50:07,950 --> 00:50:12,990 makes it return true you can kind of 1231 00:50:10,800 --> 00:50:14,460 compare it to signing a document like 1232 00:50:12,990 --> 00:50:16,890 you don't know how to forge my signature 1233 00:50:14,460 --> 00:50:19,140 I can take any piece of paper and sign 1234 00:50:16,890 --> 00:50:20,970 it and then anybody who knows what my 1235 00:50:19,140 --> 00:50:22,200 signature looks like I can show my 1236 00:50:20,970 --> 00:50:24,690 document - you can be like yeah that 1237 00:50:22,200 --> 00:50:27,150 checks out but nobody without the 1238 00:50:24,690 --> 00:50:30,060 private key can produce a signature that 1239 00:50:27,150 --> 00:50:35,339 will make this function return true for 1240 00:50:30,060 --> 00:50:36,420 any particular message and any related 1241 00:50:35,339 --> 00:50:39,470 questions started you want me to explain 1242 00:50:36,420 --> 00:50:39,470 any other way or does that make sense 1243 00:50:50,180 --> 00:50:54,119 so any questions about signing software 1244 00:50:52,680 --> 00:50:55,920 or any of the other handful of 1245 00:50:54,119 --> 00:51:01,050 applications talked about of asymmetric 1246 00:50:55,920 --> 00:51:02,460 key crypto well so one final thing I 1247 00:51:01,050 --> 00:51:05,100 want to talk about we're almost out of 1248 00:51:02,460 --> 00:51:07,200 time is key distribution this is a kind 1249 00:51:05,100 --> 00:51:08,130 of interesting side effect of asymmetric 1250 00:51:07,200 --> 00:51:09,330 key cryptography 1251 00:51:08,130 --> 00:51:11,730 it enables a bunch of interesting 1252 00:51:09,330 --> 00:51:12,960 functionality like I can post my public 1253 00:51:11,730 --> 00:51:14,820 key on the internet you can go find it 1254 00:51:12,960 --> 00:51:16,170 and send me encrypted email but how do 1255 00:51:14,820 --> 00:51:18,060 you know that the public key found is 1256 00:51:16,170 --> 00:51:19,410 actually my public key it seems like 1257 00:51:18,060 --> 00:51:22,710 there's a bootstrapping problem here 1258 00:51:19,410 --> 00:51:24,810 right so there are a couple this is like 1259 00:51:22,710 --> 00:51:27,930 a really interesting and really hard 1260 00:51:24,810 --> 00:51:29,400 real-world problem and there are a 1261 00:51:27,930 --> 00:51:31,800 couple different approaches you might 1262 00:51:29,400 --> 00:51:33,780 take to this problem one is kind of a 1263 00:51:31,800 --> 00:51:35,000 lame solution but this thing solves a 1264 00:51:33,780 --> 00:51:37,050 lot of cryptography problems this 1265 00:51:35,000 --> 00:51:39,630 exchange the information out-of-band 1266 00:51:37,050 --> 00:51:41,430 what that means is you want to send me 1267 00:51:39,630 --> 00:51:43,710 encrypted email we'll just talk to me 1268 00:51:41,430 --> 00:51:45,210 after class I'll give you my public key 1269 00:51:43,710 --> 00:51:46,560 on a piece of paper and since you were 1270 00:51:45,210 --> 00:51:48,240 talking to me in person like you know 1271 00:51:46,560 --> 00:51:49,770 that it's actually my public key not 1272 00:51:48,240 --> 00:51:51,930 just somebody like hacked my website and 1273 00:51:49,770 --> 00:51:53,160 stuck some random number on there that 1274 00:51:51,930 --> 00:51:54,420 solves the problem but it's not the most 1275 00:51:53,160 --> 00:51:55,740 elegant there are a couple other 1276 00:51:54,420 --> 00:51:58,650 approaches that different applications 1277 00:51:55,740 --> 00:52:01,410 use so those of you who use signal have 1278 00:51:58,650 --> 00:52:02,970 you ever encountered the phrase safety 1279 00:52:01,410 --> 00:52:06,810 number like verify your safety number 1280 00:52:02,970 --> 00:52:09,180 with so and so so with signal they have 1281 00:52:06,810 --> 00:52:10,950 a way of exchanging public keys which is 1282 00:52:09,180 --> 00:52:12,840 through the signal servers whoever runs 1283 00:52:10,950 --> 00:52:14,280 the signal service just maintains on 1284 00:52:12,840 --> 00:52:16,170 their servers basically a mapping from 1285 00:52:14,280 --> 00:52:17,670 phone numbers to public keys and when I 1286 00:52:16,170 --> 00:52:19,109 say oh I want to message this person 1287 00:52:17,670 --> 00:52:20,190 with this number my phone just goes and 1288 00:52:19,109 --> 00:52:21,930 retrieves their public key from the 1289 00:52:20,190 --> 00:52:24,060 internet and then encrypts the message 1290 00:52:21,930 --> 00:52:27,080 for that public key now does anybody see 1291 00:52:24,060 --> 00:52:27,080 a problem with the setup 1292 00:52:29,750 --> 00:52:38,880 yeah yeah exactly the signal servers our 1293 00:52:36,900 --> 00:52:40,830 point of failure there because if the 1294 00:52:38,880 --> 00:52:42,900 signal servers give me the wrong public 1295 00:52:40,830 --> 00:52:44,610 key like supposed signal just produces a 1296 00:52:42,900 --> 00:52:46,200 new key pair and give me their public 1297 00:52:44,610 --> 00:52:47,940 key now they can read all my messages 1298 00:52:46,200 --> 00:52:50,220 and they could even sit in between and 1299 00:52:47,940 --> 00:52:51,540 transparently decrypt the messages I 1300 00:52:50,220 --> 00:52:52,950 send them and then re encrypt them and 1301 00:52:51,540 --> 00:52:55,170 send them on to their final destination 1302 00:52:52,950 --> 00:52:57,960 like basically I need some way of 1303 00:52:55,170 --> 00:52:59,850 authenticating the public key I get and 1304 00:52:57,960 --> 00:53:01,170 so signal has one solution to this which 1305 00:52:59,850 --> 00:53:04,350 is also just kind of punting the issue 1306 00:53:01,170 --> 00:53:05,220 to out-of-band key exchange you can meet 1307 00:53:04,350 --> 00:53:07,320 up with somebody and they have a 1308 00:53:05,220 --> 00:53:08,820 slightly streamline flow where they show 1309 00:53:07,320 --> 00:53:09,750 QR codes on the screen you take one 1310 00:53:08,820 --> 00:53:10,950 phone and take a picture of the other 1311 00:53:09,750 --> 00:53:12,840 phone screen and vice versa 1312 00:53:10,950 --> 00:53:14,580 and now you've exchanged public keys in 1313 00:53:12,840 --> 00:53:15,930 person and from that point on you've 1314 00:53:14,580 --> 00:53:19,350 bootstrap your encrypted end-to-end 1315 00:53:15,930 --> 00:53:22,230 communication it also has an issue of or 1316 00:53:19,350 --> 00:53:24,150 it also has approach of pinning a public 1317 00:53:22,230 --> 00:53:25,980 key so once you know that a particular 1318 00:53:24,150 --> 00:53:27,750 phone number has a particular public key 1319 00:53:25,980 --> 00:53:30,360 your phone remembers that and if that 1320 00:53:27,750 --> 00:53:32,340 ever changes it'll complain to you and 1321 00:53:30,360 --> 00:53:34,950 then there are a couple other solutions 1322 00:53:32,340 --> 00:53:36,750 to this problem PGP one pop to let used 1323 00:53:34,950 --> 00:53:38,460 to be popular a while ago has this idea 1324 00:53:36,750 --> 00:53:40,350 of web of trust so like I trust people 1325 00:53:38,460 --> 00:53:41,940 who my friends trust so if like John has 1326 00:53:40,350 --> 00:53:43,740 done an out-of-band exchange with say my 1327 00:53:41,940 --> 00:53:45,660 professor then I can probably email my 1328 00:53:43,740 --> 00:53:47,460 professor because like I know that John 1329 00:53:45,660 --> 00:53:48,780 trusts my professor and I trust John so 1330 00:53:47,460 --> 00:53:50,310 you got this chain of trust through 1331 00:53:48,780 --> 00:53:52,380 there that's one interesting approach 1332 00:53:50,310 --> 00:53:53,910 and then another model that's called 1333 00:53:52,380 --> 00:53:55,770 pretty recently as something that a tool 1334 00:53:53,910 --> 00:54:00,350 called key base uses this is a really 1335 00:53:55,770 --> 00:54:00,350 neat whoops 1336 00:54:00,500 --> 00:54:04,950 there's website called key based IO and 1337 00:54:03,750 --> 00:54:06,780 they have a really interesting solution 1338 00:54:04,950 --> 00:54:09,420 to this bootstrapping problem which is 1339 00:54:06,780 --> 00:54:10,590 social proof so saying you probably have 1340 00:54:09,420 --> 00:54:13,500 your friends on Facebook and on Twitter 1341 00:54:10,590 --> 00:54:15,600 and whatnot and it's probably pretty 1342 00:54:13,500 --> 00:54:17,070 hard for an attacker to break into your 1343 00:54:15,600 --> 00:54:18,390 friend's Facebook account at the same 1344 00:54:17,070 --> 00:54:19,680 time as their Twitter account as the 1345 00:54:18,390 --> 00:54:21,000 same time as their hacker news account 1346 00:54:19,680 --> 00:54:23,430 and so on and so there's this 1347 00:54:21,000 --> 00:54:25,860 interesting way of binding public keys 1348 00:54:23,430 --> 00:54:27,780 to a set of social identities such that 1349 00:54:25,860 --> 00:54:30,240 you can retrieve a public key once you 1350 00:54:27,780 --> 00:54:32,610 trust some number of social identities 1351 00:54:30,240 --> 00:54:33,960 corresponding to your friend we have 1352 00:54:32,610 --> 00:54:34,980 links to these in the lecture notes if 1353 00:54:33,960 --> 00:54:38,400 you want to see these things in more 1354 00:54:34,980 --> 00:54:41,160 detail so that's it for our security and 1355 00:54:38,400 --> 00:54:42,690 cryptography lecture and tomorrow's 1356 00:54:41,160 --> 00:54:43,230 lecture will be on a random collection 1357 00:54:42,690 --> 00:54:45,119 of top 1358 00:54:43,230 --> 00:54:48,650 that your instructors find interesting 1359 00:54:45,119 --> 00:54:48,650 so hopefully see you in lecture tomorrow 1360 00:54:51,260 --> 00:54:54,180 I'll also be here for a couple of 1361 00:54:53,040 --> 00:55:08,910 minutes after class if anybody has 1362 00:54:54,180 --> 00:55:09,869 questions yes okay so John feel free to 1363 00:55:08,910 --> 00:55:11,070 leave if you have to leave but I think 1364 00:55:09,869 --> 00:55:11,970 nobody's using the classroom after us 1365 00:55:11,070 --> 00:55:15,150 I'm going to talk about one other 1366 00:55:11,970 --> 00:55:16,770 interesting topic so john brought up the 1367 00:55:15,150 --> 00:55:19,320 fact that a symmetric key cryptography 1368 00:55:16,770 --> 00:55:23,130 is slow and symmetric key cryptography 1369 00:55:19,320 --> 00:55:25,680 is fast and so in practice you don't 1370 00:55:23,130 --> 00:55:28,080 really use just a symmetric key 1371 00:55:25,680 --> 00:55:31,440 cryptography by itself it's usually used 1372 00:55:28,080 --> 00:55:38,280 to bootstrap a more sophisticated 1373 00:55:31,440 --> 00:55:40,710 protocol that you're using one thing you 1374 00:55:38,280 --> 00:55:41,430 might want to do is use a symmetric key 1375 00:55:40,710 --> 00:55:43,890 cryptography 1376 00:55:41,430 --> 00:55:45,869 for signing encrypted email right we 1377 00:55:43,890 --> 00:55:47,730 talked about that example and the way 1378 00:55:45,869 --> 00:55:49,260 that works isn't what you might have 1379 00:55:47,730 --> 00:55:50,910 guessed from our straightforward 1380 00:55:49,260 --> 00:55:52,920 explanation of asymmetric key crypto 1381 00:55:50,910 --> 00:55:54,740 like you don't just use that encrypt 1382 00:55:52,920 --> 00:55:57,630 function up there and call it a day in 1383 00:55:54,740 --> 00:56:05,400 practice what you do is you use hybrid 1384 00:55:57,630 --> 00:56:07,410 encryption to use a combination of 1385 00:56:05,400 --> 00:56:09,109 symmetric key and asymmetric key 1386 00:56:07,410 --> 00:56:12,060 cryptography 1387 00:56:09,109 --> 00:56:14,190 what you do is here I'll draw this as a 1388 00:56:12,060 --> 00:56:21,270 big block diagram you take your message 1389 00:56:14,190 --> 00:56:23,490 m and then I have my public key that I 1390 00:56:21,270 --> 00:56:25,080 want to encrypt for but rather than just 1391 00:56:23,490 --> 00:56:27,119 take these two things and pass it 1392 00:56:25,080 --> 00:56:36,240 through the encryption up there what I 1393 00:56:27,119 --> 00:56:41,820 do is I use the symmetric key gen 1394 00:56:36,240 --> 00:56:43,380 function to produce a symmetric key okay 1395 00:56:41,820 --> 00:56:44,700 I'm gonna like prepend this with 1396 00:56:43,380 --> 00:56:46,380 symmetric so we can distinguish it from 1397 00:56:44,700 --> 00:56:49,410 the public key key generation function 1398 00:56:46,380 --> 00:56:52,050 and then what I do is I take these two 1399 00:56:49,410 --> 00:56:55,250 things pass them through my symmetric 1400 00:56:52,050 --> 00:56:55,250 encryption box 1401 00:57:02,250 --> 00:57:13,750 this produces the ciphertext and now 1402 00:57:09,400 --> 00:57:15,640 this by itself to the sender sorry this 1403 00:57:13,750 --> 00:57:17,530 by itself to the receiver who has the 1404 00:57:15,640 --> 00:57:19,900 private key corresponding to this public 1405 00:57:17,530 --> 00:57:20,890 key here this is not really useful right 1406 00:57:19,900 --> 00:57:26,020 because this is encrypted with a 1407 00:57:20,890 --> 00:57:28,690 symmetric cipher with this key K that 1408 00:57:26,020 --> 00:57:30,670 came from this function that I ran on my 1409 00:57:28,690 --> 00:57:32,830 local machine so I need some way of 1410 00:57:30,670 --> 00:57:34,660 getting this to the person who actually 1411 00:57:32,830 --> 00:57:37,570 used to decrypt the email and so what I 1412 00:57:34,660 --> 00:57:39,610 do is I take this thing and now this 1413 00:57:37,570 --> 00:57:40,840 email might have been big and I use 1414 00:57:39,610 --> 00:57:42,700 symmetric encryption with that because 1415 00:57:40,840 --> 00:57:44,980 symmetric encryption is fast but this 1416 00:57:42,700 --> 00:57:46,600 key is small like it might be 256 bits 1417 00:57:44,980 --> 00:57:49,000 or something so I can take this thing 1418 00:57:46,600 --> 00:58:05,620 and encrypt it with a symmetric 1419 00:57:49,000 --> 00:58:09,640 encryption using the public key and this 1420 00:58:05,620 --> 00:58:12,550 gives me an encrypted key and this thing 1421 00:58:09,640 --> 00:58:14,520 can be decrypted using the private key 1422 00:58:12,550 --> 00:58:18,250 corresponding to that public key to 1423 00:58:14,520 --> 00:58:21,520 reconstruct this so this is on the 1424 00:58:18,250 --> 00:58:23,590 sender's end now the receiver gets this 1425 00:58:21,520 --> 00:58:24,760 and this and kind of does these things 1426 00:58:23,590 --> 00:58:27,520 backwards so you start with the 1427 00:58:24,760 --> 00:58:29,890 encrypted key and use asymmetric 1428 00:58:27,520 --> 00:58:31,330 decryption using your public using your 1429 00:58:29,890 --> 00:58:34,180 private key that corresponds to the 1430 00:58:31,330 --> 00:58:35,350 posted public key to reconstruct this 1431 00:58:34,180 --> 00:58:37,570 key that were used for the symmetric 1432 00:58:35,350 --> 00:58:39,760 encryption box and then use symmetric 1433 00:58:37,570 --> 00:58:41,560 key decryption using that key that was 1434 00:58:39,760 --> 00:58:45,760 reconstructed to take this ciphertext 1435 00:58:41,560 --> 00:58:47,590 and produce the original message so 1436 00:58:45,760 --> 00:58:49,630 there's a kind of interesting example of 1437 00:58:47,590 --> 00:58:56,910 how in practice symmetric and asymmetric 1438 00:58:49,630 --> 00:58:56,910 key cryptography is combined question 1439 00:59:00,589 --> 00:59:08,220 so the question is will you be using the 1440 00:59:02,790 --> 00:59:11,040 same symmetric key generators yes okay 1441 00:59:08,220 --> 00:59:13,650 so you need to kind of agree ahead of 1442 00:59:11,040 --> 00:59:16,280 time which box you're using here so you 1443 00:59:13,650 --> 00:59:21,359 might be like oh I'm going to use AES 1444 00:59:16,280 --> 00:59:24,510 256 GC up here but this is a well known 1445 00:59:21,359 --> 00:59:26,010 function and it's public like the 1446 00:59:24,510 --> 00:59:27,930 attackers allowed to know all the 1447 00:59:26,010 --> 00:59:29,190 parameters this function this is the 1448 00:59:27,930 --> 00:59:34,190 only secret thing that the attacker 1449 00:59:29,190 --> 00:59:40,650 doesn't know the key any other questions 1450 00:59:34,190 --> 00:59:42,000 yeah that's a really good question what 1451 00:59:40,650 --> 00:59:46,020 kind of data is important enough to 1452 00:59:42,000 --> 00:59:47,910 encrypt and I think that depends on your 1453 00:59:46,020 --> 00:59:49,349 threat model like who what kind of 1454 00:59:47,910 --> 00:59:53,099 attackers are you concerned about what 1455 00:59:49,349 --> 00:59:54,180 are you trying to protect against so you 1456 00:59:53,099 --> 00:59:56,070 might have the stance that you just 1457 00:59:54,180 --> 00:59:57,570 don't really care and that like anything 1458 00:59:56,070 --> 00:59:58,980 you communicate with anybody is allowed 1459 00:59:57,570 --> 01:00:00,720 to be public I might be willing to like 1460 00:59:58,980 --> 01:00:03,359 post all my conversation with everybody 1461 01:00:00,720 --> 01:00:05,760 for everybody to see publicly on the 1462 01:00:03,359 --> 01:00:08,130 Internet on the other hand maybe you're 1463 01:00:05,760 --> 01:00:10,560 doing some like security sensitive works 1464 01:00:08,130 --> 01:00:11,970 here working under a contract for the US 1465 01:00:10,560 --> 01:00:13,770 government developing some sensitive 1466 01:00:11,970 --> 01:00:15,060 military stuff if you're sending that 1467 01:00:13,770 --> 01:00:16,560 through the open Internet while you're 1468 01:00:15,060 --> 01:00:18,900 traveling you probably want to be pretty 1469 01:00:16,560 --> 01:00:20,579 darn sure that no eavesdroppers or 1470 01:00:18,900 --> 01:00:22,079 anybody else along the way can see what 1471 01:00:20,579 --> 01:00:23,099 you're sending and that whatever you're 1472 01:00:22,079 --> 01:00:24,990 sending is in fact going to the right 1473 01:00:23,099 --> 01:00:26,339 place and that whoever is receiving it 1474 01:00:24,990 --> 01:00:29,849 can authenticate that it in fact came 1475 01:00:26,339 --> 01:00:31,260 from you so you might be worried about 1476 01:00:29,849 --> 01:00:33,359 all different kinds of adversaries 1477 01:00:31,260 --> 01:00:34,440 depending on your scenario from random 1478 01:00:33,359 --> 01:00:36,810 script kiddies who are trying to break 1479 01:00:34,440 --> 01:00:38,490 into websites to nation state level 1480 01:00:36,810 --> 01:00:40,260 attackers and you'll need different 1481 01:00:38,490 --> 01:00:41,339 types of techniques for defending 1482 01:00:40,260 --> 01:00:47,900 against the different categories of 1483 01:00:41,339 --> 01:00:47,900 attackers any other questions 1484 01:00:51,190 --> 01:00:55,300 well so hopefully see some of you 1485 01:00:53,620 --> 01:00:57,310 tomorrow for a random collection of 1486 01:00:55,300 --> 01:00:59,490 things that John Jose and I find 1487 01:00:57,310 --> 01:00:59,490 interesting