0:00:01.100,0:00:05.310 okay so again let's get started with 0:00:03.330,0:00:06.750 today's lecture so today we're going to 0:00:05.310,0:00:09.240 be talking about security and 0:00:06.750,0:00:10.410 cryptography and today's lecture is 0:00:09.240,0:00:12.780 gonna be a little bit different than our 0:00:10.410,0:00:14.849 treatment of this topic in last year's 0:00:12.780,0:00:16.619 class so last year we focused a little 0:00:14.849,0:00:19.350 bit more on security and privacy and 0:00:16.619,0:00:21.720 from the perspective of a user of a 0:00:19.350,0:00:22.949 computer but today we're going to focus 0:00:21.720,0:00:24.689 a little bit more on security and 0:00:22.949,0:00:26.730 cryptography concepts that are relevant 0:00:24.689,0:00:29.400 in understanding some of the tools that 0:00:26.730,0:00:30.810 we talked about earlier in this class so 0:00:29.400,0:00:32.250 for example we talked about hash 0:00:30.810,0:00:34.800 functions or cryptographic hash 0:00:32.250,0:00:37.829 functions like sha-1 in the get lecture 0:00:34.800,0:00:39.239 or we talked about public keys when we 0:00:37.829,0:00:41.010 talked about SSH in the command line 0:00:39.239,0:00:43.260 environment in a command line 0:00:41.010,0:00:44.610 environment lecture and so today we'll 0:00:43.260,0:00:46.320 talk about there's different 0:00:44.610,0:00:47.879 cryptographic primitives in more detail 0:00:46.320,0:00:49.350 and get an understanding of how they 0:00:47.879,0:00:50.579 work and how they're used in these 0:00:49.350,0:00:53.460 different tools that we're teaching in 0:00:50.579,0:00:55.890 this class this lecture is not a 0:00:53.460,0:00:58.199 substitute for a more rigorous class in 0:00:55.890,0:00:59.640 security so they're they're a bunch of 0:00:58.199,0:01:01.050 really good classes at MIT like six 0:00:59.640,0:01:03.660 eight five eight which is on computer 0:01:01.050,0:01:06.420 system security or six eight five seven 0:01:03.660,0:01:08.850 and six eight seven five which are more 0:01:06.420,0:01:10.950 focused on cryptography so don't do 0:01:08.850,0:01:13.380 security work without formal training in 0:01:10.950,0:01:16.380 security from these classes or elsewhere 0:01:13.380,0:01:17.580 and unless you're an expert don't roll 0:01:16.380,0:01:19.860 your own crypto don't build your own 0:01:17.580,0:01:21.960 crypto implementations or protocols and 0:01:19.860,0:01:23.909 the same principle applies to computer 0:01:21.960,0:01:25.560 system security this lecture is not 0:01:23.909,0:01:27.869 about building your own stuff it's about 0:01:25.560,0:01:29.369 understanding what's already out there 0:01:27.869,0:01:31.290 and so this lecture will have a very 0:01:29.369,0:01:32.820 informal but we think practical 0:01:31.290,0:01:35.400 treatment of these basic cryptography 0:01:32.820,0:01:36.659 concepts and yeah hopefully it'll help 0:01:35.400,0:01:39.540 you understand some of the tools we 0:01:36.659,0:01:40.590 talked about earlier in this class any 0:01:39.540,0:01:45.000 questions about the plan for today's 0:01:40.590,0:01:46.560 lecture great so the first topic for 0:01:45.000,0:01:48.780 today is something called entropy 0:01:46.560,0:01:51.119 entropy is a measure of randomness and 0:01:48.780,0:01:52.380 this is useful for example when trying 0:01:51.119,0:01:55.229 to determine the strength of a password 0:01:52.380,0:01:58.290 so let's take a look at this comic from 0:01:55.229,0:02:00.299 xkcd we're a big fan of xkcd comics so 0:01:58.290,0:02:02.490 this comic raise your hand if you've 0:02:00.299,0:02:05.159 seen this before know a good number of 0:02:02.490,0:02:07.530 you so this comic is complaining about 0:02:05.159,0:02:09.270 this common pattern that's been taught 0:02:07.530,0:02:11.879 to users of computers that when you 0:02:09.270,0:02:12.940 design passwords they should be things 0:02:11.879,0:02:16.750 like that T 0:02:12.940,0:02:19.420 our zero ub40 orm purse and three string 0:02:16.750,0:02:21.040 in the top-left like we should design 0:02:19.420,0:02:22.450 passwords that are full of funny 0:02:21.040,0:02:25.210 characters and things like that to make 0:02:22.450,0:02:26.620 it hard for attackers to guess and yet 0:02:25.210,0:02:28.690 turns out that passwords like that are 0:02:26.620,0:02:30.370 actually pretty weak and guessable by 0:02:28.690,0:02:32.530 computers that can guess postures really 0:02:30.370,0:02:34.660 fast and brute-force attacks and on the 0:02:32.530,0:02:37.510 other hand passwords which maybe 0:02:34.660,0:02:39.130 intuitively don't look as secure like 0:02:37.510,0:02:41.620 the one on the bottom left correct horse 0:02:39.130,0:02:45.100 battery staple that one turns out to be 0:02:41.620,0:02:47.650 way more secure so how do I actually 0:02:45.100,0:02:49.810 quantify the security of these different 0:02:47.650,0:02:51.940 passwords it's by measuring the amount 0:02:49.810,0:02:55.230 of randomness in the password how many 0:02:51.940,0:02:57.700 bits of randomness are in there and so 0:02:55.230,0:02:59.170 entropy is measured in bits this is like 0:02:57.700,0:03:07.330 the same bits from information theory 0:02:59.170,0:03:09.310 and we're only going to talk about the 0:03:07.330,0:03:11.110 simple case where you're trying to 0:03:09.310,0:03:12.610 measure the amount of randomness when 0:03:11.110,0:03:15.550 you're choosing from a set of things 0:03:12.610,0:03:17.320 uniformly at random so for example when 0:03:15.550,0:03:19.780 you're constructing a password that's in 0:03:17.320,0:03:22.120 the format of four random words you're 0:03:19.780,0:03:24.310 kind of considering all possible 0:03:22.120,0:03:25.720 sequences of four random words made from 0:03:24.310,0:03:26.739 some dictionary you have you might have 0:03:25.720,0:03:28.390 a dictionary would say a hundred 0:03:26.739,0:03:31.209 thousand words and you're selecting each 0:03:28.390,0:03:32.380 word uniformly at random how many 0:03:31.209,0:03:33.730 possibilities are there 0:03:32.380,0:03:35.830 well you can go and figure that out in 0:03:33.730,0:03:37.870 that example once you know how many 0:03:35.830,0:03:40.690 possibilities there the measure of 0:03:37.870,0:03:49.390 entropy is log base 2 of the number of 0:03:40.690,0:03:51.459 possibilities and as that comic suggests 0:03:49.390,0:03:52.900 this is related to how long it'll take 0:03:51.459,0:03:54.970 an attacker to try to brute-force 0:03:52.900,0:03:56.590 through your different passwords like if 0:03:54.970,0:03:57.820 you have a thousand possibilities you're 0:03:56.590,0:03:59.650 guessing passwords at a thousand 0:03:57.820,0:04:04.300 passwords a second that's not a very 0:03:59.650,0:04:07.840 good password so this is a couple of 0:04:04.300,0:04:09.340 quick examples a coin flip has two 0:04:07.840,0:04:15.280 possibilities and let's assume we have a 0:04:09.340,0:04:19.650 fair coin and so a coin flip has log 0:04:15.280,0:04:21.489 base 2 of 2 is one bit of entropy and 0:04:19.650,0:04:24.880 another thing we might look at is 0:04:21.489,0:04:26.540 something like a dice roll so there's 0:04:24.880,0:04:32.510 six possibilities and log 0:04:26.540,0:04:33.640 two of six is something like 2.6 bits of 0:04:32.510,0:04:36.380 entropy 0:04:33.640,0:04:40.100 so that's how we quantify the amount of 0:04:36.380,0:04:41.930 randomness in something so now going 0:04:40.100,0:04:43.400 back to that example in the xkcd comic 0:04:41.930,0:04:44.690 when we want to figure out how much 0:04:43.400,0:04:46.670 entropy is in a password we have to 0:04:44.690,0:04:48.680 consider and if the model for how the 0:04:46.670,0:04:51.110 password was generated for example in 0:04:48.680,0:04:53.450 the top left you could consider okay we 0:04:51.110,0:04:55.310 take one dictionary word make some 0:04:53.450,0:04:57.650 substitutions of some of the characters 0:04:55.310,0:04:59.750 with numbers that look similar to that 0:04:57.650,0:05:01.430 character add one punctuation mark at 0:04:59.750,0:05:04.340 the end and add one numeral after that 0:05:01.430,0:05:05.630 and we can take that model and then use 0:05:04.340,0:05:07.400 common rhetoric to figure out how many 0:05:05.630,0:05:09.260 possibilities there are and from that we 0:05:07.400,0:05:10.550 can derive how many bits of entropy are 0:05:09.260,0:05:13.130 in that password so in that particular 0:05:10.550,0:05:14.180 example I don't know exactly what model 0:05:13.130,0:05:16.750 they were using for the password but 0:05:14.180,0:05:19.280 they calculated their 28 bits of entropy 0:05:16.750,0:05:22.250 whereas in the bottom-left example that 0:05:19.280,0:05:23.960 correct horse battery staple they assume 0:05:22.250,0:05:26.630 that you're working from a dictionary of 0:05:23.960,0:05:28.850 about 2,000 words and so when you 0:05:26.630,0:05:30.770 combine four of those words together you 0:05:28.850,0:05:31.850 get about 44 bits of entropy from that 0:05:30.770,0:05:34.670 so it's much more secure than the 0:05:31.850,0:05:36.620 example before it so any questions about 0:05:34.670,0:05:43.520 this definition of entropy or why it's 0:05:36.620,0:05:45.620 useful and so when you're generating 0:05:43.520,0:05:47.060 your own passwords keep this in mind you 0:05:45.620,0:05:48.800 want a high entropy password and the 0:05:47.060,0:05:50.360 exact number you need depends on exactly 0:05:48.800,0:05:52.190 what you're trying to protect against 0:05:50.360,0:05:53.540 like in general a concept in securities 0:05:52.190,0:05:55.700 you have to keep in mind what your 0:05:53.540,0:05:56.930 threat model is like what attackers 0:05:55.700,0:05:58.460 you're concerned about what kinds of 0:05:56.930,0:06:01.160 technique the attackers might be using 0:05:58.460,0:06:02.930 for example this comic refers to an 0:06:01.160,0:06:04.580 attacker that can guess a thousand 0:06:02.930,0:06:07.100 passwords a second this might be 0:06:04.580,0:06:09.140 something that's possible for say a web 0:06:07.100,0:06:11.480 service that allows people to try to log 0:06:09.140,0:06:12.830 in with your email and then random 0:06:11.480,0:06:15.430 passwords that the attacker is trying 0:06:12.830,0:06:17.600 but this thousand passwords the second 0:06:15.430,0:06:19.970 model might not be accurate for other 0:06:17.600,0:06:21.590 scenarios for example an offline 0:06:19.970,0:06:23.870 password cracking scenario or maybe the 0:06:21.590,0:06:25.790 attacker has broken into a website and 0:06:23.870,0:06:27.920 downloaded their database and they have 0:06:25.790,0:06:28.700 some obfuscated form of your password 0:06:27.920,0:06:30.560 and they're trying to figure out what 0:06:28.700,0:06:31.880 the password is maybe they can paralyze 0:06:30.560,0:06:34.010 this attack and make it go to million 0:06:31.880,0:06:35.510 guesses a second and so exactly how much 0:06:34.010,0:06:37.210 entropy you need depends on exactly what 0:06:35.510,0:06:39.350 you're trying to protect against but 0:06:37.210,0:06:40.230 roughly forty bits of entropy might be 0:06:39.350,0:06:42.660 good enough for 0:06:40.230,0:06:44.370 which is protected by a website and 0:06:42.660,0:06:47.340 you're concerned about online password 0:06:44.370,0:06:49.140 guesses and then maybe something like 80 0:06:47.340,0:06:51.000 bits of entropy might be good if you're 0:06:49.140,0:06:52.530 concerned about offline attacks and you 0:06:51.000,0:06:58.260 want to be really really secure so 0:06:52.530,0:07:00.390 they're rough guidelines you can use and 0:06:58.260,0:07:02.370 then how do you actually generate strong 0:07:00.390,0:07:03.990 passwords well you have some model for 0:07:02.370,0:07:05.340 password for example the for dictionary 0:07:03.990,0:07:07.320 works thing and you can actually get a 0:07:05.340,0:07:08.700 dictionary and then you can use methods 0:07:07.320,0:07:10.680 like dice where so there's a song we 0:07:08.700,0:07:12.840 linked to in the lecture notes where you 0:07:10.680,0:07:14.460 can actually get physical dye and roll 0:07:12.840,0:07:15.900 them and then map dice rolls to 0:07:14.460,0:07:17.730 dictionary words in order to eventually 0:07:15.900,0:07:19.290 turn that into a password and doing 0:07:17.730,0:07:21.540 something like this using some kind of 0:07:19.290,0:07:24.150 physical token that you know is random 0:07:21.540,0:07:26.280 like a balanced die or a coin that you 0:07:24.150,0:07:27.870 know is balanced is a good thing to do 0:07:26.280,0:07:29.610 because humans are actually not good at 0:07:27.870,0:07:31.200 choosing random numbers right if I just 0:07:29.610,0:07:32.940 asked you to name a random number for 1 0:07:31.200,0:07:34.710 to 100 chances are that you're probably 0:07:32.940,0:07:35.790 not doing so uniformly at random very 0:07:34.710,0:07:37.200 well and so that's why it's actually 0:07:35.790,0:07:42.300 good to use these physical tokens in 0:07:37.200,0:07:44.610 order to produce randomness so entropy 0:07:42.300,0:07:49.280 that's our first concept recovering any 0:07:44.610,0:07:52.080 questions about that or about this comic 0:07:49.280,0:07:54.600 great so getting into slightly more 0:07:52.080,0:07:55.860 interesting and complicated topics the 0:07:54.600,0:07:58.590 next thing we're going to talk about is 0:07:55.860,0:08:00.510 hash functions so hopefully most of you 0:07:58.590,0:08:02.280 were here during the get lecture where 0:08:00.510,0:08:04.680 we talked about the sha-1 hash function 0:08:02.280,0:08:12.150 used in get so now going into that topic 0:08:04.680,0:08:14.100 in a little bit more detail hash 0:08:12.150,0:08:16.620 functions at a high level are functions 0:08:14.100,0:08:20.220 that map a variable amount of data into 0:08:16.620,0:08:22.950 a fixed size output so for example the 0:08:20.220,0:08:25.350 sha-1 hash functions is one example of a 0:08:22.950,0:08:30.230 hash function takes in some input of 0:08:25.350,0:08:35.340 some number of bytes and outputs exactly 0:08:30.230,0:08:37.320 160 bits of output so that's kind of the 0:08:35.340,0:08:39.570 type signature of this particular hash 0:08:37.320,0:08:41.040 function and then these functions have 0:08:39.570,0:08:42.690 some number of properties that are 0:08:41.040,0:08:46.710 useful so at a high level 0:08:42.690,0:08:47.820 these can be thought about as hard to 0:08:46.710,0:08:50.010 invert functions that have 0:08:47.820,0:08:53.190 random-looking outputs we can actually 0:08:50.010,0:08:54.080 try this out on some random piece of 0:08:53.190,0:08:56.930 data 0:08:54.080,0:08:59.730 for example if I enter into my terminal 0:08:56.930,0:09:02.130 printf hello this does exactly what you 0:08:59.730,0:09:04.020 would expect it does prints the set to 0:09:02.130,0:09:07.380 standard out and I can pipe this to the 0:09:04.020,0:09:09.090 sha-1 sum command so this is a command 0:09:07.380,0:09:11.700 line program that accepts input via 0:09:09.090,0:09:13.170 standard in and computes this sha-1 0:09:11.700,0:09:14.340 function which takes in some variable 0:09:13.170,0:09:17.400 number of bytes from the input and 0:09:14.340,0:09:20.130 produces a 160-bit output which in this 0:09:17.400,0:09:21.720 particular case is represent or encoded 0:09:20.130,0:09:24.000 as a hexadecimal string so it's a length 0:09:21.720,0:09:26.130 40 hexadecimal string and you see this 0:09:24.000,0:09:27.510 output right here this - just means it 0:09:26.130,0:09:30.660 took it it took its input from 0:09:27.510,0:09:32.400 standardin so this output just looks 0:09:30.660,0:09:33.650 like some random number but one 0:09:32.400,0:09:37.050 important thing is that this is a 0:09:33.650,0:09:39.240 deterministic number if you try the same 0:09:37.050,0:09:40.680 command on your own computer printf 0:09:39.240,0:09:42.990 hello sha-1 something you will get the 0:09:40.680,0:09:44.340 same number out so sha-1 is some 0:09:42.990,0:09:47.190 well-known function that people have 0:09:44.340,0:09:48.540 agreed upon for all its parameters we'll 0:09:47.190,0:09:51.900 see that if we tweak the input a little 0:09:48.540,0:09:54.390 bit like say changed hello to holo with 0:09:51.900,0:09:55.920 a capital H now I get a completely 0:09:54.390,0:09:57.540 different looking output and this also 0:09:55.920,0:09:58.860 looks like some other kind of random ish 0:09:57.540,0:10:00.090 number even though it is deterministic 0:09:58.860,0:10:07.740 and you could reproduce this on your own 0:10:00.090,0:10:16.140 computer hash functions have a number of 0:10:07.740,0:10:17.610 properties that are pretty important the 0:10:16.140,0:10:18.810 first property that cryptographic hash 0:10:17.610,0:10:21.060 functions have is that their 0:10:18.810,0:10:22.440 non-invertible and what that means is 0:10:21.060,0:10:25.230 that if you take the output from this 0:10:22.440,0:10:27.720 function for example that a a f4 0:10:25.230,0:10:30.390 ballaugh 3 for D strings shown there 0:10:27.720,0:10:32.850 from that output it's hard to figure out 0:10:30.390,0:10:36.510 what the input was that produced that 0:10:32.850,0:10:38.310 output so you can go one way compute the 0:10:36.510,0:10:41.190 sha-1 hash easily but you can't go 0:10:38.310,0:10:43.380 backwards another property that these 0:10:41.190,0:10:50.910 functions have is that their collision 0:10:43.380,0:10:53.400 resistant and what this property means 0:10:50.910,0:10:56.780 is that it's hard to find two different 0:10:53.400,0:10:59.910 inputs that produce the same output and 0:10:56.780,0:11:02.460 so this basically describes what a 0:10:59.910,0:11:04.040 cryptographic hash function is so any 0:11:02.460,0:11:06.510 questions about the kind of 0:11:04.040,0:11:08.900 specification of a cryptographic hash 0:11:06.510,0:11:08.900 function 0:11:09.350,0:11:13.380 okay so what are these hash functions 0:11:11.580,0:11:16.170 actually useful for well we've already 0:11:13.380,0:11:19.260 seen one application in git for content 0:11:16.170,0:11:22.070 address storage so we want it get we 0:11:19.260,0:11:24.180 want some uniform way of naming 0:11:22.070,0:11:26.130 different objects that are in the object 0:11:24.180,0:11:27.900 store and it turns out that get 0:11:26.130,0:11:29.940 addresses all of them by their sha-1 0:11:27.900,0:11:33.000 hash so you have the actual data you 0:11:29.940,0:11:34.830 want to store and then to name that 0:11:33.000,0:11:36.420 particular piece of data you just name 0:11:34.830,0:11:37.650 the sha-1 hash and all of that is stored 0:11:36.420,0:11:41.820 in the object store in that particular 0:11:37.650,0:11:43.230 way we see this when looking at many 0:11:41.820,0:11:45.030 different parts of git for example right 0:11:43.230,0:11:47.520 here I'm going to get repository if I do 0:11:45.030,0:11:50.610 get log it shows me the commits and for 0:11:47.520,0:11:53.280 example this number up here is the 0:11:50.610,0:11:55.290 cryptographic hash function sha-1 apply 0:11:53.280,0:11:58.470 to the commit object that describes this 0:11:55.290,0:12:00.360 particular commit so does anybody know 0:11:58.470,0:12:02.370 why git uses a cryptographic hash 0:12:00.360,0:12:03.480 function here as opposed to so you might 0:12:02.370,0:12:04.830 have heard in your other computer 0:12:03.480,0:12:06.570 science classes like say your 0:12:04.830,0:12:08.850 introductory algorithms class there are 0:12:06.570,0:12:11.010 things called hash functions without the 0:12:08.850,0:12:13.740 word cryptographic appended in front of 0:12:11.010,0:12:16.080 them and they have similar properties 0:12:13.740,0:12:18.570 that they turn a variable sized input 0:12:16.080,0:12:21.300 into some fixed size output but they 0:12:18.570,0:12:23.400 don't quite have these properties where 0:12:21.300,0:12:24.810 it's hard to find an input that produces 0:12:23.400,0:12:26.820 a particular output or things like that 0:12:24.810,0:12:29.010 it's a kind of weaker definition than 0:12:26.820,0:12:29.880 this so why is it that in get we care 0:12:29.010,0:12:32.100 about having a cryptographic hash 0:12:29.880,0:12:33.810 function as opposed to just a regular 0:12:32.100,0:12:36.080 old hash function does anybody have any 0:12:33.810,0:12:36.080 ideas 0:12:45.390,0:12:52.180 yeah that's that's basically it that we 0:12:49.180,0:12:54.010 don't want to have kind of conflicts in 0:12:52.180,0:12:55.660 the output from this hash function like 0:12:54.010,0:12:57.700 every commit is identified by a hash 0:12:55.660,0:12:59.770 function every file is identified by the 0:12:57.700,0:13:01.000 hash of that file if it were ever the 0:12:59.770,0:13:03.190 case that two different pieces of 0:13:01.000,0:13:05.620 content in practice produce the same 0:13:03.190,0:13:07.600 output that is if the function were not 0:13:05.620,0:13:09.610 collision resistant that could be really 0:13:07.600,0:13:12.010 problematic right because then you and I 0:13:09.610,0:13:13.990 we could have to do to get repos that we 0:13:12.010,0:13:15.760 think are the same we check out the same 0:13:13.990,0:13:18.460 commit hash and we might end up with 0:13:15.760,0:13:21.400 different files and this is concerning 0:13:18.460,0:13:23.380 because git is used to track software a 0:13:21.400,0:13:25.510 track development of software and it's 0:13:23.380,0:13:28.180 also kind of involved in making sure 0:13:25.510,0:13:29.440 that the right people are authoring the 0:13:28.180,0:13:30.700 software nothing funny has happened in 0:13:29.440,0:13:32.620 the process for example there all these 0:13:30.700,0:13:34.540 open source projects like the Linux 0:13:32.620,0:13:37.150 kernel where development is done using 0:13:34.540,0:13:39.610 git it would be really bad if some 0:13:37.150,0:13:41.290 contributor to get could say edit some 0:13:39.610,0:13:42.970 file and propose some change that looks 0:13:41.290,0:13:44.980 pretty benign like oh let me go and 0:13:42.970,0:13:46.720 improve this part of Linux submit that 0:13:44.980,0:13:48.970 change request to the Linux developers 0:13:46.720,0:13:52.120 and then in practice actually supply a 0:13:48.970,0:13:54.220 git repository that has the same commit 0:13:52.120,0:13:55.570 hash and whatnot but actually the file 0:13:54.220,0:13:59.080 contents are different there's something 0:13:55.570,0:14:00.730 malicious so git actually relies on this 0:13:59.080,0:14:03.190 sha-1 function being a cryptographic 0:14:00.730,0:14:10.150 hash function in order to achieve 0:14:03.190,0:14:11.680 security any questions about that and 0:14:10.150,0:14:13.810 some other interesting applications of 0:14:11.680,0:14:15.670 hash functions so as we saw hash 0:14:13.810,0:14:17.950 functions turn big inputs into small 0:14:15.670,0:14:19.750 outputs and in a way because the hash 0:14:17.950,0:14:21.670 function is collision resistant the 0:14:19.750,0:14:24.430 output can be used to kind of attest to 0:14:21.670,0:14:27.040 or identify the input and so you can 0:14:24.430,0:14:30.130 think of a hash as a short summary of a 0:14:27.040,0:14:32.020 file for example in this directory of a 0:14:30.130,0:14:34.780 bunch of files and I can compute the 0:14:32.020,0:14:37.630 sha-1 sum of some file in this directory 0:14:34.780,0:14:40.660 and this is the sha-1 algorithm applied 0:14:37.630,0:14:41.770 to this readme MD file and what's 0:14:40.660,0:14:43.810 interesting is that it is 0:14:41.770,0:14:44.800 computationally hard or like impossible 0:14:43.810,0:14:47.650 you can kind of think of it as 0:14:44.800,0:14:50.230 impossible to find any other file so a 0:14:47.650,0:14:52.870 different file that has the same hash 0:14:50.230,0:14:55.030 output and one scenario in which this is 0:14:52.870,0:14:56.090 useful is when you download files from 0:14:55.030,0:14:59.220 the internet 0:14:56.090,0:15:01.410 for example there are lots of Linux 0:14:59.220,0:15:03.630 distributions that distribute large CD 0:15:01.410,0:15:05.250 or DVD images from their website like I 0:15:03.630,0:15:07.860 can go to Debian org and download the 0:15:05.250,0:15:09.510 latest version of Debian the thing is 0:15:07.860,0:15:10.950 that hosting those files can be 0:15:09.510,0:15:12.420 expensive and so a lot of people are 0:15:10.950,0:15:14.490 nice enough to host mirrors of these 0:15:12.420,0:15:17.520 files so instead of downloading Debian 0:15:14.490,0:15:20.310 from Debian org I can go to one of many 0:15:17.520,0:15:21.750 other sites and download what are 0:15:20.310,0:15:23.580 supposed to be the same files that are 0:15:21.750,0:15:25.490 hosted at Debian org but how do I know 0:15:23.580,0:15:28.920 that I actually got the correct file 0:15:25.490,0:15:30.780 like what if I set up a malicious mirror 0:15:28.920,0:15:33.180 and you go to like Anisha is evil Debian 0:15:30.780,0:15:35.190 website calm and then try to download 0:15:33.180,0:15:37.290 Debian turns out that your Linux 0:15:35.190,0:15:38.880 installation is backdoored well one 0:15:37.290,0:15:40.620 thing you could do is download a copy 0:15:38.880,0:15:42.060 from the original double-unit website 0:15:40.620,0:15:43.440 and then download my version and compare 0:15:42.060,0:15:44.550 them but that kind of defeats the 0:15:43.440,0:15:46.110 purpose right because we want to avoid 0:15:44.550,0:15:47.580 downloading things from Debian org 0:15:46.110,0:15:49.080 because hosting these files is expensive 0:15:47.580,0:15:50.490 and we want all these different people 0:15:49.080,0:15:53.660 to be able to mirror copies of the files 0:15:50.490,0:15:55.770 elsewhere so does anybody see how 0:15:53.660,0:15:57.630 cryptographic hash functions could be 0:15:55.770,0:15:59.190 useful to solve this problem that I want 0:15:57.630,0:16:02.970 to download a file from an untrusted 0:15:59.190,0:16:04.950 source but and not from like the trusted 0:16:02.970,0:16:06.210 source itself but maybe I can get some 0:16:04.950,0:16:07.920 small piece of information from this 0:16:06.210,0:16:09.660 trusted source in order to know whether 0:16:07.920,0:16:11.460 the file I downloaded from the untrusted 0:16:09.660,0:16:18.060 source is the thing I was supposed to 0:16:11.460,0:16:19.200 get yes like it's basically just a 0:16:18.060,0:16:20.610 straightforward application of 0:16:19.200,0:16:22.980 cryptographic hash functions 0:16:20.610,0:16:25.620 so what Debian org can do is they can 0:16:22.980,0:16:27.870 produce their kind of correct ISO file 0:16:25.620,0:16:29.730 or whatever they want and instead of 0:16:27.870,0:16:32.550 publishing the file itself on their 0:16:29.730,0:16:35.190 website they can publish a hash of that 0:16:32.550,0:16:37.140 file so compared to the file itself 0:16:35.190,0:16:39.390 which may be many gigabytes this is only 0:16:37.140,0:16:41.190 like in this particular case 160 bits of 0:16:39.390,0:16:43.620 data right so very cheap to host and 0:16:41.190,0:16:46.260 then what I can do is a user is I can 0:16:43.620,0:16:48.090 download that file from any random 0:16:46.260,0:16:50.460 website it could be an untrusted website 0:16:48.090,0:16:53.780 and after I download I just double check 0:16:50.460,0:16:56.040 the sha-1 hash and if the hash matches 0:16:53.780,0:16:57.330 then I know that I have the right file 0:16:56.040,0:17:00.300 because it's computationally infeasible 0:16:57.330,0:17:01.980 for somebody to give me some different 0:17:00.300,0:17:04.260 file that happens to have the same hash 0:17:01.980,0:17:07.230 because hash functions are collision 0:17:04.260,0:17:10.130 resistant so any questions about that 0:17:07.230,0:17:10.130 application yeah 0:17:18.170,0:17:22.350 yeah so that's a good question like why 0:17:20.490,0:17:24.120 do you need different people to host the 0:17:22.350,0:17:26.100 information like wouldn't it be equally 0:17:24.120,0:17:27.000 expensive for everybody so the answer is 0:17:26.100,0:17:29.010 that question is a little bit 0:17:27.000,0:17:30.840 complicated but like here's that here's 0:17:29.010,0:17:32.970 a partial answer one thing is that 0:17:30.840,0:17:34.500 downloading files from a server is 0:17:32.970,0:17:36.360 affected by how far away the server is 0:17:34.500,0:17:38.730 from you so for example if the servers 0:17:36.360,0:17:40.830 in Massachusetts and you're in say China 0:17:38.730,0:17:42.030 like you have to kind of make a big 0:17:40.830,0:17:43.830 round trip across the internet and that 0:17:42.030,0:17:46.200 may be expensive for a number of reasons 0:17:43.830,0:17:47.700 like the latency is high and the traffic 0:17:46.200,0:17:48.990 traffic needs to go through kind of lots 0:17:47.700,0:17:50.610 of different wires to make its way all 0:17:48.990,0:17:52.680 the way to where you are and so one 0:17:50.610,0:17:54.180 thing that these websites do is that 0:17:52.680,0:17:55.710 they distribute their content to servers 0:17:54.180,0:17:57.060 that are all over the world and then as 0:17:55.710,0:17:58.500 a user you download from the server 0:17:57.060,0:18:00.960 that's closest to you like for example 0:17:58.500,0:18:02.490 mit maintains a Debian package 0:18:00.960,0:18:04.560 repository and like kind of mirrors all 0:18:02.490,0:18:07.230 the Debbie and stuff so if you're a 0:18:04.560,0:18:10.260 Debian user at MIT you can use the MIT 0:18:07.230,0:18:12.060 copy of everything and then you can kind 0:18:10.260,0:18:14.100 of access it over our fast local network 0:18:12.060,0:18:15.690 and that traffic never needs to go to 0:18:14.100,0:18:18.750 the outside Internet at all so it's very 0:18:15.690,0:18:23.130 fast that's a good question any other 0:18:18.750,0:18:24.180 questions ok and then one final kind of 0:18:23.130,0:18:25.530 interesting application of hash 0:18:24.180,0:18:28.650 functions is something called a 0:18:25.530,0:18:30.120 commitment scheme so I want to play a 0:18:28.650,0:18:31.590 game and I need a volunteer for this so 0:18:30.120,0:18:32.850 you don't actually need to get up from 0:18:31.590,0:18:34.830 your seat or anything I was need you to 0:18:32.850,0:18:36.450 talk with me so any volunteers raise 0:18:34.830,0:18:37.730 your hand yeah okay yeah what's your 0:18:36.450,0:18:40.650 name 0:18:37.730,0:18:42.360 Abdul Aziz okay great so Abdul Aziz 0:18:40.650,0:18:44.250 we're going to play a game we're going 0:18:42.360,0:18:46.140 to play a game where I'm going to flip a 0:18:44.250,0:18:48.180 coin and then you're gonna call heads or 0:18:46.140,0:18:49.860 tails and if you call it right you win 0:18:48.180,0:18:52.140 and if you call it wrong you lose and 0:18:49.860,0:18:56.220 there are no stakes for this game but 0:18:52.140,0:18:57.660 just the pride of winning so sadly I 0:18:56.220,0:18:58.950 checked my wallet and all I have is 0:18:57.660,0:19:00.570 dollar bills I don't have any coins with 0:18:58.950,0:19:02.190 me so instead I'm going to just flip the 0:19:00.570,0:19:04.950 coin in my head all right 0:19:02.190,0:19:07.740 so okay I flip the coin call heads or 0:19:04.950,0:19:13.680 tails sorry you lost it was heads I 0:19:07.740,0:19:15.420 don't I play again yeah I can cheat 0:19:13.680,0:19:17.280 right I can just see what you say and 0:19:15.420,0:19:19.950 say the opposite thing so let's try 0:19:17.280,0:19:22.260 fixing this game how about you call 0:19:19.950,0:19:26.250 heads or tails after I say what the 0:19:22.260,0:19:27.510 flip result was okay yeah so if I say oh 0:19:26.250,0:19:35.220 the result is tails what are you gonna 0:19:27.510,0:19:39.030 say are you call tails yeah so is it 0:19:35.220,0:19:40.890 possible to play this guess what guess 0:19:39.030,0:19:43.020 what the coin flip result is game in a 0:19:40.890,0:19:44.550 fair way without having a physical coin 0:19:43.020,0:19:45.660 that we share like because I can't 0:19:44.550,0:19:47.160 really manipulate your physical reality 0:19:45.660,0:19:48.570 if I flip a coin in front of you 0:19:47.160,0:19:50.820 probably trust that it's okay right 0:19:48.570,0:19:51.990 so it turns out that hash functions give 0:19:50.820,0:19:54.720 us a kind of cool way to solve this 0:19:51.990,0:19:57.990 problem to through a idea called a 0:19:54.720,0:19:59.370 commitment scheme so I can say they're 0:19:57.990,0:20:02.670 like here's the construction of the 0:19:59.370,0:20:05.250 solution I can pick heads or tails and 0:20:02.670,0:20:09.390 I'm actually going to pick a big random 0:20:05.250,0:20:14.370 number say like this number here and 0:20:09.390,0:20:16.830 what I can do is compute the sha-1 sum 0:20:14.370,0:20:17.910 of this number at this moment you 0:20:16.830,0:20:19.980 haven't seen this number yet I'm just 0:20:17.910,0:20:22.680 doing all this in my head and then what 0:20:19.980,0:20:25.170 I do is I tell you okay I flipped a coin 0:20:22.680,0:20:26.340 and I'm not going to tell you what the 0:20:25.170,0:20:28.560 result is just yet because you haven't 0:20:26.340,0:20:29.850 called heads or tails but I'll tell you 0:20:28.560,0:20:31.770 what the shell wants some of the result 0:20:29.850,0:20:34.230 is here you go and I tell you this value 0:20:31.770,0:20:36.000 now after this you can call heads or 0:20:34.230,0:20:38.310 tails so what do you say like say say 0:20:36.000,0:20:40.080 heads afterwards what I can do is I can 0:20:38.310,0:20:42.000 reveal to you what my input to this 0:20:40.080,0:20:43.500 function was and then you can 0:20:42.000,0:20:45.480 cross-check this right you can compute 0:20:43.500,0:20:47.040 the sha-1 sum on the input to verify 0:20:45.480,0:20:48.930 that the output is what I said it was 0:20:47.040,0:20:50.760 earlier and then we can have some way of 0:20:48.930,0:20:52.500 mapping these numbers to heads or tails 0:20:50.760,0:20:54.360 so I might have agreed upon beforehand 0:20:52.500,0:20:56.640 that even numbers are heads and odd 0:20:54.360,0:20:58.080 numbers or tails and so this is a way of 0:20:56.640,0:21:01.200 fixing that game so we can actually play 0:20:58.080,0:21:03.600 this game in in our heads right I can 0:21:01.200,0:21:05.970 pick a value but not reveal that value 0:21:03.600,0:21:07.350 to you but I can commit to the value so 0:21:05.970,0:21:09.390 this is a kind of binding commitment 0:21:07.350,0:21:11.310 scheme that I can't change my mind after 0:21:09.390,0:21:14.220 I've told you this but it doesn't reveal 0:21:11.310,0:21:15.480 the original value to you and so this is 0:21:14.220,0:21:17.520 one other neat application of 0:21:15.480,0:21:18.330 cryptographic hash functions so any 0:21:17.520,0:21:24.030 questions about this particular 0:21:18.330,0:21:26.790 construction okay great so moving on to 0:21:24.030,0:21:29.710 the next topic we're going to talk about 0:21:26.790,0:21:35.369 key derivation functions 0:21:29.710,0:21:35.369 [Applause] 0:21:38.650,0:21:46.850 often abbreviate it as KDF so this is a 0:21:45.350,0:21:49.580 concept that's very similar to hash 0:21:46.850,0:21:51.710 functions except it has kind of one 0:21:49.580,0:21:54.740 extra property that it is slow to 0:21:51.710,0:21:56.320 compute for example there's a hash 0:21:54.740,0:22:08.240 function of key derivation function 0:21:56.320,0:22:12.110 known as P pbkdf2 pbkdf2 password-based 0:22:08.240,0:22:14.180 key derivation function that has a kind 0:22:12.110,0:22:15.410 of similar form as these hash functions 0:22:14.180,0:22:16.850 we were talking about here that they 0:22:15.410,0:22:18.380 take in some variable length input in 0:22:16.850,0:22:19.280 pretty so fixed length output but 0:22:18.380,0:22:20.450 they're meant to be used for one 0:22:19.280,0:22:22.370 particular purpose 0:22:20.450,0:22:24.740 the purpose is generally to use the 0:22:22.370,0:22:26.510 fixed length output as a key in another 0:22:24.740,0:22:28.340 cryptographic algorithm and we'll talk 0:22:26.510,0:22:31.310 about those algorithms like what use the 0:22:28.340,0:22:32.900 output of this thing for in a moment but 0:22:31.310,0:22:36.380 a one property of these things is that 0:22:32.900,0:22:39.410 they're slow does anybody have any idea 0:22:36.380,0:22:40.700 why you'd want an algorithm to be slow 0:22:39.410,0:22:42.920 like normally we want algorithms to be 0:22:40.700,0:22:46.000 fast right so why would we want an 0:22:42.920,0:22:46.000 algorithm to be slow yes 0:22:54.430,0:22:59.630 yeah that's exactly it so I'll repeat so 0:22:57.860,0:23:01.940 it goes into the microphone the reason 0:22:59.630,0:23:04.130 you want these to be slow is when you're 0:23:01.940,0:23:05.840 actually using it for something like 0:23:04.130,0:23:07.520 password authentication where you have 0:23:05.840,0:23:08.810 the hash of a password saved and then 0:23:07.520,0:23:10.100 somebody inputs the password you want to 0:23:08.810,0:23:12.440 know if that corresponds to the hash 0:23:10.100,0:23:14.510 it's ok if it's slow because you're only 0:23:12.440,0:23:15.680 doing this check kind of once but the 0:23:14.510,0:23:17.090 other scenario in which you're going to 0:23:15.680,0:23:18.500 be using this function is when 0:23:17.090,0:23:20.540 somebody's trying to brute-force a 0:23:18.500,0:23:22.430 password say a website has their 0:23:20.540,0:23:23.360 password database stolen and somebody's 0:23:22.430,0:23:25.430 going through all the accounts I'm 0:23:23.360,0:23:27.530 trying to break all the passwords well 0:23:25.430,0:23:28.610 in that case you want this to be slow 0:23:27.530,0:23:29.990 because someone's gonna be doing this 0:23:28.610,0:23:31.640 like millions and millions of times and 0:23:29.990,0:23:33.260 you can slow down the attacker a lot by 0:23:31.640,0:23:35.030 making this function slow and so it's 0:23:33.260,0:23:36.500 fine if this takes you like one second 0:23:35.030,0:23:38.750 upon logging in to compute this function 0:23:36.500,0:23:40.100 but when your brute forcing it we don't 0:23:38.750,0:23:41.960 go to a thousand guesses a second like 0:23:40.100,0:23:46.220 in that xkcd comic we can slow it down a 0:23:41.960,0:23:47.860 little bit so what is the output of key 0:23:46.220,0:23:50.060 derivation functions actually used for 0:23:47.860,0:23:52.490 well the next topic we're going to talk 0:23:50.060,0:23:53.570 about probably like one of the most 0:23:52.490,0:23:55.430 classic things when you think about 0:23:53.570,0:24:00.470 cryptography is encryption and 0:23:55.430,0:24:17.300 decryption the next topic is symmetric 0:24:00.470,0:24:18.410 key cryptography and like the rest of 0:24:17.300,0:24:19.700 this lecture we're not going to talk 0:24:18.410,0:24:21.470 about how you implement these we're 0:24:19.700,0:24:23.900 going to talk about the API for a 0:24:21.470,0:24:24.740 symmetric key symmetric key crypto like 0:24:23.900,0:24:28.070 how it's used 0:24:24.740,0:24:30.530 so symmetric key cryptosystems have a 0:24:28.070,0:24:32.930 couple different functions they have a 0:24:30.530,0:24:35.240 key generation function which is a 0:24:32.930,0:24:38.570 randomized function that produces a 0:24:35.240,0:24:42.820 thing we call the key and then they have 0:24:38.570,0:24:42.820 a pair of functions encrypt and decrypt 0:24:45.790,0:24:52.940 and encrypt take as input something we 0:24:49.130,0:24:54.620 refer to as the plaintext and this is 0:24:52.940,0:24:57.710 just some sequence of bytes some data 0:24:54.620,0:24:59.420 and it takes in a key so something that 0:24:57.710,0:25:03.190 came as an output of this key generation 0:24:59.420,0:25:03.190 function and produces 0:25:04.140,0:25:08.730 what we call the ciphertext and then 0:25:06.750,0:25:14.760 decrypt does the opposite of this so it 0:25:08.730,0:25:23.130 takes the ciphertext along with the key 0:25:14.760,0:25:24.929 and produces the plaintext and this 0:25:23.130,0:25:29.429 triple of functions has a couple 0:25:24.929,0:25:31.590 properties one is that like one one team 0:25:29.429,0:25:33.450 you might expect is that this thing 0:25:31.590,0:25:36.290 doesn't really tell you all that much 0:25:33.450,0:25:44.700 about this input to the encryption so 0:25:36.290,0:25:46.559 property number one is given the 0:25:44.700,0:26:02.280 ciphertext you can't figure out the 0:25:46.559,0:26:03.299 plaintext without the key and the other 0:26:02.280,0:26:12.210 property is kind of the obvious 0:26:03.299,0:26:14.460 correctness property that if you take 0:26:12.210,0:26:16.710 something and you encrypt it some 0:26:14.460,0:26:19.559 message M with a key K and then you 0:26:16.710,0:26:24.470 decrypt that ciphertext using the same 0:26:19.559,0:26:24.470 key that gives you back the same message 0:26:24.500,0:26:30.360 this is the kind of obvious correctness 0:26:27.090,0:26:32.280 property so does this description make 0:26:30.360,0:26:33.990 sense does it fit your kind of intuitive 0:26:32.280,0:26:36.000 understanding of taking some piece of 0:26:33.990,0:26:37.679 data and obscuring it so you can't 0:26:36.000,0:26:40.020 really tell anything about the original 0:26:37.679,0:26:42.510 input but then taking that obscured 0:26:40.020,0:26:44.760 result and then passing it there's some 0:26:42.510,0:26:50.190 decryption function given that key to 0:26:44.760,0:26:51.990 retrieve the original input and this 0:26:50.190,0:26:53.130 this isn't really a rigorous definition 0:26:51.990,0:26:55.440 of what it means for something to be 0:26:53.130,0:27:01.799 secure but it's a good enough intuitive 0:26:55.440,0:27:03.179 definition that we can work with it so 0:27:01.799,0:27:08.220 any questions about that description 0:27:03.179,0:27:09.780 there so where can key cryptography be 0:27:08.220,0:27:11.610 useful we'll talk about a whole bunch of 0:27:09.780,0:27:13.110 examples later in this lecture but one 0:27:11.610,0:27:15.150 example we'll talk about right now is 0:27:13.110,0:27:16.719 encrypting files for storage and 0:27:15.150,0:27:20.450 untrusted cloud service 0:27:16.719,0:27:23.539 so consider say something like Dropbox 0:27:20.450,0:27:25.609 or Google Drive or things like that when 0:27:23.539,0:27:27.649 you upload files there you're trusting 0:27:25.609,0:27:30.200 the service to not look at your files or 0:27:27.649,0:27:32.269 do anything malicious with them these 0:27:30.200,0:27:34.129 services like at least the ones I named 0:27:32.269,0:27:36.379 are not intend encrypted or anything 0:27:34.129,0:27:38.119 like that like in theory any employee 0:27:36.379,0:27:39.919 those companies could look at your files 0:27:38.119,0:27:41.539 now of course these companies have lots 0:27:39.919,0:27:43.219 of policies and technical controls in 0:27:41.539,0:27:45.229 place for making sure that that sort of 0:27:43.219,0:27:46.399 thing doesn't happen but that doesn't 0:27:45.229,0:27:48.709 mean that it's not technically possible 0:27:46.399,0:27:50.119 and so one thing you might want to do if 0:27:48.709,0:27:52.639 you don't want to trust these cloud 0:27:50.119,0:27:53.899 services to not peek at your data not do 0:27:52.639,0:27:55.339 like machine learning over them or do 0:27:53.899,0:27:57.259 other sorts of things that you wouldn't 0:27:55.339,0:27:59.359 really want is you can just take your 0:27:57.259,0:28:04.399 files and encrypt them before uploading 0:27:59.359,0:28:05.779 them to these these web services so does 0:28:04.399,0:28:07.039 that idea make sense that I can take my 0:28:05.779,0:28:08.839 file like Center pictures or whatever 0:28:07.039,0:28:10.039 pass it through an encryption function 0:28:08.839,0:28:11.269 and peruse the cipher text and then 0:28:10.039,0:28:13.190 place that cipher text on the web 0:28:11.269,0:28:14.779 service safe for backup purposes or 0:28:13.190,0:28:17.389 whatever and if I ever need that I can 0:28:14.779,0:28:18.979 retrieve the cipher text then use my key 0:28:17.389,0:28:20.269 to decrypt it back into the plaintext 0:28:18.979,0:28:22.190 and they can use a result for doing 0:28:20.269,0:28:29.029 whatever I need to do does that make 0:28:22.190,0:28:30.349 sense yeah yeah so that's that's a good 0:28:29.029,0:28:31.669 question the question is couldn't 0:28:30.349,0:28:34.879 anybody else run it through the same 0:28:31.669,0:28:36.109 encryption program one detail maybe I 0:28:34.879,0:28:38.419 should have explained in a little bit 0:28:36.109,0:28:46.399 more detail is this key generation 0:28:38.419,0:28:48.139 function is randomized and this key has 0:28:46.399,0:28:50.539 high entropy so going back to that topic 0:28:48.139,0:28:55.129 we talked about earlier so like an 0:28:50.539,0:28:58.249 example is we might have aes 256 this is 0:28:55.129,0:29:01.279 one particular symmetric cipher and this 0:28:58.249,0:29:03.589 as the name might indicate has 256 bits 0:29:01.279,0:29:05.570 of entropy in the key and so that means 0:29:03.589,0:29:07.190 that as long as the attacker like 0:29:05.570,0:29:08.959 whoever downloads the cipher text from 0:29:07.190,0:29:10.789 the web service doesn't know your key 0:29:08.959,0:29:11.209 unless they have some better attack in 0:29:10.789,0:29:13.219 place 0:29:11.209,0:29:14.629 they'll have to try all the different 0:29:13.219,0:29:16.940 possible keys and if they're two to the 0:29:14.629,0:29:19.639 256 keys that's too many keys to try in 0:29:16.940,0:29:21.289 a reasonable amount of time does that 0:29:19.639,0:29:26.109 answer the question okay any other 0:29:21.289,0:29:26.109 questions yeah 0:29:35.009,0:29:38.679 that's an excellent question and that 0:29:37.089,0:29:40.119 leads into what I was going to talk 0:29:38.679,0:29:43.509 about next so thanks for that question 0:29:40.119,0:29:45.099 so as you point out like if I lose my 0:29:43.509,0:29:46.659 key I'm kind of stuck right 0:29:45.099,0:29:47.949 I need my key to decrypt that's kind of 0:29:46.659,0:29:49.269 the point of this thing like if I didn't 0:29:47.949,0:29:50.589 need my key to decrypt then this 0:29:49.269,0:29:53.139 wouldn't be a very good crypto system 0:29:50.589,0:29:54.909 and so I can combine this idea of 0:29:53.139,0:29:56.199 symmetric key cryptography with the 0:29:54.909,0:29:58.359 topic we just talked about key 0:29:56.199,0:30:00.039 derivation functions so instead of 0:29:58.359,0:30:01.239 having some key that's randomly 0:30:00.039,0:30:03.459 generated with my key generation 0:30:01.239,0:30:04.239 function say sampling entropy from 0:30:03.459,0:30:11.289 somewhere on my machine 0:30:04.239,0:30:13.419 I can have a passphrase and pass it 0:30:11.289,0:30:17.679 through my key derivation function box 0:30:13.419,0:30:23.259 and this gives me my key and then I can 0:30:17.679,0:30:29.259 take my plaintext and combine it with my 0:30:23.259,0:30:35.259 key in my encrypt function and this 0:30:29.259,0:30:37.359 produces my ciphertext and I store this 0:30:35.259,0:30:39.459 cipher text on the web service but now I 0:30:37.359,0:30:41.609 don't need to save this key instead I 0:30:39.459,0:30:43.839 can just remember in my pass phrase and 0:30:41.609,0:30:45.359 whenever I need my key I can reconstruct 0:30:43.839,0:30:48.359 it from the key derivation function 0:30:45.359,0:30:48.359 question 0:30:56.680,0:30:59.930 yeah so that's a good question the 0:30:58.700,0:31:02.390 question is is the key derivation 0:30:59.930,0:31:05.000 function slow enough to prevent 0:31:02.390,0:31:06.500 brute-force guessing and the answer is 0:31:05.000,0:31:08.600 it depends on how long your passphrase 0:31:06.500,0:31:11.060 is so for example if your passphrase is 0:31:08.600,0:31:12.890 like the string password is probably 0:31:11.060,0:31:14.360 gonna get broken very quickly but as 0:31:12.890,0:31:16.460 long as there's enough entropy in your 0:31:14.360,0:31:17.930 passphrase this is good enough so like 0:31:16.460,0:31:19.370 if I was uploading something to Dropbox 0:31:17.930,0:31:21.770 and I really want it to stay secret I 0:31:19.370,0:31:23.540 think like a 64-bit passphrase really a 0:31:21.770,0:31:24.590 passphrase with 64 bits of entropy it 0:31:23.540,0:31:28.130 would be more than enough in that 0:31:24.590,0:31:30.200 scenario for example and just a quick 0:31:28.130,0:31:31.880 demo of this so there are tools to make 0:31:30.200,0:31:34.310 this really easy to do this is actually 0:31:31.880,0:31:37.100 one of the exercises but we can take a 0:31:34.310,0:31:39.500 tool for example called open SSL and use 0:31:37.100,0:31:42.050 it to apply a symmetric cipher to some 0:31:39.500,0:31:44.060 file so I had my readme text here for 0:31:42.050,0:31:47.090 example readme MD it has a bunch of 0:31:44.060,0:31:50.680 stuff in it and I can do open SSL AES 0:31:47.090,0:31:54.140 256 cbc this is the name of a particular 0:31:50.680,0:31:57.650 symmetric cipher and i can say that i 0:31:54.140,0:32:01.910 want to apply this to read me md and 0:31:57.650,0:32:03.560 produce readme dot and MD let's give it 0:32:01.910,0:32:05.090 some name and then it's asking you for a 0:32:03.560,0:32:06.470 password so by default this works in 0:32:05.090,0:32:08.300 this mode where I provide a passphrase 0:32:06.470,0:32:10.250 is run through a KDF to produce a key 0:32:08.300,0:32:12.410 and that's used for encryption so I'll 0:32:10.250,0:32:15.320 type in some password type it in again 0:32:12.410,0:32:19.310 and now I produce this readme MD file 0:32:15.320,0:32:21.080 and if I look at this it kind of looks 0:32:19.310,0:32:23.000 like garbage and that's at a high level 0:32:21.080,0:32:24.890 the point of a symmetric cipher it 0:32:23.000,0:32:26.450 produces some cipher text that should be 0:32:24.890,0:32:29.690 kind of indistinguishable from random 0:32:26.450,0:32:33.680 data and when I want to decrypt this I 0:32:29.690,0:32:37.670 can run a similar command open SSL AES 0:32:33.680,0:32:40.340 256 cbc dash D for decrypt take the 0:32:37.670,0:32:44.230 input from readme tank done MD and I 0:32:40.340,0:32:49.010 like do like readme dot read need 0:32:44.230,0:32:53.930 decrypted MD as the output and I can 0:32:49.010,0:32:55.850 compare these two files and the 0:32:53.930,0:32:57.530 correctness property of symmetric 0:32:55.850,0:32:59.120 cryptography tells me that this should 0:32:57.530,0:33:01.070 be identical and this indeed is 0:32:59.120,0:33:02.720 identical if I look at the return value 0:33:01.070,0:33:04.920 compare return 0 so that means that are 0:33:02.720,0:33:08.319 the same file 0:33:04.920,0:33:08.319 [Applause] 0:33:08.960,0:33:14.359 so any questions about symmetric key 0:33:11.549,0:33:14.359 cryptography yeah 0:33:20.340,0:33:26.039 so the this particular command did make 0:33:23.700,0:33:29.100 a new file so it took us input readme MD 0:33:26.039,0:33:31.289 and produces output this file so that is 0:33:29.100,0:33:32.879 the encrypted version of the file it 0:33:31.289,0:33:35.779 left the original untouched but then of 0:33:32.879,0:33:47.639 course I could delete it if I wanted to 0:33:35.779,0:33:48.599 yeah that's a good question this is 0:33:47.639,0:33:50.190 something I wasn't gonna talk about in 0:33:48.599,0:33:52.559 too much detail the question is I 0:33:50.190,0:33:55.830 provided the salt argument here and 0:33:52.559,0:33:58.489 where is that stored so the answer is 0:33:55.830,0:34:01.859 that that is stored in this output here 0:33:58.489,0:34:05.460 so this output format stores both the 0:34:01.859,0:34:06.869 salt and the actual output ciphertext so 0:34:05.460,0:34:13.319 can be used in the reconstruction and 0:34:06.869,0:34:14.819 decrypt yeah that's correct it doesn't 0:34:13.319,0:34:19.950 keep any database or anything this is 0:34:14.819,0:34:23.190 fully self-contained yeah and as John 0:34:19.950,0:34:24.869 says the salt is not the secret like the 0:34:23.190,0:34:33.569 the passphrase is what is the secret 0:34:24.869,0:34:36.839 thing here okay so let's go back to so 0:34:33.569,0:34:39.809 the so the question is what is salt the 0:34:36.839,0:34:42.089 idea of a cryptographic salt is probably 0:34:39.809,0:34:47.790 best explained in the context of hash 0:34:42.089,0:34:49.559 functions so one common application of 0:34:47.790,0:34:51.299 hash functions is to store passwords in 0:34:49.559,0:34:53.669 a password database if I have a website 0:34:51.299,0:34:55.290 and I have logins for users like people 0:34:53.669,0:34:57.329 log in with their username and password 0:34:55.290,0:34:59.640 I don't actually want to store people's 0:34:57.329,0:35:01.349 passwords in plain text just like as is 0:34:59.640,0:35:06.440 does anybody know why I wouldn't want to 0:35:01.349,0:35:08.280 do that yes 0:35:06.440,0:35:10.350 exactly what if there was a breach and 0:35:08.280,0:35:12.480 someone got all your data so it's really 0:35:10.350,0:35:13.860 bad if you leak all your users passwords 0:35:12.480,0:35:15.150 it's especially bad because a lot of 0:35:13.860,0:35:17.130 people reuse their passwords across 0:35:15.150,0:35:18.540 different sites so you'll see attackers 0:35:17.130,0:35:20.250 break into one thing like there was big 0:35:18.540,0:35:22.020 yahoo breach a while ago and they find 0:35:20.250,0:35:23.730 all these usernames and passwords and 0:35:22.020,0:35:25.560 then they go and try those same login 0:35:23.730,0:35:27.270 credentials on Google and on Facebook 0:35:25.560,0:35:30.030 and on YouTube and whatnot these people 0:35:27.270,0:35:32.640 reuse passwords so it's bad to store 0:35:30.030,0:35:33.750 plaintext passwords so one thing you 0:35:32.640,0:35:35.400 should do is you should store hashed 0:35:33.750,0:35:37.620 passwords with a hash function or 0:35:35.400,0:35:39.260 ideally a password hashing function 0:35:37.620,0:35:42.360 that's intentionally designed to be slow 0:35:39.260,0:35:44.160 but one thing attackers one thing 0:35:42.360,0:35:45.390 attacker started doing once they realize 0:35:44.160,0:35:47.310 that people started storing hashed 0:35:45.390,0:35:49.170 passwords is they built these things 0:35:47.310,0:35:52.560 called rainbow tables what people did 0:35:49.170,0:35:54.570 was they took a way of generating big 0:35:52.560,0:35:56.700 password lists like the kind of model 0:35:54.570,0:35:58.200 what passwords might look like say take 0:35:56.700,0:36:00.510 all the dictionary words take all 0:35:58.200,0:36:01.770 strings of like length from zero to 0:36:00.510,0:36:03.840 eight and whatnot 0:36:01.770,0:36:06.180 take all of those and then hash them and 0:36:03.840,0:36:08.910 produce a big database mapping hashes 0:36:06.180,0:36:10.530 back to their pre image and so given the 0:36:08.910,0:36:12.030 output of a hash function rather than 0:36:10.530,0:36:13.620 have to like brute force said on the fly 0:36:12.030,0:36:15.210 you can just go look up in this database 0:36:13.620,0:36:16.830 Oh what is the input that corresponds to 0:36:15.210,0:36:19.410 this output and people have built these 0:36:16.830,0:36:22.650 for reasonably large password databases 0:36:19.410,0:36:25.080 and so one thing that you can do in 0:36:22.650,0:36:28.500 reaction to that as a defense is rather 0:36:25.080,0:36:31.350 than storing in your database as your to 0:36:28.500,0:36:34.860 write it rather than storing just the 0:36:31.350,0:36:42.080 hash of the password what you do is you 0:36:34.860,0:36:44.640 compute what's called a salt value and 0:36:42.080,0:36:47.010 what this is is a large random string 0:36:44.640,0:36:50.160 and then what you do is you store in 0:36:47.010,0:36:53.100 your password database the salt which is 0:36:50.160,0:36:54.900 not really a secret like you can store 0:36:53.100,0:36:59.730 this in your password database along 0:36:54.900,0:37:04.980 with a hash of the password with the 0:36:59.730,0:37:08.190 salt appended to it why is this useful 0:37:04.980,0:37:10.320 well this salt is a random unique value 0:37:08.190,0:37:12.060 for every user and so if someone has the 0:37:10.320,0:37:14.640 password safe password one two three on 0:37:12.060,0:37:16.350 one web service if you are just storing 0:37:14.640,0:37:17.970 the hash of the password then the hash 0:37:16.350,0:37:18.900 would be the same on both Web Services 0:37:17.970,0:37:20.970 right because this hash 0:37:18.900,0:37:22.619 function is a deterministic function but 0:37:20.970,0:37:26.369 now since we're using this randomized 0:37:22.619,0:37:28.470 salt value we store the hash of the 0:37:26.369,0:37:29.700 password plus of salt and so even if 0:37:28.470,0:37:32.640 someone's using the same password on 0:37:29.700,0:37:34.589 multiple sites this thing looks 0:37:32.640,0:37:37.140 different in both cases and it makes it 0:37:34.589,0:37:40.770 so these big databases mapping these 0:37:37.140,0:37:42.210 short passwords or hash outputs to the 0:37:40.770,0:37:44.609 short passwords that they came from 0:37:42.210,0:37:47.099 those are no longer useful when you have 0:37:44.609,0:37:48.900 salted passwords you kind of need to do 0:37:47.099,0:37:51.150 the brute-force attack for every user 0:37:48.900,0:37:52.079 once you find their salt value rather 0:37:51.150,0:37:54.210 than being able to use this big 0:37:52.079,0:37:58.829 precomputed database does that answer 0:37:54.210,0:38:00.450 the question of what a salt is and so 0:37:58.829,0:38:02.960 that's what that salt argument is 0:38:00.450,0:38:02.960 related to 0:38:05.580,0:38:12.800 let's see so any questions about 0:38:08.430,0:38:16.740 anything we talked about so far great 0:38:12.800,0:38:20.040 okay so the I'm gonna go ahead and erase 0:38:16.740,0:38:22.230 this and then the last topic we'll talk 0:38:20.040,0:38:23.640 about is one of the most exciting 0:38:22.230,0:38:24.600 developments of cryptography happen 0:38:23.640,0:38:26.580 quite a while ago but it's still a 0:38:24.600,0:38:42.030 really cool concept something called a 0:38:26.580,0:38:43.680 symmetric key cryptography and so this 0:38:42.030,0:38:45.720 is an idea that actually enables a lot 0:38:43.680,0:38:48.510 of the security and privacy related 0:38:45.720,0:38:50.160 features of basically anything you use 0:38:48.510,0:38:53.430 today like we need to go and type in 0:38:50.160,0:38:56.880 like www.google.com/mapmaker flog Rafi 0:38:53.430,0:38:58.290 is used as part of what goes on there so 0:38:56.880,0:38:59.700 this is going to look pretty similar to 0:38:58.290,0:39:04.830 what we talked about in symmetric key 0:38:59.700,0:39:06.120 cryptography except with a twist there's 0:39:04.830,0:39:08.430 a key generation function which 0:39:06.120,0:39:10.530 similarly is randomized but instead of 0:39:08.430,0:39:16.860 producing a single key it produces a 0:39:10.530,0:39:21.570 pair of keys two different things one of 0:39:16.860,0:39:25.260 which is referred to as a public key and 0:39:21.570,0:39:27.750 the other is referred to as a private 0:39:25.260,0:39:29.550 key and then these can be used for 0:39:27.750,0:39:31.650 encryption and decryption in a manner 0:39:29.550,0:39:33.270 kind of similar to symmetric key crypto 0:39:31.650,0:39:35.340 except these different keys have 0:39:33.270,0:39:39.150 different uses now so we have an 0:39:35.340,0:39:41.340 encryption function which takes in a 0:39:39.150,0:39:46.830 plaintext Isles right P here and it 0:39:41.340,0:39:49.110 takes in the public key and praises the 0:39:46.830,0:39:53.250 ciphertext and then I have a decryption 0:39:49.110,0:39:59.190 function which takes in my ciphertext 0:39:53.250,0:40:05.790 and the private key and gives me back my 0:39:59.190,0:40:08.520 plaintext and then similarly to those 0:40:05.790,0:40:11.010 two properties we had over there given 0:40:08.520,0:40:14.070 just the ciphertext we can't figure out 0:40:11.010,0:40:15.720 the plaintext unless we have the private 0:40:14.070,0:40:17.460 key and then we have the obvious 0:40:15.720,0:40:18.809 correctness property that if we encrypt 0:40:17.460,0:40:20.789 something with the private key 0:40:18.809,0:40:23.219 sorry encrypt something with the public 0:40:20.789,0:40:25.380 key and then take that cypher text and 0:40:23.219,0:40:26.819 try decrypting it with the corresponding 0:40:25.380,0:40:28.619 private key that came from this key 0:40:26.819,0:40:30.809 generation process that outputs these 0:40:28.619,0:40:36.059 two different things at once then I get 0:40:30.809,0:40:37.890 the same result back so this is very 0:40:36.059,0:40:39.029 similar to what's above but there's a 0:40:37.890,0:40:40.769 twist that we have these two different 0:40:39.029,0:40:42.839 keys that have different functions and 0:40:40.769,0:40:44.999 it's really neat that this public key 0:40:42.839,0:40:47.309 can actually be made as the name 0:40:44.999,0:40:49.229 indicates public like I could be using a 0:40:47.309,0:40:51.119 crypto system that works like this post 0:40:49.229,0:40:53.279 a public key on the internet for anybody 0:40:51.119,0:40:54.869 to see but keep my private key secret 0:40:53.279,0:40:56.400 and then I have this interesting 0:40:54.869,0:40:58.259 property that anybody on the internet 0:40:56.400,0:41:00.779 can take any piece of content and 0:40:58.259,0:41:02.400 encrypt it for me using my public key 0:41:00.779,0:41:04.409 and send it over the Internet 0:41:02.400,0:41:06.329 to me and then I can decrypt it using my 0:41:04.409,0:41:08.400 private key and as long as my private 0:41:06.329,0:41:10.409 key C's stays secret it doesn't matter 0:41:08.400,0:41:11.759 if my public key is available to anybody 0:41:10.409,0:41:15.089 on the Internet so here's where the 0:41:11.759,0:41:18.569 asymmetry comes from before we were in a 0:41:15.089,0:41:20.549 scenario where like suppose I was on the 0:41:18.569,0:41:21.150 internet but you weren't like talking to 0:41:20.549,0:41:22.890 me face-to-face 0:41:21.150,0:41:25.349 and you wanted to send me some data over 0:41:22.890,0:41:26.609 the Internet over some unencrypted 0:41:25.349,0:41:28.140 Channel where anybody could listen on 0:41:26.609,0:41:30.150 what you were saying and you wanted to 0:41:28.140,0:41:32.009 use symmetric key cryptography well we 0:41:30.150,0:41:33.479 need some way of exchanging a key in 0:41:32.009,0:41:34.769 advance so that you could encrypt some 0:41:33.479,0:41:36.659 plaintext with a key and give me that 0:41:34.769,0:41:38.549 ciphertext over the Internet so that I 0:41:36.659,0:41:41.279 could done decrypt it with that key in 0:41:38.549,0:41:42.839 symmetric key crypto if the keys public 0:41:41.279,0:41:45.299 it's game over like anybody can decrypt 0:41:42.839,0:41:47.339 your stuff whereas an asymmetric key 0:41:45.299,0:41:49.019 cryptography I could take my public key 0:41:47.339,0:41:50.909 and like post it on a bulletin board on 0:41:49.019,0:41:52.650 the Internet and you can go look at that 0:41:50.909,0:41:54.749 take some contents and encrypt them for 0:41:52.650,0:41:55.799 me and then send them over and that 0:41:54.749,0:41:57.269 would be totally fine 0:41:55.799,0:41:59.969 you can only decrypt it with the private 0:41:57.269,0:42:02.249 key and so one analogy that may be 0:41:59.969,0:42:05.609 helpful is comparing these mathematical 0:42:02.249,0:42:07.439 ideas to physical locks so you probably 0:42:05.609,0:42:09.329 have a lock on your door to your house 0:42:07.439,0:42:11.189 and you can put in a key and like turn 0:42:09.329,0:42:12.479 the thing in order to lock the door or 0:42:11.189,0:42:14.339 you can turn it the other way to unlock 0:42:12.479,0:42:15.929 the door so there's a single key and it 0:42:14.339,0:42:17.459 can both lock and unlock the door 0:42:15.929,0:42:19.259 but now consider this alternative 0:42:17.459,0:42:20.640 construction which you might use if say 0:42:19.259,0:42:23.429 I want you to be able to send me a 0:42:20.640,0:42:25.019 message and have it be sent over the 0:42:23.429,0:42:27.209 internet and you I don't really need a 0:42:25.019,0:42:29.549 way to exchange a key with you 0:42:27.209,0:42:30.719 beforehand I could get a box which you 0:42:29.549,0:42:32.350 could put a letter inside and you can 0:42:30.719,0:42:36.010 close the box and I can get one of the 0:42:32.350,0:42:37.540 padlock things which I can give you by I 0:42:36.010,0:42:39.820 could like take those padlock and open 0:42:37.540,0:42:41.530 it and give it to you and you at your 0:42:39.820,0:42:43.360 own leisure could put your message 0:42:41.530,0:42:45.850 inside a box and take this padlock which 0:42:43.360,0:42:48.430 is open and shut it around the box and 0:42:45.850,0:42:50.350 then send it over to me and then I could 0:42:48.430,0:42:52.090 put in my key and unlock it so do you 0:42:50.350,0:42:54.310 see how there is this asymmetry there as 0:42:52.090,0:42:56.050 opposed to the key that I used to open 0:42:54.310,0:42:57.760 the door to my house where the same key 0:42:56.050,0:42:59.920 opens and closes the thing instead I 0:42:57.760,0:43:01.750 give you this open padlock that you have 0:42:59.920,0:43:03.880 the ability to close but not open and 0:43:01.750,0:43:05.590 after you closed it I can use my key 0:43:03.880,0:43:07.120 which I've kept secret in order to open 0:43:05.590,0:43:09.100 the thing and retrieve what's inside 0:43:07.120,0:43:10.930 maybe this analogy is helpful maybe it's 0:43:09.100,0:43:13.720 not the mathematical construction works 0:43:10.930,0:43:17.350 just fine if that works for a year so 0:43:13.720,0:43:18.790 any questions about a symmetric key 0:43:17.350,0:43:21.190 encryption and decryption and how it 0:43:18.790,0:43:25.900 relates to symmetric key crypto how it's 0:43:21.190,0:43:27.940 a little bit different so before we talk 0:43:25.900,0:43:30.370 about applications of this idea I'm 0:43:27.940,0:43:33.580 going to talk about one other set of 0:43:30.370,0:43:36.160 concepts in a symmetric key cryptography 0:43:33.580,0:43:37.870 so these crypto systems give you another 0:43:36.160,0:43:39.940 set of tools which are related to 0:43:37.870,0:43:42.370 encryption and decryption something 0:43:39.940,0:43:44.350 called signing and verifying and this is 0:43:42.370,0:43:46.210 kind of similar to the real world like I 0:43:44.350,0:43:48.370 can get a document and sign it with my 0:43:46.210,0:43:50.260 signature except real world signatures 0:43:48.370,0:43:52.210 are I don't think that hard to forge 0:43:50.260,0:43:56.260 these are pretty hard to forge and 0:43:52.210,0:43:57.940 therefore more useful what does 0:43:56.260,0:44:00.600 signature schemes look like there's a 0:43:57.940,0:44:08.380 function sign that takes us some message 0:44:00.600,0:44:09.910 and the private key so notice this this 0:44:08.380,0:44:14.620 is the private key not the public key 0:44:09.910,0:44:18.370 and it produces a signature and then 0:44:14.620,0:44:23.640 there's another function verify which 0:44:18.370,0:44:27.540 takes in the message the signature and 0:44:23.640,0:44:27.540 the public key this time 0:44:30.430,0:44:35.750 and it tells me it returns a boolean 0:44:33.890,0:44:40.610 whether or not the signature checks out 0:44:35.750,0:44:43.640 and then this pair of functions has the 0:44:40.610,0:44:45.080 property that again these are kind of 0:44:43.640,0:44:49.070 properties that follow the intuition 0:44:45.080,0:44:54.170 that come from physical signatures that 0:44:49.070,0:44:56.990 uh without the private key it's hard to 0:44:54.170,0:44:58.850 produce a signature for any message such 0:44:56.990,0:45:00.380 that you can give the message in the 0:44:58.850,0:45:02.360 signature and the public key to the 0:45:00.380,0:45:07.540 verify function to get it to return true 0:45:02.360,0:45:07.540 like at a high level it's hard to Forge 0:45:09.520,0:45:20.720 it's hard to forge a signature of course 0:45:11.930,0:45:24.200 without the private key and then there's 0:45:20.720,0:45:25.610 the obvious correctness property that if 0:45:24.200,0:45:26.690 you signed a thing with a public key and 0:45:25.610,0:45:28.670 then try verifying it with the 0:45:26.690,0:45:30.170 corresponding sorry if you sign a thing 0:45:28.670,0:45:31.520 with the private key and try to verify 0:45:30.170,0:45:34.010 it with the corresponding public key 0:45:31.520,0:45:38.300 it returns okay that this verification 0:45:34.010,0:45:41.080 checks out so these are two different 0:45:38.300,0:45:44.030 kinds of things you can do with 0:45:41.080,0:45:46.280 asymmetric key cryptosystems 0:45:44.030,0:45:47.510 an example of an asymmetric key 0:45:46.280,0:45:50.360 cryptosystem that you might have heard 0:45:47.510,0:45:51.800 of is something called RSA so RSA is 0:45:50.360,0:45:53.180 designed by a number of people one of 0:45:51.800,0:45:59.330 whom is ron rivest who's a professor 0:45:53.180,0:46:01.730 here so there are couple of interesting 0:45:59.330,0:46:03.170 applications of asymmetric key crypto 0:46:01.730,0:46:04.580 actually like tons and tons and tons of 0:46:03.170,0:46:06.800 you spend like days talking about this 0:46:04.580,0:46:08.540 but a couple examples are email 0:46:06.800,0:46:10.400 encryption so we talked a little bit 0:46:08.540,0:46:12.410 about sending messages what we can do 0:46:10.400,0:46:14.360 with asymmetric key crypto is that you 0:46:12.410,0:46:16.670 can have public keys posted online I 0:46:14.360,0:46:18.530 think some of the instructors have PGP 0:46:16.670,0:46:19.850 public keys on their website so for 0:46:18.530,0:46:21.740 example you go to my website or John's 0:46:19.850,0:46:24.740 website you'll find a public key and 0:46:21.740,0:46:27.410 then what you can do is you can send us 0:46:24.740,0:46:29.060 an encrypted email and so even if that 0:46:27.410,0:46:30.260 message goes through Gmail or whatever 0:46:29.060,0:46:31.970 other mail service throughout my T's 0:46:30.260,0:46:34.040 mail servers if there happens to be an 0:46:31.970,0:46:35.540 attacker snooping on the messages they 0:46:34.040,0:46:38.000 can't make any sense of their contents 0:46:35.540,0:46:39.650 because they're all encrypted and this 0:46:38.000,0:46:42.440 is really cool because you can do this 0:46:39.650,0:46:43.109 without kind of finding us in person and 0:46:42.440,0:46:44.099 exchanging 0:46:43.109,0:46:45.989 which you might have to do in a 0:46:44.099,0:46:47.609 symmetric key cryptosystem you can just 0:46:45.989,0:46:49.289 find our public key which can be posted 0:46:47.609,0:46:52.229 online without causing any issues and 0:46:49.289,0:46:53.759 then send us encrypted email another 0:46:52.229,0:46:56.160 thing asymmetric key crypto is used for 0:46:53.759,0:46:58.019 is private messaging so raise your hand 0:46:56.160,0:47:00.509 if you've used anything like signal or 0:46:58.019,0:47:01.950 telegram or I think what's up is in 0:47:00.509,0:47:05.039 theory antenna encrypted so a good 0:47:01.950,0:47:07.380 number of you these private messaging 0:47:05.039,0:47:09.479 applications also use asymmetric key 0:47:07.380,0:47:11.819 crypto to establish private 0:47:09.479,0:47:14.309 communication channels basically every 0:47:11.819,0:47:16.499 person has associated with them a key 0:47:14.309,0:47:18.390 pair and so your device has run this key 0:47:16.499,0:47:20.400 generation function produced a public 0:47:18.390,0:47:22.049 key and a private key and automatically 0:47:20.400,0:47:23.670 posted your public key to the internet 0:47:22.049,0:47:25.529 so for example if you're using signal 0:47:23.670,0:47:27.029 your public key is on the signal servers 0:47:25.529,0:47:30.029 and then when someone wants to contact 0:47:27.029,0:47:31.709 you their phone can look up your public 0:47:30.029,0:47:33.150 key retreat and once it's retrieve your 0:47:31.709,0:47:35.279 public key they can encrypt information 0:47:33.150,0:47:36.599 for you this is a kind of approximation 0:47:35.279,0:47:38.509 of how their algorithm works but at a 0:47:36.599,0:47:40.979 high level that's what's going on 0:47:38.509,0:47:43.499 another neat application of asymmetric 0:47:40.979,0:47:44.880 key crypto is we were talking about 0:47:43.499,0:47:46.079 earlier like making sure you have the 0:47:44.880,0:47:46.519 right software we downloaded from the 0:47:46.079,0:47:48.599 internet 0:47:46.519,0:47:50.609 asymmetric key crypto can be used to 0:47:48.599,0:47:52.430 sign software releases and this is 0:47:50.609,0:47:55.199 something that people do for example 0:47:52.430,0:47:56.489 like Debian packages or whatever things 0:47:55.199,0:47:57.959 you download from the internet the 0:47:56.489,0:47:59.670 developer will try to sign their 0:47:57.959,0:48:00.630 software so that you can make sure that 0:47:59.670,0:48:01.769 whatever you've downloaded from the 0:48:00.630,0:48:04.799 internet is actually the right thing 0:48:01.769,0:48:06.660 that came from the right person we 0:48:04.799,0:48:07.920 talked about in the get lecture all the 0:48:06.660,0:48:10.440 interesting things you can do with git 0:48:07.920,0:48:15.239 one thing we didn't cover was signing 0:48:10.440,0:48:17.670 related functionality and get so git has 0:48:15.239,0:48:19.799 commits and you can associate with 0:48:17.670,0:48:21.150 commits something called tags at a high 0:48:19.799,0:48:22.949 level you can basically take a git 0:48:21.150,0:48:26.160 commit and attach a signature to it 0:48:22.949,0:48:28.589 which binds your public key to this 0:48:26.160,0:48:31.170 commit and then anybody who has your 0:48:28.589,0:48:32.789 public key can take the commit and your 0:48:31.170,0:48:35.519 public key and make sure that there's a 0:48:32.789,0:48:40.920 legitimate signature on the key or sorry 0:48:35.519,0:48:44.670 on the commit so let me go to like some 0:48:40.920,0:48:46.650 random repository that I have I can look 0:48:44.670,0:48:51.959 at a bunch of tags associated with 0:48:46.650,0:48:55.279 repository if I do look at the raw data 0:48:51.959,0:48:55.279 associated with this tag 0:48:55.500,0:49:02.820 it has some metadata and then a blob of 0:48:59.400,0:49:05.700 like ascii encoded information that i 0:49:02.820,0:49:08.880 can use the get tagged - v4 verify 0:49:05.700,0:49:11.040 command to make sure that oh this is a 0:49:08.880,0:49:13.050 good signature from this person happens 0:49:11.040,0:49:14.220 to be me so I sign the software release 0:49:13.050,0:49:15.630 so that anybody who downloads it from 0:49:14.220,0:49:18.150 the Internet can make sure that they 0:49:15.630,0:49:31.680 actually got an authentic copy yes 0:49:18.150,0:49:33.599 question so the question is what exactly 0:49:31.680,0:49:38.460 is the verify function doing or what is 0:49:33.599,0:49:39.830 it checking against the like if you want 0:49:38.460,0:49:41.130 to know mathematically what's going on 0:49:39.830,0:49:43.619 talk to me 0:49:41.130,0:49:45.510 after this lecture but for from kind of 0:49:43.619,0:49:48.000 an API perspective what's going on here 0:49:45.510,0:49:49.800 is that the signature and also the 0:49:48.000,0:49:52.320 message here are just a blob of bytes 0:49:49.800,0:49:56.130 and it happens to be the case that these 0:49:52.320,0:49:58.560 things are designed such that basically 0:49:56.130,0:50:00.660 if I take for some particular public key 0:49:58.560,0:50:02.910 like if you take my public key it's 0:50:00.660,0:50:06.450 impossible for you without knowledge of 0:50:02.910,0:50:07.950 my private key for any message to find a 0:50:06.450,0:50:10.800 second argument to this function that 0:50:07.950,0:50:12.990 makes it return true you can kind of 0:50:10.800,0:50:14.460 compare it to signing a document like 0:50:12.990,0:50:16.890 you don't know how to forge my signature 0:50:14.460,0:50:19.140 I can take any piece of paper and sign 0:50:16.890,0:50:20.970 it and then anybody who knows what my 0:50:19.140,0:50:22.200 signature looks like I can show my 0:50:20.970,0:50:24.690 document - you can be like yeah that 0:50:22.200,0:50:27.150 checks out but nobody without the 0:50:24.690,0:50:30.060 private key can produce a signature that 0:50:27.150,0:50:35.339 will make this function return true for 0:50:30.060,0:50:36.420 any particular message and any related 0:50:35.339,0:50:39.470 questions started you want me to explain 0:50:36.420,0:50:39.470 any other way or does that make sense 0:50:50.180,0:50:54.119 so any questions about signing software 0:50:52.680,0:50:55.920 or any of the other handful of 0:50:54.119,0:51:01.050 applications talked about of asymmetric 0:50:55.920,0:51:02.460 key crypto well so one final thing I 0:51:01.050,0:51:05.100 want to talk about we're almost out of 0:51:02.460,0:51:07.200 time is key distribution this is a kind 0:51:05.100,0:51:08.130 of interesting side effect of asymmetric 0:51:07.200,0:51:09.330 key cryptography 0:51:08.130,0:51:11.730 it enables a bunch of interesting 0:51:09.330,0:51:12.960 functionality like I can post my public 0:51:11.730,0:51:14.820 key on the internet you can go find it 0:51:12.960,0:51:16.170 and send me encrypted email but how do 0:51:14.820,0:51:18.060 you know that the public key found is 0:51:16.170,0:51:19.410 actually my public key it seems like 0:51:18.060,0:51:22.710 there's a bootstrapping problem here 0:51:19.410,0:51:24.810 right so there are a couple this is like 0:51:22.710,0:51:27.930 a really interesting and really hard 0:51:24.810,0:51:29.400 real-world problem and there are a 0:51:27.930,0:51:31.800 couple different approaches you might 0:51:29.400,0:51:33.780 take to this problem one is kind of a 0:51:31.800,0:51:35.000 lame solution but this thing solves a 0:51:33.780,0:51:37.050 lot of cryptography problems this 0:51:35.000,0:51:39.630 exchange the information out-of-band 0:51:37.050,0:51:41.430 what that means is you want to send me 0:51:39.630,0:51:43.710 encrypted email we'll just talk to me 0:51:41.430,0:51:45.210 after class I'll give you my public key 0:51:43.710,0:51:46.560 on a piece of paper and since you were 0:51:45.210,0:51:48.240 talking to me in person like you know 0:51:46.560,0:51:49.770 that it's actually my public key not 0:51:48.240,0:51:51.930 just somebody like hacked my website and 0:51:49.770,0:51:53.160 stuck some random number on there that 0:51:51.930,0:51:54.420 solves the problem but it's not the most 0:51:53.160,0:51:55.740 elegant there are a couple other 0:51:54.420,0:51:58.650 approaches that different applications 0:51:55.740,0:52:01.410 use so those of you who use signal have 0:51:58.650,0:52:02.970 you ever encountered the phrase safety 0:52:01.410,0:52:06.810 number like verify your safety number 0:52:02.970,0:52:09.180 with so and so so with signal they have 0:52:06.810,0:52:10.950 a way of exchanging public keys which is 0:52:09.180,0:52:12.840 through the signal servers whoever runs 0:52:10.950,0:52:14.280 the signal service just maintains on 0:52:12.840,0:52:16.170 their servers basically a mapping from 0:52:14.280,0:52:17.670 phone numbers to public keys and when I 0:52:16.170,0:52:19.109 say oh I want to message this person 0:52:17.670,0:52:20.190 with this number my phone just goes and 0:52:19.109,0:52:21.930 retrieves their public key from the 0:52:20.190,0:52:24.060 internet and then encrypts the message 0:52:21.930,0:52:27.080 for that public key now does anybody see 0:52:24.060,0:52:27.080 a problem with the setup 0:52:29.750,0:52:38.880 yeah yeah exactly the signal servers our 0:52:36.900,0:52:40.830 point of failure there because if the 0:52:38.880,0:52:42.900 signal servers give me the wrong public 0:52:40.830,0:52:44.610 key like supposed signal just produces a 0:52:42.900,0:52:46.200 new key pair and give me their public 0:52:44.610,0:52:47.940 key now they can read all my messages 0:52:46.200,0:52:50.220 and they could even sit in between and 0:52:47.940,0:52:51.540 transparently decrypt the messages I 0:52:50.220,0:52:52.950 send them and then re encrypt them and 0:52:51.540,0:52:55.170 send them on to their final destination 0:52:52.950,0:52:57.960 like basically I need some way of 0:52:55.170,0:52:59.850 authenticating the public key I get and 0:52:57.960,0:53:01.170 so signal has one solution to this which 0:52:59.850,0:53:04.350 is also just kind of punting the issue 0:53:01.170,0:53:05.220 to out-of-band key exchange you can meet 0:53:04.350,0:53:07.320 up with somebody and they have a 0:53:05.220,0:53:08.820 slightly streamline flow where they show 0:53:07.320,0:53:09.750 QR codes on the screen you take one 0:53:08.820,0:53:10.950 phone and take a picture of the other 0:53:09.750,0:53:12.840 phone screen and vice versa 0:53:10.950,0:53:14.580 and now you've exchanged public keys in 0:53:12.840,0:53:15.930 person and from that point on you've 0:53:14.580,0:53:19.350 bootstrap your encrypted end-to-end 0:53:15.930,0:53:22.230 communication it also has an issue of or 0:53:19.350,0:53:24.150 it also has approach of pinning a public 0:53:22.230,0:53:25.980 key so once you know that a particular 0:53:24.150,0:53:27.750 phone number has a particular public key 0:53:25.980,0:53:30.360 your phone remembers that and if that 0:53:27.750,0:53:32.340 ever changes it'll complain to you and 0:53:30.360,0:53:34.950 then there are a couple other solutions 0:53:32.340,0:53:36.750 to this problem PGP one pop to let used 0:53:34.950,0:53:38.460 to be popular a while ago has this idea 0:53:36.750,0:53:40.350 of web of trust so like I trust people 0:53:38.460,0:53:41.940 who my friends trust so if like John has 0:53:40.350,0:53:43.740 done an out-of-band exchange with say my 0:53:41.940,0:53:45.660 professor then I can probably email my 0:53:43.740,0:53:47.460 professor because like I know that John 0:53:45.660,0:53:48.780 trusts my professor and I trust John so 0:53:47.460,0:53:50.310 you got this chain of trust through 0:53:48.780,0:53:52.380 there that's one interesting approach 0:53:50.310,0:53:53.910 and then another model that's called 0:53:52.380,0:53:55.770 pretty recently as something that a tool 0:53:53.910,0:54:00.350 called key base uses this is a really 0:53:55.770,0:54:00.350 neat whoops 0:54:00.500,0:54:04.950 there's website called key based IO and 0:54:03.750,0:54:06.780 they have a really interesting solution 0:54:04.950,0:54:09.420 to this bootstrapping problem which is 0:54:06.780,0:54:10.590 social proof so saying you probably have 0:54:09.420,0:54:13.500 your friends on Facebook and on Twitter 0:54:10.590,0:54:15.600 and whatnot and it's probably pretty 0:54:13.500,0:54:17.070 hard for an attacker to break into your 0:54:15.600,0:54:18.390 friend's Facebook account at the same 0:54:17.070,0:54:19.680 time as their Twitter account as the 0:54:18.390,0:54:21.000 same time as their hacker news account 0:54:19.680,0:54:23.430 and so on and so there's this 0:54:21.000,0:54:25.860 interesting way of binding public keys 0:54:23.430,0:54:27.780 to a set of social identities such that 0:54:25.860,0:54:30.240 you can retrieve a public key once you 0:54:27.780,0:54:32.610 trust some number of social identities 0:54:30.240,0:54:33.960 corresponding to your friend we have 0:54:32.610,0:54:34.980 links to these in the lecture notes if 0:54:33.960,0:54:38.400 you want to see these things in more 0:54:34.980,0:54:41.160 detail so that's it for our security and 0:54:38.400,0:54:42.690 cryptography lecture and tomorrow's 0:54:41.160,0:54:43.230 lecture will be on a random collection 0:54:42.690,0:54:45.119 of top 0:54:43.230,0:54:48.650 that your instructors find interesting 0:54:45.119,0:54:48.650 so hopefully see you in lecture tomorrow 0:54:51.260,0:54:54.180 I'll also be here for a couple of 0:54:53.040,0:55:08.910 minutes after class if anybody has 0:54:54.180,0:55:09.869 questions yes okay so John feel free to 0:55:08.910,0:55:11.070 leave if you have to leave but I think 0:55:09.869,0:55:11.970 nobody's using the classroom after us 0:55:11.070,0:55:15.150 I'm going to talk about one other 0:55:11.970,0:55:16.770 interesting topic so john brought up the 0:55:15.150,0:55:19.320 fact that a symmetric key cryptography 0:55:16.770,0:55:23.130 is slow and symmetric key cryptography 0:55:19.320,0:55:25.680 is fast and so in practice you don't 0:55:23.130,0:55:28.080 really use just a symmetric key 0:55:25.680,0:55:31.440 cryptography by itself it's usually used 0:55:28.080,0:55:38.280 to bootstrap a more sophisticated 0:55:31.440,0:55:40.710 protocol that you're using one thing you 0:55:38.280,0:55:41.430 might want to do is use a symmetric key 0:55:40.710,0:55:43.890 cryptography 0:55:41.430,0:55:45.869 for signing encrypted email right we 0:55:43.890,0:55:47.730 talked about that example and the way 0:55:45.869,0:55:49.260 that works isn't what you might have 0:55:47.730,0:55:50.910 guessed from our straightforward 0:55:49.260,0:55:52.920 explanation of asymmetric key crypto 0:55:50.910,0:55:54.740 like you don't just use that encrypt 0:55:52.920,0:55:57.630 function up there and call it a day in 0:55:54.740,0:56:05.400 practice what you do is you use hybrid 0:55:57.630,0:56:07.410 encryption to use a combination of 0:56:05.400,0:56:09.109 symmetric key and asymmetric key 0:56:07.410,0:56:12.060 cryptography 0:56:09.109,0:56:14.190 what you do is here I'll draw this as a 0:56:12.060,0:56:21.270 big block diagram you take your message 0:56:14.190,0:56:23.490 m and then I have my public key that I 0:56:21.270,0:56:25.080 want to encrypt for but rather than just 0:56:23.490,0:56:27.119 take these two things and pass it 0:56:25.080,0:56:36.240 through the encryption up there what I 0:56:27.119,0:56:41.820 do is I use the symmetric key gen 0:56:36.240,0:56:43.380 function to produce a symmetric key okay 0:56:41.820,0:56:44.700 I'm gonna like prepend this with 0:56:43.380,0:56:46.380 symmetric so we can distinguish it from 0:56:44.700,0:56:49.410 the public key key generation function 0:56:46.380,0:56:52.050 and then what I do is I take these two 0:56:49.410,0:56:55.250 things pass them through my symmetric 0:56:52.050,0:56:55.250 encryption box 0:57:02.250,0:57:13.750 this produces the ciphertext and now 0:57:09.400,0:57:15.640 this by itself to the sender sorry this 0:57:13.750,0:57:17.530 by itself to the receiver who has the 0:57:15.640,0:57:19.900 private key corresponding to this public 0:57:17.530,0:57:20.890 key here this is not really useful right 0:57:19.900,0:57:26.020 because this is encrypted with a 0:57:20.890,0:57:28.690 symmetric cipher with this key K that 0:57:26.020,0:57:30.670 came from this function that I ran on my 0:57:28.690,0:57:32.830 local machine so I need some way of 0:57:30.670,0:57:34.660 getting this to the person who actually 0:57:32.830,0:57:37.570 used to decrypt the email and so what I 0:57:34.660,0:57:39.610 do is I take this thing and now this 0:57:37.570,0:57:40.840 email might have been big and I use 0:57:39.610,0:57:42.700 symmetric encryption with that because 0:57:40.840,0:57:44.980 symmetric encryption is fast but this 0:57:42.700,0:57:46.600 key is small like it might be 256 bits 0:57:44.980,0:57:49.000 or something so I can take this thing 0:57:46.600,0:58:05.620 and encrypt it with a symmetric 0:57:49.000,0:58:09.640 encryption using the public key and this 0:58:05.620,0:58:12.550 gives me an encrypted key and this thing 0:58:09.640,0:58:14.520 can be decrypted using the private key 0:58:12.550,0:58:18.250 corresponding to that public key to 0:58:14.520,0:58:21.520 reconstruct this so this is on the 0:58:18.250,0:58:23.590 sender's end now the receiver gets this 0:58:21.520,0:58:24.760 and this and kind of does these things 0:58:23.590,0:58:27.520 backwards so you start with the 0:58:24.760,0:58:29.890 encrypted key and use asymmetric 0:58:27.520,0:58:31.330 decryption using your public using your 0:58:29.890,0:58:34.180 private key that corresponds to the 0:58:31.330,0:58:35.350 posted public key to reconstruct this 0:58:34.180,0:58:37.570 key that were used for the symmetric 0:58:35.350,0:58:39.760 encryption box and then use symmetric 0:58:37.570,0:58:41.560 key decryption using that key that was 0:58:39.760,0:58:45.760 reconstructed to take this ciphertext 0:58:41.560,0:58:47.590 and produce the original message so 0:58:45.760,0:58:49.630 there's a kind of interesting example of 0:58:47.590,0:58:56.910 how in practice symmetric and asymmetric 0:58:49.630,0:58:56.910 key cryptography is combined question 0:59:00.589,0:59:08.220 so the question is will you be using the 0:59:02.790,0:59:11.040 same symmetric key generators yes okay 0:59:08.220,0:59:13.650 so you need to kind of agree ahead of 0:59:11.040,0:59:16.280 time which box you're using here so you 0:59:13.650,0:59:21.359 might be like oh I'm going to use AES 0:59:16.280,0:59:24.510 256 GC up here but this is a well known 0:59:21.359,0:59:26.010 function and it's public like the 0:59:24.510,0:59:27.930 attackers allowed to know all the 0:59:26.010,0:59:29.190 parameters this function this is the 0:59:27.930,0:59:34.190 only secret thing that the attacker 0:59:29.190,0:59:40.650 doesn't know the key any other questions 0:59:34.190,0:59:42.000 yeah that's a really good question what 0:59:40.650,0:59:46.020 kind of data is important enough to 0:59:42.000,0:59:47.910 encrypt and I think that depends on your 0:59:46.020,0:59:49.349 threat model like who what kind of 0:59:47.910,0:59:53.099 attackers are you concerned about what 0:59:49.349,0:59:54.180 are you trying to protect against so you 0:59:53.099,0:59:56.070 might have the stance that you just 0:59:54.180,0:59:57.570 don't really care and that like anything 0:59:56.070,0:59:58.980 you communicate with anybody is allowed 0:59:57.570,1:00:00.720 to be public I might be willing to like 0:59:58.980,1:00:03.359 post all my conversation with everybody 1:00:00.720,1:00:05.760 for everybody to see publicly on the 1:00:03.359,1:00:08.130 Internet on the other hand maybe you're 1:00:05.760,1:00:10.560 doing some like security sensitive works 1:00:08.130,1:00:11.970 here working under a contract for the US 1:00:10.560,1:00:13.770 government developing some sensitive 1:00:11.970,1:00:15.060 military stuff if you're sending that 1:00:13.770,1:00:16.560 through the open Internet while you're 1:00:15.060,1:00:18.900 traveling you probably want to be pretty 1:00:16.560,1:00:20.579 darn sure that no eavesdroppers or 1:00:18.900,1:00:22.079 anybody else along the way can see what 1:00:20.579,1:00:23.099 you're sending and that whatever you're 1:00:22.079,1:00:24.990 sending is in fact going to the right 1:00:23.099,1:00:26.339 place and that whoever is receiving it 1:00:24.990,1:00:29.849 can authenticate that it in fact came 1:00:26.339,1:00:31.260 from you so you might be worried about 1:00:29.849,1:00:33.359 all different kinds of adversaries 1:00:31.260,1:00:34.440 depending on your scenario from random 1:00:33.359,1:00:36.810 script kiddies who are trying to break 1:00:34.440,1:00:38.490 into websites to nation state level 1:00:36.810,1:00:40.260 attackers and you'll need different 1:00:38.490,1:00:41.339 types of techniques for defending 1:00:40.260,1:00:47.900 against the different categories of 1:00:41.339,1:00:47.900 attackers any other questions 1:00:51.190,1:00:55.300 well so hopefully see some of you 1:00:53.620,1:00:57.310 tomorrow for a random collection of 1:00:55.300,1:00:59.490 things that John Jose and I find 1:00:57.310,1:00:59.490 interesting