WEBVTT 00:00:01.100 --> 00:00:05.310 okay so again let's get started with 00:00:03.330 --> 00:00:06.750 today's lecture so today we're going to 00:00:05.310 --> 00:00:09.240 be talking about security and 00:00:06.750 --> 00:00:10.410 cryptography and today's lecture is 00:00:09.240 --> 00:00:12.780 gonna be a little bit different than our 00:00:10.410 --> 00:00:14.849 treatment of this topic in last year's 00:00:12.780 --> 00:00:16.619 class so last year we focused a little 00:00:14.849 --> 00:00:19.350 bit more on security and privacy and 00:00:16.619 --> 00:00:21.720 from the perspective of a user of a 00:00:19.350 --> 00:00:22.949 computer but today we're going to focus 00:00:21.720 --> 00:00:24.689 a little bit more on security and 00:00:22.949 --> 00:00:26.730 cryptography concepts that are relevant 00:00:24.689 --> 00:00:29.400 in understanding some of the tools that 00:00:26.730 --> 00:00:30.810 we talked about earlier in this class so 00:00:29.400 --> 00:00:32.250 for example we talked about hash 00:00:30.810 --> 00:00:34.800 functions or cryptographic hash 00:00:32.250 --> 00:00:37.829 functions like sha-1 in the get lecture 00:00:34.800 --> 00:00:39.239 or we talked about public keys when we 00:00:37.829 --> 00:00:41.010 talked about SSH in the command line 00:00:39.239 --> 00:00:43.260 environment in a command line 00:00:41.010 --> 00:00:44.610 environment lecture and so today we'll 00:00:43.260 --> 00:00:46.320 talk about there's different 00:00:44.610 --> 00:00:47.879 cryptographic primitives in more detail 00:00:46.320 --> 00:00:49.350 and get an understanding of how they 00:00:47.879 --> 00:00:50.579 work and how they're used in these 00:00:49.350 --> 00:00:53.460 different tools that we're teaching in 00:00:50.579 --> 00:00:55.890 this class this lecture is not a 00:00:53.460 --> 00:00:58.199 substitute for a more rigorous class in 00:00:55.890 --> 00:00:59.640 security so they're they're a bunch of 00:00:58.199 --> 00:01:01.050 really good classes at MIT like six 00:00:59.640 --> 00:01:03.660 eight five eight which is on computer 00:01:01.050 --> 00:01:06.420 system security or six eight five seven 00:01:03.660 --> 00:01:08.850 and six eight seven five which are more 00:01:06.420 --> 00:01:10.950 focused on cryptography so don't do 00:01:08.850 --> 00:01:13.380 security work without formal training in 00:01:10.950 --> 00:01:16.380 security from these classes or elsewhere 00:01:13.380 --> 00:01:17.580 and unless you're an expert don't roll 00:01:16.380 --> 00:01:19.860 your own crypto don't build your own 00:01:17.580 --> 00:01:21.960 crypto implementations or protocols and 00:01:19.860 --> 00:01:23.909 the same principle applies to computer 00:01:21.960 --> 00:01:25.560 system security this lecture is not 00:01:23.909 --> 00:01:27.869 about building your own stuff it's about 00:01:25.560 --> 00:01:29.369 understanding what's already out there 00:01:27.869 --> 00:01:31.290 and so this lecture will have a very 00:01:29.369 --> 00:01:32.820 informal but we think practical 00:01:31.290 --> 00:01:35.400 treatment of these basic cryptography 00:01:32.820 --> 00:01:36.659 concepts and yeah hopefully it'll help 00:01:35.400 --> 00:01:39.540 you understand some of the tools we 00:01:36.659 --> 00:01:40.590 talked about earlier in this class any 00:01:39.540 --> 00:01:45.000 questions about the plan for today's 00:01:40.590 --> 00:01:46.560 lecture great so the first topic for 00:01:45.000 --> 00:01:48.780 today is something called entropy 00:01:46.560 --> 00:01:51.119 entropy is a measure of randomness and 00:01:48.780 --> 00:01:52.380 this is useful for example when trying 00:01:51.119 --> 00:01:55.229 to determine the strength of a password 00:01:52.380 --> 00:01:58.290 so let's take a look at this comic from 00:01:55.229 --> 00:02:00.299 xkcd we're a big fan of xkcd comics so 00:01:58.290 --> 00:02:02.490 this comic raise your hand if you've 00:02:00.299 --> 00:02:05.159 seen this before know a good number of 00:02:02.490 --> 00:02:07.530 you so this comic is complaining about 00:02:05.159 --> 00:02:09.270 this common pattern that's been taught 00:02:07.530 --> 00:02:11.879 to users of computers that when you 00:02:09.270 --> 00:02:12.940 design passwords they should be things 00:02:11.879 --> 00:02:16.750 like that T 00:02:12.940 --> 00:02:19.420 our zero ub40 orm purse and three string 00:02:16.750 --> 00:02:21.040 in the top-left like we should design 00:02:19.420 --> 00:02:22.450 passwords that are full of funny 00:02:21.040 --> 00:02:25.210 characters and things like that to make 00:02:22.450 --> 00:02:26.620 it hard for attackers to guess and yet 00:02:25.210 --> 00:02:28.690 turns out that passwords like that are 00:02:26.620 --> 00:02:30.370 actually pretty weak and guessable by 00:02:28.690 --> 00:02:32.530 computers that can guess postures really 00:02:30.370 --> 00:02:34.660 fast and brute-force attacks and on the 00:02:32.530 --> 00:02:37.510 other hand passwords which maybe 00:02:34.660 --> 00:02:39.130 intuitively don't look as secure like 00:02:37.510 --> 00:02:41.620 the one on the bottom left correct horse 00:02:39.130 --> 00:02:45.100 battery staple that one turns out to be 00:02:41.620 --> 00:02:47.650 way more secure so how do I actually 00:02:45.100 --> 00:02:49.810 quantify the security of these different 00:02:47.650 --> 00:02:51.940 passwords it's by measuring the amount 00:02:49.810 --> 00:02:55.230 of randomness in the password how many 00:02:51.940 --> 00:02:57.700 bits of randomness are in there and so 00:02:55.230 --> 00:02:59.170 entropy is measured in bits this is like 00:02:57.700 --> 00:03:07.330 the same bits from information theory 00:02:59.170 --> 00:03:09.310 and we're only going to talk about the 00:03:07.330 --> 00:03:11.110 simple case where you're trying to 00:03:09.310 --> 00:03:12.610 measure the amount of randomness when 00:03:11.110 --> 00:03:15.550 you're choosing from a set of things 00:03:12.610 --> 00:03:17.320 uniformly at random so for example when 00:03:15.550 --> 00:03:19.780 you're constructing a password that's in 00:03:17.320 --> 00:03:22.120 the format of four random words you're 00:03:19.780 --> 00:03:24.310 kind of considering all possible 00:03:22.120 --> 00:03:25.720 sequences of four random words made from 00:03:24.310 --> 00:03:26.739 some dictionary you have you might have 00:03:25.720 --> 00:03:28.390 a dictionary would say a hundred 00:03:26.739 --> 00:03:31.209 thousand words and you're selecting each 00:03:28.390 --> 00:03:32.380 word uniformly at random how many 00:03:31.209 --> 00:03:33.730 possibilities are there 00:03:32.380 --> 00:03:35.830 well you can go and figure that out in 00:03:33.730 --> 00:03:37.870 that example once you know how many 00:03:35.830 --> 00:03:40.690 possibilities there the measure of 00:03:37.870 --> 00:03:49.390 entropy is log base 2 of the number of 00:03:40.690 --> 00:03:51.459 possibilities and as that comic suggests 00:03:49.390 --> 00:03:52.900 this is related to how long it'll take 00:03:51.459 --> 00:03:54.970 an attacker to try to brute-force 00:03:52.900 --> 00:03:56.590 through your different passwords like if 00:03:54.970 --> 00:03:57.820 you have a thousand possibilities you're 00:03:56.590 --> 00:03:59.650 guessing passwords at a thousand 00:03:57.820 --> 00:04:04.300 passwords a second that's not a very 00:03:59.650 --> 00:04:07.840 good password so this is a couple of 00:04:04.300 --> 00:04:09.340 quick examples a coin flip has two 00:04:07.840 --> 00:04:15.280 possibilities and let's assume we have a 00:04:09.340 --> 00:04:19.650 fair coin and so a coin flip has log 00:04:15.280 --> 00:04:21.489 base 2 of 2 is one bit of entropy and 00:04:19.650 --> 00:04:24.880 another thing we might look at is 00:04:21.489 --> 00:04:26.540 something like a dice roll so there's 00:04:24.880 --> 00:04:32.510 six possibilities and log 00:04:26.540 --> 00:04:33.640 two of six is something like 2.6 bits of 00:04:32.510 --> 00:04:36.380 entropy 00:04:33.640 --> 00:04:40.100 so that's how we quantify the amount of 00:04:36.380 --> 00:04:41.930 randomness in something so now going 00:04:40.100 --> 00:04:43.400 back to that example in the xkcd comic 00:04:41.930 --> 00:04:44.690 when we want to figure out how much 00:04:43.400 --> 00:04:46.670 entropy is in a password we have to 00:04:44.690 --> 00:04:48.680 consider and if the model for how the 00:04:46.670 --> 00:04:51.110 password was generated for example in 00:04:48.680 --> 00:04:53.450 the top left you could consider okay we 00:04:51.110 --> 00:04:55.310 take one dictionary word make some 00:04:53.450 --> 00:04:57.650 substitutions of some of the characters 00:04:55.310 --> 00:04:59.750 with numbers that look similar to that 00:04:57.650 --> 00:05:01.430 character add one punctuation mark at 00:04:59.750 --> 00:05:04.340 the end and add one numeral after that 00:05:01.430 --> 00:05:05.630 and we can take that model and then use 00:05:04.340 --> 00:05:07.400 common rhetoric to figure out how many 00:05:05.630 --> 00:05:09.260 possibilities there are and from that we 00:05:07.400 --> 00:05:10.550 can derive how many bits of entropy are 00:05:09.260 --> 00:05:13.130 in that password so in that particular 00:05:10.550 --> 00:05:14.180 example I don't know exactly what model 00:05:13.130 --> 00:05:16.750 they were using for the password but 00:05:14.180 --> 00:05:19.280 they calculated their 28 bits of entropy 00:05:16.750 --> 00:05:22.250 whereas in the bottom-left example that 00:05:19.280 --> 00:05:23.960 correct horse battery staple they assume 00:05:22.250 --> 00:05:26.630 that you're working from a dictionary of 00:05:23.960 --> 00:05:28.850 about 2,000 words and so when you 00:05:26.630 --> 00:05:30.770 combine four of those words together you 00:05:28.850 --> 00:05:31.850 get about 44 bits of entropy from that 00:05:30.770 --> 00:05:34.670 so it's much more secure than the 00:05:31.850 --> 00:05:36.620 example before it so any questions about 00:05:34.670 --> 00:05:43.520 this definition of entropy or why it's 00:05:36.620 --> 00:05:45.620 useful and so when you're generating 00:05:43.520 --> 00:05:47.060 your own passwords keep this in mind you 00:05:45.620 --> 00:05:48.800 want a high entropy password and the 00:05:47.060 --> 00:05:50.360 exact number you need depends on exactly 00:05:48.800 --> 00:05:52.190 what you're trying to protect against 00:05:50.360 --> 00:05:53.540 like in general a concept in securities 00:05:52.190 --> 00:05:55.700 you have to keep in mind what your 00:05:53.540 --> 00:05:56.930 threat model is like what attackers 00:05:55.700 --> 00:05:58.460 you're concerned about what kinds of 00:05:56.930 --> 00:06:01.160 technique the attackers might be using 00:05:58.460 --> 00:06:02.930 for example this comic refers to an 00:06:01.160 --> 00:06:04.580 attacker that can guess a thousand 00:06:02.930 --> 00:06:07.100 passwords a second this might be 00:06:04.580 --> 00:06:09.140 something that's possible for say a web 00:06:07.100 --> 00:06:11.480 service that allows people to try to log 00:06:09.140 --> 00:06:12.830 in with your email and then random 00:06:11.480 --> 00:06:15.430 passwords that the attacker is trying 00:06:12.830 --> 00:06:17.600 but this thousand passwords the second 00:06:15.430 --> 00:06:19.970 model might not be accurate for other 00:06:17.600 --> 00:06:21.590 scenarios for example an offline 00:06:19.970 --> 00:06:23.870 password cracking scenario or maybe the 00:06:21.590 --> 00:06:25.790 attacker has broken into a website and 00:06:23.870 --> 00:06:27.920 downloaded their database and they have 00:06:25.790 --> 00:06:28.700 some obfuscated form of your password 00:06:27.920 --> 00:06:30.560 and they're trying to figure out what 00:06:28.700 --> 00:06:31.880 the password is maybe they can paralyze 00:06:30.560 --> 00:06:34.010 this attack and make it go to million 00:06:31.880 --> 00:06:35.510 guesses a second and so exactly how much 00:06:34.010 --> 00:06:37.210 entropy you need depends on exactly what 00:06:35.510 --> 00:06:39.350 you're trying to protect against but 00:06:37.210 --> 00:06:40.230 roughly forty bits of entropy might be 00:06:39.350 --> 00:06:42.660 good enough for 00:06:40.230 --> 00:06:44.370 which is protected by a website and 00:06:42.660 --> 00:06:47.340 you're concerned about online password 00:06:44.370 --> 00:06:49.140 guesses and then maybe something like 80 00:06:47.340 --> 00:06:51.000 bits of entropy might be good if you're 00:06:49.140 --> 00:06:52.530 concerned about offline attacks and you 00:06:51.000 --> 00:06:58.260 want to be really really secure so 00:06:52.530 --> 00:07:00.390 they're rough guidelines you can use and 00:06:58.260 --> 00:07:02.370 then how do you actually generate strong 00:07:00.390 --> 00:07:03.990 passwords well you have some model for 00:07:02.370 --> 00:07:05.340 password for example the for dictionary 00:07:03.990 --> 00:07:07.320 works thing and you can actually get a 00:07:05.340 --> 00:07:08.700 dictionary and then you can use methods 00:07:07.320 --> 00:07:10.680 like dice where so there's a song we 00:07:08.700 --> 00:07:12.840 linked to in the lecture notes where you 00:07:10.680 --> 00:07:14.460 can actually get physical dye and roll 00:07:12.840 --> 00:07:15.900 them and then map dice rolls to 00:07:14.460 --> 00:07:17.730 dictionary words in order to eventually 00:07:15.900 --> 00:07:19.290 turn that into a password and doing 00:07:17.730 --> 00:07:21.540 something like this using some kind of 00:07:19.290 --> 00:07:24.150 physical token that you know is random 00:07:21.540 --> 00:07:26.280 like a balanced die or a coin that you 00:07:24.150 --> 00:07:27.870 know is balanced is a good thing to do 00:07:26.280 --> 00:07:29.610 because humans are actually not good at 00:07:27.870 --> 00:07:31.200 choosing random numbers right if I just 00:07:29.610 --> 00:07:32.940 asked you to name a random number for 1 00:07:31.200 --> 00:07:34.710 to 100 chances are that you're probably 00:07:32.940 --> 00:07:35.790 not doing so uniformly at random very 00:07:34.710 --> 00:07:37.200 well and so that's why it's actually 00:07:35.790 --> 00:07:42.300 good to use these physical tokens in 00:07:37.200 --> 00:07:44.610 order to produce randomness so entropy 00:07:42.300 --> 00:07:49.280 that's our first concept recovering any 00:07:44.610 --> 00:07:52.080 questions about that or about this comic 00:07:49.280 --> 00:07:54.600 great so getting into slightly more 00:07:52.080 --> 00:07:55.860 interesting and complicated topics the 00:07:54.600 --> 00:07:58.590 next thing we're going to talk about is 00:07:55.860 --> 00:08:00.510 hash functions so hopefully most of you 00:07:58.590 --> 00:08:02.280 were here during the get lecture where 00:08:00.510 --> 00:08:04.680 we talked about the sha-1 hash function 00:08:02.280 --> 00:08:12.150 used in get so now going into that topic 00:08:04.680 --> 00:08:14.100 in a little bit more detail hash 00:08:12.150 --> 00:08:16.620 functions at a high level are functions 00:08:14.100 --> 00:08:20.220 that map a variable amount of data into 00:08:16.620 --> 00:08:22.950 a fixed size output so for example the 00:08:20.220 --> 00:08:25.350 sha-1 hash functions is one example of a 00:08:22.950 --> 00:08:30.230 hash function takes in some input of 00:08:25.350 --> 00:08:35.340 some number of bytes and outputs exactly 00:08:30.230 --> 00:08:37.320 160 bits of output so that's kind of the 00:08:35.340 --> 00:08:39.570 type signature of this particular hash 00:08:37.320 --> 00:08:41.040 function and then these functions have 00:08:39.570 --> 00:08:42.690 some number of properties that are 00:08:41.040 --> 00:08:46.710 useful so at a high level 00:08:42.690 --> 00:08:47.820 these can be thought about as hard to 00:08:46.710 --> 00:08:50.010 invert functions that have 00:08:47.820 --> 00:08:53.190 random-looking outputs we can actually 00:08:50.010 --> 00:08:54.080 try this out on some random piece of 00:08:53.190 --> 00:08:56.930 data 00:08:54.080 --> 00:08:59.730 for example if I enter into my terminal 00:08:56.930 --> 00:09:02.130 printf hello this does exactly what you 00:08:59.730 --> 00:09:04.020 would expect it does prints the set to 00:09:02.130 --> 00:09:07.380 standard out and I can pipe this to the 00:09:04.020 --> 00:09:09.090 sha-1 sum command so this is a command 00:09:07.380 --> 00:09:11.700 line program that accepts input via 00:09:09.090 --> 00:09:13.170 standard in and computes this sha-1 00:09:11.700 --> 00:09:14.340 function which takes in some variable 00:09:13.170 --> 00:09:17.400 number of bytes from the input and 00:09:14.340 --> 00:09:20.130 produces a 160-bit output which in this 00:09:17.400 --> 00:09:21.720 particular case is represent or encoded 00:09:20.130 --> 00:09:24.000 as a hexadecimal string so it's a length 00:09:21.720 --> 00:09:26.130 40 hexadecimal string and you see this 00:09:24.000 --> 00:09:27.510 output right here this - just means it 00:09:26.130 --> 00:09:30.660 took it it took its input from 00:09:27.510 --> 00:09:32.400 standardin so this output just looks 00:09:30.660 --> 00:09:33.650 like some random number but one 00:09:32.400 --> 00:09:37.050 important thing is that this is a 00:09:33.650 --> 00:09:39.240 deterministic number if you try the same 00:09:37.050 --> 00:09:40.680 command on your own computer printf 00:09:39.240 --> 00:09:42.990 hello sha-1 something you will get the 00:09:40.680 --> 00:09:44.340 same number out so sha-1 is some 00:09:42.990 --> 00:09:47.190 well-known function that people have 00:09:44.340 --> 00:09:48.540 agreed upon for all its parameters we'll 00:09:47.190 --> 00:09:51.900 see that if we tweak the input a little 00:09:48.540 --> 00:09:54.390 bit like say changed hello to holo with 00:09:51.900 --> 00:09:55.920 a capital H now I get a completely 00:09:54.390 --> 00:09:57.540 different looking output and this also 00:09:55.920 --> 00:09:58.860 looks like some other kind of random ish 00:09:57.540 --> 00:10:00.090 number even though it is deterministic 00:09:58.860 --> 00:10:07.740 and you could reproduce this on your own 00:10:00.090 --> 00:10:16.140 computer hash functions have a number of 00:10:07.740 --> 00:10:17.610 properties that are pretty important the 00:10:16.140 --> 00:10:18.810 first property that cryptographic hash 00:10:17.610 --> 00:10:21.060 functions have is that their 00:10:18.810 --> 00:10:22.440 non-invertible and what that means is 00:10:21.060 --> 00:10:25.230 that if you take the output from this 00:10:22.440 --> 00:10:27.720 function for example that a a f4 00:10:25.230 --> 00:10:30.390 ballaugh 3 for D strings shown there 00:10:27.720 --> 00:10:32.850 from that output it's hard to figure out 00:10:30.390 --> 00:10:36.510 what the input was that produced that 00:10:32.850 --> 00:10:38.310 output so you can go one way compute the 00:10:36.510 --> 00:10:41.190 sha-1 hash easily but you can't go 00:10:38.310 --> 00:10:43.380 backwards another property that these 00:10:41.190 --> 00:10:50.910 functions have is that their collision 00:10:43.380 --> 00:10:53.400 resistant and what this property means 00:10:50.910 --> 00:10:56.780 is that it's hard to find two different 00:10:53.400 --> 00:10:59.910 inputs that produce the same output and 00:10:56.780 --> 00:11:02.460 so this basically describes what a 00:10:59.910 --> 00:11:04.040 cryptographic hash function is so any 00:11:02.460 --> 00:11:06.510 questions about the kind of 00:11:04.040 --> 00:11:08.900 specification of a cryptographic hash 00:11:06.510 --> 00:11:08.900 function 00:11:09.350 --> 00:11:13.380 okay so what are these hash functions 00:11:11.580 --> 00:11:16.170 actually useful for well we've already 00:11:13.380 --> 00:11:19.260 seen one application in git for content 00:11:16.170 --> 00:11:22.070 address storage so we want it get we 00:11:19.260 --> 00:11:24.180 want some uniform way of naming 00:11:22.070 --> 00:11:26.130 different objects that are in the object 00:11:24.180 --> 00:11:27.900 store and it turns out that get 00:11:26.130 --> 00:11:29.940 addresses all of them by their sha-1 00:11:27.900 --> 00:11:33.000 hash so you have the actual data you 00:11:29.940 --> 00:11:34.830 want to store and then to name that 00:11:33.000 --> 00:11:36.420 particular piece of data you just name 00:11:34.830 --> 00:11:37.650 the sha-1 hash and all of that is stored 00:11:36.420 --> 00:11:41.820 in the object store in that particular 00:11:37.650 --> 00:11:43.230 way we see this when looking at many 00:11:41.820 --> 00:11:45.030 different parts of git for example right 00:11:43.230 --> 00:11:47.520 here I'm going to get repository if I do 00:11:45.030 --> 00:11:50.610 get log it shows me the commits and for 00:11:47.520 --> 00:11:53.280 example this number up here is the 00:11:50.610 --> 00:11:55.290 cryptographic hash function sha-1 apply 00:11:53.280 --> 00:11:58.470 to the commit object that describes this 00:11:55.290 --> 00:12:00.360 particular commit so does anybody know 00:11:58.470 --> 00:12:02.370 why git uses a cryptographic hash 00:12:00.360 --> 00:12:03.480 function here as opposed to so you might 00:12:02.370 --> 00:12:04.830 have heard in your other computer 00:12:03.480 --> 00:12:06.570 science classes like say your 00:12:04.830 --> 00:12:08.850 introductory algorithms class there are 00:12:06.570 --> 00:12:11.010 things called hash functions without the 00:12:08.850 --> 00:12:13.740 word cryptographic appended in front of 00:12:11.010 --> 00:12:16.080 them and they have similar properties 00:12:13.740 --> 00:12:18.570 that they turn a variable sized input 00:12:16.080 --> 00:12:21.300 into some fixed size output but they 00:12:18.570 --> 00:12:23.400 don't quite have these properties where 00:12:21.300 --> 00:12:24.810 it's hard to find an input that produces 00:12:23.400 --> 00:12:26.820 a particular output or things like that 00:12:24.810 --> 00:12:29.010 it's a kind of weaker definition than 00:12:26.820 --> 00:12:29.880 this so why is it that in get we care 00:12:29.010 --> 00:12:32.100 about having a cryptographic hash 00:12:29.880 --> 00:12:33.810 function as opposed to just a regular 00:12:32.100 --> 00:12:36.080 old hash function does anybody have any 00:12:33.810 --> 00:12:36.080 ideas 00:12:45.390 --> 00:12:52.180 yeah that's that's basically it that we 00:12:49.180 --> 00:12:54.010 don't want to have kind of conflicts in 00:12:52.180 --> 00:12:55.660 the output from this hash function like 00:12:54.010 --> 00:12:57.700 every commit is identified by a hash 00:12:55.660 --> 00:12:59.770 function every file is identified by the 00:12:57.700 --> 00:13:01.000 hash of that file if it were ever the 00:12:59.770 --> 00:13:03.190 case that two different pieces of 00:13:01.000 --> 00:13:05.620 content in practice produce the same 00:13:03.190 --> 00:13:07.600 output that is if the function were not 00:13:05.620 --> 00:13:09.610 collision resistant that could be really 00:13:07.600 --> 00:13:12.010 problematic right because then you and I 00:13:09.610 --> 00:13:13.990 we could have to do to get repos that we 00:13:12.010 --> 00:13:15.760 think are the same we check out the same 00:13:13.990 --> 00:13:18.460 commit hash and we might end up with 00:13:15.760 --> 00:13:21.400 different files and this is concerning 00:13:18.460 --> 00:13:23.380 because git is used to track software a 00:13:21.400 --> 00:13:25.510 track development of software and it's 00:13:23.380 --> 00:13:28.180 also kind of involved in making sure 00:13:25.510 --> 00:13:29.440 that the right people are authoring the 00:13:28.180 --> 00:13:30.700 software nothing funny has happened in 00:13:29.440 --> 00:13:32.620 the process for example there all these 00:13:30.700 --> 00:13:34.540 open source projects like the Linux 00:13:32.620 --> 00:13:37.150 kernel where development is done using 00:13:34.540 --> 00:13:39.610 git it would be really bad if some 00:13:37.150 --> 00:13:41.290 contributor to get could say edit some 00:13:39.610 --> 00:13:42.970 file and propose some change that looks 00:13:41.290 --> 00:13:44.980 pretty benign like oh let me go and 00:13:42.970 --> 00:13:46.720 improve this part of Linux submit that 00:13:44.980 --> 00:13:48.970 change request to the Linux developers 00:13:46.720 --> 00:13:52.120 and then in practice actually supply a 00:13:48.970 --> 00:13:54.220 git repository that has the same commit 00:13:52.120 --> 00:13:55.570 hash and whatnot but actually the file 00:13:54.220 --> 00:13:59.080 contents are different there's something 00:13:55.570 --> 00:14:00.730 malicious so git actually relies on this 00:13:59.080 --> 00:14:03.190 sha-1 function being a cryptographic 00:14:00.730 --> 00:14:10.150 hash function in order to achieve 00:14:03.190 --> 00:14:11.680 security any questions about that and 00:14:10.150 --> 00:14:13.810 some other interesting applications of 00:14:11.680 --> 00:14:15.670 hash functions so as we saw hash 00:14:13.810 --> 00:14:17.950 functions turn big inputs into small 00:14:15.670 --> 00:14:19.750 outputs and in a way because the hash 00:14:17.950 --> 00:14:21.670 function is collision resistant the 00:14:19.750 --> 00:14:24.430 output can be used to kind of attest to 00:14:21.670 --> 00:14:27.040 or identify the input and so you can 00:14:24.430 --> 00:14:30.130 think of a hash as a short summary of a 00:14:27.040 --> 00:14:32.020 file for example in this directory of a 00:14:30.130 --> 00:14:34.780 bunch of files and I can compute the 00:14:32.020 --> 00:14:37.630 sha-1 sum of some file in this directory 00:14:34.780 --> 00:14:40.660 and this is the sha-1 algorithm applied 00:14:37.630 --> 00:14:41.770 to this readme MD file and what's 00:14:40.660 --> 00:14:43.810 interesting is that it is 00:14:41.770 --> 00:14:44.800 computationally hard or like impossible 00:14:43.810 --> 00:14:47.650 you can kind of think of it as 00:14:44.800 --> 00:14:50.230 impossible to find any other file so a 00:14:47.650 --> 00:14:52.870 different file that has the same hash 00:14:50.230 --> 00:14:55.030 output and one scenario in which this is 00:14:52.870 --> 00:14:56.090 useful is when you download files from 00:14:55.030 --> 00:14:59.220 the internet 00:14:56.090 --> 00:15:01.410 for example there are lots of Linux 00:14:59.220 --> 00:15:03.630 distributions that distribute large CD 00:15:01.410 --> 00:15:05.250 or DVD images from their website like I 00:15:03.630 --> 00:15:07.860 can go to Debian org and download the 00:15:05.250 --> 00:15:09.510 latest version of Debian the thing is 00:15:07.860 --> 00:15:10.950 that hosting those files can be 00:15:09.510 --> 00:15:12.420 expensive and so a lot of people are 00:15:10.950 --> 00:15:14.490 nice enough to host mirrors of these 00:15:12.420 --> 00:15:17.520 files so instead of downloading Debian 00:15:14.490 --> 00:15:20.310 from Debian org I can go to one of many 00:15:17.520 --> 00:15:21.750 other sites and download what are 00:15:20.310 --> 00:15:23.580 supposed to be the same files that are 00:15:21.750 --> 00:15:25.490 hosted at Debian org but how do I know 00:15:23.580 --> 00:15:28.920 that I actually got the correct file 00:15:25.490 --> 00:15:30.780 like what if I set up a malicious mirror 00:15:28.920 --> 00:15:33.180 and you go to like Anisha is evil Debian 00:15:30.780 --> 00:15:35.190 website calm and then try to download 00:15:33.180 --> 00:15:37.290 Debian turns out that your Linux 00:15:35.190 --> 00:15:38.880 installation is backdoored well one 00:15:37.290 --> 00:15:40.620 thing you could do is download a copy 00:15:38.880 --> 00:15:42.060 from the original double-unit website 00:15:40.620 --> 00:15:43.440 and then download my version and compare 00:15:42.060 --> 00:15:44.550 them but that kind of defeats the 00:15:43.440 --> 00:15:46.110 purpose right because we want to avoid 00:15:44.550 --> 00:15:47.580 downloading things from Debian org 00:15:46.110 --> 00:15:49.080 because hosting these files is expensive 00:15:47.580 --> 00:15:50.490 and we want all these different people 00:15:49.080 --> 00:15:53.660 to be able to mirror copies of the files 00:15:50.490 --> 00:15:55.770 elsewhere so does anybody see how 00:15:53.660 --> 00:15:57.630 cryptographic hash functions could be 00:15:55.770 --> 00:15:59.190 useful to solve this problem that I want 00:15:57.630 --> 00:16:02.970 to download a file from an untrusted 00:15:59.190 --> 00:16:04.950 source but and not from like the trusted 00:16:02.970 --> 00:16:06.210 source itself but maybe I can get some 00:16:04.950 --> 00:16:07.920 small piece of information from this 00:16:06.210 --> 00:16:09.660 trusted source in order to know whether 00:16:07.920 --> 00:16:11.460 the file I downloaded from the untrusted 00:16:09.660 --> 00:16:18.060 source is the thing I was supposed to 00:16:11.460 --> 00:16:19.200 get yes like it's basically just a 00:16:18.060 --> 00:16:20.610 straightforward application of 00:16:19.200 --> 00:16:22.980 cryptographic hash functions 00:16:20.610 --> 00:16:25.620 so what Debian org can do is they can 00:16:22.980 --> 00:16:27.870 produce their kind of correct ISO file 00:16:25.620 --> 00:16:29.730 or whatever they want and instead of 00:16:27.870 --> 00:16:32.550 publishing the file itself on their 00:16:29.730 --> 00:16:35.190 website they can publish a hash of that 00:16:32.550 --> 00:16:37.140 file so compared to the file itself 00:16:35.190 --> 00:16:39.390 which may be many gigabytes this is only 00:16:37.140 --> 00:16:41.190 like in this particular case 160 bits of 00:16:39.390 --> 00:16:43.620 data right so very cheap to host and 00:16:41.190 --> 00:16:46.260 then what I can do is a user is I can 00:16:43.620 --> 00:16:48.090 download that file from any random 00:16:46.260 --> 00:16:50.460 website it could be an untrusted website 00:16:48.090 --> 00:16:53.780 and after I download I just double check 00:16:50.460 --> 00:16:56.040 the sha-1 hash and if the hash matches 00:16:53.780 --> 00:16:57.330 then I know that I have the right file 00:16:56.040 --> 00:17:00.300 because it's computationally infeasible 00:16:57.330 --> 00:17:01.980 for somebody to give me some different 00:17:00.300 --> 00:17:04.260 file that happens to have the same hash 00:17:01.980 --> 00:17:07.230 because hash functions are collision 00:17:04.260 --> 00:17:10.130 resistant so any questions about that 00:17:07.230 --> 00:17:10.130 application yeah 00:17:18.170 --> 00:17:22.350 yeah so that's a good question like why 00:17:20.490 --> 00:17:24.120 do you need different people to host the 00:17:22.350 --> 00:17:26.100 information like wouldn't it be equally 00:17:24.120 --> 00:17:27.000 expensive for everybody so the answer is 00:17:26.100 --> 00:17:29.010 that question is a little bit 00:17:27.000 --> 00:17:30.840 complicated but like here's that here's 00:17:29.010 --> 00:17:32.970 a partial answer one thing is that 00:17:30.840 --> 00:17:34.500 downloading files from a server is 00:17:32.970 --> 00:17:36.360 affected by how far away the server is 00:17:34.500 --> 00:17:38.730 from you so for example if the servers 00:17:36.360 --> 00:17:40.830 in Massachusetts and you're in say China 00:17:38.730 --> 00:17:42.030 like you have to kind of make a big 00:17:40.830 --> 00:17:43.830 round trip across the internet and that 00:17:42.030 --> 00:17:46.200 may be expensive for a number of reasons 00:17:43.830 --> 00:17:47.700 like the latency is high and the traffic 00:17:46.200 --> 00:17:48.990 traffic needs to go through kind of lots 00:17:47.700 --> 00:17:50.610 of different wires to make its way all 00:17:48.990 --> 00:17:52.680 the way to where you are and so one 00:17:50.610 --> 00:17:54.180 thing that these websites do is that 00:17:52.680 --> 00:17:55.710 they distribute their content to servers 00:17:54.180 --> 00:17:57.060 that are all over the world and then as 00:17:55.710 --> 00:17:58.500 a user you download from the server 00:17:57.060 --> 00:18:00.960 that's closest to you like for example 00:17:58.500 --> 00:18:02.490 mit maintains a Debian package 00:18:00.960 --> 00:18:04.560 repository and like kind of mirrors all 00:18:02.490 --> 00:18:07.230 the Debbie and stuff so if you're a 00:18:04.560 --> 00:18:10.260 Debian user at MIT you can use the MIT 00:18:07.230 --> 00:18:12.060 copy of everything and then you can kind 00:18:10.260 --> 00:18:14.100 of access it over our fast local network 00:18:12.060 --> 00:18:15.690 and that traffic never needs to go to 00:18:14.100 --> 00:18:18.750 the outside Internet at all so it's very 00:18:15.690 --> 00:18:23.130 fast that's a good question any other 00:18:18.750 --> 00:18:24.180 questions ok and then one final kind of 00:18:23.130 --> 00:18:25.530 interesting application of hash 00:18:24.180 --> 00:18:28.650 functions is something called a 00:18:25.530 --> 00:18:30.120 commitment scheme so I want to play a 00:18:28.650 --> 00:18:31.590 game and I need a volunteer for this so 00:18:30.120 --> 00:18:32.850 you don't actually need to get up from 00:18:31.590 --> 00:18:34.830 your seat or anything I was need you to 00:18:32.850 --> 00:18:36.450 talk with me so any volunteers raise 00:18:34.830 --> 00:18:37.730 your hand yeah okay yeah what's your 00:18:36.450 --> 00:18:40.650 name 00:18:37.730 --> 00:18:42.360 Abdul Aziz okay great so Abdul Aziz 00:18:40.650 --> 00:18:44.250 we're going to play a game we're going 00:18:42.360 --> 00:18:46.140 to play a game where I'm going to flip a 00:18:44.250 --> 00:18:48.180 coin and then you're gonna call heads or 00:18:46.140 --> 00:18:49.860 tails and if you call it right you win 00:18:48.180 --> 00:18:52.140 and if you call it wrong you lose and 00:18:49.860 --> 00:18:56.220 there are no stakes for this game but 00:18:52.140 --> 00:18:57.660 just the pride of winning so sadly I 00:18:56.220 --> 00:18:58.950 checked my wallet and all I have is 00:18:57.660 --> 00:19:00.570 dollar bills I don't have any coins with 00:18:58.950 --> 00:19:02.190 me so instead I'm going to just flip the 00:19:00.570 --> 00:19:04.950 coin in my head all right 00:19:02.190 --> 00:19:07.740 so okay I flip the coin call heads or 00:19:04.950 --> 00:19:13.680 tails sorry you lost it was heads I 00:19:07.740 --> 00:19:15.420 don't I play again yeah I can cheat 00:19:13.680 --> 00:19:17.280 right I can just see what you say and 00:19:15.420 --> 00:19:19.950 say the opposite thing so let's try 00:19:17.280 --> 00:19:22.260 fixing this game how about you call 00:19:19.950 --> 00:19:26.250 heads or tails after I say what the 00:19:22.260 --> 00:19:27.510 flip result was okay yeah so if I say oh 00:19:26.250 --> 00:19:35.220 the result is tails what are you gonna 00:19:27.510 --> 00:19:39.030 say are you call tails yeah so is it 00:19:35.220 --> 00:19:40.890 possible to play this guess what guess 00:19:39.030 --> 00:19:43.020 what the coin flip result is game in a 00:19:40.890 --> 00:19:44.550 fair way without having a physical coin 00:19:43.020 --> 00:19:45.660 that we share like because I can't 00:19:44.550 --> 00:19:47.160 really manipulate your physical reality 00:19:45.660 --> 00:19:48.570 if I flip a coin in front of you 00:19:47.160 --> 00:19:50.820 probably trust that it's okay right 00:19:48.570 --> 00:19:51.990 so it turns out that hash functions give 00:19:50.820 --> 00:19:54.720 us a kind of cool way to solve this 00:19:51.990 --> 00:19:57.990 problem to through a idea called a 00:19:54.720 --> 00:19:59.370 commitment scheme so I can say they're 00:19:57.990 --> 00:20:02.670 like here's the construction of the 00:19:59.370 --> 00:20:05.250 solution I can pick heads or tails and 00:20:02.670 --> 00:20:09.390 I'm actually going to pick a big random 00:20:05.250 --> 00:20:14.370 number say like this number here and 00:20:09.390 --> 00:20:16.830 what I can do is compute the sha-1 sum 00:20:14.370 --> 00:20:17.910 of this number at this moment you 00:20:16.830 --> 00:20:19.980 haven't seen this number yet I'm just 00:20:17.910 --> 00:20:22.680 doing all this in my head and then what 00:20:19.980 --> 00:20:25.170 I do is I tell you okay I flipped a coin 00:20:22.680 --> 00:20:26.340 and I'm not going to tell you what the 00:20:25.170 --> 00:20:28.560 result is just yet because you haven't 00:20:26.340 --> 00:20:29.850 called heads or tails but I'll tell you 00:20:28.560 --> 00:20:31.770 what the shell wants some of the result 00:20:29.850 --> 00:20:34.230 is here you go and I tell you this value 00:20:31.770 --> 00:20:36.000 now after this you can call heads or 00:20:34.230 --> 00:20:38.310 tails so what do you say like say say 00:20:36.000 --> 00:20:40.080 heads afterwards what I can do is I can 00:20:38.310 --> 00:20:42.000 reveal to you what my input to this 00:20:40.080 --> 00:20:43.500 function was and then you can 00:20:42.000 --> 00:20:45.480 cross-check this right you can compute 00:20:43.500 --> 00:20:47.040 the sha-1 sum on the input to verify 00:20:45.480 --> 00:20:48.930 that the output is what I said it was 00:20:47.040 --> 00:20:50.760 earlier and then we can have some way of 00:20:48.930 --> 00:20:52.500 mapping these numbers to heads or tails 00:20:50.760 --> 00:20:54.360 so I might have agreed upon beforehand 00:20:52.500 --> 00:20:56.640 that even numbers are heads and odd 00:20:54.360 --> 00:20:58.080 numbers or tails and so this is a way of 00:20:56.640 --> 00:21:01.200 fixing that game so we can actually play 00:20:58.080 --> 00:21:03.600 this game in in our heads right I can 00:21:01.200 --> 00:21:05.970 pick a value but not reveal that value 00:21:03.600 --> 00:21:07.350 to you but I can commit to the value so 00:21:05.970 --> 00:21:09.390 this is a kind of binding commitment 00:21:07.350 --> 00:21:11.310 scheme that I can't change my mind after 00:21:09.390 --> 00:21:14.220 I've told you this but it doesn't reveal 00:21:11.310 --> 00:21:15.480 the original value to you and so this is 00:21:14.220 --> 00:21:17.520 one other neat application of 00:21:15.480 --> 00:21:18.330 cryptographic hash functions so any 00:21:17.520 --> 00:21:24.030 questions about this particular 00:21:18.330 --> 00:21:26.790 construction okay great so moving on to 00:21:24.030 --> 00:21:29.710 the next topic we're going to talk about 00:21:26.790 --> 00:21:35.369 key derivation functions 00:21:29.710 --> 00:21:35.369 [Applause] 00:21:38.650 --> 00:21:46.850 often abbreviate it as KDF so this is a 00:21:45.350 --> 00:21:49.580 concept that's very similar to hash 00:21:46.850 --> 00:21:51.710 functions except it has kind of one 00:21:49.580 --> 00:21:54.740 extra property that it is slow to 00:21:51.710 --> 00:21:56.320 compute for example there's a hash 00:21:54.740 --> 00:22:08.240 function of key derivation function 00:21:56.320 --> 00:22:12.110 known as P pbkdf2 pbkdf2 password-based 00:22:08.240 --> 00:22:14.180 key derivation function that has a kind 00:22:12.110 --> 00:22:15.410 of similar form as these hash functions 00:22:14.180 --> 00:22:16.850 we were talking about here that they 00:22:15.410 --> 00:22:18.380 take in some variable length input in 00:22:16.850 --> 00:22:19.280 pretty so fixed length output but 00:22:18.380 --> 00:22:20.450 they're meant to be used for one 00:22:19.280 --> 00:22:22.370 particular purpose 00:22:20.450 --> 00:22:24.740 the purpose is generally to use the 00:22:22.370 --> 00:22:26.510 fixed length output as a key in another 00:22:24.740 --> 00:22:28.340 cryptographic algorithm and we'll talk 00:22:26.510 --> 00:22:31.310 about those algorithms like what use the 00:22:28.340 --> 00:22:32.900 output of this thing for in a moment but 00:22:31.310 --> 00:22:36.380 a one property of these things is that 00:22:32.900 --> 00:22:39.410 they're slow does anybody have any idea 00:22:36.380 --> 00:22:40.700 why you'd want an algorithm to be slow 00:22:39.410 --> 00:22:42.920 like normally we want algorithms to be 00:22:40.700 --> 00:22:46.000 fast right so why would we want an 00:22:42.920 --> 00:22:46.000 algorithm to be slow yes 00:22:54.430 --> 00:22:59.630 yeah that's exactly it so I'll repeat so 00:22:57.860 --> 00:23:01.940 it goes into the microphone the reason 00:22:59.630 --> 00:23:04.130 you want these to be slow is when you're 00:23:01.940 --> 00:23:05.840 actually using it for something like 00:23:04.130 --> 00:23:07.520 password authentication where you have 00:23:05.840 --> 00:23:08.810 the hash of a password saved and then 00:23:07.520 --> 00:23:10.100 somebody inputs the password you want to 00:23:08.810 --> 00:23:12.440 know if that corresponds to the hash 00:23:10.100 --> 00:23:14.510 it's ok if it's slow because you're only 00:23:12.440 --> 00:23:15.680 doing this check kind of once but the 00:23:14.510 --> 00:23:17.090 other scenario in which you're going to 00:23:15.680 --> 00:23:18.500 be using this function is when 00:23:17.090 --> 00:23:20.540 somebody's trying to brute-force a 00:23:18.500 --> 00:23:22.430 password say a website has their 00:23:20.540 --> 00:23:23.360 password database stolen and somebody's 00:23:22.430 --> 00:23:25.430 going through all the accounts I'm 00:23:23.360 --> 00:23:27.530 trying to break all the passwords well 00:23:25.430 --> 00:23:28.610 in that case you want this to be slow 00:23:27.530 --> 00:23:29.990 because someone's gonna be doing this 00:23:28.610 --> 00:23:31.640 like millions and millions of times and 00:23:29.990 --> 00:23:33.260 you can slow down the attacker a lot by 00:23:31.640 --> 00:23:35.030 making this function slow and so it's 00:23:33.260 --> 00:23:36.500 fine if this takes you like one second 00:23:35.030 --> 00:23:38.750 upon logging in to compute this function 00:23:36.500 --> 00:23:40.100 but when your brute forcing it we don't 00:23:38.750 --> 00:23:41.960 go to a thousand guesses a second like 00:23:40.100 --> 00:23:46.220 in that xkcd comic we can slow it down a 00:23:41.960 --> 00:23:47.860 little bit so what is the output of key 00:23:46.220 --> 00:23:50.060 derivation functions actually used for 00:23:47.860 --> 00:23:52.490 well the next topic we're going to talk 00:23:50.060 --> 00:23:53.570 about probably like one of the most 00:23:52.490 --> 00:23:55.430 classic things when you think about 00:23:53.570 --> 00:24:00.470 cryptography is encryption and 00:23:55.430 --> 00:24:17.300 decryption the next topic is symmetric 00:24:00.470 --> 00:24:18.410 key cryptography and like the rest of 00:24:17.300 --> 00:24:19.700 this lecture we're not going to talk 00:24:18.410 --> 00:24:21.470 about how you implement these we're 00:24:19.700 --> 00:24:23.900 going to talk about the API for a 00:24:21.470 --> 00:24:24.740 symmetric key symmetric key crypto like 00:24:23.900 --> 00:24:28.070 how it's used 00:24:24.740 --> 00:24:30.530 so symmetric key cryptosystems have a 00:24:28.070 --> 00:24:32.930 couple different functions they have a 00:24:30.530 --> 00:24:35.240 key generation function which is a 00:24:32.930 --> 00:24:38.570 randomized function that produces a 00:24:35.240 --> 00:24:42.820 thing we call the key and then they have 00:24:38.570 --> 00:24:42.820 a pair of functions encrypt and decrypt 00:24:45.790 --> 00:24:52.940 and encrypt take as input something we 00:24:49.130 --> 00:24:54.620 refer to as the plaintext and this is 00:24:52.940 --> 00:24:57.710 just some sequence of bytes some data 00:24:54.620 --> 00:24:59.420 and it takes in a key so something that 00:24:57.710 --> 00:25:03.190 came as an output of this key generation 00:24:59.420 --> 00:25:03.190 function and produces 00:25:04.140 --> 00:25:08.730 what we call the ciphertext and then 00:25:06.750 --> 00:25:14.760 decrypt does the opposite of this so it 00:25:08.730 --> 00:25:23.130 takes the ciphertext along with the key 00:25:14.760 --> 00:25:24.929 and produces the plaintext and this 00:25:23.130 --> 00:25:29.429 triple of functions has a couple 00:25:24.929 --> 00:25:31.590 properties one is that like one one team 00:25:29.429 --> 00:25:33.450 you might expect is that this thing 00:25:31.590 --> 00:25:36.290 doesn't really tell you all that much 00:25:33.450 --> 00:25:44.700 about this input to the encryption so 00:25:36.290 --> 00:25:46.559 property number one is given the 00:25:44.700 --> 00:26:02.280 ciphertext you can't figure out the 00:25:46.559 --> 00:26:03.299 plaintext without the key and the other 00:26:02.280 --> 00:26:12.210 property is kind of the obvious 00:26:03.299 --> 00:26:14.460 correctness property that if you take 00:26:12.210 --> 00:26:16.710 something and you encrypt it some 00:26:14.460 --> 00:26:19.559 message M with a key K and then you 00:26:16.710 --> 00:26:24.470 decrypt that ciphertext using the same 00:26:19.559 --> 00:26:24.470 key that gives you back the same message 00:26:24.500 --> 00:26:30.360 this is the kind of obvious correctness 00:26:27.090 --> 00:26:32.280 property so does this description make 00:26:30.360 --> 00:26:33.990 sense does it fit your kind of intuitive 00:26:32.280 --> 00:26:36.000 understanding of taking some piece of 00:26:33.990 --> 00:26:37.679 data and obscuring it so you can't 00:26:36.000 --> 00:26:40.020 really tell anything about the original 00:26:37.679 --> 00:26:42.510 input but then taking that obscured 00:26:40.020 --> 00:26:44.760 result and then passing it there's some 00:26:42.510 --> 00:26:50.190 decryption function given that key to 00:26:44.760 --> 00:26:51.990 retrieve the original input and this 00:26:50.190 --> 00:26:53.130 this isn't really a rigorous definition 00:26:51.990 --> 00:26:55.440 of what it means for something to be 00:26:53.130 --> 00:27:01.799 secure but it's a good enough intuitive 00:26:55.440 --> 00:27:03.179 definition that we can work with it so 00:27:01.799 --> 00:27:08.220 any questions about that description 00:27:03.179 --> 00:27:09.780 there so where can key cryptography be 00:27:08.220 --> 00:27:11.610 useful we'll talk about a whole bunch of 00:27:09.780 --> 00:27:13.110 examples later in this lecture but one 00:27:11.610 --> 00:27:15.150 example we'll talk about right now is 00:27:13.110 --> 00:27:16.719 encrypting files for storage and 00:27:15.150 --> 00:27:20.450 untrusted cloud service 00:27:16.719 --> 00:27:23.539 so consider say something like Dropbox 00:27:20.450 --> 00:27:25.609 or Google Drive or things like that when 00:27:23.539 --> 00:27:27.649 you upload files there you're trusting 00:27:25.609 --> 00:27:30.200 the service to not look at your files or 00:27:27.649 --> 00:27:32.269 do anything malicious with them these 00:27:30.200 --> 00:27:34.129 services like at least the ones I named 00:27:32.269 --> 00:27:36.379 are not intend encrypted or anything 00:27:34.129 --> 00:27:38.119 like that like in theory any employee 00:27:36.379 --> 00:27:39.919 those companies could look at your files 00:27:38.119 --> 00:27:41.539 now of course these companies have lots 00:27:39.919 --> 00:27:43.219 of policies and technical controls in 00:27:41.539 --> 00:27:45.229 place for making sure that that sort of 00:27:43.219 --> 00:27:46.399 thing doesn't happen but that doesn't 00:27:45.229 --> 00:27:48.709 mean that it's not technically possible 00:27:46.399 --> 00:27:50.119 and so one thing you might want to do if 00:27:48.709 --> 00:27:52.639 you don't want to trust these cloud 00:27:50.119 --> 00:27:53.899 services to not peek at your data not do 00:27:52.639 --> 00:27:55.339 like machine learning over them or do 00:27:53.899 --> 00:27:57.259 other sorts of things that you wouldn't 00:27:55.339 --> 00:27:59.359 really want is you can just take your 00:27:57.259 --> 00:28:04.399 files and encrypt them before uploading 00:27:59.359 --> 00:28:05.779 them to these these web services so does 00:28:04.399 --> 00:28:07.039 that idea make sense that I can take my 00:28:05.779 --> 00:28:08.839 file like Center pictures or whatever 00:28:07.039 --> 00:28:10.039 pass it through an encryption function 00:28:08.839 --> 00:28:11.269 and peruse the cipher text and then 00:28:10.039 --> 00:28:13.190 place that cipher text on the web 00:28:11.269 --> 00:28:14.779 service safe for backup purposes or 00:28:13.190 --> 00:28:17.389 whatever and if I ever need that I can 00:28:14.779 --> 00:28:18.979 retrieve the cipher text then use my key 00:28:17.389 --> 00:28:20.269 to decrypt it back into the plaintext 00:28:18.979 --> 00:28:22.190 and they can use a result for doing 00:28:20.269 --> 00:28:29.029 whatever I need to do does that make 00:28:22.190 --> 00:28:30.349 sense yeah yeah so that's that's a good 00:28:29.029 --> 00:28:31.669 question the question is couldn't 00:28:30.349 --> 00:28:34.879 anybody else run it through the same 00:28:31.669 --> 00:28:36.109 encryption program one detail maybe I 00:28:34.879 --> 00:28:38.419 should have explained in a little bit 00:28:36.109 --> 00:28:46.399 more detail is this key generation 00:28:38.419 --> 00:28:48.139 function is randomized and this key has 00:28:46.399 --> 00:28:50.539 high entropy so going back to that topic 00:28:48.139 --> 00:28:55.129 we talked about earlier so like an 00:28:50.539 --> 00:28:58.249 example is we might have aes 256 this is 00:28:55.129 --> 00:29:01.279 one particular symmetric cipher and this 00:28:58.249 --> 00:29:03.589 as the name might indicate has 256 bits 00:29:01.279 --> 00:29:05.570 of entropy in the key and so that means 00:29:03.589 --> 00:29:07.190 that as long as the attacker like 00:29:05.570 --> 00:29:08.959 whoever downloads the cipher text from 00:29:07.190 --> 00:29:10.789 the web service doesn't know your key 00:29:08.959 --> 00:29:11.209 unless they have some better attack in 00:29:10.789 --> 00:29:13.219 place 00:29:11.209 --> 00:29:14.629 they'll have to try all the different 00:29:13.219 --> 00:29:16.940 possible keys and if they're two to the 00:29:14.629 --> 00:29:19.639 256 keys that's too many keys to try in 00:29:16.940 --> 00:29:21.289 a reasonable amount of time does that 00:29:19.639 --> 00:29:26.109 answer the question okay any other 00:29:21.289 --> 00:29:26.109 questions yeah 00:29:35.009 --> 00:29:38.679 that's an excellent question and that 00:29:37.089 --> 00:29:40.119 leads into what I was going to talk 00:29:38.679 --> 00:29:43.509 about next so thanks for that question 00:29:40.119 --> 00:29:45.099 so as you point out like if I lose my 00:29:43.509 --> 00:29:46.659 key I'm kind of stuck right 00:29:45.099 --> 00:29:47.949 I need my key to decrypt that's kind of 00:29:46.659 --> 00:29:49.269 the point of this thing like if I didn't 00:29:47.949 --> 00:29:50.589 need my key to decrypt then this 00:29:49.269 --> 00:29:53.139 wouldn't be a very good crypto system 00:29:50.589 --> 00:29:54.909 and so I can combine this idea of 00:29:53.139 --> 00:29:56.199 symmetric key cryptography with the 00:29:54.909 --> 00:29:58.359 topic we just talked about key 00:29:56.199 --> 00:30:00.039 derivation functions so instead of 00:29:58.359 --> 00:30:01.239 having some key that's randomly 00:30:00.039 --> 00:30:03.459 generated with my key generation 00:30:01.239 --> 00:30:04.239 function say sampling entropy from 00:30:03.459 --> 00:30:11.289 somewhere on my machine 00:30:04.239 --> 00:30:13.419 I can have a passphrase and pass it 00:30:11.289 --> 00:30:17.679 through my key derivation function box 00:30:13.419 --> 00:30:23.259 and this gives me my key and then I can 00:30:17.679 --> 00:30:29.259 take my plaintext and combine it with my 00:30:23.259 --> 00:30:35.259 key in my encrypt function and this 00:30:29.259 --> 00:30:37.359 produces my ciphertext and I store this 00:30:35.259 --> 00:30:39.459 cipher text on the web service but now I 00:30:37.359 --> 00:30:41.609 don't need to save this key instead I 00:30:39.459 --> 00:30:43.839 can just remember in my pass phrase and 00:30:41.609 --> 00:30:45.359 whenever I need my key I can reconstruct 00:30:43.839 --> 00:30:48.359 it from the key derivation function 00:30:45.359 --> 00:30:48.359 question 00:30:56.680 --> 00:30:59.930 yeah so that's a good question the 00:30:58.700 --> 00:31:02.390 question is is the key derivation 00:30:59.930 --> 00:31:05.000 function slow enough to prevent 00:31:02.390 --> 00:31:06.500 brute-force guessing and the answer is 00:31:05.000 --> 00:31:08.600 it depends on how long your passphrase 00:31:06.500 --> 00:31:11.060 is so for example if your passphrase is 00:31:08.600 --> 00:31:12.890 like the string password is probably 00:31:11.060 --> 00:31:14.360 gonna get broken very quickly but as 00:31:12.890 --> 00:31:16.460 long as there's enough entropy in your 00:31:14.360 --> 00:31:17.930 passphrase this is good enough so like 00:31:16.460 --> 00:31:19.370 if I was uploading something to Dropbox 00:31:17.930 --> 00:31:21.770 and I really want it to stay secret I 00:31:19.370 --> 00:31:23.540 think like a 64-bit passphrase really a 00:31:21.770 --> 00:31:24.590 passphrase with 64 bits of entropy it 00:31:23.540 --> 00:31:28.130 would be more than enough in that 00:31:24.590 --> 00:31:30.200 scenario for example and just a quick 00:31:28.130 --> 00:31:31.880 demo of this so there are tools to make 00:31:30.200 --> 00:31:34.310 this really easy to do this is actually 00:31:31.880 --> 00:31:37.100 one of the exercises but we can take a 00:31:34.310 --> 00:31:39.500 tool for example called open SSL and use 00:31:37.100 --> 00:31:42.050 it to apply a symmetric cipher to some 00:31:39.500 --> 00:31:44.060 file so I had my readme text here for 00:31:42.050 --> 00:31:47.090 example readme MD it has a bunch of 00:31:44.060 --> 00:31:50.680 stuff in it and I can do open SSL AES 00:31:47.090 --> 00:31:54.140 256 cbc this is the name of a particular 00:31:50.680 --> 00:31:57.650 symmetric cipher and i can say that i 00:31:54.140 --> 00:32:01.910 want to apply this to read me md and 00:31:57.650 --> 00:32:03.560 produce readme dot and MD let's give it 00:32:01.910 --> 00:32:05.090 some name and then it's asking you for a 00:32:03.560 --> 00:32:06.470 password so by default this works in 00:32:05.090 --> 00:32:08.300 this mode where I provide a passphrase 00:32:06.470 --> 00:32:10.250 is run through a KDF to produce a key 00:32:08.300 --> 00:32:12.410 and that's used for encryption so I'll 00:32:10.250 --> 00:32:15.320 type in some password type it in again 00:32:12.410 --> 00:32:19.310 and now I produce this readme MD file 00:32:15.320 --> 00:32:21.080 and if I look at this it kind of looks 00:32:19.310 --> 00:32:23.000 like garbage and that's at a high level 00:32:21.080 --> 00:32:24.890 the point of a symmetric cipher it 00:32:23.000 --> 00:32:26.450 produces some cipher text that should be 00:32:24.890 --> 00:32:29.690 kind of indistinguishable from random 00:32:26.450 --> 00:32:33.680 data and when I want to decrypt this I 00:32:29.690 --> 00:32:37.670 can run a similar command open SSL AES 00:32:33.680 --> 00:32:40.340 256 cbc dash D for decrypt take the 00:32:37.670 --> 00:32:44.230 input from readme tank done MD and I 00:32:40.340 --> 00:32:49.010 like do like readme dot read need 00:32:44.230 --> 00:32:53.930 decrypted MD as the output and I can 00:32:49.010 --> 00:32:55.850 compare these two files and the 00:32:53.930 --> 00:32:57.530 correctness property of symmetric 00:32:55.850 --> 00:32:59.120 cryptography tells me that this should 00:32:57.530 --> 00:33:01.070 be identical and this indeed is 00:32:59.120 --> 00:33:02.720 identical if I look at the return value 00:33:01.070 --> 00:33:04.920 compare return 0 so that means that are 00:33:02.720 --> 00:33:08.319 the same file 00:33:04.920 --> 00:33:08.319 [Applause] 00:33:08.960 --> 00:33:14.359 so any questions about symmetric key 00:33:11.549 --> 00:33:14.359 cryptography yeah 00:33:20.340 --> 00:33:26.039 so the this particular command did make 00:33:23.700 --> 00:33:29.100 a new file so it took us input readme MD 00:33:26.039 --> 00:33:31.289 and produces output this file so that is 00:33:29.100 --> 00:33:32.879 the encrypted version of the file it 00:33:31.289 --> 00:33:35.779 left the original untouched but then of 00:33:32.879 --> 00:33:47.639 course I could delete it if I wanted to 00:33:35.779 --> 00:33:48.599 yeah that's a good question this is 00:33:47.639 --> 00:33:50.190 something I wasn't gonna talk about in 00:33:48.599 --> 00:33:52.559 too much detail the question is I 00:33:50.190 --> 00:33:55.830 provided the salt argument here and 00:33:52.559 --> 00:33:58.489 where is that stored so the answer is 00:33:55.830 --> 00:34:01.859 that that is stored in this output here 00:33:58.489 --> 00:34:05.460 so this output format stores both the 00:34:01.859 --> 00:34:06.869 salt and the actual output ciphertext so 00:34:05.460 --> 00:34:13.319 can be used in the reconstruction and 00:34:06.869 --> 00:34:14.819 decrypt yeah that's correct it doesn't 00:34:13.319 --> 00:34:19.950 keep any database or anything this is 00:34:14.819 --> 00:34:23.190 fully self-contained yeah and as John 00:34:19.950 --> 00:34:24.869 says the salt is not the secret like the 00:34:23.190 --> 00:34:33.569 the passphrase is what is the secret 00:34:24.869 --> 00:34:36.839 thing here okay so let's go back to so 00:34:33.569 --> 00:34:39.809 the so the question is what is salt the 00:34:36.839 --> 00:34:42.089 idea of a cryptographic salt is probably 00:34:39.809 --> 00:34:47.790 best explained in the context of hash 00:34:42.089 --> 00:34:49.559 functions so one common application of 00:34:47.790 --> 00:34:51.299 hash functions is to store passwords in 00:34:49.559 --> 00:34:53.669 a password database if I have a website 00:34:51.299 --> 00:34:55.290 and I have logins for users like people 00:34:53.669 --> 00:34:57.329 log in with their username and password 00:34:55.290 --> 00:34:59.640 I don't actually want to store people's 00:34:57.329 --> 00:35:01.349 passwords in plain text just like as is 00:34:59.640 --> 00:35:06.440 does anybody know why I wouldn't want to 00:35:01.349 --> 00:35:08.280 do that yes 00:35:06.440 --> 00:35:10.350 exactly what if there was a breach and 00:35:08.280 --> 00:35:12.480 someone got all your data so it's really 00:35:10.350 --> 00:35:13.860 bad if you leak all your users passwords 00:35:12.480 --> 00:35:15.150 it's especially bad because a lot of 00:35:13.860 --> 00:35:17.130 people reuse their passwords across 00:35:15.150 --> 00:35:18.540 different sites so you'll see attackers 00:35:17.130 --> 00:35:20.250 break into one thing like there was big 00:35:18.540 --> 00:35:22.020 yahoo breach a while ago and they find 00:35:20.250 --> 00:35:23.730 all these usernames and passwords and 00:35:22.020 --> 00:35:25.560 then they go and try those same login 00:35:23.730 --> 00:35:27.270 credentials on Google and on Facebook 00:35:25.560 --> 00:35:30.030 and on YouTube and whatnot these people 00:35:27.270 --> 00:35:32.640 reuse passwords so it's bad to store 00:35:30.030 --> 00:35:33.750 plaintext passwords so one thing you 00:35:32.640 --> 00:35:35.400 should do is you should store hashed 00:35:33.750 --> 00:35:37.620 passwords with a hash function or 00:35:35.400 --> 00:35:39.260 ideally a password hashing function 00:35:37.620 --> 00:35:42.360 that's intentionally designed to be slow 00:35:39.260 --> 00:35:44.160 but one thing attackers one thing 00:35:42.360 --> 00:35:45.390 attacker started doing once they realize 00:35:44.160 --> 00:35:47.310 that people started storing hashed 00:35:45.390 --> 00:35:49.170 passwords is they built these things 00:35:47.310 --> 00:35:52.560 called rainbow tables what people did 00:35:49.170 --> 00:35:54.570 was they took a way of generating big 00:35:52.560 --> 00:35:56.700 password lists like the kind of model 00:35:54.570 --> 00:35:58.200 what passwords might look like say take 00:35:56.700 --> 00:36:00.510 all the dictionary words take all 00:35:58.200 --> 00:36:01.770 strings of like length from zero to 00:36:00.510 --> 00:36:03.840 eight and whatnot 00:36:01.770 --> 00:36:06.180 take all of those and then hash them and 00:36:03.840 --> 00:36:08.910 produce a big database mapping hashes 00:36:06.180 --> 00:36:10.530 back to their pre image and so given the 00:36:08.910 --> 00:36:12.030 output of a hash function rather than 00:36:10.530 --> 00:36:13.620 have to like brute force said on the fly 00:36:12.030 --> 00:36:15.210 you can just go look up in this database 00:36:13.620 --> 00:36:16.830 Oh what is the input that corresponds to 00:36:15.210 --> 00:36:19.410 this output and people have built these 00:36:16.830 --> 00:36:22.650 for reasonably large password databases 00:36:19.410 --> 00:36:25.080 and so one thing that you can do in 00:36:22.650 --> 00:36:28.500 reaction to that as a defense is rather 00:36:25.080 --> 00:36:31.350 than storing in your database as your to 00:36:28.500 --> 00:36:34.860 write it rather than storing just the 00:36:31.350 --> 00:36:42.080 hash of the password what you do is you 00:36:34.860 --> 00:36:44.640 compute what's called a salt value and 00:36:42.080 --> 00:36:47.010 what this is is a large random string 00:36:44.640 --> 00:36:50.160 and then what you do is you store in 00:36:47.010 --> 00:36:53.100 your password database the salt which is 00:36:50.160 --> 00:36:54.900 not really a secret like you can store 00:36:53.100 --> 00:36:59.730 this in your password database along 00:36:54.900 --> 00:37:04.980 with a hash of the password with the 00:36:59.730 --> 00:37:08.190 salt appended to it why is this useful 00:37:04.980 --> 00:37:10.320 well this salt is a random unique value 00:37:08.190 --> 00:37:12.060 for every user and so if someone has the 00:37:10.320 --> 00:37:14.640 password safe password one two three on 00:37:12.060 --> 00:37:16.350 one web service if you are just storing 00:37:14.640 --> 00:37:17.970 the hash of the password then the hash 00:37:16.350 --> 00:37:18.900 would be the same on both Web Services 00:37:17.970 --> 00:37:20.970 right because this hash 00:37:18.900 --> 00:37:22.619 function is a deterministic function but 00:37:20.970 --> 00:37:26.369 now since we're using this randomized 00:37:22.619 --> 00:37:28.470 salt value we store the hash of the 00:37:26.369 --> 00:37:29.700 password plus of salt and so even if 00:37:28.470 --> 00:37:32.640 someone's using the same password on 00:37:29.700 --> 00:37:34.589 multiple sites this thing looks 00:37:32.640 --> 00:37:37.140 different in both cases and it makes it 00:37:34.589 --> 00:37:40.770 so these big databases mapping these 00:37:37.140 --> 00:37:42.210 short passwords or hash outputs to the 00:37:40.770 --> 00:37:44.609 short passwords that they came from 00:37:42.210 --> 00:37:47.099 those are no longer useful when you have 00:37:44.609 --> 00:37:48.900 salted passwords you kind of need to do 00:37:47.099 --> 00:37:51.150 the brute-force attack for every user 00:37:48.900 --> 00:37:52.079 once you find their salt value rather 00:37:51.150 --> 00:37:54.210 than being able to use this big 00:37:52.079 --> 00:37:58.829 precomputed database does that answer 00:37:54.210 --> 00:38:00.450 the question of what a salt is and so 00:37:58.829 --> 00:38:02.960 that's what that salt argument is 00:38:00.450 --> 00:38:02.960 related to 00:38:05.580 --> 00:38:12.800 let's see so any questions about 00:38:08.430 --> 00:38:16.740 anything we talked about so far great 00:38:12.800 --> 00:38:20.040 okay so the I'm gonna go ahead and erase 00:38:16.740 --> 00:38:22.230 this and then the last topic we'll talk 00:38:20.040 --> 00:38:23.640 about is one of the most exciting 00:38:22.230 --> 00:38:24.600 developments of cryptography happen 00:38:23.640 --> 00:38:26.580 quite a while ago but it's still a 00:38:24.600 --> 00:38:42.030 really cool concept something called a 00:38:26.580 --> 00:38:43.680 symmetric key cryptography and so this 00:38:42.030 --> 00:38:45.720 is an idea that actually enables a lot 00:38:43.680 --> 00:38:48.510 of the security and privacy related 00:38:45.720 --> 00:38:50.160 features of basically anything you use 00:38:48.510 --> 00:38:53.430 today like we need to go and type in 00:38:50.160 --> 00:38:56.880 like www.google.com/mapmaker flog Rafi 00:38:53.430 --> 00:38:58.290 is used as part of what goes on there so 00:38:56.880 --> 00:38:59.700 this is going to look pretty similar to 00:38:58.290 --> 00:39:04.830 what we talked about in symmetric key 00:38:59.700 --> 00:39:06.120 cryptography except with a twist there's 00:39:04.830 --> 00:39:08.430 a key generation function which 00:39:06.120 --> 00:39:10.530 similarly is randomized but instead of 00:39:08.430 --> 00:39:16.860 producing a single key it produces a 00:39:10.530 --> 00:39:21.570 pair of keys two different things one of 00:39:16.860 --> 00:39:25.260 which is referred to as a public key and 00:39:21.570 --> 00:39:27.750 the other is referred to as a private 00:39:25.260 --> 00:39:29.550 key and then these can be used for 00:39:27.750 --> 00:39:31.650 encryption and decryption in a manner 00:39:29.550 --> 00:39:33.270 kind of similar to symmetric key crypto 00:39:31.650 --> 00:39:35.340 except these different keys have 00:39:33.270 --> 00:39:39.150 different uses now so we have an 00:39:35.340 --> 00:39:41.340 encryption function which takes in a 00:39:39.150 --> 00:39:46.830 plaintext Isles right P here and it 00:39:41.340 --> 00:39:49.110 takes in the public key and praises the 00:39:46.830 --> 00:39:53.250 ciphertext and then I have a decryption 00:39:49.110 --> 00:39:59.190 function which takes in my ciphertext 00:39:53.250 --> 00:40:05.790 and the private key and gives me back my 00:39:59.190 --> 00:40:08.520 plaintext and then similarly to those 00:40:05.790 --> 00:40:11.010 two properties we had over there given 00:40:08.520 --> 00:40:14.070 just the ciphertext we can't figure out 00:40:11.010 --> 00:40:15.720 the plaintext unless we have the private 00:40:14.070 --> 00:40:17.460 key and then we have the obvious 00:40:15.720 --> 00:40:18.809 correctness property that if we encrypt 00:40:17.460 --> 00:40:20.789 something with the private key 00:40:18.809 --> 00:40:23.219 sorry encrypt something with the public 00:40:20.789 --> 00:40:25.380 key and then take that cypher text and 00:40:23.219 --> 00:40:26.819 try decrypting it with the corresponding 00:40:25.380 --> 00:40:28.619 private key that came from this key 00:40:26.819 --> 00:40:30.809 generation process that outputs these 00:40:28.619 --> 00:40:36.059 two different things at once then I get 00:40:30.809 --> 00:40:37.890 the same result back so this is very 00:40:36.059 --> 00:40:39.029 similar to what's above but there's a 00:40:37.890 --> 00:40:40.769 twist that we have these two different 00:40:39.029 --> 00:40:42.839 keys that have different functions and 00:40:40.769 --> 00:40:44.999 it's really neat that this public key 00:40:42.839 --> 00:40:47.309 can actually be made as the name 00:40:44.999 --> 00:40:49.229 indicates public like I could be using a 00:40:47.309 --> 00:40:51.119 crypto system that works like this post 00:40:49.229 --> 00:40:53.279 a public key on the internet for anybody 00:40:51.119 --> 00:40:54.869 to see but keep my private key secret 00:40:53.279 --> 00:40:56.400 and then I have this interesting 00:40:54.869 --> 00:40:58.259 property that anybody on the internet 00:40:56.400 --> 00:41:00.779 can take any piece of content and 00:40:58.259 --> 00:41:02.400 encrypt it for me using my public key 00:41:00.779 --> 00:41:04.409 and send it over the Internet 00:41:02.400 --> 00:41:06.329 to me and then I can decrypt it using my 00:41:04.409 --> 00:41:08.400 private key and as long as my private 00:41:06.329 --> 00:41:10.409 key C's stays secret it doesn't matter 00:41:08.400 --> 00:41:11.759 if my public key is available to anybody 00:41:10.409 --> 00:41:15.089 on the Internet so here's where the 00:41:11.759 --> 00:41:18.569 asymmetry comes from before we were in a 00:41:15.089 --> 00:41:20.549 scenario where like suppose I was on the 00:41:18.569 --> 00:41:21.150 internet but you weren't like talking to 00:41:20.549 --> 00:41:22.890 me face-to-face 00:41:21.150 --> 00:41:25.349 and you wanted to send me some data over 00:41:22.890 --> 00:41:26.609 the Internet over some unencrypted 00:41:25.349 --> 00:41:28.140 Channel where anybody could listen on 00:41:26.609 --> 00:41:30.150 what you were saying and you wanted to 00:41:28.140 --> 00:41:32.009 use symmetric key cryptography well we 00:41:30.150 --> 00:41:33.479 need some way of exchanging a key in 00:41:32.009 --> 00:41:34.769 advance so that you could encrypt some 00:41:33.479 --> 00:41:36.659 plaintext with a key and give me that 00:41:34.769 --> 00:41:38.549 ciphertext over the Internet so that I 00:41:36.659 --> 00:41:41.279 could done decrypt it with that key in 00:41:38.549 --> 00:41:42.839 symmetric key crypto if the keys public 00:41:41.279 --> 00:41:45.299 it's game over like anybody can decrypt 00:41:42.839 --> 00:41:47.339 your stuff whereas an asymmetric key 00:41:45.299 --> 00:41:49.019 cryptography I could take my public key 00:41:47.339 --> 00:41:50.909 and like post it on a bulletin board on 00:41:49.019 --> 00:41:52.650 the Internet and you can go look at that 00:41:50.909 --> 00:41:54.749 take some contents and encrypt them for 00:41:52.650 --> 00:41:55.799 me and then send them over and that 00:41:54.749 --> 00:41:57.269 would be totally fine 00:41:55.799 --> 00:41:59.969 you can only decrypt it with the private 00:41:57.269 --> 00:42:02.249 key and so one analogy that may be 00:41:59.969 --> 00:42:05.609 helpful is comparing these mathematical 00:42:02.249 --> 00:42:07.439 ideas to physical locks so you probably 00:42:05.609 --> 00:42:09.329 have a lock on your door to your house 00:42:07.439 --> 00:42:11.189 and you can put in a key and like turn 00:42:09.329 --> 00:42:12.479 the thing in order to lock the door or 00:42:11.189 --> 00:42:14.339 you can turn it the other way to unlock 00:42:12.479 --> 00:42:15.929 the door so there's a single key and it 00:42:14.339 --> 00:42:17.459 can both lock and unlock the door 00:42:15.929 --> 00:42:19.259 but now consider this alternative 00:42:17.459 --> 00:42:20.640 construction which you might use if say 00:42:19.259 --> 00:42:23.429 I want you to be able to send me a 00:42:20.640 --> 00:42:25.019 message and have it be sent over the 00:42:23.429 --> 00:42:27.209 internet and you I don't really need a 00:42:25.019 --> 00:42:29.549 way to exchange a key with you 00:42:27.209 --> 00:42:30.719 beforehand I could get a box which you 00:42:29.549 --> 00:42:32.350 could put a letter inside and you can 00:42:30.719 --> 00:42:36.010 close the box and I can get one of the 00:42:32.350 --> 00:42:37.540 padlock things which I can give you by I 00:42:36.010 --> 00:42:39.820 could like take those padlock and open 00:42:37.540 --> 00:42:41.530 it and give it to you and you at your 00:42:39.820 --> 00:42:43.360 own leisure could put your message 00:42:41.530 --> 00:42:45.850 inside a box and take this padlock which 00:42:43.360 --> 00:42:48.430 is open and shut it around the box and 00:42:45.850 --> 00:42:50.350 then send it over to me and then I could 00:42:48.430 --> 00:42:52.090 put in my key and unlock it so do you 00:42:50.350 --> 00:42:54.310 see how there is this asymmetry there as 00:42:52.090 --> 00:42:56.050 opposed to the key that I used to open 00:42:54.310 --> 00:42:57.760 the door to my house where the same key 00:42:56.050 --> 00:42:59.920 opens and closes the thing instead I 00:42:57.760 --> 00:43:01.750 give you this open padlock that you have 00:42:59.920 --> 00:43:03.880 the ability to close but not open and 00:43:01.750 --> 00:43:05.590 after you closed it I can use my key 00:43:03.880 --> 00:43:07.120 which I've kept secret in order to open 00:43:05.590 --> 00:43:09.100 the thing and retrieve what's inside 00:43:07.120 --> 00:43:10.930 maybe this analogy is helpful maybe it's 00:43:09.100 --> 00:43:13.720 not the mathematical construction works 00:43:10.930 --> 00:43:17.350 just fine if that works for a year so 00:43:13.720 --> 00:43:18.790 any questions about a symmetric key 00:43:17.350 --> 00:43:21.190 encryption and decryption and how it 00:43:18.790 --> 00:43:25.900 relates to symmetric key crypto how it's 00:43:21.190 --> 00:43:27.940 a little bit different so before we talk 00:43:25.900 --> 00:43:30.370 about applications of this idea I'm 00:43:27.940 --> 00:43:33.580 going to talk about one other set of 00:43:30.370 --> 00:43:36.160 concepts in a symmetric key cryptography 00:43:33.580 --> 00:43:37.870 so these crypto systems give you another 00:43:36.160 --> 00:43:39.940 set of tools which are related to 00:43:37.870 --> 00:43:42.370 encryption and decryption something 00:43:39.940 --> 00:43:44.350 called signing and verifying and this is 00:43:42.370 --> 00:43:46.210 kind of similar to the real world like I 00:43:44.350 --> 00:43:48.370 can get a document and sign it with my 00:43:46.210 --> 00:43:50.260 signature except real world signatures 00:43:48.370 --> 00:43:52.210 are I don't think that hard to forge 00:43:50.260 --> 00:43:56.260 these are pretty hard to forge and 00:43:52.210 --> 00:43:57.940 therefore more useful what does 00:43:56.260 --> 00:44:00.600 signature schemes look like there's a 00:43:57.940 --> 00:44:08.380 function sign that takes us some message 00:44:00.600 --> 00:44:09.910 and the private key so notice this this 00:44:08.380 --> 00:44:14.620 is the private key not the public key 00:44:09.910 --> 00:44:18.370 and it produces a signature and then 00:44:14.620 --> 00:44:23.640 there's another function verify which 00:44:18.370 --> 00:44:27.540 takes in the message the signature and 00:44:23.640 --> 00:44:27.540 the public key this time 00:44:30.430 --> 00:44:35.750 and it tells me it returns a boolean 00:44:33.890 --> 00:44:40.610 whether or not the signature checks out 00:44:35.750 --> 00:44:43.640 and then this pair of functions has the 00:44:40.610 --> 00:44:45.080 property that again these are kind of 00:44:43.640 --> 00:44:49.070 properties that follow the intuition 00:44:45.080 --> 00:44:54.170 that come from physical signatures that 00:44:49.070 --> 00:44:56.990 uh without the private key it's hard to 00:44:54.170 --> 00:44:58.850 produce a signature for any message such 00:44:56.990 --> 00:45:00.380 that you can give the message in the 00:44:58.850 --> 00:45:02.360 signature and the public key to the 00:45:00.380 --> 00:45:07.540 verify function to get it to return true 00:45:02.360 --> 00:45:07.540 like at a high level it's hard to Forge 00:45:09.520 --> 00:45:20.720 it's hard to forge a signature of course 00:45:11.930 --> 00:45:24.200 without the private key and then there's 00:45:20.720 --> 00:45:25.610 the obvious correctness property that if 00:45:24.200 --> 00:45:26.690 you signed a thing with a public key and 00:45:25.610 --> 00:45:28.670 then try verifying it with the 00:45:26.690 --> 00:45:30.170 corresponding sorry if you sign a thing 00:45:28.670 --> 00:45:31.520 with the private key and try to verify 00:45:30.170 --> 00:45:34.010 it with the corresponding public key 00:45:31.520 --> 00:45:38.300 it returns okay that this verification 00:45:34.010 --> 00:45:41.080 checks out so these are two different 00:45:38.300 --> 00:45:44.030 kinds of things you can do with 00:45:41.080 --> 00:45:46.280 asymmetric key cryptosystems 00:45:44.030 --> 00:45:47.510 an example of an asymmetric key 00:45:46.280 --> 00:45:50.360 cryptosystem that you might have heard 00:45:47.510 --> 00:45:51.800 of is something called RSA so RSA is 00:45:50.360 --> 00:45:53.180 designed by a number of people one of 00:45:51.800 --> 00:45:59.330 whom is ron rivest who's a professor 00:45:53.180 --> 00:46:01.730 here so there are couple of interesting 00:45:59.330 --> 00:46:03.170 applications of asymmetric key crypto 00:46:01.730 --> 00:46:04.580 actually like tons and tons and tons of 00:46:03.170 --> 00:46:06.800 you spend like days talking about this 00:46:04.580 --> 00:46:08.540 but a couple examples are email 00:46:06.800 --> 00:46:10.400 encryption so we talked a little bit 00:46:08.540 --> 00:46:12.410 about sending messages what we can do 00:46:10.400 --> 00:46:14.360 with asymmetric key crypto is that you 00:46:12.410 --> 00:46:16.670 can have public keys posted online I 00:46:14.360 --> 00:46:18.530 think some of the instructors have PGP 00:46:16.670 --> 00:46:19.850 public keys on their website so for 00:46:18.530 --> 00:46:21.740 example you go to my website or John's 00:46:19.850 --> 00:46:24.740 website you'll find a public key and 00:46:21.740 --> 00:46:27.410 then what you can do is you can send us 00:46:24.740 --> 00:46:29.060 an encrypted email and so even if that 00:46:27.410 --> 00:46:30.260 message goes through Gmail or whatever 00:46:29.060 --> 00:46:31.970 other mail service throughout my T's 00:46:30.260 --> 00:46:34.040 mail servers if there happens to be an 00:46:31.970 --> 00:46:35.540 attacker snooping on the messages they 00:46:34.040 --> 00:46:38.000 can't make any sense of their contents 00:46:35.540 --> 00:46:39.650 because they're all encrypted and this 00:46:38.000 --> 00:46:42.440 is really cool because you can do this 00:46:39.650 --> 00:46:43.109 without kind of finding us in person and 00:46:42.440 --> 00:46:44.099 exchanging 00:46:43.109 --> 00:46:45.989 which you might have to do in a 00:46:44.099 --> 00:46:47.609 symmetric key cryptosystem you can just 00:46:45.989 --> 00:46:49.289 find our public key which can be posted 00:46:47.609 --> 00:46:52.229 online without causing any issues and 00:46:49.289 --> 00:46:53.759 then send us encrypted email another 00:46:52.229 --> 00:46:56.160 thing asymmetric key crypto is used for 00:46:53.759 --> 00:46:58.019 is private messaging so raise your hand 00:46:56.160 --> 00:47:00.509 if you've used anything like signal or 00:46:58.019 --> 00:47:01.950 telegram or I think what's up is in 00:47:00.509 --> 00:47:05.039 theory antenna encrypted so a good 00:47:01.950 --> 00:47:07.380 number of you these private messaging 00:47:05.039 --> 00:47:09.479 applications also use asymmetric key 00:47:07.380 --> 00:47:11.819 crypto to establish private 00:47:09.479 --> 00:47:14.309 communication channels basically every 00:47:11.819 --> 00:47:16.499 person has associated with them a key 00:47:14.309 --> 00:47:18.390 pair and so your device has run this key 00:47:16.499 --> 00:47:20.400 generation function produced a public 00:47:18.390 --> 00:47:22.049 key and a private key and automatically 00:47:20.400 --> 00:47:23.670 posted your public key to the internet 00:47:22.049 --> 00:47:25.529 so for example if you're using signal 00:47:23.670 --> 00:47:27.029 your public key is on the signal servers 00:47:25.529 --> 00:47:30.029 and then when someone wants to contact 00:47:27.029 --> 00:47:31.709 you their phone can look up your public 00:47:30.029 --> 00:47:33.150 key retreat and once it's retrieve your 00:47:31.709 --> 00:47:35.279 public key they can encrypt information 00:47:33.150 --> 00:47:36.599 for you this is a kind of approximation 00:47:35.279 --> 00:47:38.509 of how their algorithm works but at a 00:47:36.599 --> 00:47:40.979 high level that's what's going on 00:47:38.509 --> 00:47:43.499 another neat application of asymmetric 00:47:40.979 --> 00:47:44.880 key crypto is we were talking about 00:47:43.499 --> 00:47:46.079 earlier like making sure you have the 00:47:44.880 --> 00:47:46.519 right software we downloaded from the 00:47:46.079 --> 00:47:48.599 internet 00:47:46.519 --> 00:47:50.609 asymmetric key crypto can be used to 00:47:48.599 --> 00:47:52.430 sign software releases and this is 00:47:50.609 --> 00:47:55.199 something that people do for example 00:47:52.430 --> 00:47:56.489 like Debian packages or whatever things 00:47:55.199 --> 00:47:57.959 you download from the internet the 00:47:56.489 --> 00:47:59.670 developer will try to sign their 00:47:57.959 --> 00:48:00.630 software so that you can make sure that 00:47:59.670 --> 00:48:01.769 whatever you've downloaded from the 00:48:00.630 --> 00:48:04.799 internet is actually the right thing 00:48:01.769 --> 00:48:06.660 that came from the right person we 00:48:04.799 --> 00:48:07.920 talked about in the get lecture all the 00:48:06.660 --> 00:48:10.440 interesting things you can do with git 00:48:07.920 --> 00:48:15.239 one thing we didn't cover was signing 00:48:10.440 --> 00:48:17.670 related functionality and get so git has 00:48:15.239 --> 00:48:19.799 commits and you can associate with 00:48:17.670 --> 00:48:21.150 commits something called tags at a high 00:48:19.799 --> 00:48:22.949 level you can basically take a git 00:48:21.150 --> 00:48:26.160 commit and attach a signature to it 00:48:22.949 --> 00:48:28.589 which binds your public key to this 00:48:26.160 --> 00:48:31.170 commit and then anybody who has your 00:48:28.589 --> 00:48:32.789 public key can take the commit and your 00:48:31.170 --> 00:48:35.519 public key and make sure that there's a 00:48:32.789 --> 00:48:40.920 legitimate signature on the key or sorry 00:48:35.519 --> 00:48:44.670 on the commit so let me go to like some 00:48:40.920 --> 00:48:46.650 random repository that I have I can look 00:48:44.670 --> 00:48:51.959 at a bunch of tags associated with 00:48:46.650 --> 00:48:55.279 repository if I do look at the raw data 00:48:51.959 --> 00:48:55.279 associated with this tag 00:48:55.500 --> 00:49:02.820 it has some metadata and then a blob of 00:48:59.400 --> 00:49:05.700 like ascii encoded information that i 00:49:02.820 --> 00:49:08.880 can use the get tagged - v4 verify 00:49:05.700 --> 00:49:11.040 command to make sure that oh this is a 00:49:08.880 --> 00:49:13.050 good signature from this person happens 00:49:11.040 --> 00:49:14.220 to be me so I sign the software release 00:49:13.050 --> 00:49:15.630 so that anybody who downloads it from 00:49:14.220 --> 00:49:18.150 the Internet can make sure that they 00:49:15.630 --> 00:49:31.680 actually got an authentic copy yes 00:49:18.150 --> 00:49:33.599 question so the question is what exactly 00:49:31.680 --> 00:49:38.460 is the verify function doing or what is 00:49:33.599 --> 00:49:39.830 it checking against the like if you want 00:49:38.460 --> 00:49:41.130 to know mathematically what's going on 00:49:39.830 --> 00:49:43.619 talk to me 00:49:41.130 --> 00:49:45.510 after this lecture but for from kind of 00:49:43.619 --> 00:49:48.000 an API perspective what's going on here 00:49:45.510 --> 00:49:49.800 is that the signature and also the 00:49:48.000 --> 00:49:52.320 message here are just a blob of bytes 00:49:49.800 --> 00:49:56.130 and it happens to be the case that these 00:49:52.320 --> 00:49:58.560 things are designed such that basically 00:49:56.130 --> 00:50:00.660 if I take for some particular public key 00:49:58.560 --> 00:50:02.910 like if you take my public key it's 00:50:00.660 --> 00:50:06.450 impossible for you without knowledge of 00:50:02.910 --> 00:50:07.950 my private key for any message to find a 00:50:06.450 --> 00:50:10.800 second argument to this function that 00:50:07.950 --> 00:50:12.990 makes it return true you can kind of 00:50:10.800 --> 00:50:14.460 compare it to signing a document like 00:50:12.990 --> 00:50:16.890 you don't know how to forge my signature 00:50:14.460 --> 00:50:19.140 I can take any piece of paper and sign 00:50:16.890 --> 00:50:20.970 it and then anybody who knows what my 00:50:19.140 --> 00:50:22.200 signature looks like I can show my 00:50:20.970 --> 00:50:24.690 document - you can be like yeah that 00:50:22.200 --> 00:50:27.150 checks out but nobody without the 00:50:24.690 --> 00:50:30.060 private key can produce a signature that 00:50:27.150 --> 00:50:35.339 will make this function return true for 00:50:30.060 --> 00:50:36.420 any particular message and any related 00:50:35.339 --> 00:50:39.470 questions started you want me to explain 00:50:36.420 --> 00:50:39.470 any other way or does that make sense 00:50:50.180 --> 00:50:54.119 so any questions about signing software 00:50:52.680 --> 00:50:55.920 or any of the other handful of 00:50:54.119 --> 00:51:01.050 applications talked about of asymmetric 00:50:55.920 --> 00:51:02.460 key crypto well so one final thing I 00:51:01.050 --> 00:51:05.100 want to talk about we're almost out of 00:51:02.460 --> 00:51:07.200 time is key distribution this is a kind 00:51:05.100 --> 00:51:08.130 of interesting side effect of asymmetric 00:51:07.200 --> 00:51:09.330 key cryptography 00:51:08.130 --> 00:51:11.730 it enables a bunch of interesting 00:51:09.330 --> 00:51:12.960 functionality like I can post my public 00:51:11.730 --> 00:51:14.820 key on the internet you can go find it 00:51:12.960 --> 00:51:16.170 and send me encrypted email but how do 00:51:14.820 --> 00:51:18.060 you know that the public key found is 00:51:16.170 --> 00:51:19.410 actually my public key it seems like 00:51:18.060 --> 00:51:22.710 there's a bootstrapping problem here 00:51:19.410 --> 00:51:24.810 right so there are a couple this is like 00:51:22.710 --> 00:51:27.930 a really interesting and really hard 00:51:24.810 --> 00:51:29.400 real-world problem and there are a 00:51:27.930 --> 00:51:31.800 couple different approaches you might 00:51:29.400 --> 00:51:33.780 take to this problem one is kind of a 00:51:31.800 --> 00:51:35.000 lame solution but this thing solves a 00:51:33.780 --> 00:51:37.050 lot of cryptography problems this 00:51:35.000 --> 00:51:39.630 exchange the information out-of-band 00:51:37.050 --> 00:51:41.430 what that means is you want to send me 00:51:39.630 --> 00:51:43.710 encrypted email we'll just talk to me 00:51:41.430 --> 00:51:45.210 after class I'll give you my public key 00:51:43.710 --> 00:51:46.560 on a piece of paper and since you were 00:51:45.210 --> 00:51:48.240 talking to me in person like you know 00:51:46.560 --> 00:51:49.770 that it's actually my public key not 00:51:48.240 --> 00:51:51.930 just somebody like hacked my website and 00:51:49.770 --> 00:51:53.160 stuck some random number on there that 00:51:51.930 --> 00:51:54.420 solves the problem but it's not the most 00:51:53.160 --> 00:51:55.740 elegant there are a couple other 00:51:54.420 --> 00:51:58.650 approaches that different applications 00:51:55.740 --> 00:52:01.410 use so those of you who use signal have 00:51:58.650 --> 00:52:02.970 you ever encountered the phrase safety 00:52:01.410 --> 00:52:06.810 number like verify your safety number 00:52:02.970 --> 00:52:09.180 with so and so so with signal they have 00:52:06.810 --> 00:52:10.950 a way of exchanging public keys which is 00:52:09.180 --> 00:52:12.840 through the signal servers whoever runs 00:52:10.950 --> 00:52:14.280 the signal service just maintains on 00:52:12.840 --> 00:52:16.170 their servers basically a mapping from 00:52:14.280 --> 00:52:17.670 phone numbers to public keys and when I 00:52:16.170 --> 00:52:19.109 say oh I want to message this person 00:52:17.670 --> 00:52:20.190 with this number my phone just goes and 00:52:19.109 --> 00:52:21.930 retrieves their public key from the 00:52:20.190 --> 00:52:24.060 internet and then encrypts the message 00:52:21.930 --> 00:52:27.080 for that public key now does anybody see 00:52:24.060 --> 00:52:27.080 a problem with the setup 00:52:29.750 --> 00:52:38.880 yeah yeah exactly the signal servers our 00:52:36.900 --> 00:52:40.830 point of failure there because if the 00:52:38.880 --> 00:52:42.900 signal servers give me the wrong public 00:52:40.830 --> 00:52:44.610 key like supposed signal just produces a 00:52:42.900 --> 00:52:46.200 new key pair and give me their public 00:52:44.610 --> 00:52:47.940 key now they can read all my messages 00:52:46.200 --> 00:52:50.220 and they could even sit in between and 00:52:47.940 --> 00:52:51.540 transparently decrypt the messages I 00:52:50.220 --> 00:52:52.950 send them and then re encrypt them and 00:52:51.540 --> 00:52:55.170 send them on to their final destination 00:52:52.950 --> 00:52:57.960 like basically I need some way of 00:52:55.170 --> 00:52:59.850 authenticating the public key I get and 00:52:57.960 --> 00:53:01.170 so signal has one solution to this which 00:52:59.850 --> 00:53:04.350 is also just kind of punting the issue 00:53:01.170 --> 00:53:05.220 to out-of-band key exchange you can meet 00:53:04.350 --> 00:53:07.320 up with somebody and they have a 00:53:05.220 --> 00:53:08.820 slightly streamline flow where they show 00:53:07.320 --> 00:53:09.750 QR codes on the screen you take one 00:53:08.820 --> 00:53:10.950 phone and take a picture of the other 00:53:09.750 --> 00:53:12.840 phone screen and vice versa 00:53:10.950 --> 00:53:14.580 and now you've exchanged public keys in 00:53:12.840 --> 00:53:15.930 person and from that point on you've 00:53:14.580 --> 00:53:19.350 bootstrap your encrypted end-to-end 00:53:15.930 --> 00:53:22.230 communication it also has an issue of or 00:53:19.350 --> 00:53:24.150 it also has approach of pinning a public 00:53:22.230 --> 00:53:25.980 key so once you know that a particular 00:53:24.150 --> 00:53:27.750 phone number has a particular public key 00:53:25.980 --> 00:53:30.360 your phone remembers that and if that 00:53:27.750 --> 00:53:32.340 ever changes it'll complain to you and 00:53:30.360 --> 00:53:34.950 then there are a couple other solutions 00:53:32.340 --> 00:53:36.750 to this problem PGP one pop to let used 00:53:34.950 --> 00:53:38.460 to be popular a while ago has this idea 00:53:36.750 --> 00:53:40.350 of web of trust so like I trust people 00:53:38.460 --> 00:53:41.940 who my friends trust so if like John has 00:53:40.350 --> 00:53:43.740 done an out-of-band exchange with say my 00:53:41.940 --> 00:53:45.660 professor then I can probably email my 00:53:43.740 --> 00:53:47.460 professor because like I know that John 00:53:45.660 --> 00:53:48.780 trusts my professor and I trust John so 00:53:47.460 --> 00:53:50.310 you got this chain of trust through 00:53:48.780 --> 00:53:52.380 there that's one interesting approach 00:53:50.310 --> 00:53:53.910 and then another model that's called 00:53:52.380 --> 00:53:55.770 pretty recently as something that a tool 00:53:53.910 --> 00:54:00.350 called key base uses this is a really 00:53:55.770 --> 00:54:00.350 neat whoops 00:54:00.500 --> 00:54:04.950 there's website called key based IO and 00:54:03.750 --> 00:54:06.780 they have a really interesting solution 00:54:04.950 --> 00:54:09.420 to this bootstrapping problem which is 00:54:06.780 --> 00:54:10.590 social proof so saying you probably have 00:54:09.420 --> 00:54:13.500 your friends on Facebook and on Twitter 00:54:10.590 --> 00:54:15.600 and whatnot and it's probably pretty 00:54:13.500 --> 00:54:17.070 hard for an attacker to break into your 00:54:15.600 --> 00:54:18.390 friend's Facebook account at the same 00:54:17.070 --> 00:54:19.680 time as their Twitter account as the 00:54:18.390 --> 00:54:21.000 same time as their hacker news account 00:54:19.680 --> 00:54:23.430 and so on and so there's this 00:54:21.000 --> 00:54:25.860 interesting way of binding public keys 00:54:23.430 --> 00:54:27.780 to a set of social identities such that 00:54:25.860 --> 00:54:30.240 you can retrieve a public key once you 00:54:27.780 --> 00:54:32.610 trust some number of social identities 00:54:30.240 --> 00:54:33.960 corresponding to your friend we have 00:54:32.610 --> 00:54:34.980 links to these in the lecture notes if 00:54:33.960 --> 00:54:38.400 you want to see these things in more 00:54:34.980 --> 00:54:41.160 detail so that's it for our security and 00:54:38.400 --> 00:54:42.690 cryptography lecture and tomorrow's 00:54:41.160 --> 00:54:43.230 lecture will be on a random collection 00:54:42.690 --> 00:54:45.119 of top 00:54:43.230 --> 00:54:48.650 that your instructors find interesting 00:54:45.119 --> 00:54:48.650 so hopefully see you in lecture tomorrow 00:54:51.260 --> 00:54:54.180 I'll also be here for a couple of 00:54:53.040 --> 00:55:08.910 minutes after class if anybody has 00:54:54.180 --> 00:55:09.869 questions yes okay so John feel free to 00:55:08.910 --> 00:55:11.070 leave if you have to leave but I think 00:55:09.869 --> 00:55:11.970 nobody's using the classroom after us 00:55:11.070 --> 00:55:15.150 I'm going to talk about one other 00:55:11.970 --> 00:55:16.770 interesting topic so john brought up the 00:55:15.150 --> 00:55:19.320 fact that a symmetric key cryptography 00:55:16.770 --> 00:55:23.130 is slow and symmetric key cryptography 00:55:19.320 --> 00:55:25.680 is fast and so in practice you don't 00:55:23.130 --> 00:55:28.080 really use just a symmetric key 00:55:25.680 --> 00:55:31.440 cryptography by itself it's usually used 00:55:28.080 --> 00:55:38.280 to bootstrap a more sophisticated 00:55:31.440 --> 00:55:40.710 protocol that you're using one thing you 00:55:38.280 --> 00:55:41.430 might want to do is use a symmetric key 00:55:40.710 --> 00:55:43.890 cryptography 00:55:41.430 --> 00:55:45.869 for signing encrypted email right we 00:55:43.890 --> 00:55:47.730 talked about that example and the way 00:55:45.869 --> 00:55:49.260 that works isn't what you might have 00:55:47.730 --> 00:55:50.910 guessed from our straightforward 00:55:49.260 --> 00:55:52.920 explanation of asymmetric key crypto 00:55:50.910 --> 00:55:54.740 like you don't just use that encrypt 00:55:52.920 --> 00:55:57.630 function up there and call it a day in 00:55:54.740 --> 00:56:05.400 practice what you do is you use hybrid 00:55:57.630 --> 00:56:07.410 encryption to use a combination of 00:56:05.400 --> 00:56:09.109 symmetric key and asymmetric key 00:56:07.410 --> 00:56:12.060 cryptography 00:56:09.109 --> 00:56:14.190 what you do is here I'll draw this as a 00:56:12.060 --> 00:56:21.270 big block diagram you take your message 00:56:14.190 --> 00:56:23.490 m and then I have my public key that I 00:56:21.270 --> 00:56:25.080 want to encrypt for but rather than just 00:56:23.490 --> 00:56:27.119 take these two things and pass it 00:56:25.080 --> 00:56:36.240 through the encryption up there what I 00:56:27.119 --> 00:56:41.820 do is I use the symmetric key gen 00:56:36.240 --> 00:56:43.380 function to produce a symmetric key okay 00:56:41.820 --> 00:56:44.700 I'm gonna like prepend this with 00:56:43.380 --> 00:56:46.380 symmetric so we can distinguish it from 00:56:44.700 --> 00:56:49.410 the public key key generation function 00:56:46.380 --> 00:56:52.050 and then what I do is I take these two 00:56:49.410 --> 00:56:55.250 things pass them through my symmetric 00:56:52.050 --> 00:56:55.250 encryption box 00:57:02.250 --> 00:57:13.750 this produces the ciphertext and now 00:57:09.400 --> 00:57:15.640 this by itself to the sender sorry this 00:57:13.750 --> 00:57:17.530 by itself to the receiver who has the 00:57:15.640 --> 00:57:19.900 private key corresponding to this public 00:57:17.530 --> 00:57:20.890 key here this is not really useful right 00:57:19.900 --> 00:57:26.020 because this is encrypted with a 00:57:20.890 --> 00:57:28.690 symmetric cipher with this key K that 00:57:26.020 --> 00:57:30.670 came from this function that I ran on my 00:57:28.690 --> 00:57:32.830 local machine so I need some way of 00:57:30.670 --> 00:57:34.660 getting this to the person who actually 00:57:32.830 --> 00:57:37.570 used to decrypt the email and so what I 00:57:34.660 --> 00:57:39.610 do is I take this thing and now this 00:57:37.570 --> 00:57:40.840 email might have been big and I use 00:57:39.610 --> 00:57:42.700 symmetric encryption with that because 00:57:40.840 --> 00:57:44.980 symmetric encryption is fast but this 00:57:42.700 --> 00:57:46.600 key is small like it might be 256 bits 00:57:44.980 --> 00:57:49.000 or something so I can take this thing 00:57:46.600 --> 00:58:05.620 and encrypt it with a symmetric 00:57:49.000 --> 00:58:09.640 encryption using the public key and this 00:58:05.620 --> 00:58:12.550 gives me an encrypted key and this thing 00:58:09.640 --> 00:58:14.520 can be decrypted using the private key 00:58:12.550 --> 00:58:18.250 corresponding to that public key to 00:58:14.520 --> 00:58:21.520 reconstruct this so this is on the 00:58:18.250 --> 00:58:23.590 sender's end now the receiver gets this 00:58:21.520 --> 00:58:24.760 and this and kind of does these things 00:58:23.590 --> 00:58:27.520 backwards so you start with the 00:58:24.760 --> 00:58:29.890 encrypted key and use asymmetric 00:58:27.520 --> 00:58:31.330 decryption using your public using your 00:58:29.890 --> 00:58:34.180 private key that corresponds to the 00:58:31.330 --> 00:58:35.350 posted public key to reconstruct this 00:58:34.180 --> 00:58:37.570 key that were used for the symmetric 00:58:35.350 --> 00:58:39.760 encryption box and then use symmetric 00:58:37.570 --> 00:58:41.560 key decryption using that key that was 00:58:39.760 --> 00:58:45.760 reconstructed to take this ciphertext 00:58:41.560 --> 00:58:47.590 and produce the original message so 00:58:45.760 --> 00:58:49.630 there's a kind of interesting example of 00:58:47.590 --> 00:58:56.910 how in practice symmetric and asymmetric 00:58:49.630 --> 00:58:56.910 key cryptography is combined question 00:59:00.589 --> 00:59:08.220 so the question is will you be using the 00:59:02.790 --> 00:59:11.040 same symmetric key generators yes okay 00:59:08.220 --> 00:59:13.650 so you need to kind of agree ahead of 00:59:11.040 --> 00:59:16.280 time which box you're using here so you 00:59:13.650 --> 00:59:21.359 might be like oh I'm going to use AES 00:59:16.280 --> 00:59:24.510 256 GC up here but this is a well known 00:59:21.359 --> 00:59:26.010 function and it's public like the 00:59:24.510 --> 00:59:27.930 attackers allowed to know all the 00:59:26.010 --> 00:59:29.190 parameters this function this is the 00:59:27.930 --> 00:59:34.190 only secret thing that the attacker 00:59:29.190 --> 00:59:40.650 doesn't know the key any other questions 00:59:34.190 --> 00:59:42.000 yeah that's a really good question what 00:59:40.650 --> 00:59:46.020 kind of data is important enough to 00:59:42.000 --> 00:59:47.910 encrypt and I think that depends on your 00:59:46.020 --> 00:59:49.349 threat model like who what kind of 00:59:47.910 --> 00:59:53.099 attackers are you concerned about what 00:59:49.349 --> 00:59:54.180 are you trying to protect against so you 00:59:53.099 --> 00:59:56.070 might have the stance that you just 00:59:54.180 --> 00:59:57.570 don't really care and that like anything 00:59:56.070 --> 00:59:58.980 you communicate with anybody is allowed 00:59:57.570 --> 01:00:00.720 to be public I might be willing to like 00:59:58.980 --> 01:00:03.359 post all my conversation with everybody 01:00:00.720 --> 01:00:05.760 for everybody to see publicly on the 01:00:03.359 --> 01:00:08.130 Internet on the other hand maybe you're 01:00:05.760 --> 01:00:10.560 doing some like security sensitive works 01:00:08.130 --> 01:00:11.970 here working under a contract for the US 01:00:10.560 --> 01:00:13.770 government developing some sensitive 01:00:11.970 --> 01:00:15.060 military stuff if you're sending that 01:00:13.770 --> 01:00:16.560 through the open Internet while you're 01:00:15.060 --> 01:00:18.900 traveling you probably want to be pretty 01:00:16.560 --> 01:00:20.579 darn sure that no eavesdroppers or 01:00:18.900 --> 01:00:22.079 anybody else along the way can see what 01:00:20.579 --> 01:00:23.099 you're sending and that whatever you're 01:00:22.079 --> 01:00:24.990 sending is in fact going to the right 01:00:23.099 --> 01:00:26.339 place and that whoever is receiving it 01:00:24.990 --> 01:00:29.849 can authenticate that it in fact came 01:00:26.339 --> 01:00:31.260 from you so you might be worried about 01:00:29.849 --> 01:00:33.359 all different kinds of adversaries 01:00:31.260 --> 01:00:34.440 depending on your scenario from random 01:00:33.359 --> 01:00:36.810 script kiddies who are trying to break 01:00:34.440 --> 01:00:38.490 into websites to nation state level 01:00:36.810 --> 01:00:40.260 attackers and you'll need different 01:00:38.490 --> 01:00:41.339 types of techniques for defending 01:00:40.260 --> 01:00:47.900 against the different categories of 01:00:41.339 --> 01:00:47.900 attackers any other questions 01:00:51.190 --> 01:00:55.300 well so hopefully see some of you 01:00:53.620 --> 01:00:57.310 tomorrow for a random collection of 01:00:55.300 --> 01:00:59.490 things that John Jose and I find 01:00:57.310 --> 01:00:59.490 interesting