WEBVTT 00:00:00.000 --> 00:00:02.951 Before we start with the technical material, I wanna give you a quick 00:00:02.951 --> 00:00:06.487 overview of what cryptography is about and the different areas of cryptography. So 00:00:06.487 --> 00:00:10.487 the core of cryptography of course is secure communication that essentially 00:00:10.487 --> 00:00:14.539 consists of two parts. The first is secured key establishment and then how do 00:00:14.539 --> 00:00:18.697 we communicate securely once we have a shared key. We already said that secured 00:00:18.697 --> 00:00:22.854 key establishment essentially amounts to Alice and Bob sending messages to one 00:00:22.854 --> 00:00:26.906 another such that at the end of this protocol, there's a shared key that they 00:00:26.906 --> 00:00:30.906 both agree on, shared key K, and beyond that, beyond just a shared key, in fact 00:00:30.906 --> 00:00:35.274 Alice would know that she's talking to Bob and Bob would know that he's talking to 00:00:35.274 --> 00:00:39.964 Alice. But a poor attacker who listens in on this conversation has no idea what the 00:00:39.964 --> 00:00:44.011 shared key is. And we'll see how to do that later on in the course. Now once they 00:00:44.011 --> 00:00:47.657 have a shared key, they want exchange messages securely using this key, and 00:00:47.657 --> 00:00:51.698 we'll talk about encryption schemes that allow them to do that in such a way that 00:00:51.698 --> 00:00:55.491 an attacker can't figure out what messages are being sent back and forth. And 00:00:55.491 --> 00:00:59.630 furthermore an attacker cannot even tamper with this traffic without being detected. 00:00:59.630 --> 00:01:03.227 In other words, these encryption schemes provide both confidentiality and 00:01:03.227 --> 00:01:06.774 integrity. But cryptography does much, much, much more than just these two 00:01:06.774 --> 00:01:10.519 things. And I want to give you a few examples of that. So the first example I 00:01:10.519 --> 00:01:14.468 want to give you is what's called a digital signature. So a digital signature, 00:01:14.468 --> 00:01:18.892 basically, is the analog of the signature in the physical world. In the physical 00:01:18.892 --> 00:01:23.372 world, remember when you sign a document, essentially, you write your signature on 00:01:23.372 --> 00:01:27.740 that document and your signature is always the same. You always write the same 00:01:27.740 --> 00:01:32.164 signature on all documents that you wanna sign. In the digital world, this can't 00:01:32.164 --> 00:01:36.812 possibly work because if the attacker just obtained one signed document from me, he 00:01:36.812 --> 00:01:41.180 can cut and paste my signature unto some other document that I might not have 00:01:41.180 --> 00:01:45.247 wanted to sign. And so, it's simply not possible in a digital world that my 00:01:45.247 --> 00:01:49.590 signature is the same for all documents that I want to sign. So we're gonna talk 00:01:49.590 --> 00:01:53.830 about how to construct digital signatures in the second half of the course. It's 00:01:53.830 --> 00:01:58.123 really quite an interesting primitive and we'll see exactly how to do it. Just to 00:01:58.123 --> 00:02:02.098 give you a hint, the way digital signatures work is basically by making the 00:02:02.098 --> 00:02:06.232 digital signature via function of the content being signed. So an attacker who 00:02:06.232 --> 00:02:10.313 tries to copy my signature from one document to another is not gonna succeed 00:02:10.313 --> 00:02:14.541 because the signature. On the new document is not gonna be the proper function of the 00:02:14.541 --> 00:02:18.526 data in the new document, and as a result, the signature won't verify. And as I said, 00:02:18.526 --> 00:02:22.608 we're gonna see exactly how to construct digital signatures later on and then we'll 00:02:22.608 --> 00:02:27.193 prove that those constructions are secure. Another application of cryptography that I 00:02:27.193 --> 00:02:31.096 wanted to mention, is anonymous communication. So, here, imagine user 00:02:31.096 --> 00:02:35.828 Alice wants to talk to some chat server, Bob. And, perhaps she wants to talk about 00:02:35.828 --> 00:02:40.382 a medical condition, and so she wants to do that anonymously, so that the chat 00:02:40.382 --> 00:02:45.113 server doesn't actually know who she is. Well, there's a standard method, called a NOTE Paragraph 00:02:45.113 --> 00:02:49.946 mixnet, that allows Alice to communicate over the public internet with Bob through 00:02:49.946 --> 00:02:54.856 a sequence of proxies such that at the end of the communication Bob has no idea who he 00:02:54.856 --> 00:02:59.537 just talked to. The way mixnets work is basically as Alice sends her messages 00:02:59.537 --> 00:03:03.818 to Bob through a sequence of proxies, these messages get encrypted and 00:03:03.818 --> 00:03:08.271 decrypted appropriately so that Bob has no idea who he talked to and the proxies 00:03:08.271 --> 00:03:12.724 themselves don't even know that Alice is talking to Bob, or that actually who is 00:03:12.724 --> 00:03:16.750 talking to whom more generally. One interesting thing about this anonymous 00:03:16.750 --> 00:03:20.498 communication channel is, this is bi-directional. In other words, even 00:03:20.498 --> 00:03:24.743 though Bob has no idea who he's talking to, he can still respond to Alice and 00:03:24.743 --> 00:03:29.153 Alice will get those messages. Once we have anonymous communication, we can build 00:03:29.153 --> 00:03:33.784 other privacy mechanisms. And I wanna give you one example which is called anonymous 00:03:33.784 --> 00:03:37.643 digital cash. Remember that in the physical world if I have a physical 00:03:37.643 --> 00:03:42.108 dollar, I can walk into a bookstore and buy a book and the merchant would have no 00:03:42.108 --> 00:03:46.876 idea who I am. The question is whether we can do the exact same thing in the digital 00:03:46.876 --> 00:03:50.963 world. In the digital world, basically, Alice might have a digital dollar, 00:03:50.963 --> 00:03:55.984 a digital dollar coin. And she might wanna spend that digital dollar at some online 00:03:55.984 --> 00:04:00.760 merchants, perhaps some online bookstore. Now, what we'd like to do is make it so 00:04:00.760 --> 00:04:05.539 that when Alice spends her coin at the bookstore, the bookstore would have no 00:04:05.539 --> 00:04:10.629 idea who Alice is. So we provide the same anonymity that we get from physical cash. 00:04:10.629 --> 00:04:15.470 Now the problem is that in the digital world, Alice can take the coin that she 00:04:15.470 --> 00:04:20.250 had, this one dollar coin, and before she spent it, she can actually make copies of it. 00:04:20.250 --> 00:04:24.086 And then all of a sudden instead of just having just one dollar coin now all 00:04:24.093 --> 00:04:27.936 of a sudden she has three dollar coins and they're all the same of course, and 00:04:27.936 --> 00:04:31.828 there's nothing preventing her from taking those replicas of a dollar coin and 00:04:31.828 --> 00:04:35.819 spending it at other merchants. And so the question is how do we provide anonymous 00:04:35.819 --> 00:04:39.849 digital cash? But at the same time, also prevent Alice from double spending the 00:04:39.849 --> 00:04:43.760 dollar coin at different merchants. In some sense there's a paradox here where 00:04:43.760 --> 00:04:47.879 anonymity is in conflict with security because if we have anonymous cash there's 00:04:47.879 --> 00:04:51.999 nothing to prevent Alice from double spending the coin and because the coin is 00:04:51.999 --> 00:04:56.244 anonymous we have no way of telling who committed this fraud. And so the question 00:04:56.244 --> 00:05:00.394 is how do we resolve this tension. And it turns out, it's completely doable. And 00:05:00.394 --> 00:05:04.757 we'll talk about anonymous digital cash later on. Just to give you a hint, I'll 00:05:04.757 --> 00:05:09.173 say that the way we do it is basically by making sure that if Alice spends the coin 00:05:09.173 --> 00:05:13.764 once then no one knows who she is, but if she spends the coin more than once, all of 00:05:13.764 --> 00:05:17.878 a sudden, her identity is completely exposed and then she could be subject to 00:05:17.878 --> 00:05:22.096 all sorts of legal problems. And so that's how anonymous digital cash would 00:05:22.096 --> 00:05:26.158 work at a high level and we'll see how to implement it later on in the course. 00:05:26.158 --> 00:05:30.219 Another application of cryptography has to do with more abstract protocols, but 00:05:30.219 --> 00:05:34.333 before I speak the general result, I want to give you two examples. So the 00:05:34.333 --> 00:05:38.343 first example has to do with election systems. So here's the election problem. 00:05:38.343 --> 00:05:42.656 Suppose ... we have two parties, party zero and party one. And voters vote for these 00:05:42.656 --> 00:05:47.101 parties. So for example, this voter could have voted for party zero, this voter voted for 00:05:47.101 --> 00:05:52.313 party one. And so on. So in this election, party zero got three votes and party two 00:05:52.313 --> 00:05:56.590 got two votes. So the winner of the election, of course, is party zero. But 00:05:56.590 --> 00:06:01.579 more generally, the winner of the election is the party who receives the majority of 00:06:01.579 --> 00:06:06.453 the votes. Now, the voting problem is the following. The voters would somehow like 00:06:06.453 --> 00:06:11.720 to compute the majority of the votes, but do it in such a way such that nothing else 00:06:11.720 --> 00:06:16.797 is revealed about their individual votes. Okay? So the question is how to do that? 00:06:16.797 --> 00:06:21.493 And to do so, we're gonna introduce an election center who's gonna help us 00:06:21.493 --> 00:06:26.633 compute the majority, but keep the votes otherwise secret. And what the parties 00:06:26.633 --> 00:06:32.027 will do is they will each send the funny encryption of their votes to the election 00:06:32.027 --> 00:06:36.949 center in such a way that at the end of the election, the election center is able 00:06:36.949 --> 00:06:41.615 to compute and output the winner of the election. However, other than the winner 00:06:41.615 --> 00:06:46.580 of the election, nothing else is revealed about the individual votes. The individual 00:06:46.580 --> 00:06:51.366 votes otherwise remain completely private. Of course the election center is also 00:06:51.366 --> 00:06:56.331 gonna verify that this voter for example is allowed to vote and that the voter has 00:06:56.331 --> 00:07:00.818 only voted once. But other than that information the election center and the 00:07:00.818 --> 00:07:05.484 rest of the world learned nothing else about that voter's vote other than the 00:07:05.484 --> 00:07:10.104 result of the election. So this is an example of a protocol that involves six 00:07:10.104 --> 00:07:14.430 parties. In this case, there are five voters in one election center. These 00:07:14.430 --> 00:07:19.417 parties compute amongst themselves. And at the end of the computation, the result of 00:07:19.417 --> 00:07:24.404 the election is known but nothing else is revealed about the individual inputs. Now 00:07:24.404 --> 00:07:29.156 a similar problem comes up in the context of private auctions. So in a private 00:07:29.156 --> 00:07:34.160 auction every bidder has his own bid that he wants to bid. And now suppose the 00:07:34.160 --> 00:07:39.356 auction mechanism that's being used is what's called a Vickrey auction where the 00:07:39.356 --> 00:07:45.287 definition of a Vickrey auction is that the winner is the highest bidder. But the 00:07:45.287 --> 00:07:50.099 amounts that the winner pays is actually the second highest bid. So he pays the 00:07:50.099 --> 00:07:54.850 second highest bid. Okay, so this is a standard auction mechanism called the 00:07:54.850 --> 00:08:00.028 Vickrey auction. And now what we'd like to do is basically enable the participants to 00:08:00.028 --> 00:08:04.779 compute, to figure out who the highest bidder is and how much he's supposed to 00:08:04.779 --> 00:08:09.165 pay, but other than that, all other information about the individual bids 00:08:09.165 --> 00:08:14.160 should remain secret. So for example, the actual amount that the highest bidder bid 00:08:14.160 --> 00:08:19.225 should remain secret. The only thing that should become public is the second highest 00:08:19.225 --> 00:08:23.526 bid and the identity of the highest bidder. So again now, the way we will do 00:08:23.526 --> 00:08:28.172 that is we'll introduce an auction center, and in a similar way, essentially, everybody 00:08:28.172 --> 00:08:32.588 will send their encrypted bids to the auction center. The auction center will 00:08:32.588 --> 00:08:37.119 compute the identity of the winner and in fact he will also compute the second 00:08:37.119 --> 00:08:41.822 highest bid but other than these two values, nothing else is revealed about the 00:08:41.822 --> 00:08:46.126 individual bids. Now, this is actually an example of a much more general problem 00:08:46.126 --> 00:08:50.264 called secure multi-party computation. Let me explain what secure multi-party 00:08:50.264 --> 00:08:54.618 computation is about. So here, basically abstractly, the participants have a secret 00:08:54.618 --> 00:08:58.649 inputs to themselves. So, in the case of an election, the inputs would be the 00:08:58.649 --> 00:09:02.787 votes. In the case of an auction, the inputs would be the secret bids. And then 00:09:02.787 --> 00:09:06.959 what they would like to do is compute some sort of a function of their inputs. 00:09:06.959 --> 00:09:10.840 Again, in the case of an election, the function's a majority. In the case of 00:09:10.840 --> 00:09:15.088 auction, the function happens to be the second highest, largest number among x one 00:09:15.088 --> 00:09:19.179 to x four. And the question is, how can they do that, such that the value of the 00:09:19.179 --> 00:09:23.375 function is revealed, but nothing else is revealed about the individual votes? So 00:09:23.375 --> 00:09:27.675 let me show you kind of a dumb, insecure way of doing it. What we do is introduce a 00:09:27.675 --> 00:09:31.774 trusted party. And then, this trusted authority basically collects individual 00:09:31.774 --> 00:09:36.223 inputs. And it kinda promises to keep the individual inputs secret, so that only it 00:09:36.223 --> 00:09:40.510 would know what they are. And then, it publishes the value of the function, to 00:09:40.510 --> 00:09:44.742 the world. So, the idea is now that the value of the function became public, but 00:09:44.742 --> 00:09:48.812 nothing else is revealed about the individual inputs. But, of course, you got 00:09:48.812 --> 00:09:52.990 this trusted authority that you got to trust, and if for some reason it's not 00:09:52.990 --> 00:09:57.168 trustworthy, then you have a problem. And so, there's a very central theorem in 00:09:57.168 --> 00:10:01.001 crypto and it really is quite a surprising fact. That says that any 00:10:01.001 --> 00:10:05.204 computation you'd like to do, any function F you'd like to compute, that you can 00:10:05.204 --> 00:10:09.302 compute with a trusted authority, you can also do without a trusted authority. 00:10:09.302 --> 00:10:13.559 Let me at a high level explain what this means. Basically, what we're gonna do, is 00:10:13.559 --> 00:10:17.816 we're gonna get rid of the authority. So the parties are actually not gonna send 00:10:17.816 --> 00:10:21.807 their inputs to the authority. And, in fact, there's no longer going to be an 00:10:21.807 --> 00:10:26.011 authority in the system. Instead, what the parties are gonna do, is they're gonna 00:10:26.011 --> 00:10:30.567 talk to one another using some protocol. Such that at the end of the protocol all 00:10:30.567 --> 00:10:34.890 of a sudden the value of the function becomes known to everybody. And yet 00:10:34.890 --> 00:10:39.390 nothing other than the value of the function is revealed. In other words, the 00:10:39.390 --> 00:10:43.639 individual inputs is still kept secret. But again there's no authority, there's 00:10:43.639 --> 00:10:47.867 just a way for them to talk to one another such that the final output is revealed. So 00:10:47.867 --> 00:10:51.846 this is a fairly general result, it's kind of a surprising fact that is at all 00:10:51.846 --> 00:10:56.024 doable. And in fact it is and towards the end of the class we'll actually see how to 00:10:56.024 --> 00:11:00.577 make this happen. Now, there are some applications of cryptography that I can't 00:11:00.577 --> 00:11:05.560 classify any other way other than to say that they are purely magical. Let me give 00:11:05.560 --> 00:11:10.240 you two examples of that. So the first is what's called privately outsourcing 00:11:10.240 --> 00:11:15.224 computation. So I'll give you an example of a Google search just to illustrate the 00:11:15.224 --> 00:11:20.329 point. So imagine Alice has a search query that she wants to issue. It turns out that 00:11:20.329 --> 00:11:25.434 there are very special encryption schemes such that Alice can send an encryption of 00:11:25.434 --> 00:11:30.368 her query to Google. And then because of the property of the encryption scheme 00:11:30.368 --> 00:11:35.304 Google can actually compute on the encrypted values without knowing what the 00:11:35.304 --> 00:11:40.368 plain texts are. So Google can actually run its massive search algorithm on the 00:11:40.368 --> 00:11:44.903 encrypted query and recover in encrypted results. Okay. Google will send the 00:11:44.903 --> 00:11:49.242 encrypted results back to Alice. Alice will decrypt and then she will receive the 00:11:49.242 --> 00:11:53.689 results. But the magic here is all Google saw was just encryptions of her queries 00:11:53.689 --> 00:11:57.493 and nothing else. And so, Google as a result has no idea what Alice just 00:11:57.493 --> 00:12:01.672 searched for and nevertheless Alice actually learned exactly what she 00:12:01.672 --> 00:12:05.812 wanted to learn. Okay so, these are magical kind of encryption schemes. They're 00:12:05.812 --> 00:12:09.985 fairly recent, this is only a new development from about two or three years 00:12:09.985 --> 00:12:14.436 ago, that allows us to compute on encrypted data, even though we don't really know 00:12:14.436 --> 00:12:18.667 what's inside the encryption. Now, before you rush off and think about implementing 00:12:18.667 --> 00:12:22.470 this, I should warn you that this is really at this point just theoretical, in 00:12:22.470 --> 00:12:26.422 the sense that running a Google search on encryption data probably would take a 00:12:26.422 --> 00:12:30.521 billion years. But nevertheless, just the fact that this is doable is already really 00:12:30.521 --> 00:12:34.473 surprising, and is already quite useful for relatively simple computations. So in 00:12:34.473 --> 00:12:38.671 fact, we'll see some applications of this later on. The other magical application I 00:12:38.671 --> 00:12:42.474 want to show you is what's called zero knowledge. And in particular, I'll tell 00:12:42.474 --> 00:12:46.080 you about something called the zero knowledge proof of knowledge. So here ... 00:12:46.080 --> 00:12:50.177 what happens is there's a certain number N, which Alice knows. And the way 00:12:50.177 --> 00:12:54.169 the number N was constructed is as a product of two large primes. So, imagine 00:12:54.169 --> 00:12:58.835 here we have two primes, P and Q. Each one you can think of it as like 1000 digits. 00:12:58.835 --> 00:13:03.892 And you probably know that multiplying two 1000-digit numbers is fairly easy. But if 00:13:03.892 --> 00:13:08.235 I just give you their product, figuring out their factorization into primes is 00:13:08.235 --> 00:13:12.427 actually quite difficult. And, in fact, we're gonna use the fact that factoring is 00:13:12.427 --> 00:13:16.566 difficult to build public key cryptosystems in the second half of the course. 00:13:16.566 --> 00:13:20.968 Okay, so Alice happens to have this number N, and she also knows the factorization of 00:13:20.968 --> 00:13:24.898 N. Now Bob just has the number N. He doesn't actually know the factorization. 00:13:24.898 --> 00:13:28.723 Now, the magical fact about the zero knowledge proof of knowledge, is that 00:13:28.723 --> 00:13:33.144 Alice can prove to Bob that she knows the factorization of N. Yes, you can actually 00:13:33.144 --> 00:13:37.457 give this proof to Bob, that Bob can check, and become convinced that Alice 00:13:37.457 --> 00:13:42.386 knows the factorization of N, however Bob learns nothing at all. About the factors P 00:13:42.386 --> 00:13:47.034 and Q, and this is provable. Bob absolutely learns nothing at all about the 00:13:47.034 --> 00:13:50.997 factors P and Q. And the statement actually is very, very general. This is 00:13:50.997 --> 00:13:55.275 not just about proving the factorization of N. In fact, almost any puzzle that you 00:13:55.275 --> 00:13:59.606 want to prove that you know an answer to, you can prove it is your knowledge. So if 00:13:59.606 --> 00:14:03.831 you have a crossword puzzle that you've solved. Well, maybe crosswords is not the 00:14:03.831 --> 00:14:07.845 best example. But if you have like a Sudoku puzzle, for example, that you want 00:14:07.845 --> 00:14:12.282 to prove that you've solved, you can prove it to Bob in a way that Bob would learn 00:14:12.282 --> 00:14:16.718 nothing at all about the solution, and yet Bob would be convinced that you really do 00:14:16.718 --> 00:14:20.930 have a solution to this puzzle. Okay. So those are kind of magical applications. 00:14:20.930 --> 00:14:25.000 And so the last thing I want to say is that modern cryptography is a very 00:14:25.000 --> 00:14:29.015 rigorous science. And in fact, every concept we're gonna describe is gonna 00:14:29.015 --> 00:14:33.129 follow three very rigorous steps, okay, and we're gonna see these three steps 00:14:33.129 --> 00:14:37.338 again and again and again so I wanna explain what they are. So the first thing 00:14:37.338 --> 00:14:41.493 we're gonna do when we introduce a new primitive like a digital signature is 00:14:41.493 --> 00:14:45.540 we're gonna specify precisely what the threat model is. That is, what can an 00:14:45.540 --> 00:14:49.534 attacker do to attack a digital signature and what is his goal in forging 00:14:49.534 --> 00:14:53.851 signatures? Okay, so we're gonna define exactly what does it mean for a signature 00:14:53.851 --> 00:14:57.760 for example to be unforgeable. Unforgeable. Okay and I'm giving digital 00:14:57.760 --> 00:15:01.998 signatures just as an example. For every primitive we describe we're gonna 00:15:01.998 --> 00:15:06.464 precisely define what the threat model is. Then we're gonna propose a construction 00:15:06.464 --> 00:15:10.931 and then we're gonna give a proof that any attacker that's able to attack the 00:15:10.931 --> 00:15:15.955 construction under this threat model. That attacker can also be used to solve some 00:15:15.955 --> 00:15:20.150 underlying hard problem. And as a result, if the problem really is hard, that 00:15:20.150 --> 00:15:24.350 actually proves that no attacker can break the construction under the threat model. 00:15:24.350 --> 00:15:27.843 Okay. But these three steps are actually quite important. In the case of 00:15:27.843 --> 00:15:31.928 signatures, we'll define what it means for a signature to be, forgeable, then we'll 00:15:31.928 --> 00:15:35.914 give a construction, and then for example we'll say that anyone who can break our 00:15:35.914 --> 00:15:39.801 construction can then be used to say factor integers, which is believed to be a 00:15:39.801 --> 00:15:43.541 hard problem. Okay, so we're going to follow these three steps throughout, and 00:15:43.541 --> 00:15:47.331 then you'll see how this actually comes about. Okay, so this is the end of the 00:15:47.331 --> 00:15:51.218 segment. And then in the next segment we'll talk a little bit about the history 00:15:51.218 --> 00:15:52.006 of cryptography.