1 00:00:00,000 --> 00:00:02,951 Before we start with the technical material, I wanna give you a quick 2 00:00:02,951 --> 00:00:06,487 overview of what cryptography is about and the different areas of cryptography. So 3 00:00:06,487 --> 00:00:10,487 the core of cryptography of course is secure communication that essentially 4 00:00:10,487 --> 00:00:14,539 consists of two parts. The first is secured key establishment and then how do 5 00:00:14,539 --> 00:00:18,697 we communicate securely once we have a shared key. We already said that secured 6 00:00:18,697 --> 00:00:22,854 key establishment essentially amounts to Alice and Bob sending messages to one 7 00:00:22,854 --> 00:00:26,906 another such that at the end of this protocol, there's a shared key that they 8 00:00:26,906 --> 00:00:30,906 both agree on, shared key K, and beyond that, beyond just a shared key, in fact 9 00:00:30,906 --> 00:00:35,274 Alice would know that she's talking to Bob and Bob would know that he's talking to 10 00:00:35,274 --> 00:00:39,964 Alice. But a poor attacker who listens in on this conversation has no idea what the 11 00:00:39,964 --> 00:00:44,011 shared key is. And we'll see how to do that later on in the course. Now once they 12 00:00:44,011 --> 00:00:47,657 have a shared key, they want exchange messages securely using this key, and 13 00:00:47,657 --> 00:00:51,698 we'll talk about encryption schemes that allow them to do that in such a way that 14 00:00:51,698 --> 00:00:55,491 an attacker can't figure out what messages are being sent back and forth. And 15 00:00:55,491 --> 00:00:59,630 furthermore an attacker cannot even tamper with this traffic without being detected. 16 00:00:59,630 --> 00:01:03,227 In other words, these encryption schemes provide both confidentiality and 17 00:01:03,227 --> 00:01:06,774 integrity. But cryptography does much, much, much more than just these two 18 00:01:06,774 --> 00:01:10,519 things. And I want to give you a few examples of that. So the first example I 19 00:01:10,519 --> 00:01:14,468 want to give you is what's called a digital signature. So a digital signature, 20 00:01:14,468 --> 00:01:18,892 basically, is the analog of the signature in the physical world. In the physical 21 00:01:18,892 --> 00:01:23,372 world, remember when you sign a document, essentially, you write your signature on 22 00:01:23,372 --> 00:01:27,740 that document and your signature is always the same. You always write the same 23 00:01:27,740 --> 00:01:32,164 signature on all documents that you wanna sign. In the digital world, this can't 24 00:01:32,164 --> 00:01:36,812 possibly work because if the attacker just obtained one signed document from me, he 25 00:01:36,812 --> 00:01:41,180 can cut and paste my signature unto some other document that I might not have 26 00:01:41,180 --> 00:01:45,247 wanted to sign. And so, it's simply not possible in a digital world that my 27 00:01:45,247 --> 00:01:49,590 signature is the same for all documents that I want to sign. So we're gonna talk 28 00:01:49,590 --> 00:01:53,830 about how to construct digital signatures in the second half of the course. It's 29 00:01:53,830 --> 00:01:58,123 really quite an interesting primitive and we'll see exactly how to do it. Just to 30 00:01:58,123 --> 00:02:02,098 give you a hint, the way digital signatures work is basically by making the 31 00:02:02,098 --> 00:02:06,232 digital signature via function of the content being signed. So an attacker who 32 00:02:06,232 --> 00:02:10,313 tries to copy my signature from one document to another is not gonna succeed 33 00:02:10,313 --> 00:02:14,541 because the signature. On the new document is not gonna be the proper function of the 34 00:02:14,541 --> 00:02:18,526 data in the new document, and as a result, the signature won't verify. And as I said, 35 00:02:18,526 --> 00:02:22,608 we're gonna see exactly how to construct digital signatures later on and then we'll 36 00:02:22,608 --> 00:02:27,193 prove that those constructions are secure. Another application of cryptography that I 37 00:02:27,193 --> 00:02:31,096 wanted to mention, is anonymous communication. So, here, imagine user 38 00:02:31,096 --> 00:02:35,828 Alice wants to talk to some chat server, Bob. And, perhaps she wants to talk about 39 00:02:35,828 --> 00:02:40,382 a medical condition, and so she wants to do that anonymously, so that the chat 40 00:02:40,382 --> 00:02:45,113 server doesn't actually know who she is. Well, there's a standard method, called a 41 00:02:45,113 --> 00:02:49,946 mixnet, that allows Alice to communicate over the public internet with Bob through 42 00:02:49,946 --> 00:02:54,856 a sequence of proxies such that at the end of the communication Bob has no idea who he 43 00:02:54,856 --> 00:02:59,537 just talked to. The way mixnets work is basically as Alice sends her messages 44 00:02:59,537 --> 00:03:03,818 to Bob through a sequence of proxies, these messages get encrypted and 45 00:03:03,818 --> 00:03:08,271 decrypted appropriately so that Bob has no idea who he talked to and the proxies 46 00:03:08,271 --> 00:03:12,724 themselves don't even know that Alice is talking to Bob, or that actually who is 47 00:03:12,724 --> 00:03:16,750 talking to whom more generally. One interesting thing about this anonymous 48 00:03:16,750 --> 00:03:20,498 communication channel is, this is bi-directional. In other words, even 49 00:03:20,498 --> 00:03:24,743 though Bob has no idea who he's talking to, he can still respond to Alice and 50 00:03:24,743 --> 00:03:29,153 Alice will get those messages. Once we have anonymous communication, we can build 51 00:03:29,153 --> 00:03:33,784 other privacy mechanisms. And I wanna give you one example which is called anonymous 52 00:03:33,784 --> 00:03:37,643 digital cash. Remember that in the physical world if I have a physical 53 00:03:37,643 --> 00:03:42,108 dollar, I can walk into a bookstore and buy a book and the merchant would have no 54 00:03:42,108 --> 00:03:46,876 idea who I am. The question is whether we can do the exact same thing in the digital 55 00:03:46,876 --> 00:03:50,963 world. In the digital world, basically, Alice might have a digital dollar, 56 00:03:50,963 --> 00:03:55,984 a digital dollar coin. And she might wanna spend that digital dollar at some online 57 00:03:55,984 --> 00:04:00,760 merchants, perhaps some online bookstore. Now, what we'd like to do is make it so 58 00:04:00,760 --> 00:04:05,539 that when Alice spends her coin at the bookstore, the bookstore would have no 59 00:04:05,539 --> 00:04:10,629 idea who Alice is. So we provide the same anonymity that we get from physical cash. 60 00:04:10,629 --> 00:04:15,470 Now the problem is that in the digital world, Alice can take the coin that she 61 00:04:15,470 --> 00:04:20,250 had, this one dollar coin, and before she spent it, she can actually make copies of it. 62 00:04:20,250 --> 00:04:24,086 And then all of a sudden instead of just having just one dollar coin now all 63 00:04:24,093 --> 00:04:27,936 of a sudden she has three dollar coins and they're all the same of course, and 64 00:04:27,936 --> 00:04:31,828 there's nothing preventing her from taking those replicas of a dollar coin and 65 00:04:31,828 --> 00:04:35,819 spending it at other merchants. And so the question is how do we provide anonymous 66 00:04:35,819 --> 00:04:39,849 digital cash? But at the same time, also prevent Alice from double spending the 67 00:04:39,849 --> 00:04:43,760 dollar coin at different merchants. In some sense there's a paradox here where 68 00:04:43,760 --> 00:04:47,879 anonymity is in conflict with security because if we have anonymous cash there's 69 00:04:47,879 --> 00:04:51,999 nothing to prevent Alice from double spending the coin and because the coin is 70 00:04:51,999 --> 00:04:56,244 anonymous we have no way of telling who committed this fraud. And so the question 71 00:04:56,244 --> 00:05:00,394 is how do we resolve this tension. And it turns out, it's completely doable. And 72 00:05:00,394 --> 00:05:04,757 we'll talk about anonymous digital cash later on. Just to give you a hint, I'll 73 00:05:04,757 --> 00:05:09,173 say that the way we do it is basically by making sure that if Alice spends the coin 74 00:05:09,173 --> 00:05:13,764 once then no one knows who she is, but if she spends the coin more than once, all of 75 00:05:13,764 --> 00:05:17,878 a sudden, her identity is completely exposed and then she could be subject to 76 00:05:17,878 --> 00:05:22,096 all sorts of legal problems. And so that's how anonymous digital cash would 77 00:05:22,096 --> 00:05:26,158 work at a high level and we'll see how to implement it later on in the course. 78 00:05:26,158 --> 00:05:30,219 Another application of cryptography has to do with more abstract protocols, but 79 00:05:30,219 --> 00:05:34,333 before I speak the general result, I want to give you two examples. So the 80 00:05:34,333 --> 00:05:38,343 first example has to do with election systems. So here's the election problem. 81 00:05:38,343 --> 00:05:42,656 Suppose ... we have two parties, party zero and party one. And voters vote for these 82 00:05:42,656 --> 00:05:47,101 parties. So for example, this voter could have voted for party zero, this voter voted for 83 00:05:47,101 --> 00:05:52,313 party one. And so on. So in this election, party zero got three votes and party two 84 00:05:52,313 --> 00:05:56,590 got two votes. So the winner of the election, of course, is party zero. But 85 00:05:56,590 --> 00:06:01,579 more generally, the winner of the election is the party who receives the majority of 86 00:06:01,579 --> 00:06:06,453 the votes. Now, the voting problem is the following. The voters would somehow like 87 00:06:06,453 --> 00:06:11,720 to compute the majority of the votes, but do it in such a way such that nothing else 88 00:06:11,720 --> 00:06:16,797 is revealed about their individual votes. Okay? So the question is how to do that? 89 00:06:16,797 --> 00:06:21,493 And to do so, we're gonna introduce an election center who's gonna help us 90 00:06:21,493 --> 00:06:26,633 compute the majority, but keep the votes otherwise secret. And what the parties 91 00:06:26,633 --> 00:06:32,027 will do is they will each send the funny encryption of their votes to the election 92 00:06:32,027 --> 00:06:36,949 center in such a way that at the end of the election, the election center is able 93 00:06:36,949 --> 00:06:41,615 to compute and output the winner of the election. However, other than the winner 94 00:06:41,615 --> 00:06:46,580 of the election, nothing else is revealed about the individual votes. The individual 95 00:06:46,580 --> 00:06:51,366 votes otherwise remain completely private. Of course the election center is also 96 00:06:51,366 --> 00:06:56,331 gonna verify that this voter for example is allowed to vote and that the voter has 97 00:06:56,331 --> 00:07:00,818 only voted once. But other than that information the election center and the 98 00:07:00,818 --> 00:07:05,484 rest of the world learned nothing else about that voter's vote other than the 99 00:07:05,484 --> 00:07:10,104 result of the election. So this is an example of a protocol that involves six 100 00:07:10,104 --> 00:07:14,430 parties. In this case, there are five voters in one election center. These 101 00:07:14,430 --> 00:07:19,417 parties compute amongst themselves. And at the end of the computation, the result of 102 00:07:19,417 --> 00:07:24,404 the election is known but nothing else is revealed about the individual inputs. Now 103 00:07:24,404 --> 00:07:29,156 a similar problem comes up in the context of private auctions. So in a private 104 00:07:29,156 --> 00:07:34,160 auction every bidder has his own bid that he wants to bid. And now suppose the 105 00:07:34,160 --> 00:07:39,356 auction mechanism that's being used is what's called a Vickrey auction where the 106 00:07:39,356 --> 00:07:45,287 definition of a Vickrey auction is that the winner is the highest bidder. But the 107 00:07:45,287 --> 00:07:50,099 amounts that the winner pays is actually the second highest bid. So he pays the 108 00:07:50,099 --> 00:07:54,850 second highest bid. Okay, so this is a standard auction mechanism called the 109 00:07:54,850 --> 00:08:00,028 Vickrey auction. And now what we'd like to do is basically enable the participants to 110 00:08:00,028 --> 00:08:04,779 compute, to figure out who the highest bidder is and how much he's supposed to 111 00:08:04,779 --> 00:08:09,165 pay, but other than that, all other information about the individual bids 112 00:08:09,165 --> 00:08:14,160 should remain secret. So for example, the actual amount that the highest bidder bid 113 00:08:14,160 --> 00:08:19,225 should remain secret. The only thing that should become public is the second highest 114 00:08:19,225 --> 00:08:23,526 bid and the identity of the highest bidder. So again now, the way we will do 115 00:08:23,526 --> 00:08:28,172 that is we'll introduce an auction center, and in a similar way, essentially, everybody 116 00:08:28,172 --> 00:08:32,588 will send their encrypted bids to the auction center. The auction center will 117 00:08:32,588 --> 00:08:37,119 compute the identity of the winner and in fact he will also compute the second 118 00:08:37,119 --> 00:08:41,822 highest bid but other than these two values, nothing else is revealed about the 119 00:08:41,822 --> 00:08:46,126 individual bids. Now, this is actually an example of a much more general problem 120 00:08:46,126 --> 00:08:50,264 called secure multi-party computation. Let me explain what secure multi-party 121 00:08:50,264 --> 00:08:54,618 computation is about. So here, basically abstractly, the participants have a secret 122 00:08:54,618 --> 00:08:58,649 inputs to themselves. So, in the case of an election, the inputs would be the 123 00:08:58,649 --> 00:09:02,787 votes. In the case of an auction, the inputs would be the secret bids. And then 124 00:09:02,787 --> 00:09:06,959 what they would like to do is compute some sort of a function of their inputs. 125 00:09:06,959 --> 00:09:10,840 Again, in the case of an election, the function's a majority. In the case of 126 00:09:10,840 --> 00:09:15,088 auction, the function happens to be the second highest, largest number among x one 127 00:09:15,088 --> 00:09:19,179 to x four. And the question is, how can they do that, such that the value of the 128 00:09:19,179 --> 00:09:23,375 function is revealed, but nothing else is revealed about the individual votes? So 129 00:09:23,375 --> 00:09:27,675 let me show you kind of a dumb, insecure way of doing it. What we do is introduce a 130 00:09:27,675 --> 00:09:31,774 trusted party. And then, this trusted authority basically collects individual 131 00:09:31,774 --> 00:09:36,223 inputs. And it kinda promises to keep the individual inputs secret, so that only it 132 00:09:36,223 --> 00:09:40,510 would know what they are. And then, it publishes the value of the function, to 133 00:09:40,510 --> 00:09:44,742 the world. So, the idea is now that the value of the function became public, but 134 00:09:44,742 --> 00:09:48,812 nothing else is revealed about the individual inputs. But, of course, you got 135 00:09:48,812 --> 00:09:52,990 this trusted authority that you got to trust, and if for some reason it's not 136 00:09:52,990 --> 00:09:57,168 trustworthy, then you have a problem. And so, there's a very central theorem in 137 00:09:57,168 --> 00:10:01,001 crypto and it really is quite a surprising fact. That says that any 138 00:10:01,001 --> 00:10:05,204 computation you'd like to do, any function F you'd like to compute, that you can 139 00:10:05,204 --> 00:10:09,302 compute with a trusted authority, you can also do without a trusted authority. 140 00:10:09,302 --> 00:10:13,559 Let me at a high level explain what this means. Basically, what we're gonna do, is 141 00:10:13,559 --> 00:10:17,816 we're gonna get rid of the authority. So the parties are actually not gonna send 142 00:10:17,816 --> 00:10:21,807 their inputs to the authority. And, in fact, there's no longer going to be an 143 00:10:21,807 --> 00:10:26,011 authority in the system. Instead, what the parties are gonna do, is they're gonna 144 00:10:26,011 --> 00:10:30,567 talk to one another using some protocol. Such that at the end of the protocol all 145 00:10:30,567 --> 00:10:34,890 of a sudden the value of the function becomes known to everybody. And yet 146 00:10:34,890 --> 00:10:39,390 nothing other than the value of the function is revealed. In other words, the 147 00:10:39,390 --> 00:10:43,639 individual inputs is still kept secret. But again there's no authority, there's 148 00:10:43,639 --> 00:10:47,867 just a way for them to talk to one another such that the final output is revealed. So 149 00:10:47,867 --> 00:10:51,846 this is a fairly general result, it's kind of a surprising fact that is at all 150 00:10:51,846 --> 00:10:56,024 doable. And in fact it is and towards the end of the class we'll actually see how to 151 00:10:56,024 --> 00:11:00,577 make this happen. Now, there are some applications of cryptography that I can't 152 00:11:00,577 --> 00:11:05,560 classify any other way other than to say that they are purely magical. Let me give 153 00:11:05,560 --> 00:11:10,240 you two examples of that. So the first is what's called privately outsourcing 154 00:11:10,240 --> 00:11:15,224 computation. So I'll give you an example of a Google search just to illustrate the 155 00:11:15,224 --> 00:11:20,329 point. So imagine Alice has a search query that she wants to issue. It turns out that 156 00:11:20,329 --> 00:11:25,434 there are very special encryption schemes such that Alice can send an encryption of 157 00:11:25,434 --> 00:11:30,368 her query to Google. And then because of the property of the encryption scheme 158 00:11:30,368 --> 00:11:35,304 Google can actually compute on the encrypted values without knowing what the 159 00:11:35,304 --> 00:11:40,368 plain texts are. So Google can actually run its massive search algorithm on the 160 00:11:40,368 --> 00:11:44,903 encrypted query and recover in encrypted results. Okay. Google will send the 161 00:11:44,903 --> 00:11:49,242 encrypted results back to Alice. Alice will decrypt and then she will receive the 162 00:11:49,242 --> 00:11:53,689 results. But the magic here is all Google saw was just encryptions of her queries 163 00:11:53,689 --> 00:11:57,493 and nothing else. And so, Google as a result has no idea what Alice just 164 00:11:57,493 --> 00:12:01,672 searched for and nevertheless Alice actually learned exactly what she 165 00:12:01,672 --> 00:12:05,812 wanted to learn. Okay so, these are magical kind of encryption schemes. They're 166 00:12:05,812 --> 00:12:09,985 fairly recent, this is only a new development from about two or three years 167 00:12:09,985 --> 00:12:14,436 ago, that allows us to compute on encrypted data, even though we don't really know 168 00:12:14,436 --> 00:12:18,667 what's inside the encryption. Now, before you rush off and think about implementing 169 00:12:18,667 --> 00:12:22,470 this, I should warn you that this is really at this point just theoretical, in 170 00:12:22,470 --> 00:12:26,422 the sense that running a Google search on encryption data probably would take a 171 00:12:26,422 --> 00:12:30,521 billion years. But nevertheless, just the fact that this is doable is already really 172 00:12:30,521 --> 00:12:34,473 surprising, and is already quite useful for relatively simple computations. So in 173 00:12:34,473 --> 00:12:38,671 fact, we'll see some applications of this later on. The other magical application I 174 00:12:38,671 --> 00:12:42,474 want to show you is what's called zero knowledge. And in particular, I'll tell 175 00:12:42,474 --> 00:12:46,080 you about something called the zero knowledge proof of knowledge. So here ... 176 00:12:46,080 --> 00:12:50,177 what happens is there's a certain number N, which Alice knows. And the way 177 00:12:50,177 --> 00:12:54,169 the number N was constructed is as a product of two large primes. So, imagine 178 00:12:54,169 --> 00:12:58,835 here we have two primes, P and Q. Each one you can think of it as like 1000 digits. 179 00:12:58,835 --> 00:13:03,892 And you probably know that multiplying two 1000-digit numbers is fairly easy. But if 180 00:13:03,892 --> 00:13:08,235 I just give you their product, figuring out their factorization into primes is 181 00:13:08,235 --> 00:13:12,427 actually quite difficult. And, in fact, we're gonna use the fact that factoring is 182 00:13:12,427 --> 00:13:16,566 difficult to build public key cryptosystems in the second half of the course. 183 00:13:16,566 --> 00:13:20,968 Okay, so Alice happens to have this number N, and she also knows the factorization of 184 00:13:20,968 --> 00:13:24,898 N. Now Bob just has the number N. He doesn't actually know the factorization. 185 00:13:24,898 --> 00:13:28,723 Now, the magical fact about the zero knowledge proof of knowledge, is that 186 00:13:28,723 --> 00:13:33,144 Alice can prove to Bob that she knows the factorization of N. Yes, you can actually 187 00:13:33,144 --> 00:13:37,457 give this proof to Bob, that Bob can check, and become convinced that Alice 188 00:13:37,457 --> 00:13:42,386 knows the factorization of N, however Bob learns nothing at all. About the factors P 189 00:13:42,386 --> 00:13:47,034 and Q, and this is provable. Bob absolutely learns nothing at all about the 190 00:13:47,034 --> 00:13:50,997 factors P and Q. And the statement actually is very, very general. This is 191 00:13:50,997 --> 00:13:55,275 not just about proving the factorization of N. In fact, almost any puzzle that you 192 00:13:55,275 --> 00:13:59,606 want to prove that you know an answer to, you can prove it is your knowledge. So if 193 00:13:59,606 --> 00:14:03,831 you have a crossword puzzle that you've solved. Well, maybe crosswords is not the 194 00:14:03,831 --> 00:14:07,845 best example. But if you have like a Sudoku puzzle, for example, that you want 195 00:14:07,845 --> 00:14:12,282 to prove that you've solved, you can prove it to Bob in a way that Bob would learn 196 00:14:12,282 --> 00:14:16,718 nothing at all about the solution, and yet Bob would be convinced that you really do 197 00:14:16,718 --> 00:14:20,930 have a solution to this puzzle. Okay. So those are kind of magical applications. 198 00:14:20,930 --> 00:14:25,000 And so the last thing I want to say is that modern cryptography is a very 199 00:14:25,000 --> 00:14:29,015 rigorous science. And in fact, every concept we're gonna describe is gonna 200 00:14:29,015 --> 00:14:33,129 follow three very rigorous steps, okay, and we're gonna see these three steps 201 00:14:33,129 --> 00:14:37,338 again and again and again so I wanna explain what they are. So the first thing 202 00:14:37,338 --> 00:14:41,493 we're gonna do when we introduce a new primitive like a digital signature is 203 00:14:41,493 --> 00:14:45,540 we're gonna specify precisely what the threat model is. That is, what can an 204 00:14:45,540 --> 00:14:49,534 attacker do to attack a digital signature and what is his goal in forging 205 00:14:49,534 --> 00:14:53,851 signatures? Okay, so we're gonna define exactly what does it mean for a signature 206 00:14:53,851 --> 00:14:57,760 for example to be unforgeable. Unforgeable. Okay and I'm giving digital 207 00:14:57,760 --> 00:15:01,998 signatures just as an example. For every primitive we describe we're gonna 208 00:15:01,998 --> 00:15:06,464 precisely define what the threat model is. Then we're gonna propose a construction 209 00:15:06,464 --> 00:15:10,931 and then we're gonna give a proof that any attacker that's able to attack the 210 00:15:10,931 --> 00:15:15,955 construction under this threat model. That attacker can also be used to solve some 211 00:15:15,955 --> 00:15:20,150 underlying hard problem. And as a result, if the problem really is hard, that 212 00:15:20,150 --> 00:15:24,350 actually proves that no attacker can break the construction under the threat model. 213 00:15:24,350 --> 00:15:27,843 Okay. But these three steps are actually quite important. In the case of 214 00:15:27,843 --> 00:15:31,928 signatures, we'll define what it means for a signature to be, forgeable, then we'll 215 00:15:31,928 --> 00:15:35,914 give a construction, and then for example we'll say that anyone who can break our 216 00:15:35,914 --> 00:15:39,801 construction can then be used to say factor integers, which is believed to be a 217 00:15:39,801 --> 00:15:43,541 hard problem. Okay, so we're going to follow these three steps throughout, and 218 00:15:43,541 --> 00:15:47,331 then you'll see how this actually comes about. Okay, so this is the end of the 219 00:15:47,331 --> 00:15:51,218 segment. And then in the next segment we'll talk a little bit about the history 220 00:15:51,218 --> 00:15:52,006 of cryptography.