1 00:00:00,000 --> 00:00:04,152 Now that we understand chosen plaintext security, let's build encryption schemes 2 00:00:04,152 --> 00:00:08,515 that are chosen plaintext secure. And the first such encryption scheme is going to 3 00:00:08,515 --> 00:00:12,510 be called cipher bock chaining. So here is how cipher block chaining works. 4 00:00:12,510 --> 00:00:16,610 Cipher block chaining is a way of using a block cipher to get chosen plaintext 5 00:00:16,610 --> 00:00:20,868 security. In particular, we are going to look at a mode called cipher block chaining 6 00:00:20,868 --> 00:00:25,021 with a random IV. CBC stands for cipher block chaning. So suppose we have a block 7 00:00:25,021 --> 00:00:28,963 cipher, so EB is a block cipher. So now let's define CBC to be the following 8 00:00:28,963 --> 00:00:33,248 encryption scheme. So the encryption algorithm when it's asked to encrypt a 9 00:00:33,248 --> 00:00:37,991 message m, the first thing it's going to do is it's going to choose a random IV that's 10 00:00:37,991 --> 00:00:41,958 exactly one block of the block cipher. So IV is one cypher block. 11 00:00:41,958 --> 00:00:46,035 So in the case of AES the IV would be 16 bytes. And then we're 12 00:00:46,035 --> 00:00:50,649 gonna run through the algorithm here, the IV basically that we chose is gonna be XORed 13 00:00:50,649 --> 00:00:54,726 to the first plain text block. And then the result is gonna be 14 00:00:54,726 --> 00:00:58,857 encrypted using the block cipher and output of the first block of the ciphertext. 15 00:00:58,857 --> 00:01:03,041 And now comes the chaining part where we actually use the first block of 16 00:01:03,041 --> 00:01:07,436 the ciphertext to kind of mask the second block of the plaintext. So we XOR 17 00:01:07,436 --> 00:01:11,588 the two together and the encryption of that becomes the second ciphertext block. 18 00:01:11,588 --> 00:01:15,535 And so on, and so on, and so forth. So this is cIpher block chaining, you can 19 00:01:17,559 --> 00:01:19,584 see that each cIpher block is chained and XORed into the next plaintext 20 00:01:19,584 --> 00:01:24,118 block, and the final ciphertext is going to be essentially the IV, the initial IV 21 00:01:24,118 --> 00:01:30,024 that we chose along with all the ciphertext blocks. I should say that IV stands 22 00:01:30,024 --> 00:01:35,795 for Initialization Vector. And we're going to be seeing that term used quite a bit, 23 00:01:35,795 --> 00:01:39,717 every time we need to pick something at random at the beginning of the encryption 24 00:01:39,717 --> 00:01:43,543 scheme typically we'll call that an IV for initialization vector. So you notice 25 00:01:43,543 --> 00:01:47,322 that the cIphertext is a little bit longer than the plain text because we had 26 00:01:47,322 --> 00:01:51,149 to include this IV in the cIphertexts which basically captures the randomness 27 00:01:51,149 --> 00:01:55,450 that was used during encryption. So the first question is how do we decrypt the 28 00:01:55,450 --> 00:02:00,226 results of CBC encryption, and so let me remind you again that if when we 29 00:02:00,226 --> 00:02:04,470 encrypt the first message block we XOR it with the IV, encrypt the 30 00:02:04,470 --> 00:02:09,187 result and that becomes the first ciphertext block. So let me ask you how would 31 00:02:09,187 --> 00:02:13,667 you decrypt that? So given the first ciphertext block, how would you recover 32 00:02:13,667 --> 00:02:17,915 the original first plaintext block? So decryption is actually very similar to 33 00:02:17,915 --> 00:02:21,660 encryption, here I wrote down the decryption circuit, you can see basically 34 00:02:21,660 --> 00:02:25,961 it's almost the same thing except the XOR is on the bottom, instead of on the top, and 35 00:02:25,961 --> 00:02:29,605 again you realize that essentially we chopped off the IV as part of the 36 00:02:29,605 --> 00:02:33,754 decryption process and we only output the original message back, the IV is dropped 37 00:02:33,754 --> 00:02:38,438 by the decryption algorithm. Okay, so the following theorem is going to show that in 38 00:02:38,438 --> 00:02:43,762 fact CBC mode encryption with a random IV is in fact semantically secure under a 39 00:02:43,762 --> 00:02:48,956 chosen plaintext attack, and so let's take that more precisely, basically if we 40 00:02:48,956 --> 00:02:54,083 start with a PRP, in other words, our block cipher E, that is defined over a 41 00:02:54,083 --> 00:02:59,079 space X, then we are gonna to end up with a encryption algorithm Ecbc that takes 42 00:02:59,079 --> 00:03:03,944 messages of length L and outputs ciphertexts of length L+1. And then 43 00:03:03,944 --> 00:03:09,324 suppose we have an adversary that makes q chosen plaintext queries. Then we can 44 00:03:09,324 --> 00:03:15,024 state the following security fact, that for every such adversary that's attacking 45 00:03:15,024 --> 00:03:20,184 Ecbc, to exist an adversary that's attacking the PRP, the block cipher, with 46 00:03:20,184 --> 00:03:24,926 the following relation between the two algorithms, in other words, the advantage 47 00:03:24,926 --> 00:03:29,851 of algorithm A against the encryption scheme is less than the advantage of algorithm B 48 00:03:29,851 --> 00:03:35,080 against the original PRP plus some noise term. So let me interpret this theorem for 49 00:03:35,080 --> 00:03:40,005 you as usual, so what this means is that essentially since E is a secure PRP this 50 00:03:40,005 --> 00:03:45,051 quantity here is negligible, and our goal is to say that adversary A's advantage is 51 00:03:45,051 --> 00:03:49,794 also negligible. However, here we are prevented from saying that because we got 52 00:03:49,794 --> 00:03:54,630 this extra error term. This is often called an error term and to argue that CBC 53 00:03:54,630 --> 00:03:59,676 is secure we have to make sure that the error term is also negligible. Because if 54 00:03:59,676 --> 00:04:04,474 both of these terms on the right are negligible, there sum is negligible and 55 00:04:04,474 --> 00:04:09,458 therefore the advantage of A against Ecbc would also be negligible. So this says 56 00:04:09,458 --> 00:04:14,565 that in fact for Ecbc to be secure it has better be the case that q squared L squared Is 57 00:04:14,565 --> 00:04:19,564 much, much, much smaller than the value X, so let me remind you what q and L are, so 58 00:04:19,564 --> 00:04:24,566 L is simply the length of the messages that we're encrypting. Okay, so L could be 59 00:04:24,566 --> 00:04:29,902 like say a 1000, which means that we are encrypting messages that are at most 1000 60 00:04:29,902 --> 00:04:35,303 AES blocks. q is the number of ciphertexts that the adversary gets to see under the 61 00:04:35,303 --> 00:04:40,770 CPA attack, but in real life what q is, is basically the number of times that we have 62 00:04:40,770 --> 00:04:46,041 used the key K to encrypt messages, in other words if we use a particular AES key to 63 00:04:46,041 --> 00:04:51,052 encrypt 100 messages, Q would be 100. It is because the adversary would then see 64 00:04:51,052 --> 00:04:56,224 at most 100 messages encrypted under this key K. Okay so lets see what this means in the real 65 00:04:56,224 --> 00:05:00,866 world. So here I've rewrote the error balance from the theorem. And just to remind 66 00:05:00,866 --> 00:05:05,093 you to use the messages encrypted with K and L with the lengths of the messages and so 67 00:05:05,093 --> 00:05:09,370 suppose we want the adversary's advantage to be less than one over two to the thirty 68 00:05:09,370 --> 00:05:13,346 two. This means that the error term had better be less than one over two to the 69 00:05:13,346 --> 00:05:17,853 thirty two. Okay, so let's look at AES and see what this mean. For AES, AES of course uses 70 00:05:17,853 --> 00:05:22,300 128 bit blocks, so X is going to be two to the 128, the 71 00:05:22,300 --> 00:05:26,363 size of X is going to be 2 to the 128, and if you 72 00:05:26,363 --> 00:05:30,865 plug this into the expression you see that basically the product q times L had 73 00:05:30,865 --> 00:05:35,477 better be less than two to the forty eight. This means that after we use a particular 74 00:05:35,477 --> 00:05:40,014 key to encrypt 2 to the 48 AES blocks we have to change the key. Okay, so 75 00:05:40,014 --> 00:05:46,966 essentially CBC stops being secure after the key is used to encrypt 2 to the 48 different AES blocks. 76 00:05:46,966 --> 00:05:49,572 So its kinda nice that the security theorem tells 77 00:05:49,572 --> 00:05:54,499 you exactly how long the key can be used and then how frequently, essentially, you have to 78 00:05:54,499 --> 00:05:59,575 replace the key. Now interestingly if you apply the same analogy to the 3DES it 79 00:05:59,575 --> 00:06:04,909 actually has a much shorter block, maybe only 64 bits, you see the key has to be 80 00:06:04,909 --> 00:06:10,485 changed much more frequently, maybe after every 65 thousand DES blocks, essentially you need to generate a new key. So 81 00:06:10,485 --> 00:06:15,275 this is one of the reasons why AES has a larger block size so that in fact modes 82 00:06:15,275 --> 00:06:20,240 like CBC would be more secure and one can use the keys for a longer period of time, before having 83 00:06:20,240 --> 00:06:24,796 to replace it. What this means is having to replace two to the sixteen blocks, 84 00:06:24,796 --> 00:06:29,586 each block of course is 8 bytes, so after you encrypt about half a megabyte of 85 00:06:29,586 --> 00:06:33,868 data you would have to change the DES key which is actually quite low. And you 86 00:06:33,868 --> 00:06:37,645 notice with AES you can encrypt quite a bit more data before you have to change the 87 00:06:37,645 --> 00:06:42,604 key. So I want to warn you about a very common mistake that people have made when 88 00:06:42,604 --> 00:06:47,627 using CBC with a random IV. That is that the minute that the attacker can predict 89 00:06:47,627 --> 00:06:52,712 the IV that you're going to be using for encrypting a particular message decipher 90 00:06:52,712 --> 00:06:57,797 this Ecbc is no longer CPA secure. So when using CBC with a random IV like we've 91 00:06:57,797 --> 00:07:02,246 just shown It's crucial that the IV is not predictable. But lets see an attack. So 92 00:07:02,246 --> 00:07:06,282 suppose it so happens that given a particular encryption in a message that 93 00:07:06,282 --> 00:07:10,695 attacker can actually predict that IV that will be used for the next message. Well 94 00:07:10,695 --> 00:07:14,839 let's show that in fact the resulting system is not CPA secure. So the first thing the 95 00:07:14,839 --> 00:07:19,197 adversary is going to do is, he is going to ask for the encryption of a one block 96 00:07:19,197 --> 00:07:23,449 message. In particular that one block is going to be zero. So what the adversary 97 00:07:23,449 --> 00:07:27,592 gets back is the encryption of one block, which namely is the encryption of 98 00:07:27,592 --> 00:07:31,748 the message namely zero, XOR the IV. Okay and of course the adversary also gets the 99 00:07:31,748 --> 00:07:35,877 IV. Okay so now the adversary by assumption can predict the IV that's gonna 100 00:07:35,877 --> 00:07:40,196 be used for the next encryption. Okay so let's say that IV is called, well IV. So 101 00:07:40,196 --> 00:07:44,460 next the adversary is going to issue his semantic security challenge and the 102 00:07:44,460 --> 00:07:49,167 message m0 is going to be the predicted IV XOR IV1 which was used in the encryption 103 00:07:49,167 --> 00:07:53,707 of c1. And the, the message of m1 is just going to be some other message, it doesn't 104 00:07:53,707 --> 00:07:58,248 really matter what it is. So now let's see what happens when the adversary receives 105 00:07:58,248 --> 00:08:02,346 the result of the semantic security challenge. Well, he is going to get the 106 00:08:02,346 --> 00:08:06,470 encryption of m0 or m1. So when the adversary receives the encryption of m0, 107 00:08:06,470 --> 00:08:10,800 tell me what is the actual plain text that is encrypted in the ciphertext c? 108 00:08:11,260 --> 00:08:17,368 Well so the answer is that what is actually encrypted is the message which is 109 00:08:17,368 --> 00:08:22,826 IV XOR IV1 XOR the IV that's used to encrypt that message which happens to be 110 00:08:22,826 --> 00:08:28,301 IV and this of course is IV1. So when the adversary receives the encryption of m0, 111 00:08:28,301 --> 00:08:33,167 he is actually receiving the block cipher encryption of IV1. And lo and behold, 112 00:08:33,167 --> 00:08:38,440 you'll notice that he already has that value from his chosen plaintext query. 113 00:08:38,440 --> 00:08:42,800 And then, when he is receiving the encryption of message m1, he just received 114 00:08:42,800 --> 00:08:47,825 a normal CBC encryption of the message m1. So you realize that now he has a simple 115 00:08:47,825 --> 00:08:53,057 way of breaking the scheme, namely what he'll do is he'll say, he's gonna ask, "Is the second 116 00:08:53,057 --> 00:08:58,354 block of the ciphertext c equal to the value that I received in my CPA query?" If 117 00:08:58,354 --> 00:09:03,843 so I'll say that I received the encryption of m0, otherwise I'll say that I received 118 00:09:03,843 --> 00:09:09,209 the encryption of m1. So really his test is c1 he refers to the second block 119 00:09:09,209 --> 00:09:14,441 of c and c11 refers to the second block of c1, if the two are equal he says zero, 120 00:09:14,441 --> 00:09:20,095 otherwise he says one. So the advantage of this adversary is going to be 1 and as a 121 00:09:20,095 --> 00:09:25,650 result, he completely breaks CPA security of this CBC encryption. So the lesson here 122 00:09:25,650 --> 00:09:30,334 is, if the IV is predictable then, in fact, there is no CPA security and 123 00:09:30,334 --> 00:09:35,621 unfortunately, this is actually a very common mistake in practice. In particular 124 00:09:35,621 --> 00:09:41,339 even in SSL protocol and in TLS 1.1, it turns out that IV for record number I is in fact 125 00:09:41,339 --> 00:09:46,363 the last ciphertext block of record I-1. That means that exactly given 126 00:09:46,363 --> 00:09:51,579 the encryption of record I-1, the adversary knows exactly what IV is going 127 00:09:51,579 --> 00:09:56,031 to be used as record number I. Very recently, just last summer, this was 128 00:09:56,031 --> 00:10:00,737 actually converted into a pretty devastating attack on SSL. We'll describe 129 00:10:00,737 --> 00:10:06,016 that attack once we talk about SSL in more detail, but for now I wanted to make sure 130 00:10:06,016 --> 00:10:12,371 you understand than when you use CBC encryption, its absolutely crucial that the IV be random. 131 00:10:12,371 --> 00:10:16,372 Okay, so now I going to show you the nonce based version of CBC encryption 132 00:10:16,372 --> 00:10:21,443 So in this mode the IV is replaced by non random but unique nonce 133 00:10:21,443 --> 00:10:23,928 for example the numbers 1,2,3,4,5, could all be used as a nonce, and now, the appeal of this mode 134 00:10:23,928 --> 00:10:25,246 is that if the recipient actually knows what the nonce is supposed to be 135 00:10:25,246 --> 00:10:25,880 then there's no reason to include the nonce in the ciphertext, in which case, the ciphertext 136 00:10:25,880 --> 00:10:26,197 is exactly the same length as the plaintext, unlike CBC with the random IV, 137 00:10:26,197 --> 00:10:26,276 where we had to expand the ciphertext to include the IV, here, if the nonce is already known to the recipient, 138 00:10:26,276 --> 00:10:26,315 there's no reason to include it in the ciphertext, and the ciphertext is exactly the same length as the plaintext. 139 00:10:26,315 --> 00:10:26,335 So it's perfectly fine to use a non-random but unique nonce. However, it's absolutely crucial to know that, 140 00:10:26,335 --> 00:10:26,345 if you do this, there's one more step that you have to do before you use the nonce in the CBC chain. 141 00:10:26,345 --> 00:10:26,355 In particular, in this mode now we're going to be using two independent keys, k and k1. 142 00:10:26,355 --> 00:10:26,434 The key k is, as before, going to be used to encrypt the individual message blocks, 143 00:10:26,434 --> 00:10:26,474 However, this key k1 is going to be used to encrypt the non-random but unique nonce, 144 00:10:26,474 --> 00:10:26,494 so that the output is going to be a random IV, which is then used in the CBC chain. 145 00:10:26,494 --> 00:10:26,504 So this extra step here, encrypting the nonce with the key k1, is absolutely crucial. 146 00:10:26,504 --> 00:10:26,509 Without it, CBC mode encryption would not be secure. 147 00:10:26,509 --> 00:10:26,514 However it if is going to be a counter you 148 00:10:26,514 --> 00:10:32,046 need to do one more step. Before actually encryption CBC and in particular you have 149 00:10:32,046 --> 00:10:37,380 to actually encrypt the notes to obtain the IV that will actually be used for 150 00:10:37,380 --> 00:10:42,919 encryption. The notes on CBC is similar to a random IV, the difference is that the 151 00:10:42,919 --> 00:10:48,047 notes is first encrypted and the results is that the IV is used in the CBC 152 00:10:48,047 --> 00:10:52,728 encryption Now the beauty of this mode is that the Nance doesn't necessarily have to 153 00:10:52,728 --> 00:10:56,975 be included in the cipher text. It only needs to be in there if its unknowns are 154 00:10:56,975 --> 00:11:01,116 the decrypter but it if the decrypter happens to already know the value of the 155 00:11:01,116 --> 00:11:05,310 counter by some other means then in fact the cipher text is only as big as the 156 00:11:05,310 --> 00:11:09,291 plain text. There's no extra value transmitted in the cipher text. And again, 157 00:11:09,291 --> 00:11:13,591 I warn that when you're using non spaced encryption, it's absolutely crucial that 158 00:11:13,591 --> 00:11:17,679 the key common Nance spare is only used for one message so for every message, 159 00:11:17,679 --> 00:11:22,028 either the Nance has changed or the key has changed. Okay, so here emphasize the 160 00:11:22,028 --> 00:11:26,499 fact that you need to do this extra encryption step before actual using the 161 00:11:26,499 --> 00:11:31,088 Nance. This is very common mistake that actually forgotten in practice and for 162 00:11:31,088 --> 00:11:35,795 example in TLS, this was not done and as a result there was a significant attack 163 00:11:35,795 --> 00:11:40,282 against CBC encryption in TLS. Remember the reason that this is so important to 164 00:11:40,282 --> 00:11:44,950 know is that in fact many crypto APIs are set up to almost deliberately mislead the 165 00:11:44,950 --> 00:11:49,451 user to using CBC incorrectly. So let's look to see how CBC implemented inside of 166 00:11:49,451 --> 00:11:53,840 open SSL. So here are the arguments of the function. Basically this is the plain 167 00:11:53,840 --> 00:11:58,119 text, this is the place where the cipher text will get written to. This is the 168 00:11:58,119 --> 00:12:02,760 length of the plain text. This is a, a Yes key Finally there is an argument here that 169 00:12:02,760 --> 00:12:06,438 says whether you are crypting or decrypting. And the most important 170 00:12:06,438 --> 00:12:10,884 parameter that I wanted to point out here is the actual IV and unfortunately, the 171 00:12:10,884 --> 00:12:15,330 user is asked to supply this IV and the function uses the IV directly in the CBC 172 00:12:15,330 --> 00:12:19,831 encryption mechanism. It doesn't encrypt the IV before using it and as a result, if 173 00:12:19,831 --> 00:12:24,332 you ever call this function using a non random IV, the resulting encryption system 174 00:12:24,332 --> 00:12:28,816 won't be CPA secure. Okay, so it's very important to know that when calling 175 00:12:28,816 --> 00:12:33,960 functions like this. Cbc encryption or open SSL either supply a truly random IV 176 00:12:33,960 --> 00:12:38,836 or if you want the IV to be a counter than you have to encrypt a counter using AAS 177 00:12:38,836 --> 00:12:43,668 before you actually call a CBC encrypt and you have to that yourself. So again, it's 178 00:12:43,668 --> 00:12:48,340 very important that the programmer knows that it needs to be done otherwise the CBC 179 00:12:48,340 --> 00:12:52,456 encryption is insecure. One last technicality about CBC is what to do when 180 00:12:52,456 --> 00:12:57,183 the message is not a multiple of the block cipher block length? That is what do we do 181 00:12:57,183 --> 00:13:01,689 if the last message block is shorter than the block length of AES, for example? So 182 00:13:01,689 --> 00:13:06,281 the last message block is less than sixteen bytes. And the answer is if we add 183 00:13:06,281 --> 00:13:11,586 a pad to the last block so it becomes as long as sixteen bytes, as long as the AES 184 00:13:11,586 --> 00:13:16,633 block size. And this pad, of course, if going to be removed during encryption. So 185 00:13:16,633 --> 00:13:21,873 here is a typical path, this is the path that is used in TLS. Basically a pad with 186 00:13:21,873 --> 00:13:26,919 N bytes then essentially what you do is you write the number N, N times. So for 187 00:13:26,919 --> 00:13:32,036 example if you pad with five bytes, you pad with the string 555555. So five bytes 188 00:13:32,036 --> 00:13:37,175 where each byte is the value five. And the key thing about this pad is basically when 189 00:13:37,175 --> 00:13:42,012 the decrypter receives the message, what he does is he looks at the last byte of 190 00:13:42,012 --> 00:13:46,970 the last block. So suppose that value is five, then he simply removes the last five 191 00:13:46,970 --> 00:13:51,818 bytes of the message. Now the question is what do we do if in fact the message is a 192 00:13:51,818 --> 00:13:56,262 multiple of sixteen bytes so in fact no pad is needed? If we don't pad at all, 193 00:13:56,262 --> 00:14:00,476 well that's a problem because the decrypter is going to look at the very 194 00:14:00,476 --> 00:14:05,267 last byte of the last block which is not part of the actual message and he's going 195 00:14:05,267 --> 00:14:10,000 to remove that many bytes from the plain text. So that actually would be a problem. 196 00:14:10,000 --> 00:14:15,363 So the solution is, if in fact there is no pad that's needed, nevertheless we still 197 00:14:15,363 --> 00:14:20,662 have to add a dummy block. And since we add the dummy block this would be a block 198 00:14:20,662 --> 00:14:25,830 that's basically contains sixteen bytes each one containing the number sixteen. 199 00:14:25,830 --> 00:14:30,042 Okay, so we add essentially sixteen dummy blocks. The decrypter, that when he's 200 00:14:30,042 --> 00:14:34,473 decrypting, he looks at the last byte of the last block, he sees that the value is 201 00:14:34,473 --> 00:14:38,823 sixteen, therefore he removes the entire block. And whatever's left is the actual 202 00:14:38,823 --> 00:14:42,975 plain text. So it's a bit unfortunate that in fact if you're encrypting short 203 00:14:42,975 --> 00:14:47,019 messages with CBC and the messages happen to be, say, 32 bytes, so they are a 204 00:14:47,019 --> 00:14:51,387 multiple of sixteen bytes, then you have to add one more block and make all these 205 00:14:51,387 --> 00:14:55,108 ciphertexts be 48 bytes just to accommodate the CBC padding. I should 206 00:14:55,108 --> 00:14:59,584 mention there's a variant of CBC called CBC with ciphertext stealing that actually 207 00:14:59,584 --> 00:15:03,790 avoids this problem, but I'm not gonna describe that here. If you're interested 208 00:15:03,790 --> 00:15:07,907 you can look that up online. Okay, so that's the end of our discussion of CBC 209 00:15:07,907 --> 00:15:12,198 and in the next segment we'll see how to use counter modes to encrypt multiple 210 00:15:12,198 --> 00:15:13,720 messages using a single key.