In this segment, we're gonna construct authenticated encryption systems. Since we already have CPA secured encryption, and we have secure MACs, the natural question is whether we can combine the two somehow, in order to get authenticated encryption. And if, that's exactly what we're gonna do in this segment. Authenticated encryption was introduced in the year 2000, in two independent papers that I point to at the end of this module. But before then, many crytpolibraries provided an API that separately supported CPA secure encryption, and MAC-ing. So there was one function for implementing CPA secure encryption. For example, CBC with a random IV. And another function for implementing a MAC. And then every developer that wanted to implement encryption, had to, himself, call separately the CPA secure encryption scheme and the MAC scheme. In particular, every developer had to invent his own way of combining encryption and MAC-ing to provide some sort of authenticated encryption. But since the goals of combining encryption and MAC-ing wasn't well understood since authenticated encryption hasn't yet been defined, it wasn't really clear which combinations of encryption and MAC-ing are correct and which aren't. And so, every project as I said had to invent its own combination. And in fact, not all combinations were correct. And I can tell you that the most common mistake in software projects were basically incorrectly combining the encryption and integrity mechanisms. So hopefully, by the end of this module, you will know how to combine them correctly, and you won't be making these mistakes yourself. So let's look at some combinations of CPA secure encryption and MAC, that were introduced by different projects. So here are three examples. So, first of all, in all three examples, there's a separate key for encryption, and a separate key for MACing. These two keys are independent of one another, and both are generated at session setup time. And we're gonna see how to generate these two keys later on in the course. So the first example is the SSL protocol. So the way SSL combines encryption and MAC in the hope of achieving authenticated encryption is the following. Basically you take the plain text, m, and then you compute a MAC on the plain text, m. So you use your MAC key, kI, to compute tag for this message m. And then you can concatenate the tag to the message and then you encrypt the concatenation of the message and the tag and what comes out is the actual final cipher text. So that's option number one. The second option is what IPsec does. So here, you take the message. The first thing you do is you encrypt the message. And then, you compute a tag on the resulting cipher text. So you notice the tag itself is computed on the resulting cipher text. A third option is what the SSH protocol does. So here, the SSH takes the message, and encrypts it using a CPA secure encryption scheme. And then, to it, it concatenates a tag of the message. The difference between IPsec and SSH, is that in IPsec, the tag is computed over the cipher text, whereas, in SSH, the tag is computed over the message. And so these are three completely different ways of combining encryption and MAC. And the question is, which one of these is secure? So, I will let you think about this for a second, and then when we continue we'll do the analysis together. Okay. So let's start with the SSH method. So in the SSH method you notice that the tag is computed on the message and then concatenated in the clear to the cipher text. Now this is actually quite a problem because MACs themselves are not designed to provide confidentiality. MACs are only designed for integrity. And in fact, there's nothing wrong with a MAC that as part of the tag outputs a few bits of the plain text. Outputs a few bits of the message M. That would be a perfectly fine tag. And yet if we did that, that would completely break CPA security here, because some bits of the message are leaked in the cipher text. And so the SSH approach, even though the specifics of SSH are fine and the protocol itself is not compromised by this specific combination, generally it's advisable not to use this approach. Simply because the output of the MAC signing algorithm might leak bits of the message. So now let's look at SSL and IPsec. As it turns out, the recommended method actually is the IPsec method. Because it turns out no matter what CPA secure system and MAC key you use the combination is always gonna provide authenticated encryption. Now let me very, very briefly explain why. Basically what happens is once we encrypt the message well the message contents now is hidden inside the cipher text and now when we compute a tag of the cipher text basically we're locking, this tag locks the cipher text and makes sure no one can produce a different cipher text that would look valid. And as a result this approach ensures that any modifications to the cipher text will be detected by the decrypter simply because the MAC isn't gonna verify. As it turns out, for the SSL approach, there actually are kind of pathological examples, where you combine CPA secure encryption system with a secure MAC. And the result is vulnerable to a chosen cipher text attack, so that it does not actually provide authenticated encryption. And basically, the reason that could happen, is that there's some sort of a bad interaction between the encryption scheme and the MAC algorithm. Such that, in fact, there will be a chosen cipher text attack. So if you're designing a new project the recommendation now is to always use encrypt then MAC because that is secure no matter which CPA secure encryption and secure MAC algorithm you're combining. Now, just to set the terminology, the SSL method is sometimes called MAC-then-encrypt. And the IPsec method is called encrypt-then-MAC. The SSH method even though you're not supposed to use it, is called encrypt-and-MAC. Okay, so I'll often refer to encrypt-then-MAC, and MAC-then-encrypt to differentiate SSL and IPsec. Okay, so just to repeat what I've just said. The IPsec method encrypt-then-MAC always provides authenticated encryption. If you start from a CPA secure cipher and a secure MAC you will always get authenticated encryption. As I said, MAC-then-encrypt in fact, there are pathological cases where the result is vulnerable to CCA attacks and therefore does not provide authenticated encryption. However, the story's a little bit more interesting than that, in that, it turns out, if you're actually using randomized counter mode or randomized CBC, then it turns out, for those particular CPA secure encryption schemes, MAC-then-encrypt actually does provide authenticated encryption and therefore it is secure. In fact, there's even a more interesting twist here in that if you're using randomized counter mode. Then, it's enough that your MAC algorithm just be one time secure. It doesn't have to be a fully secure MAC. It just has to be secure when a key is used to encrypt a single message, okay? And when we talked about message integrity, we saw that there are actually much faster MACs that are one time secure than MACs that are fully secure. As a result, if you're using randomized counter mode MAC-then-encrypt could actually result in a more efficient encryption mechanism. However, I'm going to repeat this again. The recommendation is to use encrypt-then-MAC and we're going to see a number of attacks on systems that didn't use encrypt-then-MAC. And so just to make sure things are secure without you having to think too hard about this. Again, I am going to recommend that you always use encrypt-then-MAC. Now, once the concept of authenticated encryption became more popular, a number of standardized approaches for combining encryption and MAC turned up. And those were even standardized by the National Institute of Standards. So I'm just gonna mention three of these standards. Two of these were standardized by NIST. And these are called Galois counter mode and CBC counter mode. And so let me explain what they do. Galois counter mode basically uses counter mode encryption, so a randomized counter mode with a Carter-Wegman MAC, so a very fact Carter-Wegman MAC. And the way the Carter-Wegman MAC works in GCM is it's basically a hash function of the message that's being MACed. And then the result is encrypted using a PRF. Now this hash function in GCM is already quite fast to the point where the bulk of the running time of GCM is dominated by the counter mode encryption and it's even made more so in that Intel introduces a special instruction PCLMULQDQ specifically designed for the purpose of making the hash function in GCM run as fast as possible. Now CCM counter mode is another NIST standard. It uses a CBC MAC and then counter mode encryption. So this mechanism, you know, this uses MAC, then encrypt, like SSL does. So this is actually not the recommended way of doing things, but because counter mode encryption is used. This is actually a perfectly fine encryption mechanism. One thing that I'd like to point out about CCM, is that everything is based on AES. You notice, it's using AES for the CBC MAC, and it's using AES for the counter mode encryption. And as a result, CCM can be implemented with relatively little code. Cause all you need is an AES engine and nothing else. And because of this, CCM actually was adopted by the Wi-Fi alliance, and in fact, you're probably using CCM on a daily basis if you're using encrypted Wi-Fi 802.11i then you're basically using CCM to encrypt traffic between your laptop and the access point. There's another mode called a EAX that uses counter mode encryption, and then CMAC. So, again you notice encrypt-then-MAC and that's another fine mode to use. We'll do a comparison of all these modes in just a minute. Now I wanted to mention that first of all, all these modes are nonce-based. In other words, they don't use any randomness but they do take as input a nonce and the nonce has to be unique per key. In other words, as you remember, the pair (key, nonce) should never ever, ever repeat. But the nonce itself need not be random, so it's perfectly fine to use a counter, for example, as a nonce. And the other important point is that, in fact, all these modes are what's called authenticated encryption with associated data. This is an extension of authenticated encryption, that comes up very often in networking protocols. So the idea between AEAD is that, in fact, the message that's provided to the encryption mode is not intended to be fully encrypted. Only part of the message is intended to be encrypted, but all of the message is intended to be authenticated. A good example of this is a network packet. Think of like a IP packet where there's a header and then there's a payload. And typically the header is not gonna be encrypted. For example, the header might contain the destination of the packet, but then the header had better not be encrypted otherwise routers along the way wouldn't know where to route the packet. And so, typically the header is sent in the clear, but the payload, of course, is always encrypted, but what you'd like to do is have the header be authenticated. Not encrypted but authenticated. So this is exactly what these AEAD modes do. They will authenticate the header and then encrypt the payload. But the header and the payload are bound together in the authentication so they can't actually be separated. So this is not difficult to do. What happens is in these three modes GCM, CCM, and EAX, basically the MAC is applied to the entire data. But the encryption is only applied to the part of the data that needs to be encrypted. So I wanted to show you what an API to these authenticated encryption with associated data encryption schemes look like. So here's what it looks like in OpenSSL. For example, this is, an API for GCM. So what you do is you call the init function to initialize the encryption mode, and you notice you give it a key and the nonce. The nonce again, doesn't have to be random, but it has to be unique. And after initialization, you would call this encrypt function, where you see that you give it the associated data that's gonna be authenticated, but not encrypted. You give it the data, and it's gonna be both authenticated and encrypted. And it gives you back the full cipher text, which is an encryption of the data, but of course does not include the AEAD, because the AEAD is gonna be sent in the clear. So now that we understand this mode of encrypt-then-MAC, we can go back to the definition of MAC security and I can explain to you something that might have been a little obscure when we looked at that definition. So if you remember, one of the requirements that followed from our definition of secure MACs meant that given a message-MAC pair on a message M, the attacker cannot produce another tag on the same message M. In other words, even though the attacker already has a tag for the message M, he shouldn't be able to produce a new tag for the same message M. And it's really not clear, why does that matter? Who cares, if the adversary already has a tag on the message M, who cares if he can produce another tag? Well, it turns out if the MAC didn't have this property. In other words, given a message-MAC pair you can produce another MAC on the same message, then that MAC would result in an insecure encrypt-then-MAC mode. And so if we want our encrypt-then-MAC to have cipher text integrity, it's crucial that our MAC security would imply this strong notion of security, which, of course, it does because we defined it correctly. So let's see what would go wrong, if, in fact, it was easy to produce this type of forgery. So what I'll do is I'll show you a chosen cipher text attack on the resulting encrypt-then-MAC system. And since the system has a chosen cipher text attack on it, it necessarily means that it doesn't provide authenticated encryption. So let's see. So the adversary's gonnna start by sending two messages, M0 and M1. And he's gonna receive, as usual, the encryption of one of them, either the encryption of M0 or the encryption of M1. And since we're using encrypt-then-MAC, the adversary receives the cipher text we'll call it C0 and a MAC on the cipher text C0. Well now we said that given the MAC on a message the adversary can produce another MAC on the same message. So what he's gonna do is he's gonna produce another MAC on the message C0. Now he has a new cipher text (C0,T'), which is a perfectly valid cipher text. T' is a valid MAC of C0. Therefore, the adversary now can submit a chosen cipher text query on C' and this is a valid chosen cipher text query because it's different from C. It's a new cipher text. The poor challenger now is forced to decrypt this cipher text C' so he's going to send back the decryption of C'. It's a valid cipher text therefore the decryption of C prime is the message Mb but now the attacker just learned the value of B because he can test whether Mb is equal to M0 or MB is equal to M1. As a result he can just output B and he gets advantage one in defeating the scheme. And so again if our MAC security did not imply this property here. Then, there would be a chosen cipher text attack on encrypt-then-MAC. And therefore, it would not be secure. So the fact that we define MAC security correctly means that encrypt-then-MAC really does provide authenticated encryption. And throughout all the MACs that we discussed actually do satisfy this strong notion of unforgeability. So, interestingly, this is not the end of the story. So, as we said before the concept of authenticated encryption was introduced everyone was just combining MACs and encryption in various ways in the hope of achieving some authenticated encryption. After the notion of authenticated encryption became formalized and rigorous, people kind of started scratching their heads and said, hey, wait a minute. Maybe we can achieve authenticated encryption more efficiently than by combining a MAC and an encryption scheme. In fact, if you think about how this combination of MAC and encryption works, let's say we combine counter mode with CMAC, then for every block of plaintext, you first of all have to use your block cipher for counter mode, and then you have to use to your block cipher again, for the CBC-MAC. This means that if you're combining CPA secure encryption with a MAC, for every block of plaintext, you have to evaluate your block cipher twice, once for the MAC and once for the encryption scheme. So the natural question was, can we construct an authenticated encryption scheme directly from a PRP, such that we would have to only evaluate the PRP once per block? And it turns out the answer is yes, and there's this beautiful construction called OCB, that pretty much does everything you want, and is much faster than constructions that are separately built from an encryption and a MAC. So I wrote down, kind of a schematic of OCB. I don't want to explain it in detail. I'll just kind of explain it at a high level. So here we have our input plain text, here at the top. And you notice that, first of all, OCB is parallelizable, completely parallelizable. So every block can be encrypted separately of every other block. The other thing to notice is that as I promised, you only evaluate your block cipher once per plain text block. And then you evaluate it one more time at the end to build your authentication tag and then the overhead of OCB beyond just a block cipher is minimal. All you have to do is evaluate a certain very simple function P. The nonce goes into the P you notice, the key goes into this P and then there is a block counter that goes into this P. So you just evaluate this function P, twice for every block and you XOR the result before and after encryption using the block cipher and that's it. That's all you have to do and then you get a very fast and efficient authenticated encryption scheme built from a block cipher. So OCB actually has a nice security theorem associated with it and I am going to point to a paper on OCB when we get to end of this module where I list some further reading papers that you can take a look at. So you might be wondering if OCB is so much better than everything you've seen so far, all these three standards CCM, GCM and EAX why isn't OCB being used or why isn't OCB the standard? And the answer is a little sad. The primary answer that OCB is not being used is actually because of various patents. And I'll just leave it at that. So to conclude this section I wanted to show you some performance numbers. So here on the right I listed performance numbers for modes that you shouldn't be using. So this is for randomized counter mode, and this is for randomized CBC. And you can see also the performance of CBC MAC is basically the same as the performance of CBC encryption. Okay. Now here are the authenticated encryption modes, so these are the ones that you're supposed to using, these you're not supposed to be using on their own, right. These two, you should never ever use these two because they only provide CPA security, they don't actually provide security against active attacks. You're only supposed to use authenticated encryption for encryption. And so I listed performance numbers for the three standards. And let me remind you that GCM basically uses a very fast hash. And then it uses counter mode for actual encryption. And you can see that the overhead of GCM over counter mode is relatively small. CCM and EAX both use a block cipher based encryption and a block cipher based MAC. And as a result they're about twice as slow as counter modes. You see that OCB is actually the fastest of these, primarily because it only use the block cipher once per message block. So based on these performance numbers, you would think that GCM is exactly the right mode to always use. But it turns out if you're on the space constrained hardware, GCM is not ideal. Primarily because its implementation requires larger code than the other two modes. However, as I said, Intel specifically added instructions to speed up GCM mode. And as a result, implementing GCM on an Intel architecture takes very little code. But on other hardware platforms, say in smart cards or other constrained environments, the code size for implementing GCM would be considerably larger than for the other two modes. But if code size is not a constraint then GCM is the right mode to use. So to summarize this segment I want to say it one more time that when you want to encrypt messages you have to use an authenticated encryption mode and the recommended way to do it is to use one of the standards, namely one of these three modes for providing authenticated encryption. Don't implement the encryption scheme yourself. In other words don't implement encrypt-then-MAC yourself. Just use one of these three standards. Many crypto libraries now provide standard API's for these three modes and these are the one's you should be using and nothing else. In the next segment we're going to see what else can go wrong when you try to implement authenticated encryption by yourself.