36C3 preroll music Herald: So, our next talk is practical cache attacks from the network. And the speaker, Michael Kurth, is the person who discovered the attack it’s the first attack of its type. So he’s the first author of the paper. And this talk is going to be amazing! We’ve also been promised a lot of bad cat puns, so I’m going to hold you to that. A round of applause for Michael Kurth! applaus Michael: Hey everyone and thank you so much for making it to my talk tonight. My name is Michael and I want to share with you the research that I was able to conduct at the amazing VUSec group during my master thesis. Briefly to myself: So I pursued my masthers degree in Computer Science at ETH Zürich and could do my Master’s thesis in Amsterdam. Nowadays, I work as a security analyst at infoGuard. So what you see here are the people that actually made this research possible. These are my supervisors and research colleagues which supported me all the way along and put so much time and effort in the research. So these are the true rockstars behind this research. So, but let’s start with cache attacks. So, cache attacks are previously known to be local code execution attacks. So, for example, in a cloud setting here on the left-hand side, we have two VMs that basically share the hardware. So they’re time-sharing the CPU and the cache and therefore an attacker that controlls VM2 can actually attack VM1 via cache attack. Similarly, JavaScript. So, a malicious JavaScript gets served to your browser which then executes it and because you share the resource on your computer, it can also attack other processes. Well, this JavaScript thing gives you the feeling of a remoteness, right? But still, it requires this JavaScript to be executed on your machine to be actually effective. So we wanted to really push this further and have a true network cache attack. We have this basic setting where a client does SSH to a server and we have a third machine that is controlled by the attack. And as I will show you today, we can break the confidentiality of this SSH session from the third machine without any malicious software running either on the client or the server. Furthermore, the CPU on the server is not even involved in any of these cache attacks. So it’s just there and not even noticing that we actually leak secrets. So, let’s look a bit more closely. So, we have this nice cat doing an SSH session to the server and everytime the cat presses a key, one packet gets send to the server. So this is always true for interactive SSH sessions. Because, as it’s said in the name, it gives you this feeling of interactiveness. When we look a bit more under the hood what’s happening on the server, we see that these packages are actually activating the Last Level Cache. More to that also later into the talk. Now, the attacker in the same time launches a remote cache attack on the Last Level Cache by just sending network packets. And by this, we can actually leak arrival times of individual SSH packets. Now, you might ask yourself: “How would arrival times of SSH packets break the confidentiality of my SSH session?” Well, humans have distinct typing patterns. And here we see an example of a user typing the word “because”. And you see that typing e right after b is faster than for example c after e. And this can be generalised. And we can use this to launch a statistical analysis. So here on the orange dots, if we’re able to reconstruct these arrival times correctly—and what correctly means: we can reconstruct the exact times of when the user was typing—, we can then launch this statistical analysis on the inter-arrival timings. And therefore, we can leak what you were typing in your private SSH session. Sounds very scary and futuristic, but I will demistify this during my talk. So, alright! There is something I want to bringt up right here at the beginning: As per tradition and the ease of writing, you give a name to your paper. And if you’re following InfoSec twitter closely, you probably already know what I’m talking about. Because in our case, we named our paper NetCAT. Well, of course, it was a pun. In our case, NetCAT stands for “Network Cache Attack,” and as it is with humour, it can backfire sometime. And in our case, it backfired massively. And with that we caused like a small twitter drama this September. One of the most-liked tweets about this research was the one from Jake. These talks are great, because you can put the face to such tweets and yes: I’m this idiot. So let’s fix this! Intel acknowledged us with a bounty and also a CVE number, so from nowadays, we can just refer it with the CVE number. Or if that is inconvenient to you, during that twitter drama, somebody sent us like a nice little alternative name and also including a logo which actually I quite like. It’s called NeoCAT. Anyway, lessons learned on that whole naming thing. And so, let’s move on. Let’s get back to the actual interesting bits and pieces of our research! So, a quick outline: I’m firstly going to talk about the background, so general cache attacks. Then DDIO and RDMA which are the key technologies that we were abusing for our remote cache attack. Then about the attack itself, how we reverse-engineered DDIO, the End-to-End attack, and, of course, a small demo. So, cache attacks are all about observing a microarchitectural state which should be hidden from software. And we do this by leveraging shared resources to leak information. An analogy here is: Safe cracking with a stethoscope, where the shared resource is actually air that just transmits the sound noises from the lock on different inputs that you’re doing. And actually works quite similarly in computers. But here, it’s just the cache. So, caches solve the problem that latency of loads from memory are really bad, right? Which make up roughly a quarter of all instructions. And with caches, we can reuse specific data and also use spatial locality in programs. Modern CPUs have usually this 3-layer cache hierarchy: L1, which is split between data and instruction cache. L2, and then L3, which is shared amongst the cores. If data that you access is already in the cache, that results in a cache hit. And if it has to be fetched from main memory, that’s considered a cache miss. So, how do we actually know now if a cache hits or misses? Because we cannot actually read data directly from the caches. We can do this, for example, with prime and probe. It’s a well-known technique that we actually also used in the network setting. So I want to quickly go through what’s actually happening. So the first step of prime+probe is that the hacker brings the cache to a known state. Basically priming the cache. So it fills it with its own data and then the attacker waits until the victim accesses it. The last step is then probing which is basically doing priming again, but this time just timing the access times. So, fast access cache hits are meaning that the cache was not touched in-between. And cache misses results in, that we known now, that the victim actually accessed one of the cache lines in the time between prime and probe. So what can we do with these cache hits and misses now? Well: We can analyse them! And these timing information tell us a lot about the behaviour of programs and users. And based on cache hits and misses alone, we can—or researchers were able to—leak crypto keys, guess visited websites, or leak memory content. That’s with SPECTRE and MELTDOWN. So let’s see how we can actually launch such an attack over the network! So, one of the key technologies is DDIO. But first, I want to talk to DMA, because it’s like the predecessor to it. So DMA is basically a technology that allows your PCIe device, for example the network card, to interact directly on itself with main memory without the CPU interrupt. So for example if a packet is received, the PCIe device then just puts it in main memory and then, when the program or the application wants to work on that data, then it can fetch from main memory. Now with DDIO, this is a bit different. With DDIO, the PCIe device can directly put data into the Last Level Cache. And that’s great, because now the application, when working on the data, just doesn’t have to go through the costly main-memory walk and can just directly work on the data from—or fetch it from—the Last Level Cache. So DDIO stands for “Data Direct I/O Technology,” and it’s enabled on all Intel server-grade processors since 2012. It’s enabled by default and transparent to drivers and operating systems. So I guess, most people didn’t even notice that something changed unter the hood. And it changed somethings quite drastically. But why is DDIO actually needed? Well: It’s for performance reasons. So here we have a nice study from Intel, which shows on the bottom, different times of NICs. So we have a setting with 2 NICs, 4 NICs, 6, and 8 NICs. And you have the throughput for it. And as you can see with the dark blue, that without DDIO, it basically stops scaling after having 4 NICs. With the light-blue you then see that it still scales up when you add more netowork cards to it. So DDIO is specifically built to scale network applications. The other technology that we were abusing is RDMA. So stands for “Remote Direct Memory Access,” and it basically offloads transport-layer tasks to silicon. It’s basically a kernel bypass. And it’s also no CPU involvement, so application can access remote memory without consuming any CPU time on the remote server. So I brought here a little illustration to showcase you the RDMA. So on the left we have the initiator and on the right we have the target server. A memory region gets allocated on startup of the server and from now on, applications can perform data transfer without the involvement of the network software stack. So you made the TCP/IP stack completely. With one- sided RDMA operations you even allow the initiator to read and write to arbitrary offsets within that allocated space on the target. I quote here a statement of the market leader of one of these high performance snakes: “Moreover, the caches of the remote CPU will not be filled with the accessed memory content.” Well, that’s not true anymore with DDIO and that’s exactly what we attacked on. So you might ask yourself, “where is this RDMA used,” right? And I can tell you that RDMA is one of these technologies that you don’t hear often but are actually extensively used in the backends of the big data centres and cloud infrastructures. So you can get your own RDMA-enabled infrastructures from public clouds like Azure, Oracle Cloud, Huawei, or AliBaba. Also file protocols use SMB… like SMB and NFS can support RDMA. And other applications are HIgh Performance Computing, Big Data, Machine Learning, Data Centres, Clouds, and so on. But let’s get a bit into detail about the research and how we abused the 2 technologies. So we know now that we have a Shared Resource exposed to the network via DDIO and RDMA gives us the necessary Read and Write primitives to launch such a cache attack over the network. But first, we needed to clarify some things. Of course, we did many experiments and extensively tested the DDIO port to understand the inner workings. But here, I brought with me like 2 major questions which we had to answer. So first of all is, of course, can we distinguish a cache hit or miss over the network? But we still have network latency and packet queueing and so on. So would it be possible to actually get the timing right? Which is an absolute must for launching a side- channel. Well, the second question is then: Can we actually access the full Last Level Cache? This would correspond more to the attack surface that we actually have for attack. So the first question, we can answer with this very simple experiment: So we have on the left, a very small code snippet. We have a timed RDMA read to a certain offset. Then we write to that offset and we read again from the offset. So what you can see is that, when doing this like 50 000 times over multiple different offsets, you can clearly distinguish the two distributions. So the blue one corresponds to data that was fetched from my memory and the orange one to the data that was fetched from the Last Level Cache over the network. You can also see the effects of the network. For example, you can see the long tails which correspond to some packages that were slowed down in the network or were queued. So on a sidenote here for all the side- channel experts: We really need that write, because actually with DDIO reads do not allocate anything in the Last Level Cache. So basically, this is the building block to launch a prime and probe attack over the network. However, we still need to have a target what we can actually profile. So let’s see what kind of an attack surface we actually have. Which brings us to the question: Can we access the full Last Level Cache? And unfortunately, this is not the case. So DDIO has this allocation limitation of two ways. Here in the example out of 20 ways. So roughly 10%. It’s not a dedicated way, so still the CPU uses this. But we would only have like access to 10% of the cache activity of the CPU in the Last Level bit. So that was not so well working for a first attack. But the good news is that other PCIe devices—let’s say a second network card—will also use the same two cache ways. And with that, we have 100% visibility of what other PCIe devices are doing in the cache. So let’s look at the end-to-end attack! So as I told you before, we have this basic setup of a client and a server. And we have the machine that is controlled by us, the attackers. So the client just sends this package over a normal ethernet NIC and there is a second NIC attached to the server which allows the attacker to launch RDMA operations. So we also know now that all the packets that… or all the keystrokes that the user is typing are sent in individual packets which are activated in the Last Level Cache through DDIO. But how can we actually now get these arrival times of packets? Because that’s what we are interested in! So now we have to look a bit more closely to how such arrival of network packages actually work. So the IP stack has a ring buffer which is basically there to have an asynchronous operation between the hardware—so the NIC—and the CPU. So if a packet arrives, it will allocate this in the first ring buffer position. On the right-hand side you see the view of the attacker which can just profile the cache activity. And we see that the cache line at position 1 lights up. So we see an activity there. Could also be on cache line 2, that’s … we don’t know on which cache line this will actually pop up. But what is important is: What happens with the second packet? Because the second packet will also light up a cache line, but this time different. And it’s actually the next cache line as from the previous package. And if we do this for 3 and 4 packets, we can see that we suddenly have this nice staircase pattern. So now we have predictable pattern that we can exploit to get information when packets were received. And this is just because the ring buffer is allocated in a way that it doesn’t evict itself, right? It doesn’t evict if packet 2 arrives. It doesn’t evict the cache content of the packet 1. Which is great for us as an attacker, because we can profile it well. Well, let’s look at the real-life example. So this is the cache activity when the server receives constant pings. You can see this nice staircase pattern and you can also see that the ring buffer reuses locations as it is a circular buffer. Here, it is important to know that the ring buffer doesn’t hold the data content, just the descriptor to the data. So this is reused. Unfortunately when the user types over SSH, the pattern is not as nice as this one here. Because then we would already have a done deal and just could work on this. Because when a user types, you will have more delays between packages. Generally also you don’t know when the user is typing, so you have to profile all the time to get the timings right. Therefore, we needed to build a bit more of a sophisticated pipeline. So it basically is a 2-stage pipeline which consists of an online tracker that is just looking at a bunch of cache lines that he’s observing all the time. And when he sees that certain cache lines were activated, it moves that windows forward the next position that he believes an activation will have. The reason why is that we have a speed advantage. So we need to profile much faster than the network packets of the SSH session are arriving. And what you can see here one the left- hand side is a visual output of what the online tracker does. So it just profiles this window which you can see in red. And if you look very closely, you can see also more lit-up in the middle which corresponds to arrived network packets. You can also see that there is plenty of noise involved, so therefore we’re not able just to directly get the packet arrival times from it. That’s why we need a second stage. The Offline Extractor. And the offline extractor is in charge of computing the most likeliest occurence of client SSH network packet. It uses the information from the online tracker and the predictable pattern of the ring buffer to do so. And then, it outputs the inter- packet arrival times for different words as shown here on the right. Great. So, now we’re again at the point where we have just packet arrival times but no words, which we need for breaking the confidentiality of your private SSH session. So, as I told you before, users or generally humans have distinctive typing patterns. And with that, we were able to launch a statistical attack. More closely, we just do like a machine learning of mapping between user typing behaviour and actual words. So that in the end, we can output the two words that you were typing in your SSH session. So we used 20 subjects that were typing free and transcribed text which resulted in a total of 4 574 unique words. And each represented as a point in a multi- dimensional space. And we used really simple machine learning techniques like the k-nearest neighbour’s algorithm which is basically categorising the measurements in terms of Euclidian space to other words. The reason why we just used like a very basic machine learning algorithm is that we just wanted to prove that the signal that we were extracting from the remote cache is actually strong enough to launch such an attack. So we didn’t want to improve in general, like, these kind of mapping between users and their typing behaviour. So let’s look how this worked out! So, firstly, on the left-hand side, you see we used our classifier on raw keyboard data. So means that we just used the signal that was emitted during the typing. So when they were typing on their local keyboard. Which gives us perfect and precise data timing. And we can see that this is already quite challenging to mount. So we have an accuracy of roughly 35%. But looking at the top 10 accuracy which is basically: the attacker can guess 10 words, and if the correct word was among these 10 words, then that’s considered to be accurate. And with the top 10 guesses, we have an accuracy of 58%. That’s just on the raw keyboard data. And then we used the same data and also the same classifier on the remote signal. And of course, this is less precise because we have noise factors and we could even add or miss out on keystrokes. And the accuracy is roughly 11% less and the top 10 accuracy is roughly 60%. So as we used a very basic machine learning algorithm, many subjects, and a relately large word corpus, we believe that we can showcase that the signal is strong enough to launch such attacks. So of course, now we want to see this whole thing working, right? As I’m a bit nervous here on stage, I’m not going to do a live demo because it would involve me doing some typing which probably would confuse myself and of course also the machine-learning model. Therefore, I brought a video with me. So here on the right-hand side, you see the victim. So it will shortly begin with doing an SSH session. And then on the left-hand side, you see the attacker. So mainly on the bottom you see this online tracker and on top you see the extractor and hopefully the predicted words. So now the victim starts this SSH session to the server called “father.” And the attacker, which is on the machine “son,” launches now this attack. So you saw we profiled the ring buffer location and now the victim starts to type. And as this pipeline takes a bit to process this words and to predict the right thing, you will shortly see, like slowly, the words popping up in the correct—hopefully the correct—order. And as you can see, we can correctly guess the right words over the network by just sending network package to the same server. And with that, getting out the crucial information of when such SSH packets were arrived. applause So now you might ask yourself: How do you mitigate against these things? Well, luckily it’s just server-grade processors, so no clients and so on. But then, from our viewpoint, the only true mitigation at the moment is to either disable DDIO or don’t use RDMA. Both comes quite with the performance impact. So DDIO, you will talk roughly about 10-18% less performance, depending, of course, on your application. And if you decide just to don’t use RDMA, you probably rewrite your whole application. So, Intel on their publication on Disclosure Day sounded a bit different therefore. But read it for yourself! I mean, the meaning “untrusted network” can, I guess, be quite debatable. And yeah. But it is what it is. So I’m very proud that we got accepted at Security and Privacy 2020. Also, Intel acknowledged our findings, public disclosure was in September, and we also got a bug bounty payment. someone cheering in crowd laughs Increased peripheral performance has forced Intel to place the Last Level Cache on the fast I/O path in its processors. And by this, it exposed even more shared microarchitectural components which we know by now have a direct security impact. Our research is the first DDIO side- channel vulnerability but we still believe that we just scratched the surface with it. Remember: There’s more PCIe devices attached to them! So there could be storage devices—so you could profile cache activity of storage devices and so on! There is even such things as GPUDirect which gives you access to the GPU’s cache. But that’s a whole other story. So, yeah. I think there’s much more to discover on that side and stay tuned with that! All is left to say is a massive “thank you” to you and, of course, to all the volunteers here at the conference. Thank you! applause Herald: Thank you, Michael! We have time for questions. So you can line up behind the microphones. And I can see someone at microphone 7! Question: So, thank you for your talk! I had a question about—when I’m working on a remote machine using SSH, I’m usually not typing nice words like you’ve shown, but usually it’s weird bash things like dollar signs, and dashes, and I don’t know. Have you looked into that as well? Michael: Well, I think … I mean, of course: What we would’ve wanted to showcase is that we could leak passwords, right? If you would do “sudo” or whatsoever. The thing with passwords is that it’s kind of its own dynamic. So you type key… passwords differently than you type normal keywords. And then it gets a bit difficult because when you want to do a large study of how users would type passwords, you either ask them for their real password—which is not so ethical anymore—or you train them different passwords. And that’s also difficult because they might adapt different style of how they type these passwords than if it were the real password. And of course, the same would go for command line in general and we just didn’t have, like, the word corpus for it to launch such an attack. Herald: Thank you! Microphone 1! Q: Hi. Thanks for your talk! I’d like to ask: the original SSH timing paper attacks, is like 2001? Michael: Yeah, exactly. Exactly! Q: And do you have some idea why there are no circumventions on the side of SSH clients to add some padding or some random delays or something like that? Do you have some idea why there’s nothing happening there? Is it some technical reason or what’s the deal? Michael: So, we also were afraid that between 2001 and nowadays, that they added some kind of a delay or batching or whatsoever. I’m not sure if it’s just a tradeoff between the interactiveness of your SSH session or if there’s, like, a true reason behind it. But what I do know is that it’s oftentimes quite difficult to add, like these artifical packets in- between. Because if it’s, like, not random at all, you could even filter out, like, additional packets that just get inserted by the SSH. But other than that, I’m not familiar with anything, why they didn’t adapt, or why this wasn’t on their radar. Herald: Thank you! Microphone 4. Q: How much do you rely on the skill of the typers? So I think of a user that has to search each letter on the keyboard or someone that is distracted while typing, so not having a real pattern behind the typing. Michael: Oh, we’re actually absolutely relying that the pattern is reducible. As I said: We’re just using this very simple machine learning algorithm that just looks at the Euclidian distance of previous words that you were typing and a new word or the new arrival times that we were observing. And so if that is completely different, then the accuracy would drop. Herald: Thank you! Microphone 8! Q: As a follow-up to what was said before. Wouldn’t this make it a targeted attack since you would need to train the machine- learning algorithm exactly for the person that you want to extract the data from? Michael: So, yeah. Our goal of the research was not, like, to do next-level, let’s say machine-learning type of recognition of your typing behaviours. So we actually used the information which user was typing so to profile that correctly. But still I think you could maybe generalize. So there is other research showing that you can categorize users in different type of typers and if I remember correctly, they came up that you can categorize each person into, like, 7 different typing, let’s say, categories. And I also know that some kind of online trackers are using your typing behaviour to re-identify you. So just to, like, serve you personalized ads, and so on. But still, I mean—we didn’t, like, want to go into that depth of improving the state of this whole thing. Herald: Thank you! And we’ll take a question from the Internet next! Signal angel: Did you ever try this with a high-latency network like the Internet? Michael: So of course, we rely on a—let’s say—a constant latency. Because otherwise it would basically screw up our timing attack. So as we’re talking with RDMA, which is usually in datacenters, we also tested it in datacenter kind of topologies. It would make it, I guess, quite hard, which means that you would have to do a lot of repetition which is actually bad because you cannot tell the users “please retype what you just did because I have to profile it again,” right? So yeah, the answer is: No. Herald: Thank you! Mic 1, please. Q: If the victim pastes something into the SSH session. Would you be able to carry out the attacks successfully? Michael: No. This is … so if you paste stuff, this is just sent out as a badge when you enter. Q: OK, thanks! Herald: Thank you! The angels tell me there is a person behind mic 6 whom I’m completely unable to see because of all the lights. Q: So as far as I understood, the attacker can only see that some package arrived on their NIC. So if there’s a second SSH session running simultaneously on the machine under attack, would this already interfere with this attack? Michael: Yeah, absolutely! So even distinguishing SSH packets from normal network packages is challenging. So we use kind of a heuristic here because the thing with SSH is that it always sends two packets right after. So not only 1, just 2. But I ommited this part because of simplicity of this talk. But we also rely on these kind of heuristics to even filter out SSH packets. And if you would have a second SSH session, I can imagine that this would completely… so we cannot distinguish which SSH session it was. Herald: Thank you. Mic 7 again! Q: You always said you were using two connectors, like—what was it called? NICs? Michael: Yes, exactly. Q: Is it has to be two different ones? Can it be the same? Or how does it work? Michael: So in our setting we used one NIC that has the capability of doing RDMA. So in our case, this was Fabric, so InfiniBand. And the other was just like a normal Ethernet connection. Q: But could it be the same or could it be both over InfiniBand, for example? Michael: Yes, I mean … the thing with InfiniBand: It doesn’t use the ring buffer so we would have to come up with a different kind of tracking ability to get this. Which could even get a bit more complicated because it does this kernel bypass. But if there’s a predictable pattern, we could potentially also do this. Herald: Thank you. Mic 1? Q: Hello again! I would like to ask, I know it was not the main focus of your study, but do you have some estimation how practical this can be, this timing attack? Like, if you do, like, real-world simulation, not the, like, prepared one? How big a problem can it really be? What would you think, like, what’s the state-of-the-art in this field? How do you feel the risk? Michael: You’re just referring to the typing attack, right? Q: Timing attack. SSH timing. Not necessarily the cache version. Michael: So, the original research that was conducted is out there since 2001. And since then, many researchers have showed that it’s possible to launch such typing attacks over different scenarios, for example JavaScript is another one. It’s always a bit difficult to judge because most of the researcher are using different datasets so it’s different to compare. But I think in general, I mean, we have used, like, quite a large word corpus and it still worked. Not super-precisely, but it still worked. So yeah, I do believe it’s possible. But to even make it a real-world attack where an attacker wants to have high accuracy, he probably would need a lot of data and even, like, more sophisticated techniques. Which there are. So there are a couple other of machine- learning techniques that you could use which have their pros and cons. Q: Thanks. Herald: Thank you! Ladies and Gentlemen—the man who named an attack netCAT: Michael Kurth! Give him a round of applause, please! applause Michael: Thanks a lot! 36C3 postscroll music Subtitles created by c3subtitles.de in the year 2020. Join, and help us!