1 00:00:00,000 --> 00:00:19,500 36C3 preroll music 2 00:00:19,500 --> 00:00:26,220 Herald: So, our next talk is practical cache attacks from the network. And the 3 00:00:26,220 --> 00:00:33,960 speaker, Michael Kurth, is the person who discovered the attack it’s the first 4 00:00:33,960 --> 00:00:42,640 attack of its type. So he’s the first author of the paper. And this talk is 5 00:00:42,640 --> 00:00:47,470 going to be amazing! We’ve also been promised a lot of bad cat puns, so I’m 6 00:00:47,470 --> 00:00:52,750 going to hold you to that. A round of applause for Michael Kurth! 7 00:00:52,750 --> 00:00:58,690 applaus 8 00:00:58,690 --> 00:01:03,800 Michael: Hey everyone and thank you so much for making it to my talk tonight. My 9 00:01:03,800 --> 00:01:08,780 name is Michael and I want to share with you the research that I was able to 10 00:01:08,780 --> 00:01:15,659 conduct at the amazing VUSec group during my master thesis. Briefly to myself: So I 11 00:01:15,659 --> 00:01:20,260 pursued my masthers degree in Computer Science at ETH Zürich and could do my 12 00:01:20,260 --> 00:01:27,869 Master’s thesis in Amsterdam. Nowadays, I work as a security analyst at infoGuard. 13 00:01:27,869 --> 00:01:33,450 So what you see here are the people that actually made this research possible. 14 00:01:33,450 --> 00:01:37,869 These are my supervisors and research colleagues which supported me all the way 15 00:01:37,869 --> 00:01:43,500 along and put so much time and effort in the research. So these are the true 16 00:01:43,500 --> 00:01:50,990 rockstars behind this research. So, but let’s start with cache attacks. So, cache 17 00:01:50,990 --> 00:01:56,850 attacks are previously known to be local code execution attacks. So, for example, 18 00:01:56,850 --> 00:02:03,679 in a cloud setting here on the left-hand side, we have two VMs that basically share 19 00:02:03,679 --> 00:02:10,270 the hardware. So they’re time-sharing the CPU and the cache and therefore an 20 00:02:10,270 --> 00:02:18,120 attacker that controlls VM2 can actually attack VM1 via cache attack. Similarly, 21 00:02:18,120 --> 00:02:23,100 JavaScript. So, a malicious JavaScript gets served to your browser which then 22 00:02:23,100 --> 00:02:28,030 executes it and because you share the resource on your computer, it can also 23 00:02:28,030 --> 00:02:33,330 attack other processes. Well, this JavaScript thing gives you the feeling of 24 00:02:33,330 --> 00:02:39,340 a remoteness, right? But still, it requires this JavaScript to be executed on 25 00:02:39,340 --> 00:02:46,170 your machine to be actually effective. So we wanted to really push this further and 26 00:02:46,170 --> 00:02:54,060 have a true network cache attack. We have this basic setting where a client does SSH 27 00:02:54,060 --> 00:03:00,790 to a server and we have a third machine that is controlled by the attack. And as I 28 00:03:00,790 --> 00:03:08,360 will show you today, we can break the confidentiality of this SSH session from 29 00:03:08,360 --> 00:03:13,269 the third machine without any malicious software running either on the client or 30 00:03:13,269 --> 00:03:20,540 the server. Furthermore, the CPU on the server is not even involved in any of 31 00:03:20,540 --> 00:03:25,390 these cache attacks. So it’s just there and not even noticing that we actually 32 00:03:25,390 --> 00:03:34,689 leak secrets. So, let’s look a bit more closely. So, we have this nice cat doing 33 00:03:34,689 --> 00:03:41,409 an SSH session to the server and everytime the cat presses a key, one packet gets 34 00:03:41,409 --> 00:03:49,700 send to the server. So this is always true for interactive SSH sessions. Because, as 35 00:03:49,700 --> 00:03:56,530 it’s said in the name, it gives you this feeling of interactiveness. When we look a 36 00:03:56,530 --> 00:04:01,459 bit more under the hood what’s happening on the server, we see that these packages 37 00:04:01,459 --> 00:04:06,950 are actually activating the Last Level Cache. More to that also later into the 38 00:04:06,950 --> 00:04:13,349 talk. Now, the attacker in the same time launches a remote cache attack on the Last 39 00:04:13,349 --> 00:04:19,340 Level Cache by just sending network packets. And by this, we can actually leak 40 00:04:19,340 --> 00:04:28,020 arrival times of individual SSH packets. Now, you might ask yourself: “How would 41 00:04:28,020 --> 00:04:36,800 arrival times of SSH packets break the confidentiality of my SSH session?” Well, 42 00:04:36,800 --> 00:04:43,210 humans have distinct typing patterns. And here we see an example of a user typing 43 00:04:43,210 --> 00:04:50,460 the word “because”. And you see that typing e right after b is faster than for 44 00:04:50,460 --> 00:04:56,870 example c after e. And this can be generalised. And we can use this to launch 45 00:04:56,870 --> 00:05:03,960 a statistical analysis. So here on the orange dots, if we’re able to reconstruct 46 00:05:03,960 --> 00:05:10,530 these arrival times correctly—and what correctly means: we can reconstruct the 47 00:05:10,530 --> 00:05:16,270 exact times of when the user was typing—, we can then launch this statistical 48 00:05:16,270 --> 00:05:22,690 analysis on the inter-arrival timings. And therefore, we can leak what you were 49 00:05:22,690 --> 00:05:29,809 typing in your private SSH session. Sounds very scary and futuristic, but I will 50 00:05:29,809 --> 00:05:36,580 demistify this during my talk. So, alright! There is something I want to 51 00:05:36,580 --> 00:05:42,730 bringt up right here at the beginning: As per tradition and the ease of writing, you 52 00:05:42,730 --> 00:05:48,180 give a name to your paper. And if you’re following InfoSec twitter closely, you 53 00:05:48,180 --> 00:05:53,930 probably already know what I’m talking about. Because in our case, we named our 54 00:05:53,930 --> 00:06:00,740 paper NetCAT. Well, of course, it was a pun. In our case, NetCAT stands for 55 00:06:00,740 --> 00:06:08,560 “Network Cache Attack,” and as it is with humour, it can backfire sometime. And in 56 00:06:08,560 --> 00:06:17,830 our case, it backfired massively. And with that we caused like a small twitter drama 57 00:06:17,830 --> 00:06:24,400 this September. One of the most-liked tweets about this research was the one 58 00:06:24,400 --> 00:06:32,889 from Jake. These talks are great, because you can put the face to such tweets and 59 00:06:32,889 --> 00:06:42,599 yes: I’m this idiot. So let’s fix this! Intel acknowledged us with a bounty and 60 00:06:42,599 --> 00:06:48,720 also a CVE number, so from nowadays, we can just refer it with the CVE number. Or 61 00:06:48,720 --> 00:06:54,479 if that is inconvenient to you, during that twitter drama, somebody sent us like 62 00:06:54,479 --> 00:06:59,800 a nice little alternative name and also including a logo which actually I quite 63 00:06:59,800 --> 00:07:09,240 like. It’s called NeoCAT. Anyway, lessons learned on that whole naming thing. And 64 00:07:09,240 --> 00:07:15,250 so, let’s move on. Let’s get back to the actual interesting bits and pieces of our 65 00:07:15,250 --> 00:07:22,460 research! So, a quick outline: I’m firstly going to talk about the background, so 66 00:07:22,460 --> 00:07:28,240 general cache attacks. Then DDIO and RDMA which are the key technologies that we 67 00:07:28,240 --> 00:07:34,330 were abusing for our remote cache attack. Then about the attack itself, how we 68 00:07:34,330 --> 00:07:42,190 reverse-engineered DDIO, the End-to-End attack, and, of course, a small demo. So, 69 00:07:42,190 --> 00:07:47,050 cache attacks are all about observing a microarchitectural state which should be 70 00:07:47,050 --> 00:07:53,160 hidden from software. And we do this by leveraging shared resources to leak 71 00:07:53,160 --> 00:07:59,759 information. An analogy here is: Safe cracking with a stethoscope, where the 72 00:07:59,759 --> 00:08:06,300 shared resource is actually air that just transmits the sound noises from the lock 73 00:08:06,300 --> 00:08:11,990 on different inputs that you’re doing. And actually works quite similarly in 74 00:08:11,990 --> 00:08:21,949 computers. But here, it’s just the cache. So, caches solve the problem that latency 75 00:08:21,949 --> 00:08:28,389 of loads from memory are really bad, right? Which make up roughly a quarter of 76 00:08:28,389 --> 00:08:34,320 all instructions. And with caches, we can reuse specific data and also use spatial 77 00:08:34,320 --> 00:08:41,980 locality in programs. Modern CPUs have usually this 3-layer cache hierarchy: L1, 78 00:08:41,980 --> 00:08:47,041 which is split between data and instruction cache. L2, and then L3, which 79 00:08:47,041 --> 00:08:54,290 is shared amongst the cores. If data that you access is already in the cache, that 80 00:08:54,290 --> 00:08:58,780 results in a cache hit. And if it has to be fetched from main memory, that’s 81 00:08:58,780 --> 00:09:06,290 considered a cache miss. So, how do we actually know now if a cache hits or 82 00:09:06,290 --> 00:09:11,549 misses? Because we cannot actually read data directly from the caches. We can do 83 00:09:11,549 --> 00:09:15,700 this, for example, with prime and probe. It’s a well-known technique that we 84 00:09:15,700 --> 00:09:20,980 actually also used in the network setting. So I want to quickly go through what’s 85 00:09:20,980 --> 00:09:26,430 actually happening. So the first step of prime+probe is that the hacker brings the 86 00:09:26,430 --> 00:09:33,860 cache to a known state. Basically priming the cache. So it fills it with its own 87 00:09:33,860 --> 00:09:42,310 data and then the attacker waits until the victim accesses it. The last step is then 88 00:09:42,310 --> 00:09:49,040 probing which is basically doing priming again, but this time just timing the 89 00:09:49,040 --> 00:09:56,260 access times. So, fast access cache hits are meaning that the cache was not touched 90 00:09:56,260 --> 00:10:02,750 in-between. And cache misses results in, that we known now, that the victim 91 00:10:02,750 --> 00:10:10,270 actually accessed one of the cache lines in the time between prime and probe. So 92 00:10:10,270 --> 00:10:15,750 what can we do with these cache hits and misses now? Well: We can analyse them! And 93 00:10:15,750 --> 00:10:21,410 these timing information tell us a lot about the behaviour of programs and users. 94 00:10:21,410 --> 00:10:28,519 And based on cache hits and misses alone, we can—or researchers were able to—leak 95 00:10:28,519 --> 00:10:35,829 crypto keys, guess visited websites, or leak memory content. That’s with SPECTRE 96 00:10:35,829 --> 00:10:42,260 and MELTDOWN. So let’s see how we can actually launch such an attack over the 97 00:10:42,260 --> 00:10:50,550 network! So, one of the key technologies is DDIO. But first, I want to talk to DMA, 98 00:10:50,550 --> 00:10:55,420 because it’s like the predecessor to it. So DMA is basically a technology that 99 00:10:55,420 --> 00:11:02,010 allows your PCIe device, for example the network card, to interact directly on 100 00:11:02,010 --> 00:11:08,519 itself with main memory without the CPU interrupt. So for example if a packet is 101 00:11:08,519 --> 00:11:14,339 received, the PCIe device then just puts it in main memory and then, when the 102 00:11:14,339 --> 00:11:19,110 program or the application wants to work on that data, then it can fetch from main 103 00:11:19,110 --> 00:11:27,089 memory. Now with DDIO, this is a bit different. With DDIO, the PCIe device can 104 00:11:27,089 --> 00:11:33,110 directly put data into the Last Level Cache. And that’s great, because now the 105 00:11:33,110 --> 00:11:38,620 application, when working on the data, just doesn’t have to go through the costly 106 00:11:38,620 --> 00:11:43,910 main-memory walk and can just directly work on the data from—or fetch it from—the 107 00:11:43,910 --> 00:11:52,010 Last Level Cache. So DDIO stands for “Data Direct I/O Technology,” and it’s enabled 108 00:11:52,010 --> 00:11:58,560 on all Intel server-grade processors since 2012. It’s enabled by default and 109 00:11:58,560 --> 00:12:04,069 transparent to drivers and operating systems. So I guess, most people didn’t 110 00:12:04,069 --> 00:12:09,279 even notice that something changed unter the hood. And it changed somethings quite 111 00:12:09,279 --> 00:12:17,100 drastically. But why is DDIO actually needed? Well: It’s for performance 112 00:12:17,100 --> 00:12:23,489 reasons. So here we have a nice study from Intel, which shows on the bottom, 113 00:12:23,489 --> 00:12:29,090 different times of NICs. So we have a setting with 2 NICs, 4 NICs, 6, and 8 114 00:12:29,090 --> 00:12:35,750 NICs. And you have the throughput for it. And as you can see with the dark blue, 115 00:12:35,750 --> 00:12:42,850 that without DDIO, it basically stops scaling after having 4 NICs. With the 116 00:12:42,850 --> 00:12:47,890 light-blue you then see that it still scales up when you add more netowork cards 117 00:12:47,890 --> 00:12:56,770 to it. So DDIO is specifically built to scale network applications. The other 118 00:12:56,770 --> 00:13:02,250 technology that we were abusing is RDMA. So stands for “Remote Direct Memory 119 00:13:02,250 --> 00:13:08,750 Access,” and it basically offloads transport-layer tasks to silicon. It’s 120 00:13:08,750 --> 00:13:15,390 basically a kernel bypass. And it’s also no CPU involvement, so application can 121 00:13:15,390 --> 00:13:23,520 access remote memory without consuming any CPU time on the remote server. So I 122 00:13:23,520 --> 00:13:28,329 brought here a little illustration to showcase you the RDMA. So on the left we 123 00:13:28,329 --> 00:13:34,230 have the initiator and on the right we have the target server. A memory region 124 00:13:34,230 --> 00:13:39,670 gets allocated on startup of the server and from now on, applications can perform 125 00:13:39,670 --> 00:13:44,490 data transfer without the involvement of the network software stack. So you made 126 00:13:44,490 --> 00:13:52,779 the TCP/IP stack completely. With one- sided RDMA operations you even allow the 127 00:13:52,779 --> 00:13:59,740 initiator to read and write to arbitrary offsets within that allocated space on the 128 00:13:59,740 --> 00:14:06,880 target. I quote here a statement of the market leader of one of these high 129 00:14:06,880 --> 00:14:12,900 performance snakes: “Moreover, the caches of the remote CPU will not be filled with 130 00:14:12,900 --> 00:14:20,639 the accessed memory content.” Well, that’s not true anymore with DDIO and that’s 131 00:14:20,639 --> 00:14:28,540 exactly what we attacked on. So you might ask yourself, “where is this RDMA used,” 132 00:14:28,540 --> 00:14:33,749 right? And I can tell you that RDMA is one of these technologies that you don’t hear 133 00:14:33,749 --> 00:14:38,780 often but are actually extensively used in the backends of the big data centres and 134 00:14:38,780 --> 00:14:45,509 cloud infrastructures. So you can get your own RDMA-enabled infrastructures from 135 00:14:45,509 --> 00:14:52,550 public clouds like Azure, Oracle Cloud, Huawei, or AliBaba. Also file protocols 136 00:14:52,550 --> 00:14:59,230 use SMB… like SMB and NFS can support RDMA. And other applications are HIgh 137 00:14:59,230 --> 00:15:07,320 Performance Computing, Big Data, Machine Learning, Data Centres, Clouds, and so on. 138 00:15:07,320 --> 00:15:12,810 But let’s get a bit into detail about the research and how we abused the 2 139 00:15:12,810 --> 00:15:19,339 technologies. So we know now that we have a Shared Resource exposed to the network 140 00:15:19,339 --> 00:15:26,291 via DDIO and RDMA gives us the necessary Read and Write primitives to launch such a 141 00:15:26,291 --> 00:15:34,310 cache attack over the network. But first, we needed to clarify some things. Of 142 00:15:34,310 --> 00:15:39,320 course, we did many experiments and extensively tested the DDIO port to 143 00:15:39,320 --> 00:15:44,630 understand the inner workings. But here, I brought with me like 2 major questions 144 00:15:44,630 --> 00:15:50,420 which we had to answer. So first of all is, of course, can we distinguish a cache 145 00:15:50,420 --> 00:15:57,860 hit or miss over the network? But we still have network latency and packet queueing 146 00:15:57,860 --> 00:16:04,020 and so on. So would it be possible to actually get the timing right? Which is an 147 00:16:04,020 --> 00:16:09,040 absolute must for launching a side- channel. Well, the second question is 148 00:16:09,040 --> 00:16:14,240 then: Can we actually access the full Last Level Cache? This would correspond more to 149 00:16:14,240 --> 00:16:20,589 the attack surface that we actually have for attack. So the first question, we can 150 00:16:20,589 --> 00:16:26,640 answer with this very simple experiment: So we have on the left, a very small code 151 00:16:26,640 --> 00:16:33,180 snippet. We have a timed RDMA read to a certain offset. Then we write to that 152 00:16:33,180 --> 00:16:41,850 offset and we read again from the offset. So what you can see is that, when doing 153 00:16:41,850 --> 00:16:46,040 this like 50 000 times over multiple different offsets, you can clearly 154 00:16:46,040 --> 00:16:52,000 distinguish the two distributions. So the blue one corresponds to data that was 155 00:16:52,000 --> 00:16:58,149 fetched from my memory and the orange one to the data that was fetched from the Last 156 00:16:58,149 --> 00:17:03,250 Level Cache over the network. You can also see the effects of the network. For 157 00:17:03,250 --> 00:17:09,820 example, you can see the long tails which correspond to some packages that were 158 00:17:09,820 --> 00:17:16,430 slowed down in the network or were queued. So on a sidenote here for all the side- 159 00:17:16,430 --> 00:17:23,280 channel experts: We really need that write, because actually with DDIO reads do not 160 00:17:23,280 --> 00:17:30,290 allocate anything in the Last Level Cache. So basically, this is the building block 161 00:17:30,290 --> 00:17:36,030 to launch a prime and probe attack over the network. However, we still need to 162 00:17:36,030 --> 00:17:40,500 have a target what we can actually profile. So let’s see what kind of an 163 00:17:40,500 --> 00:17:46,350 attack surface we actually have. Which brings us to the question: Can we access 164 00:17:46,350 --> 00:17:51,470 the full Last Level Cache? And unfortunately, this is not the case. So 165 00:17:51,470 --> 00:17:58,930 DDIO has this allocation limitation of two ways. Here in the example out of 20 ways. 166 00:17:58,930 --> 00:18:08,080 So roughly 10%. It’s not a dedicated way, so still the CPU uses this. But we would 167 00:18:08,080 --> 00:18:16,610 only have like access to 10% of the cache activity of the CPU in the Last Level bit. 168 00:18:16,610 --> 00:18:22,560 So that was not so well working for a first attack. But the good news is that 169 00:18:22,560 --> 00:18:31,760 other PCIe devices—let’s say a second network card—will also use the same two 170 00:18:31,760 --> 00:18:38,780 cache ways. And with that, we have 100% visibility of what other PCIe devices are 171 00:18:38,780 --> 00:18:48,690 doing in the cache. So let’s look at the end-to-end attack! So as I told you 172 00:18:48,690 --> 00:18:54,050 before, we have this basic setup of a client and a server. And we have the 173 00:18:54,050 --> 00:19:01,470 machine that is controlled by us, the attackers. So the client just sends this 174 00:19:01,470 --> 00:19:06,770 package over a normal ethernet NIC and there is a second NIC attached to the 175 00:19:06,770 --> 00:19:15,410 server which allows the attacker to launch RDMA operations. So we also know now that 176 00:19:15,410 --> 00:19:19,960 all the packets that… or all the keystrokes that the user is typing are 177 00:19:19,960 --> 00:19:25,540 sent in individual packets which are activated in the Last Level Cache through 178 00:19:25,540 --> 00:19:33,750 DDIO. But how can we actually now get these arrival times of packets? Because 179 00:19:33,750 --> 00:19:39,420 that’s what we are interested in! So now we have to look a bit more closely to how 180 00:19:39,420 --> 00:19:46,830 such arrival of network packages actually work. So the IP stack has a ring buffer 181 00:19:46,830 --> 00:19:52,960 which is basically there to have an asynchronous operation between the 182 00:19:52,960 --> 00:20:01,720 hardware—so the NIC—and the CPU. So if a packet arrives, it will allocate this in 183 00:20:01,720 --> 00:20:07,530 the first ring buffer position. On the right-hand side you see the view of the 184 00:20:07,530 --> 00:20:13,700 attacker which can just profile the cache activity. And we see that the cache line 185 00:20:13,700 --> 00:20:18,930 at position 1 lights up. So we see an activity there. Could also be on cache 186 00:20:18,930 --> 00:20:24,750 line 2, that’s … we don’t know on which cache line this will actually pop up. But 187 00:20:24,750 --> 00:20:29,200 what is important is: What happens with the second packet? Because the second 188 00:20:29,200 --> 00:20:35,380 packet will also light up a cache line, but this time different. And it’s actually 189 00:20:35,380 --> 00:20:41,760 the next cache line as from the previous package. And if we do this for 3 and 4 190 00:20:41,760 --> 00:20:51,310 packets, we can see that we suddenly have this nice staircase pattern. So now we 191 00:20:51,310 --> 00:20:56,940 have predictable pattern that we can exploit to get information when packets 192 00:20:56,940 --> 00:21:04,290 were received. And this is just because the ring buffer is allocated in a way that 193 00:21:04,290 --> 00:21:10,300 it doesn’t evict itself, right? It doesn’t evict if packet 2 arrives. It doesn’t 194 00:21:10,300 --> 00:21:16,660 evict the cache content of the packet 1. Which is great for us as an attacker, 195 00:21:16,660 --> 00:21:22,260 because we can profile it well. Well, let’s look at the real-life example. So 196 00:21:22,260 --> 00:21:28,010 this is the cache activity when the server receives constant pings. You can see this 197 00:21:28,010 --> 00:21:34,750 nice staircase pattern and you can also see that the ring buffer reuses locations 198 00:21:34,750 --> 00:21:40,650 as it is a circular buffer. Here, it is important to know that the ring buffer 199 00:21:40,650 --> 00:21:48,940 doesn’t hold the data content, just the descriptor to the data. So this is reused. 200 00:21:48,940 --> 00:21:55,520 Unfortunately when the user types over SSH, the pattern is not as nice as this 201 00:21:55,520 --> 00:22:00,000 one here. Because then we would already have a done deal and just could work on 202 00:22:00,000 --> 00:22:05,780 this. Because when a user types, you will have more delays between packages. 203 00:22:05,780 --> 00:22:11,470 Generally also you don’t know when the user is typing, so you have to profile all 204 00:22:11,470 --> 00:22:16,060 the time to get the timings right. Therefore, we needed to build a bit more 205 00:22:16,060 --> 00:22:23,880 of a sophisticated pipeline. So it basically is a 2-stage pipeline which 206 00:22:23,880 --> 00:22:31,520 consists of an online tracker that is just looking at a bunch of cache lines that 207 00:22:31,520 --> 00:22:37,990 he’s observing all the time. And when he sees that certain cache lines were 208 00:22:37,990 --> 00:22:44,300 activated, it moves that windows forward the next position that he believes an 209 00:22:44,300 --> 00:22:50,260 activation will have. The reason why is that we have a speed advantage. So we need 210 00:22:50,260 --> 00:22:57,090 to profile much faster than the network packets of the SSH session are arriving. 211 00:22:57,090 --> 00:23:00,710 And what you can see here one the left- hand side is a visual output of what the 212 00:23:00,710 --> 00:23:07,260 online tracker does. So it just profiles this window which you can see in red. And 213 00:23:07,260 --> 00:23:15,030 if you look very closely, you can see also more lit-up in the middle which 214 00:23:15,030 --> 00:23:19,690 corresponds to arrived network packets. You can also see that there is plenty of 215 00:23:19,690 --> 00:23:27,280 noise involved, so therefore we’re not able just to directly get the packet 216 00:23:27,280 --> 00:23:35,250 arrival times from it. That’s why we need a second stage. The Offline Extractor. And 217 00:23:35,250 --> 00:23:40,590 the offline extractor is in charge of computing the most likeliest occurence of 218 00:23:40,590 --> 00:23:46,010 client SSH network packet. It uses the information from the online tracker and 219 00:23:46,010 --> 00:23:52,451 the predictable pattern of the ring buffer to do so. And then, it outputs the inter- 220 00:23:52,451 --> 00:23:59,380 packet arrival times for different words as shown here on the right. Great. So, now 221 00:23:59,380 --> 00:24:04,900 we’re again at the point where we have just packet arrival times but no words, 222 00:24:04,900 --> 00:24:10,040 which we need for breaking the confidentiality of your private SSH 223 00:24:10,040 --> 00:24:19,260 session. So, as I told you before, users or generally humans have distinctive 224 00:24:19,260 --> 00:24:27,330 typing patterns. And with that, we were able to launch a statistical attack. More 225 00:24:27,330 --> 00:24:33,060 closely, we just do like a machine learning of mapping between user typing 226 00:24:33,060 --> 00:24:39,340 behaviour and actual words. So that in the end, we can output the two words that you 227 00:24:39,340 --> 00:24:48,090 were typing in your SSH session. So we used 20 subjects that were typing free and 228 00:24:48,090 --> 00:24:55,830 transcribed text which resulted in a total of 4 574 unique words. And each 229 00:24:55,830 --> 00:25:01,230 represented as a point in a multi- dimensional space. And we used really 230 00:25:01,230 --> 00:25:06,431 simple machine learning techniques like the k-nearest neighbour’s algorithm which 231 00:25:06,431 --> 00:25:11,960 is basically categorising the measurements in terms of Euclidian space to other 232 00:25:11,960 --> 00:25:17,550 words. The reason why we just used like a very basic machine learning algorithm is 233 00:25:17,550 --> 00:25:21,330 that we just wanted to prove that the signal that we were extracting from the 234 00:25:21,330 --> 00:25:26,590 remote cache is actually strong enough to launch such an attack. So we didn’t want 235 00:25:26,590 --> 00:25:32,910 to improve in general, like, these kind of mapping between users and their typing 236 00:25:32,910 --> 00:25:40,050 behaviour. So let’s look how this worked out! So, firstly, on the left-hand side, 237 00:25:40,050 --> 00:25:47,090 you see we used our classifier on raw keyboard data. So means that we just used 238 00:25:47,090 --> 00:25:52,880 the signal that was emitted during the typing. So when they were typing on their 239 00:25:52,880 --> 00:25:58,900 local keyboard. Which gives us perfect and precise data timing. And we can see that 240 00:25:58,900 --> 00:26:02,450 this is already quite challenging to mount. So we have an accuracy of 241 00:26:02,450 --> 00:26:09,500 roughly 35%. But looking at the top 10 accuracy which is basically: the attacker 242 00:26:09,500 --> 00:26:15,580 can guess 10 words, and if the correct word was among these 10 words, then that’s 243 00:26:15,580 --> 00:26:22,930 considered to be accurate. And with the top 10 guesses, we have an accuracy of 244 00:26:22,930 --> 00:26:30,750 58%. That’s just on the raw keyboard data. And then we used the same data and also 245 00:26:30,750 --> 00:26:35,730 the same classifier on the remote signal. And of course, this is less precise 246 00:26:35,730 --> 00:26:43,840 because we have noise factors and we could even add or miss out on keystrokes. And 247 00:26:43,840 --> 00:26:54,610 the accuracy is roughly 11% less and the top 10 accuracy is roughly 60%. So as we 248 00:26:54,610 --> 00:27:00,851 used a very basic machine learning algorithm, many subjects, and a relately 249 00:27:00,851 --> 00:27:07,600 large word corpus, we believe that we can showcase that the signal is strong enough 250 00:27:07,600 --> 00:27:15,470 to launch such attacks. So of course, now we want to see this whole thing working, 251 00:27:15,470 --> 00:27:21,030 right? As I’m a bit nervous here on stage, I’m not going to do a live demo because it 252 00:27:21,030 --> 00:27:27,630 would involve me doing some typing which probably would confuse myself and of 253 00:27:27,630 --> 00:27:34,060 course also the machine-learning model. Therefore, I brought a video with me. So 254 00:27:34,060 --> 00:27:39,890 here on the right-hand side, you see the victim. So it will shortly begin with 255 00:27:39,890 --> 00:27:45,480 doing an SSH session. And then on the left-hand side, you see the attacker. So 256 00:27:45,480 --> 00:27:51,260 mainly on the bottom you see this online tracker and on top you see the extractor 257 00:27:51,260 --> 00:27:58,080 and hopefully the predicted words. So now the victim starts this SSH session to 258 00:27:58,080 --> 00:28:04,720 the server called “father.” And the attacker, which is on the machine “son,” 259 00:28:04,720 --> 00:28:10,590 launches now this attack. So you saw we profiled the ring buffer location and now 260 00:28:10,590 --> 00:28:19,790 the victim starts to type. And as this pipeline takes a bit to process this words 261 00:28:19,790 --> 00:28:24,350 and to predict the right thing, you will shortly see, like slowly, the words 262 00:28:24,350 --> 00:28:41,600 popping up in the correct—hopefully the correct—order. And as you can see, we can 263 00:28:41,600 --> 00:28:48,010 correctly guess the right words over the network by just sending network package to 264 00:28:48,010 --> 00:28:53,620 the same server. And with that, getting out the crucial information of when such 265 00:28:53,620 --> 00:29:05,450 SSH packets were arrived. applause 266 00:29:05,450 --> 00:29:10,330 So now you might ask yourself: How do you mitigate against these things? Well, 267 00:29:10,330 --> 00:29:16,860 luckily it’s just server-grade processors, so no clients and so on. But then, from 268 00:29:16,860 --> 00:29:22,960 our viewpoint, the only true mitigation at the moment is to either disable DDIO or 269 00:29:22,960 --> 00:29:30,260 don’t use RDMA. Both comes quite with the performance impact. So DDIO, you will talk 270 00:29:30,260 --> 00:29:37,130 roughly about 10-18% less performance, depending, of course, on your application. 271 00:29:37,130 --> 00:29:42,640 And if you decide just to don’t use RDMA, you probably rewrite your whole 272 00:29:42,640 --> 00:29:50,500 application. So, Intel on their publication on Disclosure Day sounded a bit different 273 00:29:50,500 --> 00:30:00,430 therefore. But read it for yourself! I mean, the meaning “untrusted network” can, 274 00:30:00,430 --> 00:30:10,250 I guess, be quite debatable. And yeah. But it is what it is. So I’m very proud that 275 00:30:10,250 --> 00:30:17,420 we got accepted at Security and Privacy 2020. Also, Intel acknowledged our 276 00:30:17,420 --> 00:30:22,540 findings, public disclosure was in September, and we also got a bug bounty 277 00:30:22,540 --> 00:30:26,950 payment. someone cheering in crowd 278 00:30:26,950 --> 00:30:29,640 laughs Increased peripheral performance has 279 00:30:29,640 --> 00:30:36,550 forced Intel to place the Last Level Cache on the fast I/O path in its processors. 280 00:30:36,550 --> 00:30:43,250 And by this, it exposed even more shared microarchitectural components which we 281 00:30:43,250 --> 00:30:51,631 know by now have a direct security impact. Our research is the first DDIO side- 282 00:30:51,631 --> 00:30:55,730 channel vulnerability but we still believe that we just scratched the surface with 283 00:30:55,730 --> 00:31:03,320 it. Remember: There’s more PCIe devices attached to them! So there could be 284 00:31:03,320 --> 00:31:10,900 storage devices—so you could profile cache activity of storage devices and so on! 285 00:31:10,900 --> 00:31:20,419 There is even such things as GPUDirect which gives you access to the GPU’s cache. 286 00:31:20,419 --> 00:31:25,740 But that’s a whole other story. So, yeah. I think there’s much more to discover on 287 00:31:25,740 --> 00:31:33,090 that side and stay tuned with that! All is left to say is a massive “thank you” to 288 00:31:33,090 --> 00:31:38,480 you and, of course, to all the volunteers here at the conference. Thank you! 289 00:31:38,480 --> 00:31:46,970 applause 290 00:31:46,970 --> 00:31:52,740 Herald: Thank you, Michael! We have time for questions. So you can line up behind 291 00:31:52,740 --> 00:31:58,220 the microphones. And I can see someone at microphone 7! 292 00:31:58,220 --> 00:32:02,720 Question: So, thank you for your talk! I had a question about—when I’m working on a 293 00:32:02,720 --> 00:32:08,920 remote machine using SSH, I’m usually not typing nice words like you’ve shown, but 294 00:32:08,920 --> 00:32:13,750 usually it’s weird bash things like dollar signs, and dashes, and I don’t know. Have 295 00:32:13,750 --> 00:32:18,120 you looked into that as well? Michael: Well, I think … I mean, of 296 00:32:18,120 --> 00:32:22,230 course: What we would’ve wanted to showcase is that we could leak passwords, 297 00:32:22,230 --> 00:32:27,720 right? If you would do “sudo” or whatsoever. The thing with passwords is 298 00:32:27,720 --> 00:32:35,620 that it’s kind of its own dynamic. So you type key… passwords differently than you 299 00:32:35,620 --> 00:32:40,470 type normal keywords. And then it gets a bit difficult because when you want to do 300 00:32:40,470 --> 00:32:45,870 a large study of how users would type passwords, you either ask them for their 301 00:32:45,870 --> 00:32:51,030 real password—which is not so ethical anymore—or you train them different 302 00:32:51,030 --> 00:32:57,600 passwords. And that’s also difficult because they might adapt different style 303 00:32:57,600 --> 00:33:03,180 of how they type these passwords than if it were the real password. And of course, 304 00:33:03,180 --> 00:33:09,580 the same would go for command line in general and we just didn’t have, like, the 305 00:33:09,580 --> 00:33:13,050 word corpus for it to launch such an attack. 306 00:33:13,050 --> 00:33:18,880 Herald: Thank you! Microphone 1! Q: Hi. Thanks for your talk! I’d like to 307 00:33:18,880 --> 00:33:27,180 ask: the original SSH timing paper attacks, is like 2001? 308 00:33:27,180 --> 00:33:31,270 Michael: Yeah, exactly. Exactly! Q: And do you have some idea why there are 309 00:33:31,270 --> 00:33:37,650 no circumventions on the side of SSH clients to add some padding or some random 310 00:33:37,650 --> 00:33:41,980 delays or something like that? Do you have some idea why there’s nothing happening 311 00:33:41,980 --> 00:33:46,260 there? Is it some technical reason or what’s the deal? 312 00:33:46,260 --> 00:33:52,752 Michael: So, we also were afraid that between 2001 and nowadays, that they added 313 00:33:52,752 --> 00:33:59,360 some kind of a delay or batching or whatsoever. I’m not sure if it’s just a 314 00:33:59,360 --> 00:34:04,580 tradeoff between the interactiveness of your SSH session or if there’s, like, a 315 00:34:04,580 --> 00:34:09,450 true reason behind it. But what I do know is that it’s oftentimes quite difficult to 316 00:34:09,450 --> 00:34:15,649 add, like these artifical packets in- between. Because if it’s, like, not random 317 00:34:15,649 --> 00:34:21,389 at all, you could even filter out, like, additional packets that just get inserted 318 00:34:21,389 --> 00:34:27,289 by the SSH. But other than that, I’m not familiar with anything, why they didn’t 319 00:34:27,289 --> 00:34:34,770 adapt, or why this wasn’t on their radar. Herald: Thank you! Microphone 4. 320 00:34:34,770 --> 00:34:42,389 Q: How much do you rely on the skill of the typers? So I think of a user that has 321 00:34:42,389 --> 00:34:49,220 to search each letter on the keyboard or someone that is distracted while typing, 322 00:34:49,220 --> 00:34:56,520 so not having a real pattern behind the typing. 323 00:34:56,520 --> 00:35:01,900 Michael: Oh, we’re actually absolutely relying that the pattern is reducible. As 324 00:35:01,900 --> 00:35:06,640 I said: We’re just using this very simple machine learning algorithm that just looks 325 00:35:06,640 --> 00:35:11,820 at the Euclidian distance of previous words that you were typing and a new word 326 00:35:11,820 --> 00:35:17,260 or the new arrival times that we were observing. And so if that is completely 327 00:35:17,260 --> 00:35:24,440 different, then the accuracy would drop. Herald: Thank you! Microphone 8! 328 00:35:24,440 --> 00:35:29,120 Q: As a follow-up to what was said before. Wouldn’t this make it a targeted attack 329 00:35:29,120 --> 00:35:33,220 since you would need to train the machine- learning algorithm exactly for the person 330 00:35:33,220 --> 00:35:40,340 that you want to extract the data from? Michael: So, yeah. Our goal of the 331 00:35:40,340 --> 00:35:47,410 research was not, like, to do next-level, let’s say machine-learning type of 332 00:35:47,410 --> 00:35:53,510 recognition of your typing behaviours. So we actually used the information which 333 00:35:53,510 --> 00:36:01,310 user was typing so to profile that correctly. But still I think you could 334 00:36:01,310 --> 00:36:06,540 maybe generalize. So there is other research showing that you can categorize 335 00:36:06,540 --> 00:36:12,740 users in different type of typers and if I remember correctly, they came up that you 336 00:36:12,740 --> 00:36:20,260 can categorize each person into, like, 7 different typing, let’s say, categories. 337 00:36:20,260 --> 00:36:26,800 And I also know that some kind of online trackers are using your typing behaviour 338 00:36:26,800 --> 00:36:34,530 to re-identify you. So just to, like, serve you personalized ads, and so on. But 339 00:36:34,530 --> 00:36:41,400 still, I mean—we didn’t, like, want to go into that depth of improving the state of 340 00:36:41,400 --> 00:36:45,550 this whole thing. Herald: Thank you! And we’ll take a 341 00:36:45,550 --> 00:36:49,470 question from the Internet next! Signal angel: Did you ever try this with a 342 00:36:49,470 --> 00:36:56,240 high-latency network like the Internet? Michael: So of course, we rely on a—let’s 343 00:36:56,240 --> 00:37:02,740 say—a constant latency. Because otherwise it would basically screw up our timing 344 00:37:02,740 --> 00:37:09,290 attack. So as we’re talking with RDMA, which is usually in datacenters, we also 345 00:37:09,290 --> 00:37:15,940 tested it in datacenter kind of topologies. It would make it, I guess, 346 00:37:15,940 --> 00:37:20,620 quite hard, which means that you would have to do a lot of repetition which is 347 00:37:20,620 --> 00:37:25,510 actually bad because you cannot tell the users “please retype what you just did 348 00:37:25,510 --> 00:37:32,730 because I have to profile it again,” right? So yeah, the answer is: No. 349 00:37:32,730 --> 00:37:39,520 Herald: Thank you! Mic 1, please. Q: If the victim pastes something into the 350 00:37:39,520 --> 00:37:44,760 SSH session. Would you be able to carry out the attacks successfully? 351 00:37:44,760 --> 00:37:51,200 Michael: No. This is … so if you paste stuff, this is just sent out as a badge 352 00:37:51,200 --> 00:37:54,310 when you enter. Q: OK, thanks! 353 00:37:54,310 --> 00:37:59,920 Herald: Thank you! The angels tell me there is a person behind mic 6 whom I’m 354 00:37:59,920 --> 00:38:03,020 completely unable to see because of all the lights. 355 00:38:03,020 --> 00:38:08,410 Q: So as far as I understood, the attacker can only see that some package arrived on 356 00:38:08,410 --> 00:38:13,490 their NIC. So if there’s a second SSH session running simultaneously on the 357 00:38:13,490 --> 00:38:18,210 machine under attack, would this already interfere with this attack? 358 00:38:18,210 --> 00:38:23,910 Michael: Yeah, absolutely! So even distinguishing SSH packets from normal 359 00:38:23,910 --> 00:38:31,840 network packages is challenging. So we use kind of a heuristic here because the thing 360 00:38:31,840 --> 00:38:37,505 with SSH is that it always sends two packets right after. So not only 1, just 361 00:38:37,505 --> 00:38:43,800 2. But I ommited this part because of simplicity of this talk. But we also rely 362 00:38:43,800 --> 00:38:48,990 on these kind of heuristics to even filter out SSH packets. And if you would have a 363 00:38:48,990 --> 00:38:54,850 second SSH session, I can imagine that this would completely… so we cannot 364 00:38:54,850 --> 00:39:05,140 distinguish which SSH session it was. Herald: Thank you. Mic 7 again! 365 00:39:05,140 --> 00:39:11,760 Q: You always said you were using two connectors, like—what was it called? NICs? 366 00:39:11,760 --> 00:39:15,970 Michael: Yes, exactly. Q: Is it has to be two different ones? Can 367 00:39:15,970 --> 00:39:21,210 it be the same? Or how does it work? Michael: So in our setting we used one NIC 368 00:39:21,210 --> 00:39:27,461 that has the capability of doing RDMA. So in our case, this was Fabric, so 369 00:39:27,461 --> 00:39:31,950 InfiniBand. And the other was just like a normal Ethernet connection. 370 00:39:31,950 --> 00:39:36,910 Q: But could it be the same or could it be both over InfiniBand, for example? 371 00:39:36,910 --> 00:39:43,400 Michael: Yes, I mean … the thing with InfiniBand: It doesn’t use the ring buffer 372 00:39:43,400 --> 00:39:49,720 so we would have to come up with a different kind of tracking ability to get 373 00:39:49,720 --> 00:39:54,020 this. Which could even get a bit more complicated because it does this kernel 374 00:39:54,020 --> 00:39:58,730 bypass. But if there’s a predictable pattern, we could potentially also do 375 00:39:58,730 --> 00:40:03,730 this. Herald: Thank you. Mic 1? 376 00:40:03,730 --> 00:40:08,840 Q: Hello again! I would like to ask, I know it was not the main focus of your 377 00:40:08,840 --> 00:40:13,710 study, but do you have some estimation how practical this can be, this timing attack? 378 00:40:13,710 --> 00:40:20,050 Like, if you do, like, real-world simulation, not the, like, prepared one? 379 00:40:20,050 --> 00:40:23,190 How big a problem can it really be? What would you think, like, what’s 380 00:40:23,190 --> 00:40:27,170 the state-of-the-art in this field? How do you feel the risk? 381 00:40:27,170 --> 00:40:30,300 Michael: You’re just referring to the typing attack, right? 382 00:40:30,300 --> 00:40:34,330 Q: Timing attack. SSH timing. Not necessarily the cache version. 383 00:40:34,330 --> 00:40:40,500 Michael: So, the original research that was conducted is out there since 2001. And 384 00:40:40,500 --> 00:40:45,900 since then, many researchers have showed that it’s possible to launch such typing 385 00:40:45,900 --> 00:40:52,180 attacks over different scenarios, for example JavaScript is another one. It’s 386 00:40:52,180 --> 00:40:56,820 always a bit difficult to judge because most of the researcher are using different 387 00:40:56,820 --> 00:41:03,340 datasets so it’s different to compare. But I think in general, I mean, we have used, 388 00:41:03,340 --> 00:41:09,400 like, quite a large word corpus and it still worked. Not super-precisely, but it 389 00:41:09,400 --> 00:41:15,910 still worked. So yeah, I do believe it’s possible. But to even make it a real-world 390 00:41:15,910 --> 00:41:21,210 attack where an attacker wants to have high accuracy, he probably would need a 391 00:41:21,210 --> 00:41:25,950 lot of data and even, like, more sophisticated techniques. Which there are. 392 00:41:25,950 --> 00:41:29,970 So there are a couple other of machine- learning techniques that you could use 393 00:41:29,970 --> 00:41:34,180 which have their pros and cons. Q: Thanks. 394 00:41:34,180 --> 00:41:39,750 Herald: Thank you! Ladies and Gentlemen—the man who named an attack 395 00:41:39,750 --> 00:41:44,737 netCAT: Michael Kurth! Give him a round of applause, please! 396 00:41:44,737 --> 00:41:58,042 applause Michael: Thanks a lot! 397 00:41:57,048 --> 00:42:01,400 36C3 postscroll music 398 00:42:01,400 --> 00:42:16,000 Subtitles created by c3subtitles.de in the year 2020. Join, and help us!