0:00:00.000,0:00:13.245 Music 0:00:13.245,0:00:17.060 Herald Angel: We are here with a motto,[br]and the motto of this year is "Works For 0:00:17.060,0:00:21.670 Me" and I think, who many people, how many[br]people in here are programmmers? Raise[br] 0:00:21.670,0:00:28.700 your hands or shout or... Whoa, that's a[br]lot. Okay. So I think many of you will 0:00:28.700,0:00:38.990 work on x86. Yeah. And I think you assume[br]that it works, and that everything works 0:00:38.990,0:00:48.150 as intended And I mean: What could go[br]wrong? Our next talk, the first one today, 0:00:48.150,0:00:52.290 will be by Clémentine Maurice, who[br]previously was here with RowhammerJS, 0:00:52.290,0:01:01.740 something I would call scary, and Moritz[br]Lipp, who has worked on the Armageddon 0:01:01.740,0:01:09.820 exploit, back, what is it? Okay, so the[br]next... I would like to hear a really warm[br] 0:01:09.820,0:01:14.460 applause for the speakers for the talk[br]"What could what could possibly go wrong 0:01:14.460,0:01:17.280 with insert x86 instruction here?" 0:01:17.280,0:01:18.375 thank you. 0:01:18.375,0:01:28.290 Applause 0:01:28.290,0:01:32.530 Clémentine Maurice (CM): Well, thank you[br]all for being here this morning. Yes, this 0:01:32.530,0:01:38.080 is our talk "What could possibly go wrong[br]with insert x86 instructions here". So 0:01:38.080,0:01:42.850 just a few words about ourselves: So I'm[br]Clémentine Maurice, I got my PhD last year 0:01:42.850,0:01:47.119 in computer science and I'm now working as[br]a postdoc at Graz University of Technology 0:01:47.119,0:01:52.090 in Austria. You can reach me on Twitter or[br]by email but there's also I think a lots 0:01:52.090,0:01:56.670 of time before the Congress is over.[br]Moritz Lipp (ML): Hi and my name is Moritz 0:01:56.670,0:02:01.520 Lipp, I'm a PhD student at Graz University[br]of Technology and you can also reach me on 0:02:01.520,0:02:06.679 Twitter or just after our talk and in the[br]next days. 0:02:06.679,0:02:10.860 CM: So, about this talk: So, the title[br]says this is a talk about x86 0:02:10.860,0:02:17.720 instructions, but this is not a talk about[br]software. Don't leave yet! I'm actually 0:02:17.720,0:02:22.440 even assuming safe software and the point[br]that we want to make is that safe software 0:02:22.440,0:02:27.390 does not mean safe execution and we have[br]information leakage because of the 0:02:27.390,0:02:32.560 underlying hardware and this is what we're[br]going to talk about today. So we'll be 0:02:32.560,0:02:36.819 talking about cache attacks, what are[br]they, what can we do with that and also a 0:02:36.819,0:02:41.510 special kind of cache attack that we found[br]this year. So... doing cache attacks 0:02:41.510,0:02:48.590 without memory accesses and how to use[br]that even to bypass kernel ASLR. 0:02:48.590,0:02:53.129 So again, the talk says is to talk about[br]x86 instructions but this is even more 0:02:53.129,0:02:58.209 global than that. We can also mount these[br]cache attacks on ARM and not only on the 0:02:58.209,0:03:07.050 x86. So some of the examples that you will[br]see also applies to ARM. So today we'll do 0:03:07.050,0:03:11.420 have a bit of background, but actually[br]most of the background will be along the 0:03:11.420,0:03:19.251 lines because this covers really a huge[br]chunk of our research, and we'll see 0:03:19.251,0:03:24.209 mainly three instructions: So "mov" and[br]how we can perform these cache attacks, 0:03:24.209,0:03:29.430 what are they... The instruction[br]"clflush", so here we'll be doing cache 0:03:29.430,0:03:36.370 attacks without any memory accesses. Then[br]we'll see "prefetch" and how we can bypass 0:03:36.370,0:03:43.420 kernel ASLR and lots of translations[br]levels, and then there's even a bonus 0:03:43.420,0:03:48.950 track, so it's this this will be not our[br]works, but even more instructions and even 0:03:48.950,0:03:54.210 more text.[br]Okay, so let's start with a bit of an 0:03:54.210,0:04:01.190 introduction. So we will be mainly[br]focusing on Intel CPUs, and this is 0:04:01.190,0:04:05.599 roughly in terms of cores and caches, how[br]it looks like today. So we have different 0:04:05.599,0:04:09.440 levels of cores ...uh... different cores[br]so here four cores, and different levels 0:04:09.440,0:04:14.220 of caches. So here usually we have three[br]levels of caches. We have level 1 and 0:04:14.220,0:04:18.269 level 2 that are private to each call,[br]which means that core 0 can only access 0:04:18.269,0:04:24.520 its level 1 and its level 2 and not level[br]1 and level 2 of, for example, core 3, and 0:04:24.520,0:04:30.130 we have the last level cache... so here if[br]you can see the pointer... So this one is 0:04:30.130,0:04:36.289 divided in slices so we have as many[br]slices as cores, so here 4 slices, but all 0:04:36.289,0:04:40.659 the slices are shared across core so core[br]0 can access the whole last level cache, 0:04:40.659,0:04:48.669 that's 0 1 2 & 3. We also have a nice[br]property on Intel CPUs is that this level 0:04:48.669,0:04:52.280 of cache is inclusive, and what it means[br]is that everything that is contained in 0:04:52.280,0:04:56.889 level 1 and level 2 will also be contained[br]in the last level cache, and this will 0:04:56.889,0:05:01.439 prove to be quite useful for cache[br]attacks. 0:05:01.439,0:05:08.430 So today we mostly have set associative[br]caches. What it means is that we have data 0:05:08.430,0:05:13.249 that is loaded in specific sets and that[br]depends only on its address. So we have 0:05:13.249,0:05:18.900 some bits of the address that gives us the[br]index and that says "Ok the line is going 0:05:18.900,0:05:24.610 to be loaded in this cache set", so this[br]is a cache set. Then we have several ways 0:05:24.610,0:05:30.629 per set so here we have 4 ways and the[br]cacheine is going to be loaded in a 0:05:30.629,0:05:35.270 specific way and that will only depend on[br]the replacement policy and not on the 0:05:35.270,0:05:40.800 address itself, so when you load a line[br]into the cache, usually the cache is 0:05:40.800,0:05:44.830 already full and you have to make room for[br]a new line. So this is where the 0:05:44.830,0:05:49.729 replacement replacement policy—this is[br]what it does—it says ok I'm going to 0:05:49.729,0:05:57.779 remove this line to make room for the next[br]line. So for today we're going to see only 0:05:57.779,0:06:01.960 three instruction as I've been telling[br]you. So the move instruction, it does a 0:06:01.960,0:06:06.610 lot of things but the only aspect that[br]we're interested in about it that can 0:06:06.610,0:06:12.809 access data in the main memory.[br]We're going to see a clflush what it does 0:06:12.809,0:06:18.349 is that it removes a cache line from the[br]cache, from the whole cache. And we're 0:06:18.349,0:06:25.569 going to see prefetch, it prefetches a[br]cache line for future use. So we're going 0:06:25.569,0:06:30.520 to see what they do and the kind of side[br]effects that they have and all the attacks 0:06:30.520,0:06:34.800 that we can do with them. And that's[br]basically all the example you need for 0:06:34.800,0:06:39.830 today so even if you're not an expert of[br]x86 don't worry it's not just slides full 0:06:39.830,0:06:44.899 of assembly and stuff. Okay so on to the[br]first one. 0:06:44.899,0:06:49.940 ML: So we will first start with the 'mov'[br]instruction and actually the first slide 0:06:49.940,0:06:57.809 is full of code, however as you can see[br]the mov instruction is used to move data 0:06:57.809,0:07:02.629 from registers to registers, from the main[br]memory and back to the main memory and as 0:07:02.629,0:07:07.240 you can see there are many moves you can[br]use but basically it's just to move data 0:07:07.240,0:07:12.589 and that's all we need to know. In[br]addition, a lot of exceptions can occur so 0:07:12.589,0:07:18.139 we can assume that those restrictions are[br]so tight that nothing can go wrong when 0:07:18.139,0:07:22.210 you just move data because moving data is[br]simple. 0:07:22.210,0:07:27.879 However while there are a lot of[br]exceptions the data that is accessed is 0:07:27.879,0:07:35.009 always loaded into the cache, so data is[br]in the cache and this is transparent to 0:07:35.009,0:07:40.870 the program that is running. However,[br]there are side-effects when you run these 0:07:40.870,0:07:46.219 instructions, and we will see how they[br]look like with the mov instruction. So you 0:07:46.219,0:07:51.289 probably all know that data can either be[br]in CPU registers, in the different levels 0:07:51.289,0:07:56.029 of the cache that Clementine showed to you[br]earlier, in the main memory, or on the 0:07:56.029,0:08:02.219 disk, and depending on where the memory[br]and the data is located it needs a longer 0:08:02.219,0:08:09.689 time to be loaded back to the CPU, and[br]this is what we can see in this plot. So 0:08:09.689,0:08:15.739 we try here to measure the access time of[br]an address over and over again, assuming 0:08:15.739,0:08:21.759 that when we access it more often, it is[br]already stored in the cache. So around 70 0:08:21.759,0:08:27.289 cycles, most of the time we can assume[br]when we load an address and it takes 70 0:08:27.289,0:08:34.809 cycles, it's loaded into the cache.[br]However, when we assume that the data is 0:08:34.809,0:08:39.659 loaded from the main memory, we can[br]clearly see that it needs a much longer 0:08:39.659,0:08:46.720 time like a bit more than 200 cycles. So[br]depending when we measure the time it 0:08:46.720,0:08:51.470 takes to load the address we can say the[br]data has been loaded to the cache or the 0:08:51.470,0:08:58.339 data is still located in the main memory.[br]And this property is what we can exploit 0:08:58.339,0:09:05.339 using cache attacks. So we measure the[br]timing differences on memory accesses. And 0:09:05.339,0:09:09.940 what an attacker does he monitors the[br]cache lines, but he has no way to know 0:09:09.940,0:09:14.459 what's actually the content of the cache[br]line. So we can only monitor that this 0:09:14.459,0:09:20.099 cache line has been accessed and not[br]what's actually stored in the cache line. 0:09:20.099,0:09:24.411 And what you can do with this is you can[br]implement covert channels, so you can 0:09:24.411,0:09:29.580 allow two processes to communicate with[br]each other evading the permission system 0:09:29.580,0:09:35.060 what we will see later on. In addition you[br]can also do side channel attacks, so you 0:09:35.060,0:09:40.600 can spy with a malicious attacking[br]application on benign processes, and you 0:09:40.600,0:09:46.140 can use this to steal cryptographic keys[br]or to spy on keystrokes. 0:09:46.140,0:09:53.649 And basically we have different types of[br]cache attacks and I want to explain the 0:09:53.649,0:09:58.810 most popular one, the "Flush+Reload"[br]attack, in the beginning. So on the left, 0:09:58.810,0:10:03.110 you have the address space of the victim,[br]and on the right you have the address 0:10:03.110,0:10:08.560 space of the attacker who maps a shared[br]library—an executable—that the victim is 0:10:08.560,0:10:14.899 using in to its own address space, like[br]the red rectangle. And this means that 0:10:14.899,0:10:22.760 when this data is stored in the cache,[br]it's cached for both processes. Now the 0:10:22.760,0:10:28.170 attacker can use the flush instruction to[br]remove the data out of the cache, so it's 0:10:28.170,0:10:34.420 not in the cache anymore, so it's also not[br]cached for the victim. Now the attacker 0:10:34.420,0:10:39.100 can schedule the victim and if the victim[br]decides "yeah, I need this data", it will 0:10:39.100,0:10:44.970 be loaded back into the cache. And now the[br]attacker can reload the data, measure the 0:10:44.970,0:10:49.661 time how long it took, and then decide[br]"okay, the victim has accessed the data in 0:10:49.661,0:10:54.179 the meantime" or "the victim has not[br]accessed the data in the meantime." And by 0:10:54.179,0:10:58.959 that you can spy if this address has been[br]used. 0:10:58.959,0:11:03.240 The second type of attack is called[br]"Prime+Probe" and it does not rely on the 0:11:03.240,0:11:08.971 shared memory like the "Flush+Reload"[br]attack, and it works as following: Instead 0:11:08.971,0:11:16.139 of mapping anything into its own address[br]space, the attacker loads a lot of data 0:11:16.139,0:11:24.589 into one cache line, here, and fills the[br]cache. Now he again schedules the victim 0:11:24.589,0:11:31.820 and the schedule can access data that maps[br]to the same cache set. 0:11:31.820,0:11:38.050 So the cache set is used by the attacker[br]and the victim at the same time. Now the 0:11:38.050,0:11:43.050 attacker can start measuring the access[br]time to the addresses he loaded into the 0:11:43.050,0:11:49.050 cache before, and when he accesses an[br]address that is still in the cache it's 0:11:49.050,0:11:55.649 faster so he measures the lower time. And[br]if it's not in the cache anymore it has to 0:11:55.649,0:12:01.279 be reloaded into the cache so it takes a[br]longer time. He can sum this up and detect 0:12:01.279,0:12:07.870 if the victim has loaded data into the[br]cache as well. So the first thing we want 0:12:07.870,0:12:11.900 to show you is what you can do with cache[br]attacks is you can implement a covert 0:12:11.900,0:12:17.439 channel and this could be happening in the[br]following scenario. 0:12:17.439,0:12:23.610 You install an app on your phone to view[br]your favorite images you take, to apply 0:12:23.610,0:12:28.630 some filters, and in the end you don't[br]know that it's malicious because the only 0:12:28.630,0:12:33.609 permission it requires is to access your[br]images which makes sense. So you can 0:12:33.609,0:12:38.700 easily install it without any fear. In[br]addition you want to know what the weather 0:12:38.700,0:12:43.040 is outside, so you install a nice little[br]weather widget, and the only permission it 0:12:43.040,0:12:48.230 has is to access the internet because it[br]has to load the information from 0:12:48.230,0:12:55.569 somewhere. So what happens if you're able[br]to implement a covert channel between two 0:12:55.569,0:12:59.779 these two applications, without any[br]permissions and privileges so they can 0:12:59.779,0:13:05.060 communicate with each other without using[br]any mechanisms provided by the operating 0:13:05.060,0:13:11.149 system, so it's hidden. It can happen that[br]now the gallery app can send the image to 0:13:11.149,0:13:18.680 the internet, it will be uploaded and[br]exposed for everyone. So maybe you don't 0:13:18.680,0:13:25.610 want to see the cat picture everywhere.[br]While we can do this with those 0:13:25.610,0:13:30.219 Prime+Probe/ Flush+Reload attacks, we will[br]discuss a covert channel using 0:13:30.219,0:13:35.690 Prime+Probe. So how can we transmit this[br]data? We need to transmit ones and zeros 0:13:35.690,0:13:40.980 at some point. So the sender and the[br]receiver agree on one cache set that they 0:13:40.980,0:13:49.319 both use. The receiver probes the set all[br]the time. When the sender wants to 0:13:49.319,0:13:57.529 transmit a zero he just does nothing, so[br]the lines of the receiver are in the cache 0:13:57.529,0:14:01.809 all the time, and he knows "okay, he's[br]sending nothing", so it's a zero. 0:14:01.809,0:14:05.940 On the other hand if the sender wants to[br]transmit a one, he starts accessing 0:14:05.940,0:14:10.800 addresses that map to the same cache set[br]so it will take a longer time for the 0:14:10.800,0:14:16.540 receiver to access its addresses again,[br]and he knows "okay, the sender just sent 0:14:16.540,0:14:23.059 me a one", and Clementine will show you[br]what you can do with this covert channel. 0:14:23.059,0:14:25.180 CM: So the really nice thing about 0:14:25.180,0:14:28.959 Prime+Probe is that it has really low[br]requirements. It doesn't need any kind of 0:14:28.959,0:14:34.349 shared memory. For example if you have two[br]virtual machines you could have some 0:14:34.349,0:14:38.700 shared memory via memory deduplication.[br]The thing is that this is highly insecure, 0:14:38.700,0:14:43.969 so cloud providers like Amazon ec2, they[br]disable that. Now we can still use 0:14:43.969,0:14:50.429 Prime+Probe because it doesn't need this[br]shared memory. Another problem with cache 0:14:50.429,0:14:54.999 covert channels is that they are quite[br]noisy. So when you have other applications 0:14:54.999,0:14:59.259 that are also running on the system, they[br]are all competing for the cache and they 0:14:59.259,0:15:03.009 might, like, evict some cache lines,[br]especially if it's an application that is 0:15:03.009,0:15:08.749 very memory intensive. And you also have[br]noise due to the fact that the sender and 0:15:08.749,0:15:12.770 the receiver might not be scheduled at the[br]same time. So if you have your sender that 0:15:12.770,0:15:16.649 sends all the things and the receiver is[br]not scheduled then some part of the 0:15:16.649,0:15:22.539 transmission can get lost. So what we did[br]is we tried to build an error-free covert 0:15:22.539,0:15:30.829 channel. We took care of all these noise[br]issues by using some error detection to 0:15:30.829,0:15:36.470 resynchronize the sender and the receiver[br]and then we use some error correction to 0:15:36.470,0:15:40.779 correct the remaining errors.[br]So we managed to have a completely error- 0:15:40.779,0:15:46.069 free covert channel even if you have a lot[br]of noise, so let's say another virtual 0:15:46.069,0:15:54.119 machine also on the machine serving files[br]through a web server, also doing lots of 0:15:54.119,0:16:01.600 memory-intensive tasks at the same time,[br]and the covert channel stayed completely 0:16:01.600,0:16:07.610 error-free, and around 40 to 75 kilobytes[br]per second, which is still quite a lot. 0:16:07.610,0:16:14.470 All of this is between virtual machines on[br]Amazon ec2. And the really neat thing—we 0:16:14.470,0:16:19.389 wanted to do something with that—and[br]basically we managed to create an SSH 0:16:19.389,0:16:27.060 connection really over the cache. So they[br]don't have any network between 0:16:27.060,0:16:31.439 them, but just we are sending the zeros[br]and the ones and we have an SSH connection 0:16:31.439,0:16:36.839 between them. So you could say that cache[br]covert channels are nothing, but I think 0:16:36.839,0:16:43.079 it's a real threat. And if you want to[br]have more details about this work in 0:16:43.079,0:16:49.220 particular, it will be published soon at[br]NDSS. 0:16:49.220,0:16:54.040 So the second application that we wanted[br]to show you is that we can attack crypto 0:16:54.040,0:17:01.340 with cache attacks. In particular we are[br]going to show an attack on AES and a 0:17:01.340,0:17:04.990 special implementation of AES that uses[br]T-Tables. so that's the fast software 0:17:04.990,0:17:11.650 implementation because it uses some[br]precomputed lookup tables. It's known to 0:17:11.650,0:17:17.490 be vulnerable to side-channel attacks[br]since 2006 by Osvik et al, and it's a one- 0:17:17.490,0:17:24.110 round known plaintext attack, so you have[br]p—or plaintext—and k, your secret key. And 0:17:24.110,0:17:29.570 the AES algorithm, what it does is compute[br]an intermediate state at each round r. 0:17:29.570,0:17:38.559 And in the first round, the accessed table[br]indices are just p XOR k. Now it's a known 0:17:38.559,0:17:43.500 plaintext attack, what this means is that[br]if you can recover the accessed table 0:17:43.500,0:17:49.460 indices you've also managed to recover the[br]key because it's just XOR. So that would 0:17:49.460,0:17:55.450 be bad, right, if we could recover these[br]accessed table indices. Well we can, with 0:17:55.450,0:18:00.510 cache attacks! So we did that with[br]Flush+Reload and with Prime+Probe. On the 0:18:00.510,0:18:05.809 x-axis you have the plaintext byte values[br]and on the y-axis you have the addresses 0:18:05.809,0:18:15.529 which are essentially the T table entries.[br]So a black cell means that we've monitored 0:18:15.529,0:18:19.970 the cache line, and we've seen a lot of[br]cache hits. So basically the blacker it 0:18:19.970,0:18:25.650 is, the more certain we are that the[br]T-Table entry has been accessed. And here 0:18:25.650,0:18:31.779 it's a toy example, the key is all-zeros,[br]but you would basically just have a 0:18:31.779,0:18:35.700 different pattern if the key was not all-[br]zeros, and as long as you can see this 0:18:35.700,0:18:43.409 nice diagonal or a pattern then you have[br]recovered the key. So it's an old attack, 0:18:43.409,0:18:48.890 2006, it's been 10 years, everything[br]should be fixed by now, and you see where 0:18:48.890,0:18:56.880 I'm going: it's not. So on Android the[br]bouncy castle implementation it uses by 0:18:56.880,0:19:03.360 default the T-table, so that's bad. Also[br]many implementations that you can find 0:19:03.360,0:19:11.380 online uses pre-computed values, so maybe[br]be wary about this kind of attacks. The 0:19:11.380,0:19:17.240 last application we wanted to show you is[br]how we can spy on keystrokes. 0:19:17.240,0:19:21.480 So for that we will use flush and reload[br]because it's a really fine grained 0:19:21.480,0:19:26.309 attack. We can see very precisely which[br]cache line has been accessed, and a cache 0:19:26.309,0:19:31.440 line is only 64 bytes so it's really not a[br]lot and we're going to use that to spy on 0:19:31.440,0:19:37.690 keystrokes and we even have a small demo[br]for you. 0:19:40.110,0:19:45.640 ML: So what you can see on the screen this[br]is not on Intel x86 it's on a smartphone, 0:19:45.640,0:19:50.330 on the Galaxy S6, but you can also apply[br]these cache attacks there so that's what 0:19:50.330,0:19:53.850 we want to emphasize.[br]So on the left you see the screen and on 0:19:53.850,0:19:57.960 the right we have connected a shell with[br]no privileges and permissions, so it can 0:19:57.960,0:20:00.799 basically be an app that you install[br]glass bottle falling 0:20:00.799,0:20:09.480 from the App Store and on the right we are[br]going to start our spy tool, and on the 0:20:09.480,0:20:14.110 left we just open the messenger app and[br]whenever the user hits any key on the 0:20:14.110,0:20:19.690 keyboard our spy tool takes care of that[br]and notices that. Also if he presses the 0:20:19.690,0:20:26.120 spacebar we can also measure that. If the[br]user decides "ok, I want to delete the 0:20:26.120,0:20:30.880 word" because he changed his mind, we can[br]also register if the user pressed the 0:20:30.880,0:20:37.929 backspace button, so in the end we can see[br]exactly how long the words were, the user 0:20:37.929,0:20:45.630 typed into his phone without any[br]permissions and privileges, which is bad. 0:20:45.630,0:20:55.250 laughs[br]applause 0:20:55.250,0:21:00.320 ML: so enough about the mov instruction,[br]let's head to clflush. 0:21:00.320,0:21:07.230 CM: So the clflush instruction: What it[br]does is that it invalidates from every 0:21:07.230,0:21:12.309 level the cache line that contains the[br]address that you pass to this instruction. 0:21:12.309,0:21:16.990 So in itself it's kind of bad because it[br]enables the Flush+Reload attacks that we 0:21:16.990,0:21:21.300 showed earlier, that was just flush,[br]reload, and the flush part is done with 0:21:21.300,0:21:29.140 clflush. But there's actually more to it,[br]how wonderful. So there's a first timing 0:21:29.140,0:21:33.320 leakage with it, so we're going to see[br]that the clflush instruction has a 0:21:33.320,0:21:37.890 different timing depending on whether the[br]data that you that you pass to it is 0:21:37.890,0:21:44.710 cached or not. So imagine you have a cache[br]line that is on the level 1 by inclu... 0:21:44.710,0:21:50.299 With the inclusion property it has to be[br]also in the last level cache. Now this is 0:21:50.299,0:21:54.350 quite convenient and this is also why we[br]have this inclusion property for 0:21:54.350,0:22:00.019 performance reason on Intel CPUs, if you[br]want to see if a line is present at all in 0:22:00.019,0:22:04.209 the cache you just have to look in the[br]last level cache. So this is basically 0:22:04.209,0:22:08.010 what the clflush instruction does. It goes[br]to the last last level cache, sees "ok 0:22:08.010,0:22:12.890 there's a line, I'm going to flush this[br]one" and then there's something that tells 0:22:12.890,0:22:18.950 ok the line is also present somewhere else[br]so then it flushes the line in level 1 0:22:18.950,0:22:26.390 and/or level 2. So that's slow. Now if you[br]perform clflush on some data that is not 0:22:26.390,0:22:32.240 cached, basically it does the same, goes[br]to the last level cache, see that there's 0:22:32.240,0:22:36.659 no line and there can't be any... This[br]data can't be anywhere else in the cache 0:22:36.659,0:22:41.269 because it would be in the last level[br]cache if it was anywhere, so it does 0:22:41.269,0:22:47.430 nothing and it stop there. So that's fast.[br]So how exactly fast and slow am I talking 0:22:47.430,0:22:53.760 about? So it's actually only a very few[br]cycles so we did this experiments on 0:22:53.760,0:22:59.072 different microarchitecture so Center[br]Bridge, Ivy Bridge, and Haswell and... 0:22:59.072,0:23:03.250 So it different colors correspond to the[br]different microarchitecture. So first 0:23:03.250,0:23:07.880 thing that is already... kinda funny is[br]that you can see that you can distinguish 0:23:07.880,0:23:14.649 the micro architecture quite nicely with[br]this, but the real point is that you have 0:23:14.649,0:23:20.280 really a different zones. The solids...[br]The solid line is when we performed the 0:23:20.280,0:23:25.200 measurement on clflush with the line that[br]was already in the cache, and the dashed 0:23:25.200,0:23:30.840 line is when the line was not in the[br]cache, and in all microarchitectures you 0:23:30.840,0:23:36.539 can see that we can see a difference: It's[br]only a few cycles, it's a bit noisy, so 0:23:36.539,0:23:43.250 what could go wrong? Okay, so exploiting[br]these few cycles, we still managed to 0:23:43.250,0:23:47.029 perform a new cache attacks that we call[br]"Flush+Flush", so I'm going to explain 0:23:47.029,0:23:52.220 that to you: So basically everything that[br]we could do with "Flush+Reload", we can 0:23:52.220,0:23:56.899 also do with "Flush+Flush". We can perform[br]cover channels and sidechannel attacks. 0:23:56.899,0:24:01.090 It's stealthier than previous cache[br]attacks, I'm going to go back on this one, 0:24:01.090,0:24:07.220 and it's also faster than previous cache[br]attacks. So how does it work exactly? So 0:24:07.220,0:24:12.210 the principle is a bit similar to[br]"Flush+Reload": So we have the attacker 0:24:12.210,0:24:16.131 and the victim that have some kind of[br]shared memory, let's say a shared library. 0:24:16.131,0:24:21.340 It will be shared in the cache The[br]attacker will start by flushing the cache 0:24:21.340,0:24:26.510 line then let's the victim perform[br]whatever it does, let's say encryption, 0:24:26.510,0:24:32.120 the victim will load some data into the[br]cache, automatically, and now the attacker 0:24:32.120,0:24:36.720 wants to know again if the victim accessed[br]this precise cache line and instead of 0:24:36.720,0:24:43.540 reloading it is going to flush it again.[br]And since we have this timing difference 0:24:43.540,0:24:47.040 depending on whether the data is in the[br]cache or not, it gives us the same 0:24:47.040,0:24:54.889 information as if we reloaded it, except[br]it's way faster. So I talked about 0:24:54.889,0:24:59.690 stealthiness. So the thing is that[br]basically these cache attacks and that 0:24:59.690,0:25:06.340 also applies to "Rowhammer": They are[br]already stealth in themselves, because 0:25:06.340,0:25:10.470 there's no antivirus today that can detect[br]them. but some people thought that we 0:25:10.470,0:25:14.351 could detect them with performance[br]counters because they do a lot of cache 0:25:14.351,0:25:18.549 misses and cache references that happen[br]when the data is flushed and when you 0:25:18.549,0:25:26.090 reaccess memory. now what we thought is[br]that yeah but that also not the only 0:25:26.090,0:25:31.269 program steps to lots of cache misses and[br]cache references so we would like to have 0:25:31.269,0:25:38.120 a slightly parametric. So these cache[br]attacks they have a very heavy activity on 0:25:38.120,0:25:43.840 the cache but they're also very particular[br]because there are very short loops of code 0:25:43.840,0:25:48.610 if you take flush and reload this just[br]flush one line reload the line and then 0:25:48.610,0:25:53.750 again flush reload that's very short loop[br]and that creates a very low pressure on 0:25:53.750,0:26:01.490 the instruction therapy which is kind of[br]particular for of cache attacks so what we 0:26:01.490,0:26:05.380 decided to do is normalizing the cache[br]even so the cache misses and cache 0:26:05.380,0:26:10.720 references by events that have to do with[br]the instruction TLB and there we could 0:26:10.720,0:26:19.360 manage to detect cache attacks and[br]Rowhammer without having false positives 0:26:19.360,0:26:24.510 so the system metric that I'm going to use[br]when I talk about stealthiness so we 0:26:24.510,0:26:29.750 started by creating a cover channel. First[br]we wanted to have it as fast as possible 0:26:29.750,0:26:36.160 so we created a protocol to evaluates all[br]the kind of cache attack that we had so 0:26:36.160,0:26:40.540 flush+flush, flush+reload, and[br]prime+probe and we started with a 0:26:40.540,0:26:47.010 packet side of 28 doesn't really matter.[br]We measured the capacity of our covert 0:26:47.010,0:26:52.799 channel and flush+flush is around[br]500 kB/s whereas Flush+Reload 0:26:52.799,0:26:56.340 was only 300 kB/s[br]so Flush+Flush is already quite an 0:26:56.340,0:27:00.740 improvement on the speed.[br]Then we measured the stealth zone at this 0:27:00.740,0:27:06.100 speed only Flush+Flush was stealth and[br]now the thing is that Flush+Flush and 0:27:06.100,0:27:10.200 Flush+Reload as you've seen there are[br]some similarities so for a covert channel 0:27:10.200,0:27:15.309 they also share the same center on it is[br]receivers different and for this one the 0:27:15.309,0:27:20.000 center was not stealth for both of them[br]anyway if you want a fast covert channel 0:27:20.000,0:27:26.640 then just try flush+flush that works.[br]Now let's try to make it stealthy 0:27:26.640,0:27:30.639 completely stealthy because if I have the[br]standard that is not stealth maybe that we 0:27:30.639,0:27:36.440 give away the whole attack so we said okay[br]maybe if we just slow down all the attacks 0:27:36.440,0:27:41.240 then there will be less cache hits,[br]cache misses and then maybe all 0:27:41.240,0:27:48.070 the attacks are actually stealthy why not?[br]So we tried that we slowed down everything 0:27:48.070,0:27:52.889 so Flush+Reload and Flash+Flash[br]are around 50 kB/s now 0:27:52.889,0:27:55.829 Prime+Probe is a bit slower because it[br]takes more time 0:27:55.829,0:28:01.330 to prime and probe anything but still 0:28:01.330,0:28:09.419 even with this slow down only Flush+Flush[br]has its receiver stealth and we also 0:28:09.419,0:28:14.769 managed to have the sender stealth now so[br]basically whether you want a fast covert 0:28:14.769,0:28:20.450 channel or a stealth covert channel[br]Flush+Flush is really great. 0:28:20.450,0:28:26.500 Now we wanted to also evaluate if it[br]wasn't too noisy to perform some side 0:28:26.500,0:28:30.740 channel attack so we did these side[br]channels on the AES t-table implementation 0:28:30.740,0:28:35.910 the attacks that we have shown you[br]earlier, so we computed a lot of 0:28:35.910,0:28:41.820 encryption that we needed to determine the[br]upper four bits of a key bytes so here the 0:28:41.820,0:28:48.870 lower the better the attack and Flush +[br]Reload is a bit better so we need only 250 0:28:48.870,0:28:55.029 encryptions to recover these bits but[br]Flush+Flush comes quite, comes quite 0:28:55.029,0:29:00.570 close with 350 and Prime+Probe is[br]actually the most noisy of them all, needs 0:29:00.570,0:29:06.101 5... close to 5000 encryptions so we have[br]around the same performance for 0:29:06.101,0:29:13.520 Flush+Flush and Flush+Reload.[br]Now let's evaluate the stealthiness again. 0:29:13.520,0:29:19.320 So what we did here is we perform 256[br]billion encryptions in a synchronous 0:29:19.320,0:29:25.740 attack so we really had the spy and the[br]victim scattered and we evaluated the 0:29:25.740,0:29:31.409 stealthiness of them all and here only[br]Flush+Flush again is stealth. And while 0:29:31.409,0:29:36.279 you can always slow down a covert channel[br]you can't actually slow down a side 0:29:36.279,0:29:40.700 channel because, in a real-life scenario,[br]you're not going to say "Hey victim, him 0:29:40.700,0:29:47.179 wait for me a bit, I am trying to do an[br]attack here." That won't work. 0:29:47.179,0:29:51.429 So there's even more to it but I will need[br]again a bit of background before 0:29:51.429,0:29:56.910 continuing. So I've shown you the[br]different levels of caches and here I'm 0:29:56.910,0:30:04.009 going to focus more on the last-level[br]cache. So we have here our four slices so 0:30:04.009,0:30:09.830 this is the last-level cache and we have[br]some bits of the address here that 0:30:09.830,0:30:14.330 corresponds to the set, but more[br]importantly, we need to know where in 0:30:14.330,0:30:19.899 which slice and address is going to be.[br]And that is given, that is given by some 0:30:19.899,0:30:23.850 bits of the set and the type of the[br]address that are passed into a function 0:30:23.850,0:30:27.960 that says in which slice the line is going[br]to be. 0:30:27.960,0:30:32.460 Now the thing is that this hash function[br]is undocumented by Intel. Wouldn't be fun 0:30:32.460,0:30:39.250 otherwise. So we have this: As many slices[br]as core, an undocumented hash function 0:30:39.250,0:30:43.980 that maps a physical address to a slice,[br]and while it's actually a bit of a pain 0:30:43.980,0:30:48.710 for attacks, it has, it was not designed[br]for security originally but for 0:30:48.710,0:30:53.570 performance, because you want all the[br]access to be evenly distributed in the 0:30:53.570,0:31:00.399 different slices, for performance reasons.[br]So the hash function basically does, it 0:31:00.399,0:31:05.279 takes some bits of the physical address[br]and output k bits of slice, so just one 0:31:05.279,0:31:09.309 bits if you have a two-core machine, two[br]bits if you have a four-core machine and 0:31:09.309,0:31:16.830 so on. Now let's go back to clflush, see[br]what's the relation with that. 0:31:16.830,0:31:21.169 So the thing that we noticed is that[br]clflush is actually faster to reach a line 0:31:21.169,0:31:28.549 on the local slice.[br]So if you have, if you're flushing always 0:31:28.549,0:31:33.340 one line and you run your program on core[br]zero, core one, core two and core three, 0:31:33.340,0:31:37.899 you will observe that one core in[br]particular on, when you run the program on 0:31:37.899,0:31:44.632 one core, the clflush is faster. And so[br]here this is on core one, and you can see 0:31:44.632,0:31:51.139 that core zero, two, and three it's, it's[br]a bit slower and here we can deduce that, 0:31:51.139,0:31:55.320 so we run the program on core one and we[br]flush always the same line and we can 0:31:55.320,0:32:01.850 deduce that the line belong to slice one.[br]And what we can do with that is that we 0:32:01.850,0:32:06.500 can map physical address to slices.[br]And that's one way to reverse-engineer 0:32:06.500,0:32:10.639 this addressing function that was not[br]documented. 0:32:10.639,0:32:15.880 Funnily enough that's not the only way:[br]What I did before that was using the 0:32:15.880,0:32:21.229 performance counters to reverse-engineer[br]this function, but that's actually a whole 0:32:21.229,0:32:27.770 other story and if you want more detail on[br]that, there's also an article on that. 0:32:27.770,0:32:30.139 ML: So the next instruction we want to 0:32:30.139,0:32:35.110 talk about is the prefetch instruction.[br]And the prefetch instruction is used to 0:32:35.110,0:32:40.841 tell the CPU: "Okay, please load the data[br]I need later on, into the cache, if you 0:32:40.841,0:32:45.968 have some time." And in the end there are[br]actually six different prefetch 0:32:45.968,0:32:52.929 instructions: prefetcht0 to t2 which[br]means: "CPU, please load the data into the 0:32:52.929,0:32:58.640 first-level cache", or in the last-level[br]cache, whatever you want to use, but we 0:32:58.640,0:33:02.250 spare you the details because it's not so[br]interesting in the end. 0:33:02.250,0:33:06.940 However, what's more interesting is when[br]we take a look at the Intel manual and 0:33:06.940,0:33:11.880 what it says there. So, "Using the[br]PREFETCH instruction is recommended only 0:33:11.880,0:33:17.049 if data does not fit in the cache." So you[br]can tell the CPU: "Please load data I want 0:33:17.049,0:33:23.210 to stream into the cache, so it's more[br]performant." "Use of software prefetch 0:33:23.210,0:33:27.740 should be limited to memory addresses that[br]are managed or owned within the 0:33:27.740,0:33:33.620 application context."[br]So one might wonder what happens if this 0:33:33.620,0:33:40.940 address is not managed by myself. Sounds[br]interesting. "Prefetching to addresses 0:33:40.940,0:33:46.289 that are not mapped to physical pages can[br]experience non-deterministic performance 0:33:46.289,0:33:52.030 penalty. For example specifying a NULL[br]pointer as an address for prefetch can 0:33:52.030,0:33:56.000 cause long delays."[br]So we don't want to do that because our 0:33:56.000,0:34:02.919 program will be slow. So, let's take a[br]look what they mean with non-deterministic 0:34:02.919,0:34:08.889 performance penalty, because we want to[br]write good software, right? But before 0:34:08.889,0:34:12.510 that, we have to take a look at a little[br]bit more background information to 0:34:12.510,0:34:17.710 understand the attacks.[br]So on modern operating systems, every 0:34:17.710,0:34:22.850 application has its own virtual address[br]space. So at some point, the CPU needs to 0:34:22.850,0:34:27.479 translate these addresses to the physical[br]addresses actually in the DRAM. And for 0:34:27.479,0:34:33.690 that we have this very complex-looking[br]data structure. So we have a 48-bit 0:34:33.690,0:34:40.409 virtual address, and some of those bits[br]mapped to a table, like the PM level 4 0:34:40.409,0:34:47.760 table, with 512 entries, so depending on[br]those bits the CPU knows, at which line he 0:34:47.760,0:34:51.520 has to look.[br]And if there is data there, because the 0:34:51.520,0:34:56.900 address is mapped, he can proceed and look[br]at the page directory, point the table, 0:34:56.900,0:35:04.620 and so on for the town. So is everything,[br]is the same for each level until you come 0:35:04.620,0:35:09.130 to your page table, where you have[br]4-kilobyte pages. So it's in the end not 0:35:09.130,0:35:13.851 that complicated, but it's a bit[br]confusing, because you want to know a 0:35:13.851,0:35:20.310 physical address, so you have to look it[br]up somewhere in the, in the main memory 0:35:20.310,0:35:25.420 with physical addresses to translate your[br]virtual addresses. And if you have to go 0:35:25.420,0:35:31.890 through all those levels, it needs a long[br]time, so we can do better than that and 0:35:31.890,0:35:39.160 that's why Intel introduced additional[br]caches, also for all of those levels. So, 0:35:39.160,0:35:45.560 if you want to translate an address, you[br]take a look at the ITLB for instructions, 0:35:45.560,0:35:51.150 and the data TLB for data. If it's there,[br]you can stop, otherwise you go down all 0:35:51.150,0:35:58.700 those levels and if it's not in any cache[br]you have to look it up in the DRAM. In 0:35:58.700,0:36:03.300 addition, the address space you have is[br]shared, because you have, on the one hand, 0:36:03.300,0:36:07.470 the user memory and, on the other hand,[br]you have mapped the kernel for convenience 0:36:07.470,0:36:12.870 and performance also in the address space.[br]And if your user program wants to access 0:36:12.870,0:36:18.310 some kernel functionality like reading a[br]file, it will switch to the kernel memory 0:36:18.310,0:36:23.880 there's a privilege escalation, and then[br]you can read the file, and so on. So, 0:36:23.880,0:36:30.420 that's it. However, you have drivers in[br]the kernel, and if you know the addresses 0:36:30.420,0:36:35.771 of those drivers, you can do code-reuse[br]attacks, and as a countermeasure, they 0:36:35.771,0:36:40.150 introduced address-space layout[br]randomization, also for the kernel. 0:36:40.150,0:36:47.040 And this means that when you have your[br]program running, the kernel is mapped at 0:36:47.040,0:36:51.630 one address and if you reboot the machine[br]it's not on the same address anymore but 0:36:51.630,0:36:58.390 somewhere else. So if there is a way to[br]find out at which address the kernel is 0:36:58.390,0:37:04.450 loaded, you have circumvented this[br]countermeasure and defeated kernel address 0:37:04.450,0:37:11.060 space layout randomization. So this would[br]be nice for some attacks. In addition, 0:37:11.060,0:37:16.947 there's also the kernel direct physical[br]map. And what does this mean? It's 0:37:16.947,0:37:23.320 implemented on many operating systems like[br]OS X, Linux, also on the Xen hypervisor 0:37:23.320,0:37:27.860 and[br]BSD, but not on Windows. But what it means 0:37:27.860,0:37:33.870 is that the complete physical memory is[br]mapped in additionally in the kernel 0:37:33.870,0:37:40.460 memory at a fixed offset. So, for every[br]page that is mapped in the user space, 0:37:40.460,0:37:45.160 there's something like a twin page in the[br]kernel memory, which you can't access 0:37:45.160,0:37:50.371 because it's in the kernel memory.[br]However, we will need it later, because 0:37:50.371,0:37:58.230 now we go back to prefetch and see what we[br]can do with that. So, prefetch is not a 0:37:58.230,0:38:04.150 usual instruction, because it just tells[br]the CPU "I might need that data later on. 0:38:04.150,0:38:10.000 If you have time, load for me," if not,[br]the CPU can ignore it because it's busy 0:38:10.000,0:38:15.810 with other stuff. So, there's no necessity[br]that this instruction is really executed, 0:38:15.810,0:38:22.070 but most of the time it is. And a nice,[br]interesting thing is that it generates no 0:38:22.070,0:38:29.000 faults, so whatever you pass to this[br]instruction, your program won't crash, and 0:38:29.000,0:38:33.990 it does not check any privileges, so I can[br]also pass an kernel address to it and it 0:38:33.990,0:38:37.510 won't say "No, stop, you accessed an[br]address that you are not allowed to 0:38:37.510,0:38:45.530 access, so I crash," it just continues,[br]which is nice. 0:38:45.530,0:38:49.810 The second interesting thing is that the[br]operand is a virtual address, so every 0:38:49.810,0:38:55.534 time you execute this instruction, the CPU[br]has to go and check "OK, what physical 0:38:55.534,0:38:59.600 address does this virtual address[br]correspond to?" So it has to do the lookup 0:38:59.600,0:39:05.750 with all those tables we've seen earlier,[br]and as you probably have guessed already, 0:39:05.750,0:39:10.370 the execution time varies also for the[br]prefetch instruction and we will see later 0:39:10.370,0:39:16.090 on what we can do with that.[br]So, let's get back to the direct physical 0:39:16.090,0:39:22.870 map. Because we can create an oracle for[br]address translation, so we can find out 0:39:22.870,0:39:27.540 what physical address belongs to the[br]virtual address. Because nowadays you 0:39:27.540,0:39:31.990 don't want that the user to know, because[br]you can craft nice rowhammer attacks with 0:39:31.990,0:39:37.520 that information, and more advanced cache[br]attacks, so you restrict this information 0:39:37.520,0:39:44.270 to the user. But let's check if we find a[br]way to still get this information. So, as 0:39:44.270,0:39:50.150 I've told you earlier, if you have a[br]paired page in the user space map, 0:39:50.150,0:39:54.505 you have the twin page in the kernel [br]space, and if it's cached, 0:39:54.505,0:39:56.710 its cached for both of them again. 0:39:56.710,0:40:03.170 So, the attack now works as the following:[br]From the attacker you flush your user 0:40:03.170,0:40:09.760 space page, so it's not in the cache for[br]the... also for the kernel memory, and 0:40:09.760,0:40:15.850 then you call prefetch on the address of[br]the kernel, because as I told you, you 0:40:15.850,0:40:22.050 still can do that because it doesn't[br]create any faults. So, you tell the CPU 0:40:22.050,0:40:28.310 "Please load me this data into the cache[br]even if I don't have access to this data 0:40:28.310,0:40:32.550 normally."[br]And if we now measure on our user space 0:40:32.550,0:40:37.100 page the address again, and we measure a[br]cache hit, because it has been loaded by 0:40:37.100,0:40:42.630 the CPU into the cache, we know exactly at[br]which position, since we passed the 0:40:42.630,0:40:48.250 address to the function, this address[br]corresponds to. And because this is at a 0:40:48.250,0:40:53.280 fixed offset, we can just do a simple[br]subtraction and know the physical address 0:40:53.280,0:40:59.180 again. So we have a nice way to find[br]physical addresses for virtual addresses. 0:40:59.180,0:41:04.390 And in practice this looks like this[br]following plot. So, it's pretty simple, 0:41:04.390,0:41:08.910 because we just do this for every address,[br]and at some point we measure a cache hit. 0:41:08.910,0:41:14.260 So, there's a huge difference. And exactly[br]at this point we know this physical 0:41:14.260,0:41:20.140 address corresponds to our virtual[br]address. The second thing is that we can 0:41:20.140,0:41:27.070 exploit the timing differences it needs[br]for the prefetch instruction. Because, as 0:41:27.070,0:41:31.850 I told you, when you go down this cache[br]levels, at some point you see "it's here" 0:41:31.850,0:41:37.500 or "it's not here," so it can abort early.[br]And with that we can know exactly 0:41:37.500,0:41:41.800 when the prefetch[br]instruction aborted, and know how the 0:41:41.800,0:41:48.070 pages are mapped into the address space.[br]So, the timing depends on where the 0:41:48.070,0:41:57.090 translation stops. And using those two[br]properties and those information, we can 0:41:57.090,0:42:02.227 do the following: On the one hand, we can[br]build variants of cache attacks. So, 0:42:02.227,0:42:07.444 instead of Flush+Reload, we can do[br]Flush+Prefetch, for instance. We can 0:42:07.444,0:42:12.060 also use prefetch to mount rowhammer[br]attacks on privileged addresses, because 0:42:12.060,0:42:18.069 it doesn't do any faults when we pass[br]those addresses, and it works as well. In 0:42:18.069,0:42:23.330 addition, we can use it to recover the[br]translation levels of a process, which you 0:42:23.330,0:42:27.870 could do earlier with the page map file,[br]but as I told you it's now privileged, so 0:42:27.870,0:42:32.890 you don't have access to that, and by[br]doing that you can bypass address space 0:42:32.890,0:42:38.170 layout randomization. In addition, as I[br]told you, you can translate virtual 0:42:38.170,0:42:43.530 addresses to physical addresses, which is[br]now also privileged with the page map 0:42:43.530,0:42:48.790 file, and using that it reenables return[br]to direct exploits, which have been 0:42:48.790,0:42:55.550 demonstrated last year. On top of that, we[br]can also use this to locate kernel 0:42:55.550,0:43:00.850 drivers, as I told you. It would be nice[br]if we can circumvent KSLR as well, and I 0:43:00.850,0:43:08.380 will show you now how this is possible.[br]So, with the first oracle we find out all 0:43:08.380,0:43:15.430 the pages that are mapped, and for each of[br]those pages, we evict the translation 0:43:15.430,0:43:18.210 caches, and we can do that by either[br]calling sleep, 0:43:18.210,0:43:24.450 which schedules another program, or access[br]just a large memory buffer. Then, we 0:43:24.450,0:43:28.260 perform a syscall to the driver. So,[br]there's code of the driver executed and 0:43:28.260,0:43:33.540 loaded into the cache, and then we just[br]measure the time prefetch takes on this 0:43:33.540,0:43:40.840 address. And in the end, the fastest[br]average access time is the driver page. 0:43:40.840,0:43:46.770 So, we can mount this attack on Windows 10[br]in less than 12 seconds. So, we can defeat 0:43:46.770,0:43:52.110 KSLR in less than 12 seconds, which is[br]very nice. And in practice, the 0:43:52.110,0:43:58.330 measurements looks like the following: So,[br]we have a lot of long measurements, and at 0:43:58.330,0:44:05.060 some point you have a low one, and you[br]know exactly that this driver region and 0:44:05.060,0:44:09.930 the address the driver is located. And[br]you can mount those read to direct 0:44:09.930,0:44:16.210 attacks again. However, that's not[br]everything, because there are more 0:44:16.210,0:44:20.795 instructions in Intel.[br]CM: Yeah, so, the following is not our 0:44:20.795,0:44:24.350 work, but we thought that would be[br]interesting, because it's basically more 0:44:24.350,0:44:30.740 instructions, more attacks, more fun. So[br]there's the RDSEED instruction, and what 0:44:30.740,0:44:35.340 it does, that is request a random seed to[br]the hardware random number generator. So, 0:44:35.340,0:44:39.310 the thing is that there is a fixed number[br]of precomputed random bits, and that takes 0:44:39.310,0:44:44.320 time to regenerate them. So, as everything[br]that takes time, you can create a covert 0:44:44.320,0:44:50.180 channel with that. There is also FADD and[br]FMUL, which are floating point operations. 0:44:50.180,0:44:56.740 Here, the running time of this instruction[br]depends on the operands. Some people 0:44:56.740,0:45:01.530 managed to bypass Firefox's same origin[br]policy with an SVG filter timing attack 0:45:01.530,0:45:08.540 with that. There's also the JMP[br]instructions. So, in modern CPUs you have 0:45:08.540,0:45:14.520 branch prediction, and branch target[br]prediction. With that, it's actually been 0:45:14.520,0:45:18.250 studied a lot, you can create a covert[br]channel. You can do side-channel attacks 0:45:18.250,0:45:26.028 on crypto. You can also bypass KSLR, and[br]finally, there are TSX instructions, which 0:45:26.028,0:45:31.010 is an extension for hardware transactional[br]memory support, which has also been used 0:45:31.010,0:45:37.150 to bypass KSLR. So, in case you're not[br]sure, KSLR is dead. You have lots of 0:45:37.150,0:45:45.650 different things to read. Okay, so, on the[br]conclusion now. So, as you've seen, it's 0:45:45.650,0:45:50.190 actually more a problem of CPU design,[br]than really the instruction sets 0:45:50.190,0:45:55.720 architecture. The thing is that all these[br]issues are really hard to patch. They 0:45:55.720,0:45:59.966 are all linked to performance[br]optimizations, and we are not getting rid 0:45:59.966,0:46:03.890 of performance optimization. That's[br]basically a trade-off between performance 0:46:03.890,0:46:11.530 and security, and performance seems to[br]always win. There has been some 0:46:11.530,0:46:20.922 propositions to... against cache attacks,[br]to... let's say remove the CLFLUSH 0:46:20.922,0:46:26.640 instructions. The thing is that all these[br]quick fix won't work, because we always 0:46:26.640,0:46:31.450 find new ways to do the same thing without[br]these precise instructions and also, we 0:46:31.450,0:46:37.410 keep finding new instruction that leak[br]information. So, it's really, let's say 0:46:37.410,0:46:43.740 quite a big topic that we have to fix[br]this. So, thank you very much for your 0:46:43.740,0:46:47.046 attention. If you have any questions we'd[br]be happy to answer them. 0:46:47.046,0:46:52.728 applause 0:46:52.728,0:47:01.510 applause[br]Herald: Okay. Thank you very much again 0:47:01.510,0:47:06.571 for your talk, and now we will have a Q&A,[br]and we have, I think, about 15 minutes, so 0:47:06.571,0:47:11.330 you can start lining up behind the[br]microphones. They are in the gangways in 0:47:11.330,0:47:18.130 the middle. Except, I think that one...[br]oh, no, it's back up, so it will work. And 0:47:18.130,0:47:22.180 while we wait, I think we will take[br]questions from our signal angel, if there 0:47:22.180,0:47:28.810 are any. Okay, there aren't any, so...[br]microphone questions. I think, you in 0:47:28.810,0:47:33.440 front.[br]Microphone: Hi. Can you hear me? 0:47:33.440,0:47:40.050 Herald: Try again.[br]Microphone: Okay. Can you hear me now? 0:47:40.050,0:47:46.480 Okay. Yeah, I'd like to know what exactly[br]was your stealthiness metric? Was it that 0:47:46.480,0:47:51.310 you can't distinguish it from a normal[br]process, or...? 0:47:51.310,0:47:56.500 CM: So...[br]Herald: Wait a second. We have still Q&A, 0:47:56.500,0:47:59.780 so could you quiet down a bit? That would[br]be nice. 0:47:59.780,0:48:08.180 CM: So, the question was about the[br]stealthiness metric. Basically, we use the 0:48:08.180,0:48:14.320 metric with cache misses and cache[br]references, normalized by the instructions 0:48:14.320,0:48:21.080 TLB events, and we[br]just found the threshold under which 0:48:21.080,0:48:25.820 pretty much every benign application was[br]below this, and rowhammer and cache 0:48:25.820,0:48:30.520 attacks were after that. So we fixed the[br]threshold, basically. 0:48:30.520,0:48:35.520 H: That microphone.[br]Microphone: Hello. Thanks for your talk. 0:48:35.520,0:48:42.760 It was great. First question: Did you[br]inform Intel before doing this talk? 0:48:42.760,0:48:47.520 CM: Nope.[br]Microphone: Okay. The second question: 0:48:47.520,0:48:51.050 What's your future plans?[br]CM: Sorry? 0:48:51.050,0:48:55.780 M: What's your future plans?[br]CM: Ah, future plans. Well, what I did, 0:48:55.780,0:49:01.220 that is interesting, is that we keep[br]finding these more or less by accident, or 0:49:01.220,0:49:06.440 manually, so having a good idea of what's[br]the attack surface here would be a good 0:49:06.440,0:49:10.050 thing, and doing that automatically would[br]be even better. 0:49:10.050,0:49:14.170 M: Great, thanks.[br]H: Okay, the microphone in the back, 0:49:14.170,0:49:18.770 over there. The guy in white.[br]M: Hi. One question. If you have, 0:49:18.770,0:49:24.410 like, a demon, that randomly invalidates[br]some cache lines, would that be a better 0:49:24.410,0:49:31.120 countermeasure than disabling the caches?[br]ML: What was the question? 0:49:31.120,0:49:39.580 CM: If invalidating cache lines would be[br]better than disabling the whole cache. So, 0:49:39.580,0:49:42.680 I'm...[br]ML: If you know which cache lines have 0:49:42.680,0:49:47.300 been accessed by the process, you can[br]invalidate those cache lines before you 0:49:47.300,0:49:52.820 swap those processes, but it's also a[br]trade-off between performance. Like, you 0:49:52.820,0:49:57.940 can also, if you switch processes, flush[br]the whole cache, and then it's empty, and 0:49:57.940,0:50:01.900 then you don't see any activity anymore,[br]but there's also the trade-off of 0:50:01.900,0:50:07.510 performance with this.[br]M: Okay, maybe a second question. If you, 0:50:07.510,0:50:12.240 there are some ARM architectures[br]that have random cache line invalidations. 0:50:12.240,0:50:16.010 Did you try those, if you can see a[br][unintelligible] channel there. 0:50:16.010,0:50:21.960 ML: If they're truly random, but probably[br]you just have to make more measurements 0:50:21.960,0:50:27.180 and more measurements, and then you can[br]average out the noise, and then you can do 0:50:27.180,0:50:30.350 these attacks again. It's like, with prime[br]and probe, where you need more 0:50:30.350,0:50:34.080 measurements, because it's much more[br]noisy, so in the end you will just need 0:50:34.080,0:50:37.870 much more measurements.[br]CM: So, on ARM, it's supposed to be pretty 0:50:37.870,0:50:43.260 random. At least it's in the manual, but[br]we actually found nice ways to evict cache 0:50:43.260,0:50:47.230 lines, that we really wanted to evict, so[br]it's not actually that pseudo-random. 0:50:47.230,0:50:51.960 So, even... let's say, if something is[br]truly random, it might be nice, but then 0:50:51.960,0:50:57.170 it's also quite complicated to implement.[br]I mean, you probably don't want a random 0:50:57.170,0:51:01.480 number generator just for the cache.[br]M: Okay. Thanks. 0:51:01.480,0:51:05.980 H: Okay, and then the three guys here on[br]the microphone in the front. 0:51:05.980,0:51:13.450 M: My question is about a detail with the[br]keylogger. You could distinguish between 0:51:13.450,0:51:18.150 space, backspace and alphabet, which is[br]quite interesting. But could you also 0:51:18.150,0:51:22.320 figure out the specific keys that were[br]pressed, and if so, how? 0:51:22.320,0:51:25.650 ML: Yeah, that depends on the[br]implementation of the keyboard. But what 0:51:25.650,0:51:29.310 we did, we used the Android stock[br]keyboard, which is shipped with the 0:51:29.310,0:51:34.520 Samsung, so it's pre-installed. And if you[br]have a table somewhere in your code, which 0:51:34.520,0:51:39.540 says "Okay, if you press this exact[br]location or this image, it's an A or it's 0:51:39.540,0:51:44.450 an B", then you can also do a more[br]sophisticated attack. So, if you find any 0:51:44.450,0:51:49.050 functions or data in the code, which[br]directly tells you "Okay, this is this 0:51:49.050,0:51:54.520 character," you can also spy on the actual[br]key characters on the keyboard. 0:51:54.520,0:52:02.900 M: Thank you.[br]M: Hi. Thank you for your talk. My first 0:52:02.900,0:52:08.570 question is: What can we actually do now,[br]to mitigate this kind of attack? By, for 0:52:08.570,0:52:11.980 example switching off TSX or using ECC[br]RAM. 0:52:11.980,0:52:17.410 CM: So, I think the very important thing[br]to protect would be, like crypto, and the 0:52:17.410,0:52:20.840 good thing is that today we know how to[br]build crypto that is resistant to side- 0:52:20.840,0:52:24.490 channel attacks. So the good thing would[br]be to stop improving implementation that 0:52:24.490,0:52:31.360 are known to be vulnerable for 10 years.[br]Then things like keystrokes is way harder 0:52:31.360,0:52:36.830 to protect, so let's say crypto is[br]manageable; the whole system is clearly 0:52:36.830,0:52:41.490 another problem. And you can have[br]different types of countermeasure on the 0:52:41.490,0:52:45.780 hardware side but it does would mean that[br]Intel an ARM actually want to fix that, 0:52:45.780,0:52:48.560 and that they know how to fix that. I[br]don't even know how to fix that in 0:52:48.560,0:52:55.500 hardware. Then on the system side, if you[br]prevent some kind of memory sharing, you 0:52:55.500,0:52:58.540 don't have flush involved anymore[br]and primum (?) probably is much more 0:52:58.540,0:53:04.880 noisier, so it would be an improvement.[br]M: Thank you. 0:53:04.880,0:53:11.880 H: Do we have signal angel questions? No.[br]OK, then more microphone. 0:53:11.880,0:53:16.630 M: Hi, thank you. I wanted to ask about[br]the way you establish the side-channel 0:53:16.630,0:53:23.280 between the two processors, because it[br]would obviously have to be timed in a way to 0:53:23.280,0:53:28.511 transmit information between one process[br]to the other. Is there anywhere that you 0:53:28.511,0:53:32.970 documented the whole? You know, it's[br]actually almost like the seven layers or 0:53:32.970,0:53:36.580 something like that. There are any ways[br]that you documented that? It would be 0:53:36.580,0:53:40.260 really interesting to know how it worked.[br]ML: You can find this information in the 0:53:40.260,0:53:46.120 paper because there are several papers on[br]covered channels using that, so the NDSS 0:53:46.120,0:53:51.300 paper is published in February I guess,[br]but the Armageddon paper also includes 0:53:51.300,0:53:55.670 a cover channel, and you can[br]find more information about how the 0:53:55.670,0:53:59.320 packets look like and how the[br]synchronization works in the paper. 0:53:59.320,0:54:04.020 M: Thank you.[br]H: One last question? 0:54:04.020,0:54:09.750 M: Hi! You mentioned that you used Osvik's[br]attack for the AES side-channel attack. 0:54:09.750,0:54:17.350 Did you solve the AES round detection and[br]is it different to some scheduler 0:54:17.350,0:54:21.441 manipulation?[br]CM: So on this one I think we only did 0:54:21.441,0:54:24.280 some synchronous attack, so we already[br]knew when 0:54:24.280,0:54:27.770 the victim is going to be scheduled and[br]we didn't have anything to do with 0:54:27.770,0:54:32.930 schedulers.[br]M: Alright, thank you. 0:54:32.930,0:54:37.140 H: Are there any more questions? No, I[br]don't see anyone. Then, thank you very 0:54:37.140,0:54:39.132 much again to our speakers. 0:54:39.132,0:54:42.162 applause 0:54:42.162,0:54:58.970 music 0:54:58.970,0:55:06.000 subtitles created by c3subtitles.de[br]in the year 2020. Join, and help us!