0:00:00.000,0:00:13.245
<i>Music</i>

0:00:13.245,0:00:17.060
Herald Angel: We are here with a motto,[br]and the motto of this year is "Works For

0:00:17.060,0:00:21.670
Me" and I think, who many people, how many[br]people in here are programmmers? Raise[br]

0:00:21.670,0:00:28.700
your hands or shout or... Whoa, that's a[br]lot. Okay. So I think many of you will

0:00:28.700,0:00:38.990
work on x86. Yeah. And I think you assume[br]that it works, and that everything works

0:00:38.990,0:00:48.150
as intended And I mean: What could go[br]wrong? Our next talk, the first one today,

0:00:48.150,0:00:52.290
will be by Clémentine Maurice, who[br]previously was here with RowhammerJS,

0:00:52.290,0:01:01.740
something I would call scary, and Moritz[br]Lipp, who has worked on the Armageddon

0:01:01.740,0:01:09.820
exploit, back, what is it? Okay, so the[br]next... I would like to hear a really warm[br]

0:01:09.820,0:01:14.460
applause for the speakers for the talk[br]"What could what could possibly go wrong

0:01:14.460,0:01:17.280
with insert x86 instruction here?"

0:01:17.280,0:01:18.375
thank you.

0:01:18.375,0:01:28.290
<i>Applause</i>

0:01:28.290,0:01:32.530
Clémentine Maurice (CM): Well, thank you[br]all for being here this morning. Yes, this

0:01:32.530,0:01:38.080
is our talk "What could possibly go wrong[br]with insert x86 instructions here". So

0:01:38.080,0:01:42.850
just a few words about ourselves: So I'm[br]Clémentine Maurice, I got my PhD last year

0:01:42.850,0:01:47.119
in computer science and I'm now working as[br]a postdoc at Graz University of Technology

0:01:47.119,0:01:52.090
in Austria. You can reach me on Twitter or[br]by email but there's also I think a lots

0:01:52.090,0:01:56.670
of time before the Congress is over.[br]Moritz Lipp (ML): Hi and my name is Moritz

0:01:56.670,0:02:01.520
Lipp, I'm a PhD student at Graz University[br]of Technology and you can also reach me on

0:02:01.520,0:02:06.679
Twitter or just after our talk and in the[br]next days.

0:02:06.679,0:02:10.860
CM: So, about this talk: So, the title[br]says this is a talk about x86

0:02:10.860,0:02:17.720
instructions, but this is not a talk about[br]software. Don't leave yet! I'm actually

0:02:17.720,0:02:22.440
even assuming safe software and the point[br]that we want to make is that safe software

0:02:22.440,0:02:27.390
does not mean safe execution and we have[br]information leakage because of the

0:02:27.390,0:02:32.560
underlying hardware and this is what we're[br]going to talk about today. So we'll be

0:02:32.560,0:02:36.819
talking about cache attacks, what are[br]they, what can we do with that and also a

0:02:36.819,0:02:41.510
special kind of cache attack that we found[br]this year. So... doing cache attacks

0:02:41.510,0:02:48.590
without memory accesses and how to use[br]that even to bypass kernel ASLR.

0:02:48.590,0:02:53.129
So again, the talk says is to talk about[br]x86 instructions but this is even more

0:02:53.129,0:02:58.209
global than that. We can also mount these[br]cache attacks on ARM and not only on the

0:02:58.209,0:03:07.050
x86. So some of the examples that you will[br]see also applies to ARM. So today we'll do

0:03:07.050,0:03:11.420
have a bit of background, but actually[br]most of the background will be along the

0:03:11.420,0:03:19.251
lines because this covers really a huge[br]chunk of our research, and we'll see

0:03:19.251,0:03:24.209
mainly three instructions: So "mov" and[br]how we can perform these cache attacks,

0:03:24.209,0:03:29.430
what are they... The instruction[br]"clflush", so here we'll be doing cache

0:03:29.430,0:03:36.370
attacks without any memory accesses. Then[br]we'll see "prefetch" and how we can bypass

0:03:36.370,0:03:43.420
kernel ASLR and lots of translations[br]levels, and then there's even a bonus

0:03:43.420,0:03:48.950
track, so it's this this will be not our[br]works, but even more instructions and even

0:03:48.950,0:03:54.210
more text.[br]Okay, so let's start with a bit of an

0:03:54.210,0:04:01.190
introduction. So we will be mainly[br]focusing on Intel CPUs, and this is

0:04:01.190,0:04:05.599
roughly in terms of cores and caches, how[br]it looks like today. So we have different

0:04:05.599,0:04:09.440
levels of cores ...uh... different cores[br]so here four cores, and different levels

0:04:09.440,0:04:14.220
of caches. So here usually we have three[br]levels of caches. We have level 1 and

0:04:14.220,0:04:18.269
level 2 that are private to each call,[br]which means that core 0 can only access

0:04:18.269,0:04:24.520
its level 1 and its level 2 and not level[br]1 and level 2 of, for example, core 3, and

0:04:24.520,0:04:30.130
we have the last level cache... so here if[br]you can see the pointer... So this one is

0:04:30.130,0:04:36.289
divided in slices so we have as many[br]slices as cores, so here 4 slices, but all

0:04:36.289,0:04:40.659
the slices are shared across core so core[br]0 can access the whole last level cache,

0:04:40.659,0:04:48.669
that's 0 1 2 &amp; 3. We also have a nice[br]property on Intel CPUs is that this level

0:04:48.669,0:04:52.280
of cache is inclusive, and what it means[br]is that everything that is contained in

0:04:52.280,0:04:56.889
level 1 and level 2 will also be contained[br]in the last level cache, and this will

0:04:56.889,0:05:01.439
prove to be quite useful for cache[br]attacks.

0:05:01.439,0:05:08.430
So today we mostly have set associative[br]caches. What it means is that we have data

0:05:08.430,0:05:13.249
that is loaded in specific sets and that[br]depends only on its address. So we have

0:05:13.249,0:05:18.900
some bits of the address that gives us the[br]index and that says "Ok the line is going

0:05:18.900,0:05:24.610
to be loaded in this cache set", so this[br]is a cache set. Then we have several ways

0:05:24.610,0:05:30.629
per set so here we have 4 ways and the[br]cacheine is going to be loaded in a

0:05:30.629,0:05:35.270
specific way and that will only depend on[br]the replacement policy and not on the

0:05:35.270,0:05:40.800
address itself, so when you load a line[br]into the cache, usually the cache is

0:05:40.800,0:05:44.830
already full and you have to make room for[br]a new line. So this is where the

0:05:44.830,0:05:49.729
replacement replacement policy—this is[br]what it does—it says ok I'm going to

0:05:49.729,0:05:57.779
remove this line to make room for the next[br]line. So for today we're going to see only

0:05:57.779,0:06:01.960
three instruction as I've been telling[br]you. So the move instruction, it does a

0:06:01.960,0:06:06.610
lot of things but the only aspect that[br]we're interested in about it that can

0:06:06.610,0:06:12.809
access data in the main memory.[br]We're going to see a clflush what it does

0:06:12.809,0:06:18.349
is that it removes a cache line from the[br]cache, from the whole cache. And we're

0:06:18.349,0:06:25.569
going to see prefetch, it prefetches a[br]cache line for future use. So we're going

0:06:25.569,0:06:30.520
to see what they do and the kind of side[br]effects that they have and all the attacks

0:06:30.520,0:06:34.800
that we can do with them. And that's[br]basically all the example you need for

0:06:34.800,0:06:39.830
today so even if you're not an expert of[br]x86 don't worry it's not just slides full

0:06:39.830,0:06:44.899
of assembly and stuff. Okay so on to the[br]first one.

0:06:44.899,0:06:49.940
ML: So we will first start with the 'mov'[br]instruction and actually the first slide

0:06:49.940,0:06:57.809
is full of code, however as you can see[br]the mov instruction is used to move data

0:06:57.809,0:07:02.629
from registers to registers, from the main[br]memory and back to the main memory and as

0:07:02.629,0:07:07.240
you can see there are many moves you can[br]use but basically it's just to move data

0:07:07.240,0:07:12.589
and that's all we need to know. In[br]addition, a lot of exceptions can occur so

0:07:12.589,0:07:18.139
we can assume that those restrictions are[br]so tight that nothing can go wrong when

0:07:18.139,0:07:22.210
you just move data because moving data is[br]simple.

0:07:22.210,0:07:27.879
However while there are a lot of[br]exceptions the data that is accessed is

0:07:27.879,0:07:35.009
always loaded into the cache, so data is[br]in the cache and this is transparent to

0:07:35.009,0:07:40.870
the program that is running. However,[br]there are side-effects when you run these

0:07:40.870,0:07:46.219
instructions, and we will see how they[br]look like with the mov instruction. So you

0:07:46.219,0:07:51.289
probably all know that data can either be[br]in CPU registers, in the different levels

0:07:51.289,0:07:56.029
of the cache that Clementine showed to you[br]earlier, in the main memory, or on the

0:07:56.029,0:08:02.219
disk, and depending on where the memory[br]and the data is located it needs a longer

0:08:02.219,0:08:09.689
time to be loaded back to the CPU, and[br]this is what we can see in this plot. So

0:08:09.689,0:08:15.739
we try here to measure the access time of[br]an address over and over again, assuming

0:08:15.739,0:08:21.759
that when we access it more often, it is[br]already stored in the cache. So around 70

0:08:21.759,0:08:27.289
cycles, most of the time we can assume[br]when we load an address and it takes 70

0:08:27.289,0:08:34.809
cycles, it's loaded into the cache.[br]However, when we assume that the data is

0:08:34.809,0:08:39.659
loaded from the main memory, we can[br]clearly see that it needs a much longer

0:08:39.659,0:08:46.720
time like a bit more than 200 cycles. So[br]depending when we measure the time it

0:08:46.720,0:08:51.470
takes to load the address we can say the[br]data has been loaded to the cache or the

0:08:51.470,0:08:58.339
data is still located in the main memory.[br]And this property is what we can exploit

0:08:58.339,0:09:05.339
using cache attacks. So we measure the[br]timing differences on memory accesses. And

0:09:05.339,0:09:09.940
what an attacker does he monitors the[br]cache lines, but he has no way to know

0:09:09.940,0:09:14.459
what's actually the content of the cache[br]line. So we can only monitor that this

0:09:14.459,0:09:20.099
cache line has been accessed and not[br]what's actually stored in the cache line.

0:09:20.099,0:09:24.411
And what you can do with this is you can[br]implement covert channels, so you can

0:09:24.411,0:09:29.580
allow two processes to communicate with[br]each other evading the permission system

0:09:29.580,0:09:35.060
what we will see later on. In addition you[br]can also do side channel attacks, so you

0:09:35.060,0:09:40.600
can spy with a malicious attacking[br]application on benign processes, and you

0:09:40.600,0:09:46.140
can use this to steal cryptographic keys[br]or to spy on keystrokes.

0:09:46.140,0:09:53.649
And basically we have different types of[br]cache attacks and I want to explain the

0:09:53.649,0:09:58.810
most popular one, the "Flush+Reload"[br]attack, in the beginning. So on the left,

0:09:58.810,0:10:03.110
you have the address space of the victim,[br]and on the right you have the address

0:10:03.110,0:10:08.560
space of the attacker who maps a shared[br]library—an executable—that the victim is

0:10:08.560,0:10:14.899
using in to its own address space, like[br]the red rectangle. And this means that

0:10:14.899,0:10:22.760
when this data is stored in the cache,[br]it's cached for both processes. Now the

0:10:22.760,0:10:28.170
attacker can use the flush instruction to[br]remove the data out of the cache, so it's

0:10:28.170,0:10:34.420
not in the cache anymore, so it's also not[br]cached for the victim. Now the attacker

0:10:34.420,0:10:39.100
can schedule the victim and if the victim[br]decides "yeah, I need this data", it will

0:10:39.100,0:10:44.970
be loaded back into the cache. And now the[br]attacker can reload the data, measure the

0:10:44.970,0:10:49.661
time how long it took, and then decide[br]"okay, the victim has accessed the data in

0:10:49.661,0:10:54.179
the meantime" or "the victim has not[br]accessed the data in the meantime." And by

0:10:54.179,0:10:58.959
that you can spy if this address has been[br]used.

0:10:58.959,0:11:03.240
The second type of attack is called[br]"Prime+Probe" and it does not rely on the

0:11:03.240,0:11:08.971
shared memory like the "Flush+Reload"[br]attack, and it works as following: Instead

0:11:08.971,0:11:16.139
of mapping anything into its own address[br]space, the attacker loads a lot of data

0:11:16.139,0:11:24.589
into one cache line, here, and fills the[br]cache. Now he again schedules the victim

0:11:24.589,0:11:31.820
and the schedule can access data that maps[br]to the same cache set.

0:11:31.820,0:11:38.050
So the cache set is used by the attacker[br]and the victim at the same time. Now the

0:11:38.050,0:11:43.050
attacker can start measuring the access[br]time to the addresses he loaded into the

0:11:43.050,0:11:49.050
cache before, and when he accesses an[br]address that is still in the cache it's

0:11:49.050,0:11:55.649
faster so he measures the lower time. And[br]if it's not in the cache anymore it has to

0:11:55.649,0:12:01.279
be reloaded into the cache so it takes a[br]longer time. He can sum this up and detect

0:12:01.279,0:12:07.870
if the victim has loaded data into the[br]cache as well. So the first thing we want

0:12:07.870,0:12:11.900
to show you is what you can do with cache[br]attacks is you can implement a covert

0:12:11.900,0:12:17.439
channel and this could be happening in the[br]following scenario.

0:12:17.439,0:12:23.610
You install an app on your phone to view[br]your favorite images you take, to apply

0:12:23.610,0:12:28.630
some filters, and in the end you don't[br]know that it's malicious because the only

0:12:28.630,0:12:33.609
permission it requires is to access your[br]images which makes sense. So you can

0:12:33.609,0:12:38.700
easily install it without any fear. In[br]addition you want to know what the weather

0:12:38.700,0:12:43.040
is outside, so you install a nice little[br]weather widget, and the only permission it

0:12:43.040,0:12:48.230
has is to access the internet because it[br]has to load the information from

0:12:48.230,0:12:55.569
somewhere. So what happens if you're able[br]to implement a covert channel between two

0:12:55.569,0:12:59.779
these two applications, without any[br]permissions and privileges so they can

0:12:59.779,0:13:05.060
communicate with each other without using[br]any mechanisms provided by the operating

0:13:05.060,0:13:11.149
system, so it's hidden. It can happen that[br]now the gallery app can send the image to

0:13:11.149,0:13:18.680
the internet, it will be uploaded and[br]exposed for everyone. So maybe you don't

0:13:18.680,0:13:25.610
want to see the cat picture everywhere.[br]While we can do this with those

0:13:25.610,0:13:30.219
Prime+Probe/ Flush+Reload attacks, we will[br]discuss a covert channel using

0:13:30.219,0:13:35.690
Prime+Probe. So how can we transmit this[br]data? We need to transmit ones and zeros

0:13:35.690,0:13:40.980
at some point. So the sender and the[br]receiver agree on one cache set that they

0:13:40.980,0:13:49.319
both use. The receiver probes the set all[br]the time. When the sender wants to

0:13:49.319,0:13:57.529
transmit a zero he just does nothing, so[br]the lines of the receiver are in the cache

0:13:57.529,0:14:01.809
all the time, and he knows "okay, he's[br]sending nothing", so it's a zero.

0:14:01.809,0:14:05.940
On the other hand if the sender wants to[br]transmit a one, he starts accessing

0:14:05.940,0:14:10.800
addresses that map to the same cache set[br]so it will take a longer time for the

0:14:10.800,0:14:16.540
receiver to access its addresses again,[br]and he knows "okay, the sender just sent

0:14:16.540,0:14:23.059
me a one", and Clementine will show you[br]what you can do with this covert channel.

0:14:23.059,0:14:25.180
CM: So the really nice thing about

0:14:25.180,0:14:28.959
Prime+Probe is that it has really low[br]requirements. It doesn't need any kind of

0:14:28.959,0:14:34.349
shared memory. For example if you have two[br]virtual machines you could have some

0:14:34.349,0:14:38.700
shared memory via memory deduplication.[br]The thing is that this is highly insecure,

0:14:38.700,0:14:43.969
so cloud providers like Amazon ec2, they[br]disable that. Now we can still use

0:14:43.969,0:14:50.429
Prime+Probe because it doesn't need this[br]shared memory. Another problem with cache

0:14:50.429,0:14:54.999
covert channels is that they are quite[br]noisy. So when you have other applications

0:14:54.999,0:14:59.259
that are also running on the system, they[br]are all competing for the cache and they

0:14:59.259,0:15:03.009
might, like, evict some cache lines,[br]especially if it's an application that is

0:15:03.009,0:15:08.749
very memory intensive. And you also have[br]noise due to the fact that the sender and

0:15:08.749,0:15:12.770
the receiver might not be scheduled at the[br]same time. So if you have your sender that

0:15:12.770,0:15:16.649
sends all the things and the receiver is[br]not scheduled then some part of the

0:15:16.649,0:15:22.539
transmission can get lost. So what we did[br]is we tried to build an error-free covert

0:15:22.539,0:15:30.829
channel. We took care of all these noise[br]issues by using some error detection to

0:15:30.829,0:15:36.470
resynchronize the sender and the receiver[br]and then we use some error correction to

0:15:36.470,0:15:40.779
correct the remaining errors.[br]So we managed to have a completely error-

0:15:40.779,0:15:46.069
free covert channel even if you have a lot[br]of noise, so let's say another virtual

0:15:46.069,0:15:54.119
machine also on the machine serving files[br]through a web server, also doing lots of

0:15:54.119,0:16:01.600
memory-intensive tasks at the same time,[br]and the covert channel stayed completely

0:16:01.600,0:16:07.610
error-free, and around 40 to 75 kilobytes[br]per second, which is still quite a lot.

0:16:07.610,0:16:14.470
All of this is between virtual machines on[br]Amazon ec2. And the really neat thing—we

0:16:14.470,0:16:19.389
wanted to do something with that—and[br]basically we managed to create an SSH

0:16:19.389,0:16:27.060
connection really over the cache. So they[br]don't have any network between

0:16:27.060,0:16:31.439
them, but just we are sending the zeros[br]and the ones and we have an SSH connection

0:16:31.439,0:16:36.839
between them. So you could say that cache[br]covert channels are nothing, but I think

0:16:36.839,0:16:43.079
it's a real threat. And if you want to[br]have more details about this work in

0:16:43.079,0:16:49.220
particular, it will be published soon at[br]NDSS.

0:16:49.220,0:16:54.040
So the second application that we wanted[br]to show you is that we can attack crypto

0:16:54.040,0:17:01.340
with cache attacks. In particular we are[br]going to show an attack on AES and a

0:17:01.340,0:17:04.990
special implementation of AES that uses[br]T-Tables. so that's the fast software

0:17:04.990,0:17:11.650
implementation because it uses some[br]precomputed lookup tables. It's known to

0:17:11.650,0:17:17.490
be vulnerable to side-channel attacks[br]since 2006 by Osvik et al, and it's a one-

0:17:17.490,0:17:24.110
round known plaintext attack, so you have[br]p—or plaintext—and k, your secret key. And

0:17:24.110,0:17:29.570
the AES algorithm, what it does is compute[br]an intermediate state at each round r.

0:17:29.570,0:17:38.559
And in the first round, the accessed table[br]indices are just p XOR k. Now it's a known

0:17:38.559,0:17:43.500
plaintext attack, what this means is that[br]if you can recover the accessed table

0:17:43.500,0:17:49.460
indices you've also managed to recover the[br]key because it's just XOR. So that would

0:17:49.460,0:17:55.450
be bad, right, if we could recover these[br]accessed table indices. Well we can, with

0:17:55.450,0:18:00.510
cache attacks! So we did that with[br]Flush+Reload and with Prime+Probe. On the

0:18:00.510,0:18:05.809
x-axis you have the plaintext byte values[br]and on the y-axis you have the addresses

0:18:05.809,0:18:15.529
which are essentially the T table entries.[br]So a black cell means that we've monitored

0:18:15.529,0:18:19.970
the cache line, and we've seen a lot of[br]cache hits. So basically the blacker it

0:18:19.970,0:18:25.650
is, the more certain we are that the[br]T-Table entry has been accessed. And here

0:18:25.650,0:18:31.779
it's a toy example, the key is all-zeros,[br]but you would basically just have a

0:18:31.779,0:18:35.700
different pattern if the key was not all-[br]zeros, and as long as you can see this

0:18:35.700,0:18:43.409
nice diagonal or a pattern then you have[br]recovered the key. So it's an old attack,

0:18:43.409,0:18:48.890
2006, it's been 10 years, everything[br]should be fixed by now, and you see where

0:18:48.890,0:18:56.880
I'm going: it's not. So on Android the[br]bouncy castle implementation it uses by

0:18:56.880,0:19:03.360
default the T-table, so that's bad. Also[br]many implementations that you can find

0:19:03.360,0:19:11.380
online uses pre-computed values, so maybe[br]be wary about this kind of attacks. The

0:19:11.380,0:19:17.240
last application we wanted to show you is[br]how we can spy on keystrokes.

0:19:17.240,0:19:21.480
So for that we will use flush and reload[br]because it's a really fine grained

0:19:21.480,0:19:26.309
attack. We can see very precisely which[br]cache line has been accessed, and a cache

0:19:26.309,0:19:31.440
line is only 64 bytes so it's really not a[br]lot and we're going to use that to spy on

0:19:31.440,0:19:37.690
keystrokes and we even have a small demo[br]for you.

0:19:40.110,0:19:45.640
ML: So what you can see on the screen this[br]is not on Intel x86 it's on a smartphone,

0:19:45.640,0:19:50.330
on the Galaxy S6, but you can also apply[br]these cache attacks there so that's what

0:19:50.330,0:19:53.850
we want to emphasize.[br]So on the left you see the screen and on

0:19:53.850,0:19:57.960
the right we have connected a shell with[br]no privileges and permissions, so it can

0:19:57.960,0:20:00.799
basically be an app that you install[br]<i>glass bottle falling</i>

0:20:00.799,0:20:09.480
from the App Store and on the right we are[br]going to start our spy tool, and on the

0:20:09.480,0:20:14.110
left we just open the messenger app and[br]whenever the user hits any key on the

0:20:14.110,0:20:19.690
keyboard our spy tool takes care of that[br]and notices that. Also if he presses the

0:20:19.690,0:20:26.120
spacebar we can also measure that. If the[br]user decides "ok, I want to delete the

0:20:26.120,0:20:30.880
word" because he changed his mind, we can[br]also register if the user pressed the

0:20:30.880,0:20:37.929
backspace button, so in the end we can see[br]exactly how long the words were, the user

0:20:37.929,0:20:45.630
typed into his phone without any[br]permissions and privileges, which is bad.

0:20:45.630,0:20:55.250
<i>laughs</i>[br]<i>applause</i>

0:20:55.250,0:21:00.320
ML: so enough about the mov instruction,[br]let's head to clflush.

0:21:00.320,0:21:07.230
CM: So the clflush instruction: What it[br]does is that it invalidates from every

0:21:07.230,0:21:12.309
level the cache line that contains the[br]address that you pass to this instruction.

0:21:12.309,0:21:16.990
So in itself it's kind of bad because it[br]enables the Flush+Reload attacks that we

0:21:16.990,0:21:21.300
showed earlier, that was just flush,[br]reload, and the flush part is done with

0:21:21.300,0:21:29.140
clflush. But there's actually more to it,[br]how wonderful. So there's a first timing

0:21:29.140,0:21:33.320
leakage with it, so we're going to see[br]that the clflush instruction has a

0:21:33.320,0:21:37.890
different timing depending on whether the[br]data that you that you pass to it is

0:21:37.890,0:21:44.710
cached or not. So imagine you have a cache[br]line that is on the level 1 by inclu...

0:21:44.710,0:21:50.299
With the inclusion property it has to be[br]also in the last level cache. Now this is

0:21:50.299,0:21:54.350
quite convenient and this is also why we[br]have this inclusion property for

0:21:54.350,0:22:00.019
performance reason on Intel CPUs, if you[br]want to see if a line is present at all in

0:22:00.019,0:22:04.209
the cache you just have to look in the[br]last level cache. So this is basically

0:22:04.209,0:22:08.010
what the clflush instruction does. It goes[br]to the last last level cache, sees "ok

0:22:08.010,0:22:12.890
there's a line, I'm going to flush this[br]one" and then there's something that tells

0:22:12.890,0:22:18.950
ok the line is also present somewhere else[br]so then it flushes the line in level 1

0:22:18.950,0:22:26.390
and/or level 2. So that's slow. Now if you[br]perform clflush on some data that is not

0:22:26.390,0:22:32.240
cached, basically it does the same, goes[br]to the last level cache, see that there's

0:22:32.240,0:22:36.659
no line and there can't be any... This[br]data can't be anywhere else in the cache

0:22:36.659,0:22:41.269
because it would be in the last level[br]cache if it was anywhere, so it does

0:22:41.269,0:22:47.430
nothing and it stop there. So that's fast.[br]So how exactly fast and slow am I talking

0:22:47.430,0:22:53.760
about? So it's actually only a very few[br]cycles so we did this experiments on

0:22:53.760,0:22:59.072
different microarchitecture so Center[br]Bridge, Ivy Bridge, and Haswell and...

0:22:59.072,0:23:03.250
So it different colors correspond to the[br]different microarchitecture. So first

0:23:03.250,0:23:07.880
thing that is already... kinda funny is[br]that you can see that you can distinguish

0:23:07.880,0:23:14.649
the micro architecture quite nicely with[br]this, but the real point is that you have

0:23:14.649,0:23:20.280
really a different zones. The solids...[br]The solid line is when we performed the

0:23:20.280,0:23:25.200
measurement on clflush with the line that[br]was already in the cache, and the dashed

0:23:25.200,0:23:30.840
line is when the line was not in the[br]cache, and in all microarchitectures you

0:23:30.840,0:23:36.539
can see that we can see a difference: It's[br]only a few cycles, it's a bit noisy, so

0:23:36.539,0:23:43.250
what could go wrong? Okay, so exploiting[br]these few cycles, we still managed to

0:23:43.250,0:23:47.029
perform a new cache attacks that we call[br]"Flush+Flush", so I'm going to explain

0:23:47.029,0:23:52.220
that to you: So basically everything that[br]we could do with "Flush+Reload", we can

0:23:52.220,0:23:56.899
also do with "Flush+Flush". We can perform[br]cover channels and sidechannel attacks.

0:23:56.899,0:24:01.090
It's stealthier than previous cache[br]attacks, I'm going to go back on this one,

0:24:01.090,0:24:07.220
and it's also faster than previous cache[br]attacks. So how does it work exactly? So

0:24:07.220,0:24:12.210
the principle is a bit similar to[br]"Flush+Reload": So we have the attacker

0:24:12.210,0:24:16.131
and the victim that have some kind of[br]shared memory, let's say a shared library.

0:24:16.131,0:24:21.340
It will be shared in the cache The[br]attacker will start by flushing the cache

0:24:21.340,0:24:26.510
line then let's the victim perform[br]whatever it does, let's say encryption,

0:24:26.510,0:24:32.120
the victim will load some data into the[br]cache, automatically, and now the attacker

0:24:32.120,0:24:36.720
wants to know again if the victim accessed[br]this precise cache line and instead of

0:24:36.720,0:24:43.540
reloading it is going to flush it again.[br]And since we have this timing difference

0:24:43.540,0:24:47.040
depending on whether the data is in the[br]cache or not, it gives us the same

0:24:47.040,0:24:54.889
information as if we reloaded it, except[br]it's way faster. So I talked about

0:24:54.889,0:24:59.690
stealthiness. So the thing is that[br]basically these cache attacks and that

0:24:59.690,0:25:06.340
also applies to "Rowhammer": They are[br]already stealth in themselves, because

0:25:06.340,0:25:10.470
there's no antivirus today that can detect[br]them. but some people thought that we

0:25:10.470,0:25:14.351
could detect them with performance[br]counters because they do a lot of cache

0:25:14.351,0:25:18.549
misses and cache references that happen[br]when the data is flushed and when you

0:25:18.549,0:25:26.090
reaccess memory. now what we thought is[br]that yeah but that also not the only

0:25:26.090,0:25:31.269
program steps to lots of cache misses and[br]cache references so we would like to have

0:25:31.269,0:25:38.120
a slightly parametric. So these cache[br]attacks they have a very heavy activity on

0:25:38.120,0:25:43.840
the cache but they're also very particular[br]because there are very short loops of code

0:25:43.840,0:25:48.610
if you take flush and reload this just[br]flush one line reload the line and then

0:25:48.610,0:25:53.750
again flush reload that's very short loop[br]and that creates a very low pressure on

0:25:53.750,0:26:01.490
the instruction therapy which is kind of[br]particular for of cache attacks so what we

0:26:01.490,0:26:05.380
decided to do is normalizing the cache[br]even so the cache misses and cache

0:26:05.380,0:26:10.720
references by events that have to do with[br]the instruction TLB and there we could

0:26:10.720,0:26:19.360
manage to detect cache attacks and[br]Rowhammer without having false positives

0:26:19.360,0:26:24.510
so the system metric that I'm going to use[br]when I talk about stealthiness so we

0:26:24.510,0:26:29.750
started by creating a cover channel. First[br]we wanted to have it as fast as possible

0:26:29.750,0:26:36.160
so we created a protocol to evaluates all[br]the kind of cache attack that we had so

0:26:36.160,0:26:40.540
flush+flush, flush+reload, and[br]prime+probe and we started with a

0:26:40.540,0:26:47.010
packet side of 28 doesn't really matter.[br]We measured the capacity of our covert

0:26:47.010,0:26:52.799
channel and flush+flush is around[br]500 kB/s whereas Flush+Reload

0:26:52.799,0:26:56.340
was only 300 kB/s[br]so Flush+Flush is already quite an

0:26:56.340,0:27:00.740
improvement on the speed.[br]Then we measured the stealth zone at this

0:27:00.740,0:27:06.100
speed only Flush+Flush was stealth and[br]now the thing is that Flush+Flush and

0:27:06.100,0:27:10.200
Flush+Reload as you've seen there are[br]some similarities so for a covert channel

0:27:10.200,0:27:15.309
they also share the same center on it is[br]receivers different and for this one the

0:27:15.309,0:27:20.000
center was not stealth for both of them[br]anyway if you want a fast covert channel

0:27:20.000,0:27:26.640
then just try flush+flush that works.[br]Now let's try to make it stealthy

0:27:26.640,0:27:30.639
completely stealthy because if I have the[br]standard that is not stealth maybe that we

0:27:30.639,0:27:36.440
give away the whole attack so we said okay[br]maybe if we just slow down all the attacks

0:27:36.440,0:27:41.240
then there will be less cache hits,[br]cache misses and then maybe all

0:27:41.240,0:27:48.070
the attacks are actually stealthy why not?[br]So we tried that we slowed down everything

0:27:48.070,0:27:52.889
so Flush+Reload and Flash+Flash[br]are around 50 kB/s now

0:27:52.889,0:27:55.829
Prime+Probe is a bit slower because it[br]takes more time

0:27:55.829,0:28:01.330
to prime and probe anything but still

0:28:01.330,0:28:09.419
even with this slow down only Flush+Flush[br]has its receiver stealth and we also

0:28:09.419,0:28:14.769
managed to have the sender stealth now so[br]basically whether you want a fast covert

0:28:14.769,0:28:20.450
channel or a stealth covert channel[br]Flush+Flush is really great.

0:28:20.450,0:28:26.500
Now we wanted to also evaluate if it[br]wasn't too noisy to perform some side

0:28:26.500,0:28:30.740
channel attack so we did these side[br]channels on the AES t-table implementation

0:28:30.740,0:28:35.910
the attacks that we have shown you[br]earlier, so we computed a lot of

0:28:35.910,0:28:41.820
encryption that we needed to determine the[br]upper four bits of a key bytes so here the

0:28:41.820,0:28:48.870
lower the better the attack and Flush +[br]Reload is a bit better so we need only 250

0:28:48.870,0:28:55.029
encryptions to recover these bits but[br]Flush+Flush comes quite, comes quite

0:28:55.029,0:29:00.570
close with 350 and Prime+Probe is[br]actually the most noisy of them all, needs

0:29:00.570,0:29:06.101
5... close to 5000 encryptions so we have[br]around the same performance for

0:29:06.101,0:29:13.520
Flush+Flush and Flush+Reload.[br]Now let's evaluate the stealthiness again.

0:29:13.520,0:29:19.320
So what we did here is we perform 256[br]billion encryptions in a synchronous

0:29:19.320,0:29:25.740
attack so we really had the spy and the[br]victim scattered and we evaluated the

0:29:25.740,0:29:31.409
stealthiness of them all and here only[br]Flush+Flush again is stealth. And while

0:29:31.409,0:29:36.279
you can always slow down a covert channel[br]you can't actually slow down a side

0:29:36.279,0:29:40.700
channel because, in a real-life scenario,[br]you're not going to say "Hey victim, him

0:29:40.700,0:29:47.179
wait for me a bit, I am trying to do an[br]attack here." That won't work.

0:29:47.179,0:29:51.429
So there's even more to it but I will need[br]again a bit of background before

0:29:51.429,0:29:56.910
continuing. So I've shown you the[br]different levels of caches and here I'm

0:29:56.910,0:30:04.009
going to focus more on the last-level[br]cache. So we have here our four slices so

0:30:04.009,0:30:09.830
this is the last-level cache and we have[br]some bits of the address here that

0:30:09.830,0:30:14.330
corresponds to the set, but more[br]importantly, we need to know where in

0:30:14.330,0:30:19.899
which slice and address is going to be.[br]And that is given, that is given by some

0:30:19.899,0:30:23.850
bits of the set and the type of the[br]address that are passed into a function

0:30:23.850,0:30:27.960
that says in which slice the line is going[br]to be.

0:30:27.960,0:30:32.460
Now the thing is that this hash function[br]is undocumented by Intel. Wouldn't be fun

0:30:32.460,0:30:39.250
otherwise. So we have this: As many slices[br]as core, an undocumented hash function

0:30:39.250,0:30:43.980
that maps a physical address to a slice,[br]and while it's actually a bit of a pain

0:30:43.980,0:30:48.710
for attacks, it has, it was not designed[br]for security originally but for

0:30:48.710,0:30:53.570
performance, because you want all the[br]access to be evenly distributed in the

0:30:53.570,0:31:00.399
different slices, for performance reasons.[br]So the hash function basically does, it

0:31:00.399,0:31:05.279
takes some bits of the physical address[br]and output k bits of slice, so just one

0:31:05.279,0:31:09.309
bits if you have a two-core machine, two[br]bits if you have a four-core machine and

0:31:09.309,0:31:16.830
so on. Now let's go back to clflush, see[br]what's the relation with that.

0:31:16.830,0:31:21.169
So the thing that we noticed is that[br]clflush is actually faster to reach a line

0:31:21.169,0:31:28.549
on the local slice.[br]So if you have, if you're flushing always

0:31:28.549,0:31:33.340
one line and you run your program on core[br]zero, core one, core two and core three,

0:31:33.340,0:31:37.899
you will observe that one core in[br]particular on, when you run the program on

0:31:37.899,0:31:44.632
one core, the clflush is faster. And so[br]here this is on core one, and you can see

0:31:44.632,0:31:51.139
that core zero, two, and three it's, it's[br]a bit slower and here we can deduce that,

0:31:51.139,0:31:55.320
so we run the program on core one and we[br]flush always the same line and we can

0:31:55.320,0:32:01.850
deduce that the line belong to slice one.[br]And what we can do with that is that we

0:32:01.850,0:32:06.500
can map physical address to slices.[br]And that's one way to reverse-engineer

0:32:06.500,0:32:10.639
this addressing function that was not[br]documented.

0:32:10.639,0:32:15.880
Funnily enough that's not the only way:[br]What I did before that was using the

0:32:15.880,0:32:21.229
performance counters to reverse-engineer[br]this function, but that's actually a whole

0:32:21.229,0:32:27.770
other story and if you want more detail on[br]that, there's also an article on that.

0:32:27.770,0:32:30.139
ML: So the next instruction we want to

0:32:30.139,0:32:35.110
talk about is the prefetch instruction.[br]And the prefetch instruction is used to

0:32:35.110,0:32:40.841
tell the CPU: "Okay, please load the data[br]I need later on, into the cache, if you

0:32:40.841,0:32:45.968
have some time." And in the end there are[br]actually six different prefetch

0:32:45.968,0:32:52.929
instructions: prefetcht0 to t2 which[br]means: "CPU, please load the data into the

0:32:52.929,0:32:58.640
first-level cache", or in the last-level[br]cache, whatever you want to use, but we

0:32:58.640,0:33:02.250
spare you the details because it's not so[br]interesting in the end.

0:33:02.250,0:33:06.940
However, what's more interesting is when[br]we take a look at the Intel manual and

0:33:06.940,0:33:11.880
what it says there. So, "Using the[br]PREFETCH instruction is recommended only

0:33:11.880,0:33:17.049
if data does not fit in the cache." So you[br]can tell the CPU: "Please load data I want

0:33:17.049,0:33:23.210
to stream into the cache, so it's more[br]performant." "Use of software prefetch

0:33:23.210,0:33:27.740
should be limited to memory addresses that[br]are managed or owned within the

0:33:27.740,0:33:33.620
application context."[br]So one might wonder what happens if this

0:33:33.620,0:33:40.940
address is not managed by myself. Sounds[br]interesting. "Prefetching to addresses

0:33:40.940,0:33:46.289
that are not mapped to physical pages can[br]experience non-deterministic performance

0:33:46.289,0:33:52.030
penalty. For example specifying a NULL[br]pointer as an address for prefetch can

0:33:52.030,0:33:56.000
cause long delays."[br]So we don't want to do that because our

0:33:56.000,0:34:02.919
program will be slow. So, let's take a[br]look what they mean with non-deterministic

0:34:02.919,0:34:08.889
performance penalty, because we want to[br]write good software, right? But before

0:34:08.889,0:34:12.510
that, we have to take a look at a little[br]bit more background information to

0:34:12.510,0:34:17.710
understand the attacks.[br]So on modern operating systems, every

0:34:17.710,0:34:22.850
application has its own virtual address[br]space. So at some point, the CPU needs to

0:34:22.850,0:34:27.479
translate these addresses to the physical[br]addresses actually in the DRAM. And for

0:34:27.479,0:34:33.690
that we have this very complex-looking[br]data structure. So we have a 48-bit

0:34:33.690,0:34:40.409
virtual address, and some of those bits[br]mapped to a table, like the PM level 4

0:34:40.409,0:34:47.760
table, with 512 entries, so depending on[br]those bits the CPU knows, at which line he

0:34:47.760,0:34:51.520
has to look.[br]And if there is data there, because the

0:34:51.520,0:34:56.900
address is mapped, he can proceed and look[br]at the page directory, point the table,

0:34:56.900,0:35:04.620
and so on for the town. So is everything,[br]is the same for each level until you come

0:35:04.620,0:35:09.130
to your page table, where you have[br]4-kilobyte pages. So it's in the end not

0:35:09.130,0:35:13.851
that complicated, but it's a bit[br]confusing, because you want to know a

0:35:13.851,0:35:20.310
physical address, so you have to look it[br]up somewhere in the, in the main memory

0:35:20.310,0:35:25.420
with physical addresses to translate your[br]virtual addresses. And if you have to go

0:35:25.420,0:35:31.890
through all those levels, it needs a long[br]time, so we can do better than that and

0:35:31.890,0:35:39.160
that's why Intel introduced additional[br]caches, also for all of those levels. So,

0:35:39.160,0:35:45.560
if you want to translate an address, you[br]take a look at the ITLB for instructions,

0:35:45.560,0:35:51.150
and the data TLB for data. If it's there,[br]you can stop, otherwise you go down all

0:35:51.150,0:35:58.700
those levels and if it's not in any cache[br]you have to look it up in the DRAM. In

0:35:58.700,0:36:03.300
addition, the address space you have is[br]shared, because you have, on the one hand,

0:36:03.300,0:36:07.470
the user memory and, on the other hand,[br]you have mapped the kernel for convenience

0:36:07.470,0:36:12.870
and performance also in the address space.[br]And if your user program wants to access

0:36:12.870,0:36:18.310
some kernel functionality like reading a[br]file, it will switch to the kernel memory

0:36:18.310,0:36:23.880
there's a privilege escalation, and then[br]you can read the file, and so on. So,

0:36:23.880,0:36:30.420
that's it. However, you have drivers in[br]the kernel, and if you know the addresses

0:36:30.420,0:36:35.771
of those drivers, you can do code-reuse[br]attacks, and as a countermeasure, they

0:36:35.771,0:36:40.150
introduced address-space layout[br]randomization, also for the kernel.

0:36:40.150,0:36:47.040
And this means that when you have your[br]program running, the kernel is mapped at

0:36:47.040,0:36:51.630
one address and if you reboot the machine[br]it's not on the same address anymore but

0:36:51.630,0:36:58.390
somewhere else. So if there is a way to[br]find out at which address the kernel is

0:36:58.390,0:37:04.450
loaded, you have circumvented this[br]countermeasure and defeated kernel address

0:37:04.450,0:37:11.060
space layout randomization. So this would[br]be nice for some attacks. In addition,

0:37:11.060,0:37:16.947
there's also the kernel direct physical[br]map. And what does this mean? It's

0:37:16.947,0:37:23.320
implemented on many operating systems like[br]OS X, Linux, also on the Xen hypervisor

0:37:23.320,0:37:27.860
and[br]BSD, but not on Windows. But what it means

0:37:27.860,0:37:33.870
is that the complete physical memory is[br]mapped in additionally in the kernel

0:37:33.870,0:37:40.460
memory at a fixed offset. So, for every[br]page that is mapped in the user space,

0:37:40.460,0:37:45.160
there's something like a twin page in the[br]kernel memory, which you can't access

0:37:45.160,0:37:50.371
because it's in the kernel memory.[br]However, we will need it later, because

0:37:50.371,0:37:58.230
now we go back to prefetch and see what we[br]can do with that. So, prefetch is not a

0:37:58.230,0:38:04.150
usual instruction, because it just tells[br]the CPU "I might need that data later on.

0:38:04.150,0:38:10.000
If you have time, load for me," if not,[br]the CPU can ignore it because it's busy

0:38:10.000,0:38:15.810
with other stuff. So, there's no necessity[br]that this instruction is really executed,

0:38:15.810,0:38:22.070
but most of the time it is. And a nice,[br]interesting thing is that it generates no

0:38:22.070,0:38:29.000
faults, so whatever you pass to this[br]instruction, your program won't crash, and

0:38:29.000,0:38:33.990
it does not check any privileges, so I can[br]also pass an kernel address to it and it

0:38:33.990,0:38:37.510
won't say "No, stop, you accessed an[br]address that you are not allowed to

0:38:37.510,0:38:45.530
access, so I crash," it just continues,[br]which is nice.

0:38:45.530,0:38:49.810
The second interesting thing is that the[br]operand is a virtual address, so every

0:38:49.810,0:38:55.534
time you execute this instruction, the CPU[br]has to go and check "OK, what physical

0:38:55.534,0:38:59.600
address does this virtual address[br]correspond to?" So it has to do the lookup

0:38:59.600,0:39:05.750
with all those tables we've seen earlier,[br]and as you probably have guessed already,

0:39:05.750,0:39:10.370
the execution time varies also for the[br]prefetch instruction and we will see later

0:39:10.370,0:39:16.090
on what we can do with that.[br]So, let's get back to the direct physical

0:39:16.090,0:39:22.870
map. Because we can create an oracle for[br]address translation, so we can find out

0:39:22.870,0:39:27.540
what physical address belongs to the[br]virtual address. Because nowadays you

0:39:27.540,0:39:31.990
don't want that the user to know, because[br]you can craft nice rowhammer attacks with

0:39:31.990,0:39:37.520
that information, and more advanced cache[br]attacks, so you restrict this information

0:39:37.520,0:39:44.270
to the user. But let's check if we find a[br]way to still get this information. So, as

0:39:44.270,0:39:50.150
I've told you earlier, if you have a[br]paired page in the user space map,

0:39:50.150,0:39:54.505
you have the twin page in the kernel [br]space, and if it's cached,

0:39:54.505,0:39:56.710
its cached for both of them again.

0:39:56.710,0:40:03.170
So, the attack now works as the following:[br]From the attacker you flush your user

0:40:03.170,0:40:09.760
space page, so it's not in the cache for[br]the... also for the kernel memory, and

0:40:09.760,0:40:15.850
then you call prefetch on the address of[br]the kernel, because as I told you, you

0:40:15.850,0:40:22.050
still can do that because it doesn't[br]create any faults. So, you tell the CPU

0:40:22.050,0:40:28.310
"Please load me this data into the cache[br]even if I don't have access to this data

0:40:28.310,0:40:32.550
normally."[br]And if we now measure on our user space

0:40:32.550,0:40:37.100
page the address again, and we measure a[br]cache hit, because it has been loaded by

0:40:37.100,0:40:42.630
the CPU into the cache, we know exactly at[br]which position, since we passed the

0:40:42.630,0:40:48.250
address to the function, this address[br]corresponds to. And because this is at a

0:40:48.250,0:40:53.280
fixed offset, we can just do a simple[br]subtraction and know the physical address

0:40:53.280,0:40:59.180
again. So we have a nice way to find[br]physical addresses for virtual addresses.

0:40:59.180,0:41:04.390
And in practice this looks like this[br]following plot. So, it's pretty simple,

0:41:04.390,0:41:08.910
because we just do this for every address,[br]and at some point we measure a cache hit.

0:41:08.910,0:41:14.260
So, there's a huge difference. And exactly[br]at this point we know this physical

0:41:14.260,0:41:20.140
address corresponds to our virtual[br]address. The second thing is that we can

0:41:20.140,0:41:27.070
exploit the timing differences it needs[br]for the prefetch instruction. Because, as

0:41:27.070,0:41:31.850
I told you, when you go down this cache[br]levels, at some point you see "it's here"

0:41:31.850,0:41:37.500
or "it's not here," so it can abort early.[br]And with that we can know exactly

0:41:37.500,0:41:41.800
when the prefetch[br]instruction aborted, and know how the

0:41:41.800,0:41:48.070
pages are mapped into the address space.[br]So, the timing depends on where the

0:41:48.070,0:41:57.090
translation stops. And using those two[br]properties and those information, we can

0:41:57.090,0:42:02.227
do the following: On the one hand, we can[br]build variants of cache attacks. So,

0:42:02.227,0:42:07.444
instead of Flush+Reload, we can do[br]Flush+Prefetch, for instance. We can

0:42:07.444,0:42:12.060
also use prefetch to mount rowhammer[br]attacks on privileged addresses, because

0:42:12.060,0:42:18.069
it doesn't do any faults when we pass[br]those addresses, and it works as well. In

0:42:18.069,0:42:23.330
addition, we can use it to recover the[br]translation levels of a process, which you

0:42:23.330,0:42:27.870
could do earlier with the page map file,[br]but as I told you it's now privileged, so

0:42:27.870,0:42:32.890
you don't have access to that, and by[br]doing that you can bypass address space

0:42:32.890,0:42:38.170
layout randomization. In addition, as I[br]told you, you can translate virtual

0:42:38.170,0:42:43.530
addresses to physical addresses, which is[br]now also privileged with the page map

0:42:43.530,0:42:48.790
file, and using that it reenables return[br]to direct exploits, which have been

0:42:48.790,0:42:55.550
demonstrated last year. On top of that, we[br]can also use this to locate kernel

0:42:55.550,0:43:00.850
drivers, as I told you. It would be nice[br]if we can circumvent KSLR as well, and I

0:43:00.850,0:43:08.380
will show you now how this is possible.[br]So, with the first oracle we find out all

0:43:08.380,0:43:15.430
the pages that are mapped, and for each of[br]those pages, we evict the translation

0:43:15.430,0:43:18.210
caches, and we can do that by either[br]calling sleep,

0:43:18.210,0:43:24.450
which schedules another program, or access[br]just a large memory buffer. Then, we

0:43:24.450,0:43:28.260
perform a syscall to the driver. So,[br]there's code of the driver executed and

0:43:28.260,0:43:33.540
loaded into the cache, and then we just[br]measure the time prefetch takes on this

0:43:33.540,0:43:40.840
address. And in the end, the fastest[br]average access time is the driver page.

0:43:40.840,0:43:46.770
So, we can mount this attack on Windows 10[br]in less than 12 seconds. So, we can defeat

0:43:46.770,0:43:52.110
KSLR in less than 12 seconds, which is[br]very nice. And in practice, the

0:43:52.110,0:43:58.330
measurements looks like the following: So,[br]we have a lot of long measurements, and at

0:43:58.330,0:44:05.060
some point you have a low one, and you[br]know exactly that this driver region and

0:44:05.060,0:44:09.930
the address the driver is located. And[br]you can mount those read to direct

0:44:09.930,0:44:16.210
attacks again. However, that's not[br]everything, because there are more

0:44:16.210,0:44:20.795
instructions in Intel.[br]CM: Yeah, so, the following is not our

0:44:20.795,0:44:24.350
work, but we thought that would be[br]interesting, because it's basically more

0:44:24.350,0:44:30.740
instructions, more attacks, more fun. So[br]there's the RDSEED instruction, and what

0:44:30.740,0:44:35.340
it does, that is request a random seed to[br]the hardware random number generator. So,

0:44:35.340,0:44:39.310
the thing is that there is a fixed number[br]of precomputed random bits, and that takes

0:44:39.310,0:44:44.320
time to regenerate them. So, as everything[br]that takes time, you can create a covert

0:44:44.320,0:44:50.180
channel with that. There is also FADD and[br]FMUL, which are floating point operations.

0:44:50.180,0:44:56.740
Here, the running time of this instruction[br]depends on the operands. Some people

0:44:56.740,0:45:01.530
managed to bypass Firefox's same origin[br]policy with an SVG filter timing attack

0:45:01.530,0:45:08.540
with that. There's also the JMP[br]instructions. So, in modern CPUs you have

0:45:08.540,0:45:14.520
branch prediction, and branch target[br]prediction. With that, it's actually been

0:45:14.520,0:45:18.250
studied a lot, you can create a covert[br]channel. You can do side-channel attacks

0:45:18.250,0:45:26.028
on crypto. You can also bypass KSLR, and[br]finally, there are TSX instructions, which

0:45:26.028,0:45:31.010
is an extension for hardware transactional[br]memory support, which has also been used

0:45:31.010,0:45:37.150
to bypass KSLR. So, in case you're not[br]sure, KSLR is dead. You have lots of

0:45:37.150,0:45:45.650
different things to read. Okay, so, on the[br]conclusion now. So, as you've seen, it's

0:45:45.650,0:45:50.190
actually more a problem of CPU design,[br]than really the instruction sets

0:45:50.190,0:45:55.720
architecture. The thing is that all these[br]issues are really hard to patch. They

0:45:55.720,0:45:59.966
are all linked to performance[br]optimizations, and we are not getting rid

0:45:59.966,0:46:03.890
of performance optimization. That's[br]basically a trade-off between performance

0:46:03.890,0:46:11.530
and security, and performance seems to[br]always win. There has been some

0:46:11.530,0:46:20.922
propositions to... against cache attacks,[br]to... let's say remove the CLFLUSH

0:46:20.922,0:46:26.640
instructions. The thing is that all these[br]quick fix won't work, because we always

0:46:26.640,0:46:31.450
find new ways to do the same thing without[br]these precise instructions and also, we

0:46:31.450,0:46:37.410
keep finding new instruction that leak[br]information. So, it's really, let's say

0:46:37.410,0:46:43.740
quite a big topic that we have to fix[br]this. So, thank you very much for your

0:46:43.740,0:46:47.046
attention. If you have any questions we'd[br]be happy to answer them.

0:46:47.046,0:46:52.728
<i>applause</i>

0:46:52.728,0:47:01.510
<i>applause</i>[br]Herald: Okay. Thank you very much again

0:47:01.510,0:47:06.571
for your talk, and now we will have a Q&amp;A,[br]and we have, I think, about 15 minutes, so

0:47:06.571,0:47:11.330
you can start lining up behind the[br]microphones. They are in the gangways in

0:47:11.330,0:47:18.130
the middle. Except, I think that one...[br]oh, no, it's back up, so it will work. And

0:47:18.130,0:47:22.180
while we wait, I think we will take[br]questions from our signal angel, if there

0:47:22.180,0:47:28.810
are any. Okay, there aren't any, so...[br]microphone questions. I think, you in

0:47:28.810,0:47:33.440
front.[br]Microphone: Hi. Can you hear me?

0:47:33.440,0:47:40.050
Herald: Try again.[br]Microphone: Okay. Can you hear me now?

0:47:40.050,0:47:46.480
Okay. Yeah, I'd like to know what exactly[br]was your stealthiness metric? Was it that

0:47:46.480,0:47:51.310
you can't distinguish it from a normal[br]process, or...?

0:47:51.310,0:47:56.500
CM: So...[br]Herald: Wait a second. We have still Q&amp;A,

0:47:56.500,0:47:59.780
so could you quiet down a bit? That would[br]be nice.

0:47:59.780,0:48:08.180
CM: So, the question was about the[br]stealthiness metric. Basically, we use the

0:48:08.180,0:48:14.320
metric with cache misses and cache[br]references, normalized by the instructions

0:48:14.320,0:48:21.080
TLB events, and we[br]just found the threshold under which

0:48:21.080,0:48:25.820
pretty much every benign application was[br]below this, and rowhammer and cache

0:48:25.820,0:48:30.520
attacks were after that. So we fixed the[br]threshold, basically.

0:48:30.520,0:48:35.520
H: That microphone.[br]Microphone: Hello. Thanks for your talk.

0:48:35.520,0:48:42.760
It was great. First question: Did you[br]inform Intel before doing this talk?

0:48:42.760,0:48:47.520
CM: Nope.[br]Microphone: Okay. The second question:

0:48:47.520,0:48:51.050
What's your future plans?[br]CM: Sorry?

0:48:51.050,0:48:55.780
M: What's your future plans?[br]CM: Ah, future plans. Well, what I did,

0:48:55.780,0:49:01.220
that is interesting, is that we keep[br]finding these more or less by accident, or

0:49:01.220,0:49:06.440
manually, so having a good idea of what's[br]the attack surface here would be a good

0:49:06.440,0:49:10.050
thing, and doing that automatically would[br]be even better.

0:49:10.050,0:49:14.170
M: Great, thanks.[br]H: Okay, the microphone in the back,

0:49:14.170,0:49:18.770
over there. The guy in white.[br]M: Hi. One question. If you have,

0:49:18.770,0:49:24.410
like, a demon, that randomly invalidates[br]some cache lines, would that be a better

0:49:24.410,0:49:31.120
countermeasure than disabling the caches?[br]ML: What was the question?

0:49:31.120,0:49:39.580
CM: If invalidating cache lines would be[br]better than disabling the whole cache. So,

0:49:39.580,0:49:42.680
I'm...[br]ML: If you know which cache lines have

0:49:42.680,0:49:47.300
been accessed by the process, you can[br]invalidate those cache lines before you

0:49:47.300,0:49:52.820
swap those processes, but it's also a[br]trade-off between performance. Like, you

0:49:52.820,0:49:57.940
can also, if you switch processes, flush[br]the whole cache, and then it's empty, and

0:49:57.940,0:50:01.900
then you don't see any activity anymore,[br]but there's also the trade-off of

0:50:01.900,0:50:07.510
performance with this.[br]M: Okay, maybe a second question. If you,

0:50:07.510,0:50:12.240
there are some ARM architectures[br]that have random cache line invalidations.

0:50:12.240,0:50:16.010
Did you try those, if you can see a[br][unintelligible] channel there.

0:50:16.010,0:50:21.960
ML: If they're truly random, but probably[br]you just have to make more measurements

0:50:21.960,0:50:27.180
and more measurements, and then you can[br]average out the noise, and then you can do

0:50:27.180,0:50:30.350
these attacks again. It's like, with prime[br]and probe, where you need more

0:50:30.350,0:50:34.080
measurements, because it's much more[br]noisy, so in the end you will just need

0:50:34.080,0:50:37.870
much more measurements.[br]CM: So, on ARM, it's supposed to be pretty

0:50:37.870,0:50:43.260
random. At least it's in the manual, but[br]we actually found nice ways to evict cache

0:50:43.260,0:50:47.230
lines, that we really wanted to evict, so[br]it's not actually that pseudo-random.

0:50:47.230,0:50:51.960
So, even... let's say, if something is[br]truly random, it might be nice, but then

0:50:51.960,0:50:57.170
it's also quite complicated to implement.[br]I mean, you probably don't want a random

0:50:57.170,0:51:01.480
number generator just for the cache.[br]M: Okay. Thanks.

0:51:01.480,0:51:05.980
H: Okay, and then the three guys here on[br]the microphone in the front.

0:51:05.980,0:51:13.450
M: My question is about a detail with the[br]keylogger. You could distinguish between

0:51:13.450,0:51:18.150
space, backspace and alphabet, which is[br]quite interesting. But could you also

0:51:18.150,0:51:22.320
figure out the specific keys that were[br]pressed, and if so, how?

0:51:22.320,0:51:25.650
ML: Yeah, that depends on the[br]implementation of the keyboard. But what

0:51:25.650,0:51:29.310
we did, we used the Android stock[br]keyboard, which is shipped with the

0:51:29.310,0:51:34.520
Samsung, so it's pre-installed. And if you[br]have a table somewhere in your code, which

0:51:34.520,0:51:39.540
says "Okay, if you press this exact[br]location or this image, it's an A or it's

0:51:39.540,0:51:44.450
an B", then you can also do a more[br]sophisticated attack. So, if you find any

0:51:44.450,0:51:49.050
functions or data in the code, which[br]directly tells you "Okay, this is this

0:51:49.050,0:51:54.520
character," you can also spy on the actual[br]key characters on the keyboard.

0:51:54.520,0:52:02.900
M: Thank you.[br]M: Hi. Thank you for your talk. My first

0:52:02.900,0:52:08.570
question is: What can we actually do now,[br]to mitigate this kind of attack? By, for

0:52:08.570,0:52:11.980
example switching off TSX or using ECC[br]RAM.

0:52:11.980,0:52:17.410
CM: So, I think the very important thing[br]to protect would be, like crypto, and the

0:52:17.410,0:52:20.840
good thing is that today we know how to[br]build crypto that is resistant to side-

0:52:20.840,0:52:24.490
channel attacks. So the good thing would[br]be to stop improving implementation that

0:52:24.490,0:52:31.360
are known to be vulnerable for 10 years.[br]Then things like keystrokes is way harder

0:52:31.360,0:52:36.830
to protect, so let's say crypto is[br]manageable; the whole system is clearly

0:52:36.830,0:52:41.490
another problem. And you can have[br]different types of countermeasure on the

0:52:41.490,0:52:45.780
hardware side but it does would mean that[br]Intel an ARM actually want to fix that,

0:52:45.780,0:52:48.560
and that they know how to fix that. I[br]don't even know how to fix that in

0:52:48.560,0:52:55.500
hardware. Then on the system side, if you[br]prevent some kind of memory sharing, you

0:52:55.500,0:52:58.540
don't have flush involved anymore[br]and primum (?) probably is much more

0:52:58.540,0:53:04.880
noisier, so it would be an improvement.[br]M: Thank you.

0:53:04.880,0:53:11.880
H: Do we have signal angel questions? No.[br]OK, then more microphone.

0:53:11.880,0:53:16.630
M: Hi, thank you. I wanted to ask about[br]the way you establish the side-channel

0:53:16.630,0:53:23.280
between the two processors, because it[br]would obviously have to be timed in a way to

0:53:23.280,0:53:28.511
transmit information between one process[br]to the other. Is there anywhere that you

0:53:28.511,0:53:32.970
documented the whole? You know, it's[br]actually almost like the seven layers or

0:53:32.970,0:53:36.580
something like that. There are any ways[br]that you documented that? It would be

0:53:36.580,0:53:40.260
really interesting to know how it worked.[br]ML: You can find this information in the

0:53:40.260,0:53:46.120
paper because there are several papers on[br]covered channels using that, so the NDSS

0:53:46.120,0:53:51.300
paper is published in February I guess,[br]but the Armageddon paper also includes

0:53:51.300,0:53:55.670
a cover channel, and you can[br]find more information about how the

0:53:55.670,0:53:59.320
packets look like and how the[br]synchronization works in the paper.

0:53:59.320,0:54:04.020
M: Thank you.[br]H: One last question?

0:54:04.020,0:54:09.750
M: Hi! You mentioned that you used Osvik's[br]attack for the AES side-channel attack.

0:54:09.750,0:54:17.350
Did you solve the AES round detection and[br]is it different to some scheduler

0:54:17.350,0:54:21.441
manipulation?[br]CM: So on this one I think we only did

0:54:21.441,0:54:24.280
some synchronous attack, so we already[br]knew when

0:54:24.280,0:54:27.770
the victim is going to be scheduled and[br]we didn't have anything to do with

0:54:27.770,0:54:32.930
schedulers.[br]M: Alright, thank you.

0:54:32.930,0:54:37.140
H: Are there any more questions? No, I[br]don't see anyone. Then, thank you very

0:54:37.140,0:54:39.132
much again to our speakers.

0:54:39.132,0:54:42.162
<i>applause</i>

0:54:42.162,0:54:58.970
<i>music</i>

0:54:58.970,0:55:06.000
subtitles created by c3subtitles.de[br]in the year 2020. Join, and help us!