0:00:00.000,0:00:15.030
34C3 preroll music
0:00:15.030,0:00:22.570
Herald: Hello fellow creatures.[br]Welcome and
0:00:22.570,0:00:30.140
I wanna start with a question. [br]Another one: Who do we trust?
0:00:30.140,0:00:36.500
Do we trust the TrustZones [br]on our smartphones?
0:00:36.500,0:00:41.710
Well Keegan Ryan, he's really [br]fortunate to be here and
0:00:41.710,0:00:51.710
he was inspired by another talk from the[br]CCC before - I think it was 29C3 and his
0:00:51.710,0:00:57.550
research on smartphones and systems on a[br]chip used in smart phones will answer
0:00:57.550,0:01:02.520
these questions if you can trust those[br]trusted execution environments. Please
0:01:02.520,0:01:06.330
give a warm round of applause[br]to Keegan and enjoy!
0:01:06.330,0:01:10.630
Applause
0:01:10.630,0:01:14.220
Kegan Ryan: All right, thank you! So I'm[br]Keegan Ryan, I'm a consultant with NCC
0:01:14.220,0:01:19.740
group and this is micro architectural[br]attacks on Trusted Execution Environments.
0:01:19.740,0:01:23.250
So, in order to understand what a Trusted[br]Execution Environment is we need to go
0:01:23.250,0:01:29.729
back into processor security, specifically[br]on x86. So as many of you are probably
0:01:29.729,0:01:33.729
aware there are a couple different modes[br]which we can execute code under in x86
0:01:33.729,0:01:39.290
processors and that includes ring 3, which[br]is the user code and the applications, and
0:01:39.290,0:01:45.570
also ring 0 which is the kernel code. Now[br]there's also a ring 1 and ring 2 that are
0:01:45.570,0:01:50.229
supposedly used for drivers or guest[br]operating systems but really it just boils
0:01:50.229,0:01:56.159
down to ring 0 and ring 3. And in this[br]diagram we have here we see that privilege
0:01:56.159,0:02:02.149
increases as we go up the diagram, so ring[br]0 is the most privileged ring and ring 3
0:02:02.149,0:02:05.470
is the least privileged ring. So all of[br]our secrets, all of our sensitive
0:02:05.470,0:02:10.030
information, all of the attackers goals[br]are in ring 0 and the attacker is trying
0:02:10.030,0:02:15.890
to access those from the unprivileged[br]world of ring 3. Now you may have a
0:02:15.890,0:02:20.150
question what if I want to add a processor[br]feature that I don't want ring 0 to be
0:02:20.150,0:02:26.240
able to access? Well then you add ring -1[br]which is often used for a hypervisor. Now
0:02:26.240,0:02:30.610
the hypervisor has all the secrets and the[br]hypervisor can manage different guest
0:02:30.610,0:02:35.680
operating systems and each of these guest[br]operating systems can execute in ring 0
0:02:35.680,0:02:41.300
without having any idea of the other[br]operating systems. So this way now the
0:02:41.300,0:02:45.230
secrets are all in ring -1 so now the[br]attackers goals have shifted from ring 0
0:02:45.230,0:02:50.760
to ring -1. The attacker has to attack[br]ring -1 from a less privileged ring and
0:02:50.760,0:02:55.430
tries to access those secrets. But what if[br]you want to add a processor feature that
0:02:55.430,0:03:00.560
you don't want ring -1 to be able to[br]access? So you add ring -2 which is System
0:03:00.560,0:03:05.230
Management Mode and that's capable of[br]monitoring power, directly interfacing
0:03:05.230,0:03:10.350
with firmware and other chips on a[br]motherboard and it's able to access and do
0:03:10.350,0:03:13.820
a lot of things that the hypervisor is not[br]able to and now all of your secrets and
0:03:13.820,0:03:17.880
all of your attacker goals are in ring -2[br]and the attacker has to attack those from
0:03:17.880,0:03:22.400
a less privileged ring. Now maybe you want[br]to add something to your processor that
0:03:22.400,0:03:26.900
you don't want ring -2 to be able access,[br]so you add ring -3 and I think you get the
0:03:26.900,0:03:31.450
picture now. And we just keep on adding[br]more and more privilege rings and keep
0:03:31.450,0:03:35.260
putting our secrets and our attackers[br]goals in these higher and higher
0:03:35.260,0:03:41.260
privileged rings but what if we're[br]thinking about it wrong? What if instead
0:03:41.260,0:03:46.710
we want to put all the secrets in the[br]least privileged ring? So this is sort of
0:03:46.710,0:03:51.490
the idea behind SGX and it's useful for[br]things like DRM where you want that to run
0:03:51.490,0:03:56.980
ring 3 code but have sensitive secrets or[br]other assigning capabilities running in
0:03:56.980,0:04:02.050
ring 3. But this picture is getting a[br]little bit complicated, this diagram is a
0:04:02.050,0:04:06.250
little bit complex so let's simplify it a[br]little bit. We'll only be looking at ring
0:04:06.250,0:04:12.100
0 through ring 3 which is the kernel, the[br]userland and the SGX enclave which also
0:04:12.100,0:04:16.910
executes in ring 3. Now when you're[br]executing code in the SGX enclave you
0:04:16.910,0:04:22.170
first load the code into the enclave and[br]then from that point on you trust the
0:04:22.170,0:04:26.980
execution of whatever's going on in that[br]enclave. You trust that the other elements
0:04:26.980,0:04:31.640
the kernel, the userland, the other rings[br]are not going to be able to access what's
0:04:31.640,0:04:38.020
in that enclave so you've made your[br]Trusted Execution Environment. This is a
0:04:38.020,0:04:44.750
bit of a weird model because now your[br]attacker is in the ring 0 kernel and your
0:04:44.750,0:04:48.840
target victim here is in ring 3. So[br]instead of the attacker trying to move up
0:04:48.840,0:04:54.070
the privilege chain, the attacker is[br]trying to move down. Which is pretty
0:04:54.070,0:04:57.820
strange and you might have some questions[br]like "under this model who handles memory
0:04:57.820,0:05:01.470
management?" because traditionally that's[br]something that ring 0 would manage and
0:05:01.470,0:05:05.290
ring 0 would be responsible for paging[br]memory in and out for different processes
0:05:05.290,0:05:10.460
in different code that's executing it in[br]ring 3. But on the other hand you don't
0:05:10.460,0:05:16.030
want that to happen with the SGX enclave[br]because what if the malicious ring 0 adds
0:05:16.030,0:05:22.410
a page to the enclave that the enclave[br]doesn't expect? So in order to solve this
0:05:22.410,0:05:28.950
problem, SGX does allow ring 0 to handle[br]page faults. But simultaneously and in
0:05:28.950,0:05:35.380
parallel it verifies every memory load to[br]make sure that no access violations are
0:05:35.380,0:05:40.139
made so that all the SGX memory is safe.[br]So it allows ring 0 to do its job but it
0:05:40.139,0:05:45.010
sort of watches over at the same time to[br]make sure that nothing is messed up. So
0:05:45.010,0:05:51.120
it's a bit of a weird convoluted solution[br]to a strange inverted problem but it works
0:05:51.120,0:05:57.580
and that's essentially how SGX works and[br]the idea behind SGX. Now we can look at
0:05:57.580,0:06:02.530
x86 and we can see that ARMv8 is[br]constructed in a similar way but it
0:06:02.530,0:06:08.450
improves on x86 in a couple key ways. So[br]first of all ARMv8 gets rid of ring 1 and
0:06:08.450,0:06:12.170
ring 2 so you don't have to worry about[br]those and it just has different privilege
0:06:12.170,0:06:17.370
levels for userland and the kernel. And[br]these different privilege levels are
0:06:17.370,0:06:21.520
called exception levels in the ARM[br]terminology. And the second thing that ARM
0:06:21.520,0:06:25.930
gets right compared to x86 is that instead[br]of starting at 3 and counting down as
0:06:25.930,0:06:30.730
privilege goes up, ARM starts at 0 and[br]counts up so we don't have to worry about
0:06:30.730,0:06:35.940
negative numbers anymore. Now when we add[br]the next privilege level the hypervisor we
0:06:35.940,0:06:40.860
call it exception level 2 and the next one[br]after that is the monitor in exception
0:06:40.860,0:06:47.210
level 3. So at this point we still want to[br]have the ability to run trusted code in
0:06:47.210,0:06:52.650
exception level 0 the least privileged[br]level of the ARMv8 processor. So in order
0:06:52.650,0:06:59.060
to support this we need to separate this[br]diagram into two different sections. In
0:06:59.060,0:07:03.510
ARMv8 these are called the secure world[br]and the non-secure world. So we have the
0:07:03.510,0:07:07.740
non-secure world on the left in blue that[br]consists of the userland, the kernel and
0:07:07.740,0:07:11.900
the hypervisor and we have the secure[br]world on the right which consists of the
0:07:11.900,0:07:17.360
monitor in exception level 3, a trusted[br]operating system in exception level 1 and
0:07:17.360,0:07:23.030
trusted applications in exception level 0.[br]So the idea is that if you run anything in
0:07:23.030,0:07:27.360
the secure world, it should not be[br]accessible or modifiable by anything in
0:07:27.360,0:07:32.320
the non secure world. So that's how our[br]attacker is trying to access it. The
0:07:32.320,0:07:36.371
attacker has access to the non secure[br]kernel, which is often Linux, and they're
0:07:36.371,0:07:40.120
trying to go after the trusted apps. So[br]once again we have this weird inversion
0:07:40.120,0:07:43.330
where we're trying to go from a more[br]privileged level to a less privileged
0:07:43.330,0:07:48.260
level and trying to extract secrets in[br]that way. So the question that arises when
0:07:48.260,0:07:53.070
using these Trusted Execution Environments[br]that are implemented in SGX and TrustZone
0:07:53.070,0:07:58.330
in ARM is "can we use these privilege[br]modes in our privilege access in order to
0:07:58.330,0:08:03.330
attack these Trusted Execution[br]Environments?". Now transfer that question
0:08:03.330,0:08:06.260
and we can start looking at a few[br]different research papers. The first one
0:08:06.260,0:08:11.360
that I want to go into is one called[br]CLKSCREW and it's an attack on TrustZone.
0:08:11.360,0:08:14.360
So throughout this presentation I'm going[br]to go through a few different papers and
0:08:14.360,0:08:18.050
just to make it clear which papers have[br]already been published and which ones are
0:08:18.050,0:08:21.400
old I'll include the citations in the[br]upper right hand corner so that way you
0:08:21.400,0:08:26.580
can tell what's old and what's new. And as[br]far as papers go this CLKSCREW paper is
0:08:26.580,0:08:31.430
relatively new. It was released in 2017.[br]And the way CLKSCREW works is it takes
0:08:31.430,0:08:38.009
advantage of the energy management[br]features of a processor. So a non-secure
0:08:38.009,0:08:41.679
operating system has the ability to manage[br]the energy consumption of the different
0:08:41.679,0:08:47.970
cores. So if a certain target core doesn't[br]have much scheduled to do then the
0:08:47.970,0:08:52.350
operating system is able to scale back[br]that voltage or dial down the frequency on
0:08:52.350,0:08:56.449
that core so that core uses less energy[br]which is a great thing for performance: it
0:08:56.449,0:09:00.971
really extends battery life, it makes the[br]the cores last longer and it gives better
0:09:00.971,0:09:07.009
performance overall. But the problem here[br]is what if you have two separate cores and
0:09:07.009,0:09:11.740
one of your cores is running this non-[br]trusted operating system and the other
0:09:11.740,0:09:15.579
core is running code in the secure world?[br]It's running that trusted code those
0:09:15.579,0:09:21.240
trusted applications so that non secure[br]operating system can still dial down that
0:09:21.240,0:09:25.629
voltage and it can still change that[br]frequency and those changes will affect
0:09:25.629,0:09:30.740
the secure world code. So what the[br]CLKSCREW attack does is the non secure
0:09:30.740,0:09:36.470
operating system core will dial down the[br]voltage, it will overclock the frequency
0:09:36.470,0:09:40.749
on the target secure world core in order[br]to induce faults to make sure to make the
0:09:40.749,0:09:45.909
computation on that core fail in some way[br]and when that computation fails you get
0:09:45.909,0:09:50.439
certain cryptographic errors that the[br]attack can use to infer things like secret
0:09:50.439,0:09:56.040
keys, secret AES keys and to bypass code[br]signing implemented in the secure world.
0:09:56.040,0:09:59.680
So it's a very powerful attack that's made[br]possible because the non-secure operating
0:09:59.680,0:10:06.099
system is privileged enough in order to[br]use these energy management features. Now
0:10:06.099,0:10:10.189
CLKSCREW is an example of an active attack[br]where the attacker is actively changing
0:10:10.189,0:10:15.470
the outcome of the victim code of that[br]code in the secure world. But what about
0:10:15.470,0:10:20.540
passive attacks? So in a passive attack,[br]the attacker does not modify the actual
0:10:20.540,0:10:25.220
outcome of the process. The attacker just[br]tries to monitor that process infer what's
0:10:25.220,0:10:29.200
going on and that is the sort of attack[br]that we'll be considering for the rest of
0:10:29.200,0:10:35.769
the presentation. So in a lot of SGX and[br]TrustZone implementations, the trusted and
0:10:35.769,0:10:39.759
the non-trusted code both share the same[br]hardware and this shared hardware could be
0:10:39.759,0:10:45.800
a shared cache, it could be a branch[br]predictor, it could be a TLB. The point is
0:10:45.800,0:10:53.230
that they share the same hardware so that[br]the changes made by the secure code may be
0:10:53.230,0:10:57.209
reflected in the behavior of the non-[br]secure code. So the trusted code might
0:10:57.209,0:11:02.259
execute, change the state of that shared[br]cache for example and then the untrusted
0:11:02.259,0:11:07.179
code may be able to go in, see the changes[br]in that cache and infer information about
0:11:07.179,0:11:11.720
the behavior of the secure code. So that's[br]essentially how our side channel attacks
0:11:11.720,0:11:16.160
are going to work. If the non-secure code[br]is going to monitor these shared hardware
0:11:16.160,0:11:23.050
resources for state changes that reflect[br]the behavior of the secure code. Now we've
0:11:23.050,0:11:27.899
all talked about how Intel and SGX address[br]the problem of memory management and who's
0:11:27.899,0:11:33.399
responsible for making sure that those[br]attacks don't work on SGX. So what do they
0:11:33.399,0:11:37.050
have to say on how they protect against[br]these side channel attacks and attacks on
0:11:37.050,0:11:45.490
this shared cache hardware? They don't..[br]at all. They essentially say "we do not
0:11:45.490,0:11:48.931
consider this part of our threat model. It[br]is up to the developer to implement the
0:11:48.931,0:11:53.530
protections needed to protect against[br]these side-channel attacks". Which is
0:11:53.530,0:11:56.769
great news for us because these side[br]channel attacks can be very powerful and
0:11:56.769,0:12:00.350
if there aren't any hardware features that[br]are necessarily stopping us from being
0:12:00.350,0:12:06.910
able to accomplish our goal it makes us[br]that more likely to succeed. So with that
0:12:06.910,0:12:11.430
we can sort of take a step back from trust[br]zone industry acts and just take a look at
0:12:11.430,0:12:14.959
cache attacks to make sure that we all[br]have the same understanding of how the
0:12:14.959,0:12:19.549
cache attacks will be applied to these[br]Trusted Execution Environments. To start
0:12:19.549,0:12:25.619
that let's go over a brief recap of how a[br]cache works. So caches are necessary in
0:12:25.619,0:12:29.949
processors because accessing the main[br]memory is slow. When you try to access
0:12:29.949,0:12:34.079
something from the main memory it takes a[br]while to be read into the process. So the
0:12:34.079,0:12:40.389
cache exists as sort of a layer to[br]remember what that information is so if
0:12:40.389,0:12:45.040
the process ever needs information from[br]that same address it just reloads it from
0:12:45.040,0:12:49.699
the cache and that access is going to be[br]fast. So it really speeds up the memory
0:12:49.699,0:12:55.810
access for repeated accesses to the same[br]address. And then if we try to access a
0:12:55.810,0:13:00.069
different address then that will also be[br]read into the cache, slowly at first but
0:13:00.069,0:13:06.720
then quickly for repeated accesses and so[br]on and so forth. Now as you can probably
0:13:06.720,0:13:10.970
tell from all of these examples the memory[br]blocks have been moving horizontally
0:13:10.970,0:13:15.649
they've always been staying in the same[br]row. And that is reflective of the idea of
0:13:15.649,0:13:20.360
sets in a cache. So there are a number of[br]different set IDs and that corresponds to
0:13:20.360,0:13:24.189
the different rows in this diagram. So for[br]our example there are four different set
0:13:24.189,0:13:30.889
IDs and each address in the main memory[br]maps to a different set ID. So that
0:13:30.889,0:13:35.100
address in main memory will only go into[br]that location in the cache with the same
0:13:35.100,0:13:39.730
set ID so it will only travel along those[br]rows. So that means if you have two
0:13:39.730,0:13:43.410
different blocks of memory that mapped to[br]different set IDs they're not going to
0:13:43.410,0:13:48.899
interfere with each other in the cache.[br]But that raises the question "what about
0:13:48.899,0:13:53.310
two memory blocks that do map to the same[br]set ID?". Well if there's room in the
0:13:53.310,0:13:58.759
cache then the same thing will happen as[br]before: those memory contents will be
0:13:58.759,0:14:03.769
loaded into the cache and then retrieved[br]from the cache for future accesses. And
0:14:03.769,0:14:08.110
the number of possible entries for a[br]particular set ID within a cache is called
0:14:08.110,0:14:11.800
the associativity. And on this diagram[br]that's represented by the number of
0:14:11.800,0:14:16.819
columns in the cache. So we will call our[br]cache in this example a 2-way set-
0:14:16.819,0:14:22.350
associative cache. Now the next question[br]is "what happens if you try to read a
0:14:22.350,0:14:27.049
memory address that maps the same set ID[br]but all of those entries within that said ID
0:14:27.049,0:14:32.529
within the cache are full?". Well one of[br]those entries is chosen, it's evicted from
0:14:32.529,0:14:38.729
the cache, the new memory is read in and[br]then that's fed to the process. So it
0:14:38.729,0:14:43.779
doesn't really matter how the cache entry[br]is chosen that you're evicting for the
0:14:43.779,0:14:47.960
purpose of the presentation you can just[br]assume that it's random. But the important
0:14:47.960,0:14:51.899
thing is that if you try to access that[br]same memory that was evicted before you're
0:14:51.899,0:14:55.689
not going to have to wait for that time[br]penalty for that to be reloaded into the
0:14:55.689,0:15:01.329
cache and read into the process. So those[br]are caches in a nutshell in particularly
0:15:01.329,0:15:05.749
set associative caches, we can begin[br]looking at the different types of cache
0:15:05.749,0:15:09.319
attacks. So for a cache attack we have two[br]different processes we have an attacker
0:15:09.319,0:15:13.779
process and a victim process. For this[br]type of attack that we're considering both
0:15:13.779,0:15:17.290
of them share the same underlying code so[br]they're trying to access the same
0:15:17.290,0:15:21.829
resources which could be the case if you[br]have page deduplication in virtual
0:15:21.829,0:15:26.009
machines or if you have copy-on-write[br]mechanisms for shared code and shared
0:15:26.009,0:15:31.649
libraries. But the point is that they[br]share the same underlying memory. Now the
0:15:31.649,0:15:35.659
Flush and Reload Attack works in two[br]stages for the attacker. The attacker
0:15:35.659,0:15:39.420
first starts by flushing out the cache.[br]They flush each and every addresses in the
0:15:39.420,0:15:44.309
cache so the cache is just empty. Then the[br]attacker let's the victim executes for a
0:15:44.309,0:15:48.769
small amount of time so the victim might[br]read on an address from main memory
0:15:48.769,0:15:53.489
loading that into the cache and then the[br]second stage of the attack is the reload
0:15:53.489,0:15:58.099
phase. In the reload phase the attacker[br]tries to load different memory addresses
0:15:58.099,0:16:04.171
from main memory and see if those entries[br]are in the cache or not. Here the attacker
0:16:04.171,0:16:09.380
will first try to load address 0 and see[br]that because it takes a long time to read
0:16:09.380,0:16:14.429
the contents of address 0 the attacker can[br]infer that address 0 was not part of the
0:16:14.429,0:16:17.499
cache which makes sense because the[br]attacker flushed it from the cache in the
0:16:17.499,0:16:23.330
first stage. The attacker then tries to[br]read the memory at address 1 and sees that
0:16:23.330,0:16:29.089
this operation is fast so the attacker[br]infers that the contents of address 1 are
0:16:29.089,0:16:32.859
in the cache and because the attacker[br]flushed everything from the cache before
0:16:32.859,0:16:37.119
the victim executed, the attacker then[br]concludes that the victim is responsible
0:16:37.119,0:16:42.540
for bringing address 1 into the cache.[br]This Flush+Reload attack reveals which
0:16:42.540,0:16:47.370
memory addresses the victim accesses[br]during that small slice of time. Then
0:16:47.370,0:16:50.970
after that reload phase, the attack[br]repeats so the attacker flushes again
0:16:50.970,0:16:57.739
let's the victim execute, reloads again[br]and so on. There's also a variant on the
0:16:57.739,0:17:01.050
Flush+Reload attack that's called the[br]Flush+Flush attack which I'm not going to
0:17:01.050,0:17:05.569
go into the details of, but essentially[br]it's the same idea. But instead of using
0:17:05.569,0:17:08.980
load instructions to determine whether or[br]not a piece of memory is in the cache or
0:17:08.980,0:17:13.720
not, it uses flush instructions because[br]flush instructions will take longer if
0:17:13.720,0:17:19.138
something is in the cache already. The[br]important thing is that both the
0:17:19.138,0:17:22.819
Flush+Reload attack and the Flush+Flush[br]attack rely on the attacker and the victim
0:17:22.819,0:17:27.029
sharing the same memory. But this isn't[br]always the case so we need to consider
0:17:27.029,0:17:30.810
what happens when the attacker and the[br]victim do not share memory. For this we
0:17:30.810,0:17:35.670
have the Prime+Probe attack. The[br]Prime+Probe attack once again works in two
0:17:35.670,0:17:40.380
separate stages. In the first stage the[br]attacker prime's the cache by reading all
0:17:40.380,0:17:44.401
the attacker memory into the cache and[br]then the attacker lets the victim execute
0:17:44.401,0:17:49.750
for a small amount of time. So no matter[br]what the victim accesses from main memory
0:17:49.750,0:17:54.460
since the cache is full of the attacker[br]data, one of those attacker entries will
0:17:54.460,0:17:59.190
be replaced by a victim entry. Then in the[br]second phase of the attack, during the
0:17:59.190,0:18:03.529
probe phase, the attacker checks the[br]different cache entries for particular set
0:18:03.529,0:18:08.959
IDs and sees if all of the attacker[br]entries are still in the cache. So maybe
0:18:08.959,0:18:13.440
our attacker is curious about the last set[br]ID, the bottom row, so the attacker first
0:18:13.440,0:18:18.090
tries to load the memory at address 3 and[br]because this operation is fast the
0:18:18.090,0:18:23.000
attacker knows that address 3 is in the[br]cache. The attacker tries the same thing
0:18:23.000,0:18:28.159
with address 7, sees that this operation[br]is slow and infers that at some point
0:18:28.159,0:18:33.279
address 7 was evicted from the cache so[br]the attacker knows that something had to
0:18:33.279,0:18:37.490
evicted from the cache and it had to be[br]from the victim so the attacker concludes
0:18:37.490,0:18:42.840
that the victim accessed something in that[br]last set ID and that bottom row. The
0:18:42.840,0:18:47.230
attacker doesn't know if it was the[br]contents of address 11 or the contents of
0:18:47.230,0:18:51.260
address 15 or even what those contents[br]are, but the attacker has a good idea of
0:18:51.260,0:18:57.090
which set ID it was. So, the good things,[br]the important things to remember about
0:18:57.090,0:19:01.179
cache attacks is that caches are very[br]important, they're crucial for performance
0:19:01.179,0:19:06.059
on processors, they give a huge speed[br]boost and there's a huge time difference
0:19:06.059,0:19:11.569
between having a cache and not having a[br]cache for your executables. But the
0:19:11.569,0:19:16.080
downside to this is that big time[br]difference also allows the attacker to
0:19:16.080,0:19:21.620
infer information about how the victim is[br]using the cache. We're able to use these
0:19:21.620,0:19:24.429
cache attacks in the two different[br]scenarios of, where memory is shared, in
0:19:24.429,0:19:28.230
the case of the Flush+Reload and[br]Flush+Flush attacks and in the case where
0:19:28.230,0:19:31.739
memory is not shared, in the case of the[br]Prime+Probe attack. And finally the
0:19:31.739,0:19:36.659
important thing to keep in mind is that,[br]for these cache attacks, we know where the
0:19:36.659,0:19:40.480
victim is looking, but we don't know what[br]they see. So we don't know the contents of
0:19:40.480,0:19:44.360
the memory that the victim is actually[br]seeing, we just know the location and the
0:19:44.360,0:19:51.549
addresses. So, what does an example trace[br]of these attacks look like? Well, there's
0:19:51.549,0:19:56.451
an easy way to represent these as two-[br]dimensional images. So in this image, we
0:19:56.451,0:20:01.760
have our horizontal axis as time, so each[br]column in this image represents a
0:20:01.760,0:20:07.159
different time slice, a different[br]iteration of the Prime measure and Probe.
0:20:07.159,0:20:11.440
So, then we also have the vertical access[br]which is the different set IDs, which is
0:20:11.440,0:20:18.360
the location that's accessed by the victim[br]process, and then here a pixel is white if
0:20:18.360,0:20:24.159
the victim accessed that set ID during[br]that time slice. So, as you look from left
0:20:24.159,0:20:28.139
to right as time moves forward, you can[br]sort of see the changes in the patterns of
0:20:28.139,0:20:34.070
the memory accesses made by the victim[br]process. Now, for this particular example
0:20:34.070,0:20:39.860
the trace is captured on an execution of[br]AES repeated several times, an AES
0:20:39.860,0:20:44.519
encryption repeated about 20 times. And[br]you can tell that this is a repeated
0:20:44.519,0:20:49.070
action because you see the same repeated[br]memory access patterns in the data, you
0:20:49.070,0:20:55.320
see the same structures repeated over and[br]over. So, you know that this is reflecting
0:20:55.320,0:21:00.749
at what's going on throughout time, but[br]what does it have to do with AES itself?
0:21:00.749,0:21:05.950
Well, if we take the same trace with the[br]same settings, but a different key, we see
0:21:05.950,0:21:11.590
that there is a different memory access[br]pattern with different repetition within
0:21:11.590,0:21:18.200
the trace. So, only the key changed, the[br]code didn't change. So, even though we're
0:21:18.200,0:21:22.130
not able to read the contents of the key[br]directly using this cache attack, we know
0:21:22.130,0:21:25.610
that the key is changing these memory[br]access patterns, and if we can see these
0:21:25.610,0:21:30.850
memory access patterns, then we can infer[br]the key. So, that's the essential idea: we
0:21:30.850,0:21:35.380
want to make these images as clear as[br]possible and as descriptive as possible so
0:21:35.380,0:21:42.279
we have the best chance of learning what[br]those secrets are. And we can define the
0:21:42.279,0:21:47.389
metrics for what makes these cache attacks[br]powerful in a few different ways. So, the
0:21:47.389,0:21:51.759
three ways we'll be looking at are spatial[br]resolution, temporal resolution and noise.
0:21:51.759,0:21:56.300
So, spatial resolution refers to how[br]accurately we can determine the where. If
0:21:56.300,0:22:00.510
we know that the victim access to memory[br]address within 1,000 bytes, that's
0:22:00.510,0:22:06.820
obviously not as powerful as knowing where[br]they accessed within 512 bytes. Temporal
0:22:06.820,0:22:12.049
resolution is similar, where we want to[br]know the order of what accesses the victim
0:22:12.049,0:22:17.769
made. So if that time slice during our[br]attack is 1 millisecond, we're going to
0:22:17.769,0:22:22.139
get much better ordering information on[br]those memory access than we would get if
0:22:22.139,0:22:27.350
we only saw all the memory accesses over[br]the course of one second. So the shorter
0:22:27.350,0:22:32.159
that time slice, the better the temporal[br]resolution, the longer our picture will be
0:22:32.159,0:22:37.790
on the horizontal access, and the clearer[br]of an image of the cache that we'll see.
0:22:37.790,0:22:41.419
And the last metric to evaluate our[br]attacks on is noise and that reflects how
0:22:41.419,0:22:46.070
accurately our measurements reflect the[br]true state of the cache. So, right now
0:22:46.070,0:22:49.950
we've been using time and data to infer[br]whether or not an item was in the cache or
0:22:49.950,0:22:54.340
not, but this is a little bit noisy. It's[br]possible that we'll have false positives
0:22:54.340,0:22:57.370
or false negatives, so we want to keep[br]that in mind as we look at the different
0:22:57.370,0:23:03.081
attacks. So, that's essentially cache[br]attacks, and then, in a nutshell and
0:23:03.081,0:23:06.519
that's all you really need to understand[br]in order to understand these attacks as
0:23:06.519,0:23:11.389
they've been implemented on Trusted[br]Execution Environments. And the first
0:23:11.389,0:23:14.510
particular attack that we're going to be[br]looking at is called a Controlled-Channel
0:23:14.510,0:23:19.890
Attack on SGX, and this attack isn't[br]necessarily a cache attack, but we can
0:23:19.890,0:23:23.770
analyze it in the same way that we analyze[br]the cache attacks. So, it's still useful
0:23:23.770,0:23:30.940
to look at. Now, if you remember how[br]memory management occurs with SGX, we know
0:23:30.940,0:23:36.210
that if a page fault occurs during SGX[br]Enclave code execution, that page fault is
0:23:36.210,0:23:43.019
handled by the kernel. So, the kernel has[br]to know which page the Enclave needs to be
0:23:43.019,0:23:48.050
paged in. The kernel already gets some[br]information about what the Enclave is
0:23:48.050,0:23:54.789
looking at. Now, in the Controlled-Channel[br]attack, there's a, what the attacker does
0:23:54.789,0:23:59.839
from the non-trusted OS is the attacker[br]pages almost every other page from the
0:23:59.839,0:24:05.260
Enclave out of memory. So no matter[br]whatever page that Enclave tries to
0:24:05.260,0:24:09.770
access, it's very likely to cause a page[br]fault, which will be redirected to the
0:24:09.770,0:24:14.150
non-trusted OS, where the non-trusted OS[br]can record it, page out any other pages
0:24:14.150,0:24:20.429
and continue execution. So, the OS[br]essentially gets a list of sequential page
0:24:20.429,0:24:26.259
accesses made by the SGX Enclaves, all by[br]capturing the page fault handler. This is
0:24:26.259,0:24:29.669
a very general attack, you don't need to[br]know what's going on in the Enclave in
0:24:29.669,0:24:33.460
order to pull this off. You just load up[br]an arbitrary Enclave and you're able to
0:24:33.460,0:24:40.720
see which pages that Enclave is trying to[br]access. So, how does it do on our metrics?
0:24:40.720,0:24:44.270
First of all, this spatial resolution is[br]not great. We can only see where the
0:24:44.270,0:24:50.470
victim is accessing within 4096 bytes or[br]the size of a full page because SGX
0:24:50.470,0:24:55.519
obscures the offset into the page where[br]the page fault occurs. The temporal
0:24:55.519,0:24:58.760
resolution is good but not great, because[br]even though we're able to see any
0:24:58.760,0:25:04.450
sequential accesses to different pages[br]we're not able to see sequential accesses
0:25:04.450,0:25:09.970
to the same page because we need to keep[br]that same page paged-in while we let our
0:25:09.970,0:25:15.490
SGX Enclave run for that small time slice.[br]So temporal resolution is good but not
0:25:15.490,0:25:22.440
perfect. But the noise is, there is no[br]noise in this attack because no matter
0:25:22.440,0:25:26.149
where the page fault occurs, the untrusted[br]operating system is going to capture that
0:25:26.149,0:25:30.180
page fault and is going to handle it. So,[br]it's very low noise, not great spatial
0:25:30.180,0:25:37.490
resolution but overall still a powerful[br]attack. But we still want to improve on
0:25:37.490,0:25:40.700
that spatial resolution, we want to be[br]able to see what the Enclave is doing that
0:25:40.700,0:25:45.970
greater than a resolution of a one page of[br]four kilobytes. So that's exactly what the
0:25:45.970,0:25:50.179
CacheZoom paper does, and instead of[br]interrupting the SGX Enclave execution
0:25:50.179,0:25:55.370
with page faults, it uses timer[br]interrupts. Because the untrusted
0:25:55.370,0:25:59.280
operating system is able to schedule when[br]timer interrupts occur, so it's able to
0:25:59.280,0:26:03.320
schedule them at very tight intervals, so[br]it's able to get that small and tight
0:26:03.320,0:26:08.549
temporal resolution. And essentially what[br]happens in between is this timer
0:26:08.549,0:26:13.410
interrupts fires, the untrusted operating[br]system runs the Prime+Probe attack code in
0:26:13.410,0:26:18.240
this case, and resumes execution of the[br]onclick process, and this repeats. So this
0:26:18.240,0:26:24.549
is a Prime+Probe attack on the L1 data[br]cache. So, this attack let's you see what
0:26:24.549,0:26:30.529
data The Enclave is looking at. Now, this[br]attack could be easily modified to use the
0:26:30.529,0:26:36.000
L1 instruction cache, so in that case you[br]learn which instructions The Enclave is
0:26:36.000,0:26:41.419
executing. And overall this is an even[br]more powerful attack than the Control-
0:26:41.419,0:26:46.429
Channel attack. If we look at the metrics,[br]we can see that the spatial resolution is
0:26:46.429,0:26:50.360
a lot better, now we're looking at spatial[br]resolution of 64 bytes or the size of an
0:26:50.360,0:26:55.370
individual line. The temporal resolution[br]is very good, it's "almost unlimited", to
0:26:55.370,0:27:00.250
quote the paper, because the untrusted[br]operating system has the privilege to keep
0:27:00.250,0:27:05.179
scheduling those time interrupts closer[br]and closer together until it's able to
0:27:05.179,0:27:10.260
capture very small time slices of the[br]victim process .And the noise itself is
0:27:10.260,0:27:14.559
low, we're still using a cycle counter to[br]measure the time it takes to load memory
0:27:14.559,0:27:20.629
in and out of the cache, but it's, it's[br]useful, the chances of having a false
0:27:20.629,0:27:26.809
positive or false negative are low, so the[br]noise is low as well. Now, we can also
0:27:26.809,0:27:31.129
look at Trust Zone attacks, because so far[br]the attacks that we've looked at, the
0:27:31.129,0:27:35.130
passive attacks, have been against SGX and[br]those attacks on SGX have been pretty
0:27:35.130,0:27:40.669
powerful. So, what are the published[br]attacks on Trust Zone? Well, there's one
0:27:40.669,0:27:44.990
called TruSpy, which is kind of similar in[br]concept to the CacheZoom attack that we
0:27:44.990,0:27:51.629
just looked at on SGX. It's once again a[br]Prime+probe attack on the L1 data cache,
0:27:51.629,0:27:57.129
and the difference here is that instead of[br]interrupting the victim code execution
0:27:57.129,0:28:04.460
multiple times, the TruSpy attack does the[br]prime step, does the full AES encryption,
0:28:04.460,0:28:08.539
and then does the probe step. And the[br]reason they do this, is because as they
0:28:08.539,0:28:13.330
say, the secure world is protected, and is[br]not interruptible in the same way that SGX
0:28:13.330,0:28:20.690
is interruptable. But even despite this,[br]just having one measurement per execution,
0:28:20.690,0:28:24.940
the TruSpy authors were able to use some[br]statistics to still recover the AES key
0:28:24.940,0:28:30.460
from that noise. And their methods were so[br]powerful, they are able to do this from an
0:28:30.460,0:28:34.539
unapproved application in user land, so[br]they don't even need to be running within
0:28:34.539,0:28:39.820
the kernel in order to be able to pull off[br]this attack. So, how does this attack
0:28:39.820,0:28:43.360
measure up? The spatial resolution is once[br]again 64 bytes because that's the size of
0:28:43.360,0:28:48.559
a cache line on this processor, and the[br]temporal resolution is, is pretty poor
0:28:48.559,0:28:54.190
here, because we only get one measurement[br]per execution of the AES encryption. This
0:28:54.190,0:28:58.700
is also a particularly noisy attack[br]because we're making the measurements from
0:28:58.700,0:29:02.659
the user land, but even if we make the[br]measurements from the kernel, we're still
0:29:02.659,0:29:05.789
going to have the same issues of false[br]positives and false negatives associated
0:29:05.789,0:29:12.470
with using a cycle counter to measure[br]membership in a cache. So, we'd like to
0:29:12.470,0:29:16.389
improve this a little bit. We'd like to[br]improve the temporal resolution, so we
0:29:16.389,0:29:20.749
have the power of the cache attack to be a[br]little bit closer on TrustZone, as it is
0:29:20.749,0:29:27.149
on SGX. So, we want to improve that[br]temporal resolution. Let's dig into that
0:29:27.149,0:29:30.549
statement a little bit, that the secure[br]world is protected and not interruptable.
0:29:30.549,0:29:36.499
And to do, this we go back to this diagram[br]of ARMv8 and how that TrustZone is set up.
0:29:36.499,0:29:41.490
So, it is true that when an interrupt[br]occurs, it is directed to the monitor and,
0:29:41.490,0:29:45.530
because the monitor operates in the secure[br]world, if we interrupt secure code that's
0:29:45.530,0:29:49.081
running an exception level 0, we're just[br]going to end up running secure code an
0:29:49.081,0:29:54.239
exception level 3. So, this doesn't[br]necessarily get us anything. I think,
0:29:54.239,0:29:57.880
that's what the author's mean by saying[br]that it's protected against this. Just by
0:29:57.880,0:30:02.780
setting an interrupt, we don't have a[br]way to redirect our flow to the non-
0:30:02.780,0:30:08.190
trusted code. At least that's how it works[br]in theory. In practice, the Linux
0:30:08.190,0:30:11.840
operating system, running in exception[br]level 1 in the non-secure world, kind of
0:30:11.840,0:30:15.299
needs interrupts in order to be able to[br]work, so if an interrupt occurs and it's
0:30:15.299,0:30:18.120
being sent to the monitor, the monitor[br]will just forward it right to the non-
0:30:18.120,0:30:22.500
secure operating system. So, we have[br]interrupts just the same way as we did in
0:30:22.500,0:30:28.930
CacheZoom. And we can improve the[br]TrustZone attacks by using this idea: We
0:30:28.930,0:30:33.549
have 2 cores, where one core is running[br]the secure code, the other core is running
0:30:33.549,0:30:38.101
the non-secure code, and the non-secure[br]code is sending interrupts to the secure-
0:30:38.101,0:30:42.809
world core and that will give us that[br]interleaving of attacker process and
0:30:42.809,0:30:47.409
victim process that allow us to have a[br]powerful prime-and-probe attack. So, what
0:30:47.409,0:30:51.139
does this look like? We have the attack[br]core and the victim core. The attack core
0:30:51.139,0:30:54.909
sends an interrupt to the victim core.[br]This interrupt is captured by the monitor,
0:30:54.909,0:30:58.769
which passes it to the non-secure[br]operating system. The not-secure operating
0:30:58.769,0:31:02.979
system transfers this to our attack code,[br]which runs the prime-and-probe attack.
0:31:02.979,0:31:06.529
Then, we leave the interrupt, the[br]execution within the victim code in the
0:31:06.529,0:31:10.910
secure world resumes and we just repeat[br]this over and over. So, now we have that
0:31:10.910,0:31:16.690
interleaving of data... of the processes[br]of the attacker and the victim. So, now,
0:31:16.690,0:31:22.690
instead of having a temporal resolution of[br]one measurement per execution, we once
0:31:22.690,0:31:26.320
again have almost unlimited temporal[br]resolution, because we can just schedule
0:31:26.320,0:31:32.229
when we send those interrupts from the[br]attacker core. Now, we'd also like to
0:31:32.229,0:31:37.590
improve the noise measurements. The...[br]because if we can improve the noise, we'll
0:31:37.590,0:31:42.159
get clearer pictures and we'll be able to[br]infer those secrets more clearly. So, we
0:31:42.159,0:31:45.720
can get some improvement by switching the[br]measurements from userland and starting to
0:31:45.720,0:31:50.830
do those in the kernel, but again we have[br]the cycle counters. So, what if, instead
0:31:50.830,0:31:54.330
of using the cycle counter to measure[br]whether or not something is in the cache,
0:31:54.330,0:32:00.070
we use the other performance counters?[br]Because on ARMv8 platforms, there is a way
0:32:00.070,0:32:03.769
to use performance counters to measure[br]different events, such as cache hits and
0:32:03.769,0:32:09.809
cache misses. So, these events and these[br]performance monitors require privileged
0:32:09.809,0:32:15.330
access in order to use, which, for this[br]attack, we do have. Now, in a typical
0:32:15.330,0:32:18.779
cache text scenario we wouldn't have[br]access to these performance monitors,
0:32:18.779,0:32:22.259
which is why they haven't really been[br]explored before, but in this weird
0:32:22.259,0:32:25.250
scenario where we're attacking the less[br]privileged code from the more privileged
0:32:25.250,0:32:29.340
code, we do have access to these[br]performance monitors and we can use these
0:32:29.340,0:32:33.640
monitors during the probe step to get a[br]very accurate count of whether or not a
0:32:33.640,0:32:39.519
certain memory load caused a cache miss or[br]a cache hit. So, we're able to essentially
0:32:39.519,0:32:45.720
get rid of the different levels of noise.[br]Now, one thing to point out is that maybe
0:32:45.720,0:32:49.230
we'd like to use these ARMv8 performance[br]counters in order to count the different
0:32:49.230,0:32:53.729
events that are occurring in the secure[br]world code. So, maybe we start the
0:32:53.729,0:32:57.909
performance counters from the non-secure[br]world, let the secure world run and then,
0:32:57.909,0:33:01.669
when they secure world exits, we use the[br]non-secure world to read these performance
0:33:01.669,0:33:05.440
counters and maybe we'd like to see how[br]many instructions the secure world
0:33:05.440,0:33:09.019
executed or how many branch instructions[br]or how many arithmetic instructions or how
0:33:09.019,0:33:13.179
many cache misses there were. But[br]unfortunately, ARMv8 took this into
0:33:13.179,0:33:17.350
account and by default, performance[br]counters that are started in the non-
0:33:17.350,0:33:20.769
secure world will not measure events that[br]happen in the secure world, which is
0:33:20.769,0:33:24.570
smart; which is how it should be. And the[br]only reason I bring this up is because
0:33:24.570,0:33:29.320
that's not how it is an ARMv7. So, we go[br]into a whole different talk with that,
0:33:29.320,0:33:33.909
just exploring the different implications[br]of what that means, but I want to focus on
0:33:33.909,0:33:39.230
ARMv8, because that's that's the newest of[br]the new. So, we'll keep looking at that.
0:33:39.230,0:33:42.540
So, we instrument the primary probe attack[br]to use these performance counters, so we
0:33:42.540,0:33:46.509
can get a clear picture of what is and[br]what is not in the cache. And instead of
0:33:46.509,0:33:52.399
having noisy measurements based on time,[br]we have virtually no noise at all, because
0:33:52.399,0:33:55.919
we get the truth straight from the[br]processor itself, whether or not we
0:33:55.919,0:34:01.660
experience a cache miss. So, how do we[br]implement these attacks, where do we go
0:34:01.660,0:34:05.549
from here? We have all these ideas; we[br]have ways to make these TrustZone attacks
0:34:05.549,0:34:11.840
more powerful, but that's not worthwhile,[br]unless we actually implement them. So, the
0:34:11.840,0:34:16.510
goal here is to implement these attacks on[br]TrustZone and since typically the non-
0:34:16.510,0:34:20.960
secure world operating system is based on[br]Linux, we'll take that into account when
0:34:20.960,0:34:25.360
making our implementation. So, we'll write[br]a kernel module that uses these
0:34:25.360,0:34:29.340
performance counters and these inner[br]processor interrupts, in order to actually
0:34:29.340,0:34:33.179
accomplish these attacks; and we'll write[br]it in such a way that it's very
0:34:33.179,0:34:37.300
generalizable. So you can take this kernel[br]module that's was written for one device
0:34:37.300,0:34:41.650
-- in my case I did most of my attention[br]on the Nexus 5x -- and it's very easy to
0:34:41.650,0:34:46.739
transfer this module to any other Linux-[br]based device that has a trust zone that has
0:34:46.739,0:34:52.139
these shared caches, so it should be very[br]easy to port this over and to perform
0:34:52.139,0:34:57.810
these same powerful cache attacks on[br]different platforms. We can also do clever
0:34:57.810,0:35:01.500
things based on the Linux operating[br]system, so that we limit that collection
0:35:01.500,0:35:05.500
window to just when we're executing within[br]the secure world, so we can align our
0:35:05.500,0:35:10.580
traces a lot more easily that way. And the[br]end result is having a synchronized trace
0:35:10.580,0:35:14.930
for each different attacks, because, since[br]we've written in a modular way, we're able
0:35:14.930,0:35:19.440
to run different attacks simultaneously.[br]So, maybe we're running one prime-and-
0:35:19.440,0:35:23.050
probe attack on the L1 data cache, to[br]learn where the victim is accessing
0:35:23.050,0:35:27.050
memory, and we're simultaneously running[br]an attack on the L1 instruction cache, so
0:35:27.050,0:35:33.910
we can see what instructions the victim is[br]executing. And these can be aligned. So,
0:35:33.910,0:35:37.080
the tool that I've written is a[br]combination of a kernel module which
0:35:37.080,0:35:41.580
actually performs this attack, a userland[br]binary which schedules these processes to
0:35:41.580,0:35:45.860
different cores, and a GUI that will allow[br]you to interact with this kernel module
0:35:45.860,0:35:49.710
and rapidly start doing these cache[br]attacks for yourself and perform them
0:35:49.710,0:35:56.860
against different processes and secure[br]code and secure world code. So, the
0:35:56.860,0:36:02.820
intention behind this tool is to be very[br]generalizable to make it very easy to use
0:36:02.820,0:36:08.430
this platform for different devices and to[br]allow people way to, once again, quickly
0:36:08.430,0:36:12.360
develop these attacks; and also to see if[br]their own code is vulnerable to these
0:36:12.360,0:36:18.490
cache attacks, to see if their code has[br]these secret dependent memory accesses.
0:36:18.490,0:36:25.349
So, can we get even better... spatial[br]resolution? Right now, we're down to 64
0:36:25.349,0:36:30.320
bytes and that's the size of a cache line,[br]which is the size of our shared hardware.
0:36:30.320,0:36:35.510
And on SGX, we actually can get better[br]than 64 bytes, based on something called a
0:36:35.510,0:36:39.160
branch-shadowing attack. So, a branch-[br]shadowing attack takes advantage of
0:36:39.160,0:36:42.730
something called the branch target buffer.[br]And the branch target buffer is a
0:36:42.730,0:36:48.490
structure that's used for branch[br]prediction. It's similar to a cache, but
0:36:48.490,0:36:51.740
there's a key difference where the branch[br]target buffer doesn't compare the full
0:36:51.740,0:36:54.770
address, when seeing if something is[br]already in the cache or not: It doesn't
0:36:54.770,0:36:59.701
compare all of the upper level bits. So,[br]that means that it's possible that two
0:36:59.701,0:37:04.140
different addresses will experience a[br]collision, and the same entry from that
0:37:04.140,0:37:08.870
BTB cache will be read out for an improper[br]address. Now, since this is just for
0:37:08.870,0:37:12.090
branch prediction, the worst that can[br]happen is, you'll get a misprediction and
0:37:12.090,0:37:18.070
a small time penalty, but that's about it.[br]The idea of behind the branch-shadowing
0:37:18.070,0:37:22.440
attack is leveraging the small difference[br]in this overlapping and this collision of
0:37:22.440,0:37:28.540
addresses in order to sort of execute a[br]shared code cell flush-and-reload attack
0:37:28.540,0:37:35.330
on the branch target buffer. So, here what[br]goes on is, during the attack the attacker
0:37:35.330,0:37:39.650
modifies the SGX Enclave to make sure that[br]the branches that are within the Enclave
0:37:39.650,0:37:44.340
will collide with branches that are not in[br]the Enclave. The attacker executes the
0:37:44.340,0:37:50.440
Enclave code and then the attacker[br]executes their own code and based on the
0:37:50.440,0:37:55.460
outcome of the the victim code in that[br]cache, the attacker code may or may not
0:37:55.460,0:37:59.210
experience a branch prediction. So, the[br]attacker is able to tell the outcome of a
0:37:59.210,0:38:03.310
branch, because of this overlap in this[br]collision, like would be in a flush-and-
0:38:03.310,0:38:06.570
reload attack, where those memories[br]overlap between the attacker and the
0:38:06.570,0:38:14.020
victim. So here, our spatial resolution is[br]fantastic: We can tell down to individual
0:38:14.020,0:38:19.440
branch instructions in SGX; we can tell[br]exactly, which branches were executed and
0:38:19.440,0:38:25.010
which directions they were taken, in the[br]case of conditional branches. The temporal
0:38:25.010,0:38:29.720
resolution is also, once again, almost[br]unlimited, because we can use the same
0:38:29.720,0:38:33.880
timer interrupts in order to schedule our[br]process, our attacker process. And the
0:38:33.880,0:38:39.120
noise is, once again, very low, because we[br]can, once again, use the same sort of
0:38:39.120,0:38:43.980
branch misprediction counters, that exist[br]in the Intel world, in order to measure
0:38:43.980,0:38:51.510
this noise. So, does anything of that[br]apply to the TrustZone attacks? Well, in
0:38:51.510,0:38:55.040
this case the victim and attacker don't[br]share entries in the branch target buffer,
0:38:55.040,0:39:01.610
because the attacker is not able to map[br]the virtual address of the victim process.
0:39:01.610,0:39:05.340
But this is kind of reminiscent of our[br]earlier cache attacks, so our flush-and-
0:39:05.340,0:39:10.100
reload attack only worked when the attack[br]on the victim shared that memory, but we
0:39:10.100,0:39:13.930
still have the prime-and-probe attack for[br]when they don't. So, what if we use a
0:39:13.930,0:39:21.380
prime-and-probe-style attack on the branch[br]target buffer cache in ARM processors? So,
0:39:21.380,0:39:25.320
essentially what we do here is, we prime[br]the branch target buffer by executing mini
0:39:25.320,0:39:29.531
attacker branches to sort of fill up this[br]BTB cache with the attacker branch
0:39:29.531,0:39:34.770
prediction data; we let the victim execute[br]a branch which will evict an attacker BTB
0:39:34.770,0:39:39.120
entry; and then we have the attacker re-[br]execute those branches and see if there
0:39:39.120,0:39:45.120
have been any mispredictions. So now, the[br]cool thing about this attack is, the
0:39:45.120,0:39:50.320
structure of the BTB cache is different[br]from that of the L1 caches. So, instead of
0:39:50.320,0:39:59.750
having 256 different sets in the L1 cache,[br]the BTB cache has 2048 different sets, so
0:39:59.750,0:40:06.380
we can tell which branch it attacks, based[br]on which one of 2048 different set IDs
0:40:06.380,0:40:11.230
that it could fall into. And even more[br]than that, on the ARM platform, at least
0:40:11.230,0:40:15.730
on the Nexus 5x that I was working with,[br]the granularity is no longer 64 bytes,
0:40:15.730,0:40:21.830
which is the size of the line, it's now 16[br]bytes. So, we can see which branches the
0:40:21.830,0:40:27.620
the trusted code within TrustZone is[br]executing within 16 bytes. So, what does
0:40:27.620,0:40:31.820
this look like? So, previously with the[br]true-spy attack, this is sort of the
0:40:31.820,0:40:37.410
outcome of our prime-and-probe attack: We[br]get 1 measurement for those 256 different
0:40:37.410,0:40:43.420
set IDs. When we added those interrupts,[br]we're able to get that time resolution,
0:40:43.420,0:40:48.090
and it looks something like this. Now,[br]maybe you can see a little bit at the top
0:40:48.090,0:40:52.660
of the screen, how there's these repeated[br]sections of little white blocks, and you
0:40:52.660,0:40:56.720
can sort of use that to infer, maybe[br]there's the same cache line and cache
0:40:56.720,0:41:00.870
instructions that are called over and[br]over. So, just looking at this L1-I cache
0:41:00.870,0:41:06.920
attack, you can tell some information[br]about how the process went. Now, let's
0:41:06.920,0:41:11.870
compare that to the BTB attack. And I[br]don't know if you can see too clearly --
0:41:11.870,0:41:17.190
it's a it's a bit too high of resolution[br]right now -- so let's just focus in on one
0:41:17.190,0:41:22.580
small part of this overall trace. And this[br]is what it looks like. So, each of those
0:41:22.580,0:41:27.720
white pixels represents a branch that was[br]taken by that secure-world code and we can
0:41:27.720,0:41:31.070
see repeated patterns, we can see maybe[br]different functions that were called, we
0:41:31.070,0:41:35.310
can see different loops. And just by[br]looking at this 1 trace, we can infer a
0:41:35.310,0:41:40.110
lot of information on how that secure[br]world executed. So, it's incredibly
0:41:40.110,0:41:44.230
powerful and all of those secrets are just[br]waiting to be uncovered using these new
0:41:44.230,0:41:52.890
tools. So, where do we go from here? What[br]sort of countermeasures do we have? Well,
0:41:52.890,0:41:56.690
first of all I think, the long term[br]solution is going to be moving to no more
0:41:56.690,0:42:00.200
shared hardware. We need to have separate[br]hardware and no more shared caches in
0:42:00.200,0:42:05.750
order to fully get rid of these different[br]cache attacks. And we've already seen this
0:42:05.750,0:42:11.420
trend in different cell phones. So, for[br]example, in Apple SSEs for a long time now
0:42:11.420,0:42:15.521
-- I think since the Apple A7 -- the[br]secure Enclave, which runs the secure
0:42:15.521,0:42:21.000
code, has its own cache. So, these cache[br]attacks can't be accomplished from code
0:42:21.000,0:42:27.400
outside of that secure Enclave. So, just[br]by using that separate hardware, it knocks
0:42:27.400,0:42:30.970
out a whole class of different potential[br]side-channel and microarchitecture
0:42:30.970,0:42:35.610
attacks. And just recently, the Pixel 2 is[br]moving in the same direction. The Pixel 2
0:42:35.610,0:42:40.540
now includes a hardware security module[br]that includes cryptographic operations;
0:42:40.540,0:42:45.890
and that chip also has its own memory and[br]its own caches, so now we can no longer
0:42:45.890,0:42:51.270
use this attack to extract information[br]about what's going on in this external
0:42:51.270,0:42:56.530
hardware security module. But even then,[br]using this separate hardware, that doesn't
0:42:56.530,0:43:00.800
solve all of our problems. Because we[br]still have the question of "What do we
0:43:00.800,0:43:05.900
include in this separate hardware?" On the[br]one hand, we want to include more code in
0:43:05.900,0:43:11.370
that a separate hardware, so we're less[br]vulnerable to these side-channel attacks,
0:43:11.370,0:43:16.490
but on the other hand, we don't want to[br]expand the attack surface anymore. Because
0:43:16.490,0:43:19.060
the more code we include in these secure[br]environments, the more like that a
0:43:19.060,0:43:22.600
vulnerabiliyy will be found and the[br]attacker will be able to get a foothold
0:43:22.600,0:43:26.470
within the secure, trusted environment.[br]So, there's going to be a balance between
0:43:26.470,0:43:30.270
what do you choose to include in the[br]separate hardware and what you don't. So,
0:43:30.270,0:43:35.220
do you include DRM code? Do you include[br]cryptographic code? It's still an open
0:43:35.220,0:43:41.800
question. And that's sort of the long-term[br]approach. In the short term, you just kind
0:43:41.800,0:43:46.370
of have to write side-channel-free[br]software: Just be very careful about what
0:43:46.370,0:43:50.811
your process does, if there are any[br]secret, dependent memory accesses or a
0:43:50.811,0:43:55.310
secret, dependent branching or secret,[br]dependent function calls, because any of
0:43:55.310,0:44:00.010
those can leak the secrets out of your[br]trusted execution environment. So, here
0:44:00.010,0:44:03.460
are the things that, if you are a[br]developer of trusted execution environment
0:44:03.460,0:44:08.150
code, that I want you to keep in mind:[br]First of all, performance is very often at
0:44:08.150,0:44:13.130
odds with security. We've seen over and[br]over that the performance enhancements to
0:44:13.130,0:44:18.880
these processors open up the ability for[br]these microarchitectural attacks to be
0:44:18.880,0:44:23.750
more efficient. Additionally, these[br]trusted execution environments don't
0:44:23.750,0:44:27.160
protect against everything; there are[br]still these side-channel attacks and these
0:44:27.160,0:44:32.310
microarchitectural attacks that these[br]systems are vulnerable to. These attacks
0:44:32.310,0:44:37.650
are very powerful; they can be[br]accomplished simply; and with the
0:44:37.650,0:44:41.770
publication of the code that I've written,[br]it should be very simple to get set up and
0:44:41.770,0:44:46.070
to analyze your own code to see "Am I[br]vulnerable, do I expose information in the
0:44:46.070,0:44:52.760
same way?" And lastly, it only takes 1[br]small error, 1 tiny leak from your trusted
0:44:52.760,0:44:56.670
and secure code, in order to extract the[br]entire secret, in order to bring the whole
0:44:56.670,0:45:03.920
thing down. So, what I want to leave you[br]with is: I want you to remember that you
0:45:03.920,0:45:08.520
are responsible for making sure that your[br]program is not vulnerable to these
0:45:08.520,0:45:13.110
microarchitectural attacks, because if you[br]do not take responsibility for this, who
0:45:13.110,0:45:16.645
will? Thank you!
0:45:16.645,0:45:25.040
Applause
0:45:25.040,0:45:29.821
Herald: Thank you very much. Please, if[br]you want to leave the hall, please do it
0:45:29.821,0:45:35.000
quiet and take all your belongings with[br]you and respect the speaker. We have
0:45:35.000,0:45:43.230
plenty of time, 16, 17 minutes for Q&A, so[br]please line up on the microphones. No
0:45:43.230,0:45:50.650
questions from the signal angel, all[br]right. So, we can start with microphone 6,
0:45:50.650,0:45:54.770
please.[br]Mic 6: Okay. There was a symbol of secure
0:45:54.770,0:46:01.160
OSes at the ARM TrustZone. Which a idea of[br]them if the non-secure OS gets all the
0:46:01.160,0:46:04.210
interrupts? What does is[br]the secure OS for?
0:46:04.210,0:46:08.880
Keegan: Yeah so, in the ARMv8 there are a[br]couple different kinds of interrupts. So,
0:46:08.880,0:46:11.760
I think -- if I'm remembering the[br]terminology correctly -- there is an IRQ
0:46:11.760,0:46:16.800
and an FIQ interrupt. So, the non-secure[br]mode handles the IRQ interrupts and the
0:46:16.800,0:46:20.440
secure mode handles the FIQ interrupts.[br]So, depending on which one you send, it
0:46:20.440,0:46:24.840
will depend on which direction that[br]monitor will direct that interrupt.
0:46:29.640,0:46:32.010
Mic 6: Thank you.[br]Herald: Okay, thank you. Microphone number
0:46:32.010,0:46:37.930
7, please.[br]Mic 7: Does any of your present attacks on
0:46:37.930,0:46:45.290
TrustZone also apply to the AMD[br]implementation of TrustZone or are you
0:46:45.290,0:46:48.380
looking into it?[br]Keegan: I haven't looked into AMD too
0:46:48.380,0:46:54.011
much, because, as far as I can tell,[br]that's not used as commonly, but there are
0:46:54.011,0:46:57.490
many different types of trusted execution[br]environments. The 2 that I focus on were
0:46:57.490,0:47:04.760
SGX and TrustZone, because those are the[br]most common examples that I've seen.
0:47:04.760,0:47:09.250
Herald: Thank you. Microphone[br]number 8, please.
0:47:09.250,0:47:20.370
Mic 8: When TrustZone is moved to[br]dedicated hardware, dedicated memory,
0:47:20.370,0:47:27.780
couldn't you replicate the userspace[br]attacks by loading your own trusted
0:47:27.780,0:47:32.210
userspace app and use it as an[br]oracle of some sorts?
0:47:32.210,0:47:35.760
Keegan: If you can load your own trust[br]code, then yes, you could do that. But in
0:47:35.760,0:47:39.650
many of the models I've seen today, that's[br]not possible. So, that's why you have
0:47:39.650,0:47:44.250
things like code signing, which prevent[br]the arbitrary user from running their own
0:47:44.250,0:47:50.310
code in the trusted OS... or in the the[br]trusted environment.
0:47:50.310,0:47:55.010
Herald: All right. Microphone number 1.[br]Mic 1: So, these attacks are more powerful
0:47:55.010,0:48:00.720
against code that's running in... just the[br]execution environments than similar
0:48:00.720,0:48:07.100
attacks would be against ring-3 code, or,[br]in general, trusted code. Does that mean
0:48:07.100,0:48:10.910
that trusting execution environments are[br]basically an attractive nuisance that we
0:48:10.910,0:48:15.080
shouldn't use?[br]Keegan: There's still a large benefit to
0:48:15.080,0:48:17.600
using these trusted execution[br]environments. The point I want to get
0:48:17.600,0:48:21.390
across is that, although they add a lot of[br]features, they don't protect against
0:48:21.390,0:48:25.450
everything, so you should keep in mind[br]that these side-channel attacks do still
0:48:25.450,0:48:28.820
exist and you still need to protect[br]against them. But overall, these are
0:48:28.820,0:48:35.930
better things and worthwhile in including.[br]Herald: Thank you. Microphone number 1
0:48:35.930,0:48:41.580
again, please[br]Mic 1: So, AMD is doing something with
0:48:41.580,0:48:47.780
encrypting memory and I'm not sure if they[br]encrypt addresses, too, and but would that
0:48:47.780,0:48:53.090
be a defense against such attacks?[br]Keegan: So, I'm not too familiar with AMD,
0:48:53.090,0:48:57.690
but SGX also encrypts memory. It encrypts[br]it in between the lowest-level cache and
0:48:57.690,0:49:02.170
the main memory. But that doesn't really[br]have an impact on the actual operation,
0:49:02.170,0:49:06.220
because the memories encrypt at the cache[br]line level and as the attacker, we don't
0:49:06.220,0:49:10.380
care what that data is within that cache[br]line, we only care which cache line is
0:49:10.380,0:49:16.150
being accessed.[br]Mic 1: If you encrypt addresses, wouldn't
0:49:16.150,0:49:20.551
that help against that?[br]Keegan: I'm not sure, how you would
0:49:20.551,0:49:25.070
encrypt the addresses yourself. As long as[br]those adresses map into the same set IDs
0:49:25.070,0:49:30.200
that the victim can map into, then the[br]victim could still pull off the same style
0:49:30.200,0:49:35.030
of attacks.[br]Herald: Great. We have a question from the
0:49:35.030,0:49:38.200
internet, please.[br]Signal Angel: The question is "Does the
0:49:38.200,0:49:42.410
secure enclave on the Samsung Exynos[br]distinguish the receiver of the messag, so
0:49:42.410,0:49:46.830
that if the user application asked to[br]decode an AES message, can one sniff on
0:49:46.830,0:49:52.220
the value that the secure[br]enclave returns?"
0:49:52.220,0:49:56.680
Keegan: So, that sounds like it's asking[br]about the true-spy style attack, where
0:49:56.680,0:50:01.270
it's calling to the secure world to[br]encrypt something with AES. I think, that
0:50:01.270,0:50:04.830
would all depend on the different[br]implementation: As long as it's encrypting
0:50:04.830,0:50:09.790
for a certain key and it's able to do that[br]repeatably, then the attack would,
0:50:09.790,0:50:16.290
assuming a vulnerable AES implementation,[br]would be able to extract that key out.
0:50:16.290,0:50:20.750
Herald: Cool. Microphone number 2, please.[br]Mic 2: Do you recommend a reference to
0:50:20.750,0:50:25.350
understand how these cache line attacks[br]and branch oracles actually lead to key
0:50:25.350,0:50:29.540
recovery?[br]Keegan: Yeah. So, I will flip through
0:50:29.540,0:50:33.620
these pages which include a lot of the[br]references for the attacks that I've
0:50:33.620,0:50:38.030
mentioned, so if you're watching the[br]video, you can see these right away or
0:50:38.030,0:50:43.200
just access the slides. And a lot of these[br]contain good starting points. So, I didn't
0:50:43.200,0:50:46.340
go into a lot of the details on how, for[br]example, the true-spy attack recovered
0:50:46.340,0:50:53.090
that AES key., but that paper does have a[br]lot of good links, how those areas can
0:50:53.090,0:50:56.350
lead to key recovery. Same thing with the[br]CLKSCREW attack, how the different fault
0:50:56.350,0:51:03.070
injection can lead to key recovery.[br]Herald: Microphone number 6, please.
0:51:03.070,0:51:07.900
Mic 6: I think my question might have been[br]very, almost the same thing: How hard is
0:51:07.900,0:51:11.920
it actually to recover the keys? Is this[br]like a massive machine learning problem or
0:51:11.920,0:51:18.500
is this something that you can do[br]practically on a single machine?
0:51:18.500,0:51:21.640
Keegan: It varies entirely by the end[br]implementation. So, for all these attacks
0:51:21.640,0:51:25.750
work, you need to have some sort of[br]vulnerable implementation and some
0:51:25.750,0:51:29.010
implementations leak more data than[br]others. In the case of a lot of the AES
0:51:29.010,0:51:33.880
attacks, where you're doing the passive[br]attacks, those are very easy to do on just
0:51:33.880,0:51:37.630
your own computer. For the AES fault[br]injection attack, I think that one
0:51:37.630,0:51:42.340
required more brute force, in the CLKSCREW[br]paper, so that one required more computing
0:51:42.340,0:51:49.780
resources, but still, it was entirely[br]practical to do in a realistic setting.
0:51:49.780,0:51:53.770
Herald: Cool, thank you. So, we have one[br]more: Microphone number 1, please.
0:51:53.770,0:51:59.080
Mic 1: So, I hope it's not a too naive[br]question, but I was wondering, since all
0:51:59.080,0:52:04.730
these attacks are based on cache hit and[br]misses, isn't it possible to forcibly
0:52:04.730,0:52:11.280
flush or invalidate or insert noise in[br]cache after each operation in this trust
0:52:11.280,0:52:23.520
environment, in order to mess up the[br]guesswork of the attacker? So, discarding
0:52:23.520,0:52:29.180
optimization and performance for[br]additional security benefits.
0:52:29.180,0:52:32.420
Keegan: Yeah, and that is absolutely[br]possible and you are absolutely right: It
0:52:32.420,0:52:36.300
does lead to a performance degradation,[br]because if you always flush the entire
0:52:36.300,0:52:41.190
cache every time you do a context switch,[br]that will be a huge performance hit. So
0:52:41.190,0:52:45.190
again, that comes down to the question of[br]the performance and security trade-off:
0:52:45.190,0:52:49.540
Which one do you end up going with? And it[br]seems historically the choice has been
0:52:49.540,0:52:54.000
more in the direction of performance.[br]Mic 1: Thank you.
0:52:54.000,0:52:56.920
Herald: But we have one more: Microphone[br]number 1, please.
0:52:56.920,0:53:01.500
Mic 1: So, I have more of a moral[br]question: So, how well should we really
0:53:01.500,0:53:07.720
protect from attacks which need some[br]ring-0 cooperation? Because, basically,
0:53:07.720,0:53:14.350
when we use TrustZone for purpose... we[br]would see clear, like protecting the
0:53:14.350,0:53:20.250
browser from interacting from outside[br]world, then we are basically using the
0:53:20.250,0:53:27.280
safe execution environment for sandboxing[br]the process. But once we need some
0:53:27.280,0:53:32.281
cooperation from the kernel, some of that[br]attacks, is in fact, empower the user
0:53:32.281,0:53:36.320
instead of the hardware producer.[br]Keegan: Yeah, and you're right. It
0:53:36.320,0:53:39.210
depends entirely on what your application[br]is and what your threat model is that
0:53:39.210,0:53:43.020
you're looking at. So, if you're using[br]these trusted execution environments to do
0:53:43.020,0:53:48.430
DRM, for example, then maybe you wouldn't[br]be worried about that ring-0 attack or
0:53:48.430,0:53:51.620
that privileged attacker who has their[br]phone rooted and is trying to recover
0:53:51.620,0:53:56.740
these media encryption keys from this[br]execution environment. But maybe there are
0:53:56.740,0:54:01.230
other scenarios where you're not as[br]worried about having an attack with a
0:54:01.230,0:54:05.580
compromised ring 0. So, it entirely[br]depends on context.
0:54:05.580,0:54:09.000
Herald: Alright, thank you. So, we have[br]one more: Microphone number 1, again.
0:54:09.000,0:54:10.990
Mic 1: Hey there. Great talk, thank you[br]very much.
0:54:10.990,0:54:13.040
Keegan: Thank you.[br]Mic 1: Just a short question: Do you have
0:54:13.040,0:54:16.980
any success stories about attacking the[br]TrustZone and the different
0:54:16.980,0:54:24.010
implementations of TE with some vendors[br]like some OEMs creating phones and stuff?
0:54:24.010,0:54:29.750
Keegan: Not that I'm announcing[br]at this time.
0:54:29.750,0:54:35.584
Herald: So, thank you very much. Please,[br]again a warm round of applause for Keegan!
0:54:35.584,0:54:39.998
Applause
0:54:39.998,0:54:45.489
34c3 postroll music
0:54:45.489,0:55:02.000
subtitles created by c3subtitles.de[br]in the year 2018. Join, and help us!