WEBVTT
00:00:00.000 --> 00:00:15.030
34C3 preroll music
00:00:15.030 --> 00:00:22.570
Herald: Hello fellow creatures.
Welcome and
00:00:22.570 --> 00:00:30.140
I wanna start with a question.
Another one: Who do we trust?
00:00:30.140 --> 00:00:36.500
Do we trust the TrustZones
on our smartphones?
00:00:36.500 --> 00:00:41.710
Well Keegan Ryan, he's really
fortunate to be here and
00:00:41.710 --> 00:00:51.710
he was inspired by another talk from the
CCC before - I think it was 29C3 and his
00:00:51.710 --> 00:00:57.550
research on smartphones and systems on a
chip used in smart phones will answer
00:00:57.550 --> 00:01:02.520
these questions if you can trust those
trusted execution environments. Please
00:01:02.520 --> 00:01:06.330
give a warm round of applause
to Keegan and enjoy!
00:01:06.330 --> 00:01:10.630
Applause
00:01:10.630 --> 00:01:14.220
Kegan Ryan: All right, thank you! So I'm
Keegan Ryan, I'm a consultant with NCC
00:01:14.220 --> 00:01:19.740
group and this is micro architectural
attacks on Trusted Execution Environments.
00:01:19.740 --> 00:01:23.250
So, in order to understand what a Trusted
Execution Environment is we need to go
00:01:23.250 --> 00:01:29.729
back into processor security, specifically
on x86. So as many of you are probably
00:01:29.729 --> 00:01:33.729
aware there are a couple different modes
which we can execute code under in x86
00:01:33.729 --> 00:01:39.290
processors and that includes ring 3, which
is the user code and the applications, and
00:01:39.290 --> 00:01:45.570
also ring 0 which is the kernel code. Now
there's also a ring 1 and ring 2 that are
00:01:45.570 --> 00:01:50.229
supposedly used for drivers or guest
operating systems but really it just boils
00:01:50.229 --> 00:01:56.159
down to ring 0 and ring 3. And in this
diagram we have here we see that privilege
00:01:56.159 --> 00:02:02.149
increases as we go up the diagram, so ring
0 is the most privileged ring and ring 3
00:02:02.149 --> 00:02:05.470
is the least privileged ring. So all of
our secrets, all of our sensitive
00:02:05.470 --> 00:02:10.030
information, all of the attackers goals
are in ring 0 and the attacker is trying
00:02:10.030 --> 00:02:15.890
to access those from the unprivileged
world of ring 3. Now you may have a
00:02:15.890 --> 00:02:20.150
question what if I want to add a processor
feature that I don't want ring 0 to be
00:02:20.150 --> 00:02:26.240
able to access? Well then you add ring -1
which is often used for a hypervisor. Now
00:02:26.240 --> 00:02:30.610
the hypervisor has all the secrets and the
hypervisor can manage different guest
00:02:30.610 --> 00:02:35.680
operating systems and each of these guest
operating systems can execute in ring 0
00:02:35.680 --> 00:02:41.300
without having any idea of the other
operating systems. So this way now the
00:02:41.300 --> 00:02:45.230
secrets are all in ring -1 so now the
attackers goals have shifted from ring 0
00:02:45.230 --> 00:02:50.760
to ring -1. The attacker has to attack
ring -1 from a less privileged ring and
00:02:50.760 --> 00:02:55.430
tries to access those secrets. But what if
you want to add a processor feature that
00:02:55.430 --> 00:03:00.560
you don't want ring -1 to be able to
access? So you add ring -2 which is System
00:03:00.560 --> 00:03:05.230
Management Mode and that's capable of
monitoring power, directly interfacing
00:03:05.230 --> 00:03:10.350
with firmware and other chips on a
motherboard and it's able to access and do
00:03:10.350 --> 00:03:13.820
a lot of things that the hypervisor is not
able to and now all of your secrets and
00:03:13.820 --> 00:03:17.880
all of your attacker goals are in ring -2
and the attacker has to attack those from
00:03:17.880 --> 00:03:22.400
a less privileged ring. Now maybe you want
to add something to your processor that
00:03:22.400 --> 00:03:26.900
you don't want ring -2 to be able access,
so you add ring -3 and I think you get the
00:03:26.900 --> 00:03:31.450
picture now. And we just keep on adding
more and more privilege rings and keep
00:03:31.450 --> 00:03:35.260
putting our secrets and our attackers
goals in these higher and higher
00:03:35.260 --> 00:03:41.260
privileged rings but what if we're
thinking about it wrong? What if instead
00:03:41.260 --> 00:03:46.710
we want to put all the secrets in the
least privileged ring? So this is sort of
00:03:46.710 --> 00:03:51.490
the idea behind SGX and it's useful for
things like DRM where you want that to run
00:03:51.490 --> 00:03:56.980
ring 3 code but have sensitive secrets or
other assigning capabilities running in
00:03:56.980 --> 00:04:02.050
ring 3. But this picture is getting a
little bit complicated, this diagram is a
00:04:02.050 --> 00:04:06.250
little bit complex so let's simplify it a
little bit. We'll only be looking at ring
00:04:06.250 --> 00:04:12.100
0 through ring 3 which is the kernel, the
userland and the SGX enclave which also
00:04:12.100 --> 00:04:16.910
executes in ring 3. Now when you're
executing code in the SGX enclave you
00:04:16.910 --> 00:04:22.170
first load the code into the enclave and
then from that point on you trust the
00:04:22.170 --> 00:04:26.980
execution of whatever's going on in that
enclave. You trust that the other elements
00:04:26.980 --> 00:04:31.640
the kernel, the userland, the other rings
are not going to be able to access what's
00:04:31.640 --> 00:04:38.020
in that enclave so you've made your
Trusted Execution Environment. This is a
00:04:38.020 --> 00:04:44.750
bit of a weird model because now your
attacker is in the ring 0 kernel and your
00:04:44.750 --> 00:04:48.840
target victim here is in ring 3. So
instead of the attacker trying to move up
00:04:48.840 --> 00:04:54.070
the privilege chain, the attacker is
trying to move down. Which is pretty
00:04:54.070 --> 00:04:57.820
strange and you might have some questions
like "under this model who handles memory
00:04:57.820 --> 00:05:01.470
management?" because traditionally that's
something that ring 0 would manage and
00:05:01.470 --> 00:05:05.290
ring 0 would be responsible for paging
memory in and out for different processes
00:05:05.290 --> 00:05:10.460
in different code that's executing it in
ring 3. But on the other hand you don't
00:05:10.460 --> 00:05:16.030
want that to happen with the SGX enclave
because what if the malicious ring 0 adds
00:05:16.030 --> 00:05:22.410
a page to the enclave that the enclave
doesn't expect? So in order to solve this
00:05:22.410 --> 00:05:28.950
problem, SGX does allow ring 0 to handle
page faults. But simultaneously and in
00:05:28.950 --> 00:05:35.380
parallel it verifies every memory load to
make sure that no access violations are
00:05:35.380 --> 00:05:40.139
made so that all the SGX memory is safe.
So it allows ring 0 to do its job but it
00:05:40.139 --> 00:05:45.010
sort of watches over at the same time to
make sure that nothing is messed up. So
00:05:45.010 --> 00:05:51.120
it's a bit of a weird convoluted solution
to a strange inverted problem but it works
00:05:51.120 --> 00:05:57.580
and that's essentially how SGX works and
the idea behind SGX. Now we can look at
00:05:57.580 --> 00:06:02.530
x86 and we can see that ARMv8 is
constructed in a similar way but it
00:06:02.530 --> 00:06:08.450
improves on x86 in a couple key ways. So
first of all ARMv8 gets rid of ring 1 and
00:06:08.450 --> 00:06:12.170
ring 2 so you don't have to worry about
those and it just has different privilege
00:06:12.170 --> 00:06:17.370
levels for userland and the kernel. And
these different privilege levels are
00:06:17.370 --> 00:06:21.520
called exception levels in the ARM
terminology. And the second thing that ARM
00:06:21.520 --> 00:06:25.930
gets right compared to x86 is that instead
of starting at 3 and counting down as
00:06:25.930 --> 00:06:30.730
privilege goes up, ARM starts at 0 and
counts up so we don't have to worry about
00:06:30.730 --> 00:06:35.940
negative numbers anymore. Now when we add
the next privilege level the hypervisor we
00:06:35.940 --> 00:06:40.860
call it exception level 2 and the next one
after that is the monitor in exception
00:06:40.860 --> 00:06:47.210
level 3. So at this point we still want to
have the ability to run trusted code in
00:06:47.210 --> 00:06:52.650
exception level 0 the least privileged
level of the ARMv8 processor. So in order
00:06:52.650 --> 00:06:59.060
to support this we need to separate this
diagram into two different sections. In
00:06:59.060 --> 00:07:03.510
ARMv8 these are called the secure world
and the non-secure world. So we have the
00:07:03.510 --> 00:07:07.740
non-secure world on the left in blue that
consists of the userland, the kernel and
00:07:07.740 --> 00:07:11.900
the hypervisor and we have the secure
world on the right which consists of the
00:07:11.900 --> 00:07:17.360
monitor in exception level 3, a trusted
operating system in exception level 1 and
00:07:17.360 --> 00:07:23.030
trusted applications in exception level 0.
So the idea is that if you run anything in
00:07:23.030 --> 00:07:27.360
the secure world, it should not be
accessible or modifiable by anything in
00:07:27.360 --> 00:07:32.320
the non secure world. So that's how our
attacker is trying to access it. The
00:07:32.320 --> 00:07:36.371
attacker has access to the non secure
kernel, which is often Linux, and they're
00:07:36.371 --> 00:07:40.120
trying to go after the trusted apps. So
once again we have this weird inversion
00:07:40.120 --> 00:07:43.330
where we're trying to go from a more
privileged level to a less privileged
00:07:43.330 --> 00:07:48.260
level and trying to extract secrets in
that way. So the question that arises when
00:07:48.260 --> 00:07:53.070
using these Trusted Execution Environments
that are implemented in SGX and TrustZone
00:07:53.070 --> 00:07:58.330
in ARM is "can we use these privilege
modes in our privilege access in order to
00:07:58.330 --> 00:08:03.330
attack these Trusted Execution
Environments?". Now transfer that question
00:08:03.330 --> 00:08:06.260
and we can start looking at a few
different research papers. The first one
00:08:06.260 --> 00:08:11.360
that I want to go into is one called
CLKSCREW and it's an attack on TrustZone.
00:08:11.360 --> 00:08:14.360
So throughout this presentation I'm going
to go through a few different papers and
00:08:14.360 --> 00:08:18.050
just to make it clear which papers have
already been published and which ones are
00:08:18.050 --> 00:08:21.400
old I'll include the citations in the
upper right hand corner so that way you
00:08:21.400 --> 00:08:26.580
can tell what's old and what's new. And as
far as papers go this CLKSCREW paper is
00:08:26.580 --> 00:08:31.430
relatively new. It was released in 2017.
And the way CLKSCREW works is it takes
00:08:31.430 --> 00:08:38.009
advantage of the energy management
features of a processor. So a non-secure
00:08:38.009 --> 00:08:41.679
operating system has the ability to manage
the energy consumption of the different
00:08:41.679 --> 00:08:47.970
cores. So if a certain target core doesn't
have much scheduled to do then the
00:08:47.970 --> 00:08:52.350
operating system is able to scale back
that voltage or dial down the frequency on
00:08:52.350 --> 00:08:56.449
that core so that core uses less energy
which is a great thing for performance: it
00:08:56.449 --> 00:09:00.971
really extends battery life, it makes the
the cores last longer and it gives better
00:09:00.971 --> 00:09:07.009
performance overall. But the problem here
is what if you have two separate cores and
00:09:07.009 --> 00:09:11.740
one of your cores is running this non-
trusted operating system and the other
00:09:11.740 --> 00:09:15.579
core is running code in the secure world?
It's running that trusted code those
00:09:15.579 --> 00:09:21.240
trusted applications so that non secure
operating system can still dial down that
00:09:21.240 --> 00:09:25.629
voltage and it can still change that
frequency and those changes will affect
00:09:25.629 --> 00:09:30.740
the secure world code. So what the
CLKSCREW attack does is the non secure
00:09:30.740 --> 00:09:36.470
operating system core will dial down the
voltage, it will overclock the frequency
00:09:36.470 --> 00:09:40.749
on the target secure world core in order
to induce faults to make sure to make the
00:09:40.749 --> 00:09:45.909
computation on that core fail in some way
and when that computation fails you get
00:09:45.909 --> 00:09:50.439
certain cryptographic errors that the
attack can use to infer things like secret
00:09:50.439 --> 00:09:56.040
keys, secret AES keys and to bypass code
signing implemented in the secure world.
00:09:56.040 --> 00:09:59.680
So it's a very powerful attack that's made
possible because the non-secure operating
00:09:59.680 --> 00:10:06.099
system is privileged enough in order to
use these energy management features. Now
00:10:06.099 --> 00:10:10.189
CLKSCREW is an example of an active attack
where the attacker is actively changing
00:10:10.189 --> 00:10:15.470
the outcome of the victim code of that
code in the secure world. But what about
00:10:15.470 --> 00:10:20.540
passive attacks? So in a passive attack,
the attacker does not modify the actual
00:10:20.540 --> 00:10:25.220
outcome of the process. The attacker just
tries to monitor that process infer what's
00:10:25.220 --> 00:10:29.200
going on and that is the sort of attack
that we'll be considering for the rest of
00:10:29.200 --> 00:10:35.769
the presentation. So in a lot of SGX and
TrustZone implementations, the trusted and
00:10:35.769 --> 00:10:39.759
the non-trusted code both share the same
hardware and this shared hardware could be
00:10:39.759 --> 00:10:45.800
a shared cache, it could be a branch
predictor, it could be a TLB. The point is
00:10:45.800 --> 00:10:53.230
that they share the same hardware so that
the changes made by the secure code may be
00:10:53.230 --> 00:10:57.209
reflected in the behavior of the non-
secure code. So the trusted code might
00:10:57.209 --> 00:11:02.259
execute, change the state of that shared
cache for example and then the untrusted
00:11:02.259 --> 00:11:07.179
code may be able to go in, see the changes
in that cache and infer information about
00:11:07.179 --> 00:11:11.720
the behavior of the secure code. So that's
essentially how our side channel attacks
00:11:11.720 --> 00:11:16.160
are going to work. If the non-secure code
is going to monitor these shared hardware
00:11:16.160 --> 00:11:23.050
resources for state changes that reflect
the behavior of the secure code. Now we've
00:11:23.050 --> 00:11:27.899
all talked about how Intel and SGX address
the problem of memory management and who's
00:11:27.899 --> 00:11:33.399
responsible for making sure that those
attacks don't work on SGX. So what do they
00:11:33.399 --> 00:11:37.050
have to say on how they protect against
these side channel attacks and attacks on
00:11:37.050 --> 00:11:45.490
this shared cache hardware? They don't..
at all. They essentially say "we do not
00:11:45.490 --> 00:11:48.931
consider this part of our threat model. It
is up to the developer to implement the
00:11:48.931 --> 00:11:53.530
protections needed to protect against
these side-channel attacks". Which is
00:11:53.530 --> 00:11:56.769
great news for us because these side
channel attacks can be very powerful and
00:11:56.769 --> 00:12:00.350
if there aren't any hardware features that
are necessarily stopping us from being
00:12:00.350 --> 00:12:06.910
able to accomplish our goal it makes us
that more likely to succeed. So with that
00:12:06.910 --> 00:12:11.430
we can sort of take a step back from trust
zone industry acts and just take a look at
00:12:11.430 --> 00:12:14.959
cache attacks to make sure that we all
have the same understanding of how the
00:12:14.959 --> 00:12:19.549
cache attacks will be applied to these
Trusted Execution Environments. To start
00:12:19.549 --> 00:12:25.619
that let's go over a brief recap of how a
cache works. So caches are necessary in
00:12:25.619 --> 00:12:29.949
processors because accessing the main
memory is slow. When you try to access
00:12:29.949 --> 00:12:34.079
something from the main memory it takes a
while to be read into the process. So the
00:12:34.079 --> 00:12:40.389
cache exists as sort of a layer to
remember what that information is so if
00:12:40.389 --> 00:12:45.040
the process ever needs information from
that same address it just reloads it from
00:12:45.040 --> 00:12:49.699
the cache and that access is going to be
fast. So it really speeds up the memory
00:12:49.699 --> 00:12:55.810
access for repeated accesses to the same
address. And then if we try to access a
00:12:55.810 --> 00:13:00.069
different address then that will also be
read into the cache, slowly at first but
00:13:00.069 --> 00:13:06.720
then quickly for repeated accesses and so
on and so forth. Now as you can probably
00:13:06.720 --> 00:13:10.970
tell from all of these examples the memory
blocks have been moving horizontally
00:13:10.970 --> 00:13:15.649
they've always been staying in the same
row. And that is reflective of the idea of
00:13:15.649 --> 00:13:20.360
sets in a cache. So there are a number of
different set IDs and that corresponds to
00:13:20.360 --> 00:13:24.189
the different rows in this diagram. So for
our example there are four different set
00:13:24.189 --> 00:13:30.889
IDs and each address in the main memory
maps to a different set ID. So that
00:13:30.889 --> 00:13:35.100
address in main memory will only go into
that location in the cache with the same
00:13:35.100 --> 00:13:39.730
set ID so it will only travel along those
rows. So that means if you have two
00:13:39.730 --> 00:13:43.410
different blocks of memory that mapped to
different set IDs they're not going to
00:13:43.410 --> 00:13:48.899
interfere with each other in the cache.
But that raises the question "what about
00:13:48.899 --> 00:13:53.310
two memory blocks that do map to the same
set ID?". Well if there's room in the
00:13:53.310 --> 00:13:58.759
cache then the same thing will happen as
before: those memory contents will be
00:13:58.759 --> 00:14:03.769
loaded into the cache and then retrieved
from the cache for future accesses. And
00:14:03.769 --> 00:14:08.110
the number of possible entries for a
particular set ID within a cache is called
00:14:08.110 --> 00:14:11.800
the associativity. And on this diagram
that's represented by the number of
00:14:11.800 --> 00:14:16.819
columns in the cache. So we will call our
cache in this example a 2-way set-
00:14:16.819 --> 00:14:22.350
associative cache. Now the next question
is "what happens if you try to read a
00:14:22.350 --> 00:14:27.049
memory address that maps the same set ID
but all of those entries within that said ID
00:14:27.049 --> 00:14:32.529
within the cache are full?". Well one of
those entries is chosen, it's evicted from
00:14:32.529 --> 00:14:38.729
the cache, the new memory is read in and
then that's fed to the process. So it
00:14:38.729 --> 00:14:43.779
doesn't really matter how the cache entry
is chosen that you're evicting for the
00:14:43.779 --> 00:14:47.960
purpose of the presentation you can just
assume that it's random. But the important
00:14:47.960 --> 00:14:51.899
thing is that if you try to access that
same memory that was evicted before you're
00:14:51.899 --> 00:14:55.689
not going to have to wait for that time
penalty for that to be reloaded into the
00:14:55.689 --> 00:15:01.329
cache and read into the process. So those
are caches in a nutshell in particularly
00:15:01.329 --> 00:15:05.749
set associative caches, we can begin
looking at the different types of cache
00:15:05.749 --> 00:15:09.319
attacks. So for a cache attack we have two
different processes we have an attacker
00:15:09.319 --> 00:15:13.779
process and a victim process. For this
type of attack that we're considering both
00:15:13.779 --> 00:15:17.290
of them share the same underlying code so
they're trying to access the same
00:15:17.290 --> 00:15:21.829
resources which could be the case if you
have page deduplication in virtual
00:15:21.829 --> 00:15:26.009
machines or if you have copy-on-write
mechanisms for shared code and shared
00:15:26.009 --> 00:15:31.649
libraries. But the point is that they
share the same underlying memory. Now the
00:15:31.649 --> 00:15:35.659
Flush and Reload Attack works in two
stages for the attacker. The attacker
00:15:35.659 --> 00:15:39.420
first starts by flushing out the cache.
They flush each and every addresses in the
00:15:39.420 --> 00:15:44.309
cache so the cache is just empty. Then the
attacker let's the victim executes for a
00:15:44.309 --> 00:15:48.769
small amount of time so the victim might
read on an address from main memory
00:15:48.769 --> 00:15:53.489
loading that into the cache and then the
second stage of the attack is the reload
00:15:53.489 --> 00:15:58.099
phase. In the reload phase the attacker
tries to load different memory addresses
00:15:58.099 --> 00:16:04.171
from main memory and see if those entries
are in the cache or not. Here the attacker
00:16:04.171 --> 00:16:09.380
will first try to load address 0 and see
that because it takes a long time to read
00:16:09.380 --> 00:16:14.429
the contents of address 0 the attacker can
infer that address 0 was not part of the
00:16:14.429 --> 00:16:17.499
cache which makes sense because the
attacker flushed it from the cache in the
00:16:17.499 --> 00:16:23.330
first stage. The attacker then tries to
read the memory at address 1 and sees that
00:16:23.330 --> 00:16:29.089
this operation is fast so the attacker
infers that the contents of address 1 are
00:16:29.089 --> 00:16:32.859
in the cache and because the attacker
flushed everything from the cache before
00:16:32.859 --> 00:16:37.119
the victim executed, the attacker then
concludes that the victim is responsible
00:16:37.119 --> 00:16:42.540
for bringing address 1 into the cache.
This Flush+Reload attack reveals which
00:16:42.540 --> 00:16:47.370
memory addresses the victim accesses
during that small slice of time. Then
00:16:47.370 --> 00:16:50.970
after that reload phase, the attack
repeats so the attacker flushes again
00:16:50.970 --> 00:16:57.739
let's the victim execute, reloads again
and so on. There's also a variant on the
00:16:57.739 --> 00:17:01.050
Flush+Reload attack that's called the
Flush+Flush attack which I'm not going to
00:17:01.050 --> 00:17:05.569
go into the details of, but essentially
it's the same idea. But instead of using
00:17:05.569 --> 00:17:08.980
load instructions to determine whether or
not a piece of memory is in the cache or
00:17:08.980 --> 00:17:13.720
not, it uses flush instructions because
flush instructions will take longer if
00:17:13.720 --> 00:17:19.138
something is in the cache already. The
important thing is that both the
00:17:19.138 --> 00:17:22.819
Flush+Reload attack and the Flush+Flush
attack rely on the attacker and the victim
00:17:22.819 --> 00:17:27.029
sharing the same memory. But this isn't
always the case so we need to consider
00:17:27.029 --> 00:17:30.810
what happens when the attacker and the
victim do not share memory. For this we
00:17:30.810 --> 00:17:35.670
have the Prime+Probe attack. The
Prime+Probe attack once again works in two
00:17:35.670 --> 00:17:40.380
separate stages. In the first stage the
attacker prime's the cache by reading all
00:17:40.380 --> 00:17:44.401
the attacker memory into the cache and
then the attacker lets the victim execute
00:17:44.401 --> 00:17:49.750
for a small amount of time. So no matter
what the victim accesses from main memory
00:17:49.750 --> 00:17:54.460
since the cache is full of the attacker
data, one of those attacker entries will
00:17:54.460 --> 00:17:59.190
be replaced by a victim entry. Then in the
second phase of the attack, during the
00:17:59.190 --> 00:18:03.529
probe phase, the attacker checks the
different cache entries for particular set
00:18:03.529 --> 00:18:08.959
IDs and sees if all of the attacker
entries are still in the cache. So maybe
00:18:08.959 --> 00:18:13.440
our attacker is curious about the last set
ID, the bottom row, so the attacker first
00:18:13.440 --> 00:18:18.090
tries to load the memory at address 3 and
because this operation is fast the
00:18:18.090 --> 00:18:23.000
attacker knows that address 3 is in the
cache. The attacker tries the same thing
00:18:23.000 --> 00:18:28.159
with address 7, sees that this operation
is slow and infers that at some point
00:18:28.159 --> 00:18:33.279
address 7 was evicted from the cache so
the attacker knows that something had to
00:18:33.279 --> 00:18:37.490
evicted from the cache and it had to be
from the victim so the attacker concludes
00:18:37.490 --> 00:18:42.840
that the victim accessed something in that
last set ID and that bottom row. The
00:18:42.840 --> 00:18:47.230
attacker doesn't know if it was the
contents of address 11 or the contents of
00:18:47.230 --> 00:18:51.260
address 15 or even what those contents
are, but the attacker has a good idea of
00:18:51.260 --> 00:18:57.090
which set ID it was. So, the good things,
the important things to remember about
00:18:57.090 --> 00:19:01.179
cache attacks is that caches are very
important, they're crucial for performance
00:19:01.179 --> 00:19:06.059
on processors, they give a huge speed
boost and there's a huge time difference
00:19:06.059 --> 00:19:11.569
between having a cache and not having a
cache for your executables. But the
00:19:11.569 --> 00:19:16.080
downside to this is that big time
difference also allows the attacker to
00:19:16.080 --> 00:19:21.620
infer information about how the victim is
using the cache. We're able to use these
00:19:21.620 --> 00:19:24.429
cache attacks in the two different
scenarios of, where memory is shared, in
00:19:24.429 --> 00:19:28.230
the case of the Flush+Reload and
Flush+Flush attacks and in the case where
00:19:28.230 --> 00:19:31.739
memory is not shared, in the case of the
Prime+Probe attack. And finally the
00:19:31.739 --> 00:19:36.659
important thing to keep in mind is that,
for these cache attacks, we know where the
00:19:36.659 --> 00:19:40.480
victim is looking, but we don't know what
they see. So we don't know the contents of
00:19:40.480 --> 00:19:44.360
the memory that the victim is actually
seeing, we just know the location and the
00:19:44.360 --> 00:19:51.549
addresses. So, what does an example trace
of these attacks look like? Well, there's
00:19:51.549 --> 00:19:56.451
an easy way to represent these as two-
dimensional images. So in this image, we
00:19:56.451 --> 00:20:01.760
have our horizontal axis as time, so each
column in this image represents a
00:20:01.760 --> 00:20:07.159
different time slice, a different
iteration of the Prime measure and Probe.
00:20:07.159 --> 00:20:11.440
So, then we also have the vertical access
which is the different set IDs, which is
00:20:11.440 --> 00:20:18.360
the location that's accessed by the victim
process, and then here a pixel is white if
00:20:18.360 --> 00:20:24.159
the victim accessed that set ID during
that time slice. So, as you look from left
00:20:24.159 --> 00:20:28.139
to right as time moves forward, you can
sort of see the changes in the patterns of
00:20:28.139 --> 00:20:34.070
the memory accesses made by the victim
process. Now, for this particular example
00:20:34.070 --> 00:20:39.860
the trace is captured on an execution of
AES repeated several times, an AES
00:20:39.860 --> 00:20:44.519
encryption repeated about 20 times. And
you can tell that this is a repeated
00:20:44.519 --> 00:20:49.070
action because you see the same repeated
memory access patterns in the data, you
00:20:49.070 --> 00:20:55.320
see the same structures repeated over and
over. So, you know that this is reflecting
00:20:55.320 --> 00:21:00.749
at what's going on throughout time, but
what does it have to do with AES itself?
00:21:00.749 --> 00:21:05.950
Well, if we take the same trace with the
same settings, but a different key, we see
00:21:05.950 --> 00:21:11.590
that there is a different memory access
pattern with different repetition within
00:21:11.590 --> 00:21:18.200
the trace. So, only the key changed, the
code didn't change. So, even though we're
00:21:18.200 --> 00:21:22.130
not able to read the contents of the key
directly using this cache attack, we know
00:21:22.130 --> 00:21:25.610
that the key is changing these memory
access patterns, and if we can see these
00:21:25.610 --> 00:21:30.850
memory access patterns, then we can infer
the key. So, that's the essential idea: we
00:21:30.850 --> 00:21:35.380
want to make these images as clear as
possible and as descriptive as possible so
00:21:35.380 --> 00:21:42.279
we have the best chance of learning what
those secrets are. And we can define the
00:21:42.279 --> 00:21:47.389
metrics for what makes these cache attacks
powerful in a few different ways. So, the
00:21:47.389 --> 00:21:51.759
three ways we'll be looking at are spatial
resolution, temporal resolution and noise.
00:21:51.759 --> 00:21:56.300
So, spatial resolution refers to how
accurately we can determine the where. If
00:21:56.300 --> 00:22:00.510
we know that the victim access to memory
address within 1,000 bytes, that's
00:22:00.510 --> 00:22:06.820
obviously not as powerful as knowing where
they accessed within 512 bytes. Temporal
00:22:06.820 --> 00:22:12.049
resolution is similar, where we want to
know the order of what accesses the victim
00:22:12.049 --> 00:22:17.769
made. So if that time slice during our
attack is 1 millisecond, we're going to
00:22:17.769 --> 00:22:22.139
get much better ordering information on
those memory access than we would get if
00:22:22.139 --> 00:22:27.350
we only saw all the memory accesses over
the course of one second. So the shorter
00:22:27.350 --> 00:22:32.159
that time slice, the better the temporal
resolution, the longer our picture will be
00:22:32.159 --> 00:22:37.790
on the horizontal access, and the clearer
of an image of the cache that we'll see.
00:22:37.790 --> 00:22:41.419
And the last metric to evaluate our
attacks on is noise and that reflects how
00:22:41.419 --> 00:22:46.070
accurately our measurements reflect the
true state of the cache. So, right now
00:22:46.070 --> 00:22:49.950
we've been using time and data to infer
whether or not an item was in the cache or
00:22:49.950 --> 00:22:54.340
not, but this is a little bit noisy. It's
possible that we'll have false positives
00:22:54.340 --> 00:22:57.370
or false negatives, so we want to keep
that in mind as we look at the different
00:22:57.370 --> 00:23:03.081
attacks. So, that's essentially cache
attacks, and then, in a nutshell and
00:23:03.081 --> 00:23:06.519
that's all you really need to understand
in order to understand these attacks as
00:23:06.519 --> 00:23:11.389
they've been implemented on Trusted
Execution Environments. And the first
00:23:11.389 --> 00:23:14.510
particular attack that we're going to be
looking at is called a Controlled-Channel
00:23:14.510 --> 00:23:19.890
Attack on SGX, and this attack isn't
necessarily a cache attack, but we can
00:23:19.890 --> 00:23:23.770
analyze it in the same way that we analyze
the cache attacks. So, it's still useful
00:23:23.770 --> 00:23:30.940
to look at. Now, if you remember how
memory management occurs with SGX, we know
00:23:30.940 --> 00:23:36.210
that if a page fault occurs during SGX
Enclave code execution, that page fault is
00:23:36.210 --> 00:23:43.019
handled by the kernel. So, the kernel has
to know which page the Enclave needs to be
00:23:43.019 --> 00:23:48.050
paged in. The kernel already gets some
information about what the Enclave is
00:23:48.050 --> 00:23:54.789
looking at. Now, in the Controlled-Channel
attack, there's a, what the attacker does
00:23:54.789 --> 00:23:59.839
from the non-trusted OS is the attacker
pages almost every other page from the
00:23:59.839 --> 00:24:05.260
Enclave out of memory. So no matter
whatever page that Enclave tries to
00:24:05.260 --> 00:24:09.770
access, it's very likely to cause a page
fault, which will be redirected to the
00:24:09.770 --> 00:24:14.150
non-trusted OS, where the non-trusted OS
can record it, page out any other pages
00:24:14.150 --> 00:24:20.429
and continue execution. So, the OS
essentially gets a list of sequential page
00:24:20.429 --> 00:24:26.259
accesses made by the SGX Enclaves, all by
capturing the page fault handler. This is
00:24:26.259 --> 00:24:29.669
a very general attack, you don't need to
know what's going on in the Enclave in
00:24:29.669 --> 00:24:33.460
order to pull this off. You just load up
an arbitrary Enclave and you're able to
00:24:33.460 --> 00:24:40.720
see which pages that Enclave is trying to
access. So, how does it do on our metrics?
00:24:40.720 --> 00:24:44.270
First of all, this spatial resolution is
not great. We can only see where the
00:24:44.270 --> 00:24:50.470
victim is accessing within 4096 bytes or
the size of a full page because SGX
00:24:50.470 --> 00:24:55.519
obscures the offset into the page where
the page fault occurs. The temporal
00:24:55.519 --> 00:24:58.760
resolution is good but not great, because
even though we're able to see any
00:24:58.760 --> 00:25:04.450
sequential accesses to different pages
we're not able to see sequential accesses
00:25:04.450 --> 00:25:09.970
to the same page because we need to keep
that same page paged-in while we let our
00:25:09.970 --> 00:25:15.490
SGX Enclave run for that small time slice.
So temporal resolution is good but not
00:25:15.490 --> 00:25:22.440
perfect. But the noise is, there is no
noise in this attack because no matter
00:25:22.440 --> 00:25:26.149
where the page fault occurs, the untrusted
operating system is going to capture that
00:25:26.149 --> 00:25:30.180
page fault and is going to handle it. So,
it's very low noise, not great spatial
00:25:30.180 --> 00:25:37.490
resolution but overall still a powerful
attack. But we still want to improve on
00:25:37.490 --> 00:25:40.700
that spatial resolution, we want to be
able to see what the Enclave is doing that
00:25:40.700 --> 00:25:45.970
greater than a resolution of a one page of
four kilobytes. So that's exactly what the
00:25:45.970 --> 00:25:50.179
CacheZoom paper does, and instead of
interrupting the SGX Enclave execution
00:25:50.179 --> 00:25:55.370
with page faults, it uses timer
interrupts. Because the untrusted
00:25:55.370 --> 00:25:59.280
operating system is able to schedule when
timer interrupts occur, so it's able to
00:25:59.280 --> 00:26:03.320
schedule them at very tight intervals, so
it's able to get that small and tight
00:26:03.320 --> 00:26:08.549
temporal resolution. And essentially what
happens in between is this timer
00:26:08.549 --> 00:26:13.410
interrupts fires, the untrusted operating
system runs the Prime+Probe attack code in
00:26:13.410 --> 00:26:18.240
this case, and resumes execution of the
onclick process, and this repeats. So this
00:26:18.240 --> 00:26:24.549
is a Prime+Probe attack on the L1 data
cache. So, this attack let's you see what
00:26:24.549 --> 00:26:30.529
data The Enclave is looking at. Now, this
attack could be easily modified to use the
00:26:30.529 --> 00:26:36.000
L1 instruction cache, so in that case you
learn which instructions The Enclave is
00:26:36.000 --> 00:26:41.419
executing. And overall this is an even
more powerful attack than the Control-
00:26:41.419 --> 00:26:46.429
Channel attack. If we look at the metrics,
we can see that the spatial resolution is
00:26:46.429 --> 00:26:50.360
a lot better, now we're looking at spatial
resolution of 64 bytes or the size of an
00:26:50.360 --> 00:26:55.370
individual line. The temporal resolution
is very good, it's "almost unlimited", to
00:26:55.370 --> 00:27:00.250
quote the paper, because the untrusted
operating system has the privilege to keep
00:27:00.250 --> 00:27:05.179
scheduling those time interrupts closer
and closer together until it's able to
00:27:05.179 --> 00:27:10.260
capture very small time slices of the
victim process .And the noise itself is
00:27:10.260 --> 00:27:14.559
low, we're still using a cycle counter to
measure the time it takes to load memory
00:27:14.559 --> 00:27:20.629
in and out of the cache, but it's, it's
useful, the chances of having a false
00:27:20.629 --> 00:27:26.809
positive or false negative are low, so the
noise is low as well. Now, we can also
00:27:26.809 --> 00:27:31.129
look at Trust Zone attacks, because so far
the attacks that we've looked at, the
00:27:31.129 --> 00:27:35.130
passive attacks, have been against SGX and
those attacks on SGX have been pretty
00:27:35.130 --> 00:27:40.669
powerful. So, what are the published
attacks on Trust Zone? Well, there's one
00:27:40.669 --> 00:27:44.990
called TruSpy, which is kind of similar in
concept to the CacheZoom attack that we
00:27:44.990 --> 00:27:51.629
just looked at on SGX. It's once again a
Prime+probe attack on the L1 data cache,
00:27:51.629 --> 00:27:57.129
and the difference here is that instead of
interrupting the victim code execution
00:27:57.129 --> 00:28:04.460
multiple times, the TruSpy attack does the
prime step, does the full AES encryption,
00:28:04.460 --> 00:28:08.539
and then does the probe step. And the
reason they do this, is because as they
00:28:08.539 --> 00:28:13.330
say, the secure world is protected, and is
not interruptible in the same way that SGX
00:28:13.330 --> 00:28:20.690
is interruptable. But even despite this,
just having one measurement per execution,
00:28:20.690 --> 00:28:24.940
the TruSpy authors were able to use some
statistics to still recover the AES key
00:28:24.940 --> 00:28:30.460
from that noise. And their methods were so
powerful, they are able to do this from an
00:28:30.460 --> 00:28:34.539
unapproved application in user land, so
they don't even need to be running within
00:28:34.539 --> 00:28:39.820
the kernel in order to be able to pull off
this attack. So, how does this attack
00:28:39.820 --> 00:28:43.360
measure up? The spatial resolution is once
again 64 bytes because that's the size of
00:28:43.360 --> 00:28:48.559
a cache line on this processor, and the
temporal resolution is, is pretty poor
00:28:48.559 --> 00:28:54.190
here, because we only get one measurement
per execution of the AES encryption. This
00:28:54.190 --> 00:28:58.700
is also a particularly noisy attack
because we're making the measurements from
00:28:58.700 --> 00:29:02.659
the user land, but even if we make the
measurements from the kernel, we're still
00:29:02.659 --> 00:29:05.789
going to have the same issues of false
positives and false negatives associated
00:29:05.789 --> 00:29:12.470
with using a cycle counter to measure
membership in a cache. So, we'd like to
00:29:12.470 --> 00:29:16.389
improve this a little bit. We'd like to
improve the temporal resolution, so we
00:29:16.389 --> 00:29:20.749
have the power of the cache attack to be a
little bit closer on TrustZone, as it is
00:29:20.749 --> 00:29:27.149
on SGX. So, we want to improve that
temporal resolution. Let's dig into that
00:29:27.149 --> 00:29:30.549
statement a little bit, that the secure
world is protected and not interruptable.
00:29:30.549 --> 00:29:36.499
And to do, this we go back to this diagram
of ARMv8 and how that TrustZone is set up.
00:29:36.499 --> 00:29:41.490
So, it is true that when an interrupt
occurs, it is directed to the monitor and,
00:29:41.490 --> 00:29:45.530
because the monitor operates in the secure
world, if we interrupt secure code that's
00:29:45.530 --> 00:29:49.081
running an exception level 0, we're just
going to end up running secure code an
00:29:49.081 --> 00:29:54.239
exception level 3. So, this doesn't
necessarily get us anything. I think,
00:29:54.239 --> 00:29:57.880
that's what the author's mean by saying
that it's protected against this. Just by
00:29:57.880 --> 00:30:02.780
setting an interrupt, we don't have a
way to redirect our flow to the non-
00:30:02.780 --> 00:30:08.190
trusted code. At least that's how it works
in theory. In practice, the Linux
00:30:08.190 --> 00:30:11.840
operating system, running in exception
level 1 in the non-secure world, kind of
00:30:11.840 --> 00:30:15.299
needs interrupts in order to be able to
work, so if an interrupt occurs and it's
00:30:15.299 --> 00:30:18.120
being sent to the monitor, the monitor
will just forward it right to the non-
00:30:18.120 --> 00:30:22.500
secure operating system. So, we have
interrupts just the same way as we did in
00:30:22.500 --> 00:30:28.930
CacheZoom. And we can improve the
TrustZone attacks by using this idea: We
00:30:28.930 --> 00:30:33.549
have 2 cores, where one core is running
the secure code, the other core is running
00:30:33.549 --> 00:30:38.101
the non-secure code, and the non-secure
code is sending interrupts to the secure-
00:30:38.101 --> 00:30:42.809
world core and that will give us that
interleaving of attacker process and
00:30:42.809 --> 00:30:47.409
victim process that allow us to have a
powerful prime-and-probe attack. So, what
00:30:47.409 --> 00:30:51.139
does this look like? We have the attack
core and the victim core. The attack core
00:30:51.139 --> 00:30:54.909
sends an interrupt to the victim core.
This interrupt is captured by the monitor,
00:30:54.909 --> 00:30:58.769
which passes it to the non-secure
operating system. The not-secure operating
00:30:58.769 --> 00:31:02.979
system transfers this to our attack code,
which runs the prime-and-probe attack.
00:31:02.979 --> 00:31:06.529
Then, we leave the interrupt, the
execution within the victim code in the
00:31:06.529 --> 00:31:10.910
secure world resumes and we just repeat
this over and over. So, now we have that
00:31:10.910 --> 00:31:16.690
interleaving of data... of the processes
of the attacker and the victim. So, now,
00:31:16.690 --> 00:31:22.690
instead of having a temporal resolution of
one measurement per execution, we once
00:31:22.690 --> 00:31:26.320
again have almost unlimited temporal
resolution, because we can just schedule
00:31:26.320 --> 00:31:32.229
when we send those interrupts from the
attacker core. Now, we'd also like to
00:31:32.229 --> 00:31:37.590
improve the noise measurements. The...
because if we can improve the noise, we'll
00:31:37.590 --> 00:31:42.159
get clearer pictures and we'll be able to
infer those secrets more clearly. So, we
00:31:42.159 --> 00:31:45.720
can get some improvement by switching the
measurements from userland and starting to
00:31:45.720 --> 00:31:50.830
do those in the kernel, but again we have
the cycle counters. So, what if, instead
00:31:50.830 --> 00:31:54.330
of using the cycle counter to measure
whether or not something is in the cache,
00:31:54.330 --> 00:32:00.070
we use the other performance counters?
Because on ARMv8 platforms, there is a way
00:32:00.070 --> 00:32:03.769
to use performance counters to measure
different events, such as cache hits and
00:32:03.769 --> 00:32:09.809
cache misses. So, these events and these
performance monitors require privileged
00:32:09.809 --> 00:32:15.330
access in order to use, which, for this
attack, we do have. Now, in a typical
00:32:15.330 --> 00:32:18.779
cache text scenario we wouldn't have
access to these performance monitors,
00:32:18.779 --> 00:32:22.259
which is why they haven't really been
explored before, but in this weird
00:32:22.259 --> 00:32:25.250
scenario where we're attacking the less
privileged code from the more privileged
00:32:25.250 --> 00:32:29.340
code, we do have access to these
performance monitors and we can use these
00:32:29.340 --> 00:32:33.640
monitors during the probe step to get a
very accurate count of whether or not a
00:32:33.640 --> 00:32:39.519
certain memory load caused a cache miss or
a cache hit. So, we're able to essentially
00:32:39.519 --> 00:32:45.720
get rid of the different levels of noise.
Now, one thing to point out is that maybe
00:32:45.720 --> 00:32:49.230
we'd like to use these ARMv8 performance
counters in order to count the different
00:32:49.230 --> 00:32:53.729
events that are occurring in the secure
world code. So, maybe we start the
00:32:53.729 --> 00:32:57.909
performance counters from the non-secure
world, let the secure world run and then,
00:32:57.909 --> 00:33:01.669
when they secure world exits, we use the
non-secure world to read these performance
00:33:01.669 --> 00:33:05.440
counters and maybe we'd like to see how
many instructions the secure world
00:33:05.440 --> 00:33:09.019
executed or how many branch instructions
or how many arithmetic instructions or how
00:33:09.019 --> 00:33:13.179
many cache misses there were. But
unfortunately, ARMv8 took this into
00:33:13.179 --> 00:33:17.350
account and by default, performance
counters that are started in the non-
00:33:17.350 --> 00:33:20.769
secure world will not measure events that
happen in the secure world, which is
00:33:20.769 --> 00:33:24.570
smart; which is how it should be. And the
only reason I bring this up is because
00:33:24.570 --> 00:33:29.320
that's not how it is an ARMv7. So, we go
into a whole different talk with that,
00:33:29.320 --> 00:33:33.909
just exploring the different implications
of what that means, but I want to focus on
00:33:33.909 --> 00:33:39.230
ARMv8, because that's that's the newest of
the new. So, we'll keep looking at that.
00:33:39.230 --> 00:33:42.540
So, we instrument the primary probe attack
to use these performance counters, so we
00:33:42.540 --> 00:33:46.509
can get a clear picture of what is and
what is not in the cache. And instead of
00:33:46.509 --> 00:33:52.399
having noisy measurements based on time,
we have virtually no noise at all, because
00:33:52.399 --> 00:33:55.919
we get the truth straight from the
processor itself, whether or not we
00:33:55.919 --> 00:34:01.660
experience a cache miss. So, how do we
implement these attacks, where do we go
00:34:01.660 --> 00:34:05.549
from here? We have all these ideas; we
have ways to make these TrustZone attacks
00:34:05.549 --> 00:34:11.840
more powerful, but that's not worthwhile,
unless we actually implement them. So, the
00:34:11.840 --> 00:34:16.510
goal here is to implement these attacks on
TrustZone and since typically the non-
00:34:16.510 --> 00:34:20.960
secure world operating system is based on
Linux, we'll take that into account when
00:34:20.960 --> 00:34:25.360
making our implementation. So, we'll write
a kernel module that uses these
00:34:25.360 --> 00:34:29.340
performance counters and these inner
processor interrupts, in order to actually
00:34:29.340 --> 00:34:33.179
accomplish these attacks; and we'll write
it in such a way that it's very
00:34:33.179 --> 00:34:37.300
generalizable. So you can take this kernel
module that's was written for one device
00:34:37.300 --> 00:34:41.650
-- in my case I did most of my attention
on the Nexus 5x -- and it's very easy to
00:34:41.650 --> 00:34:46.739
transfer this module to any other Linux-
based device that has a trust zone that has
00:34:46.739 --> 00:34:52.139
these shared caches, so it should be very
easy to port this over and to perform
00:34:52.139 --> 00:34:57.810
these same powerful cache attacks on
different platforms. We can also do clever
00:34:57.810 --> 00:35:01.500
things based on the Linux operating
system, so that we limit that collection
00:35:01.500 --> 00:35:05.500
window to just when we're executing within
the secure world, so we can align our
00:35:05.500 --> 00:35:10.580
traces a lot more easily that way. And the
end result is having a synchronized trace
00:35:10.580 --> 00:35:14.930
for each different attacks, because, since
we've written in a modular way, we're able
00:35:14.930 --> 00:35:19.440
to run different attacks simultaneously.
So, maybe we're running one prime-and-
00:35:19.440 --> 00:35:23.050
probe attack on the L1 data cache, to
learn where the victim is accessing
00:35:23.050 --> 00:35:27.050
memory, and we're simultaneously running
an attack on the L1 instruction cache, so
00:35:27.050 --> 00:35:33.910
we can see what instructions the victim is
executing. And these can be aligned. So,
00:35:33.910 --> 00:35:37.080
the tool that I've written is a
combination of a kernel module which
00:35:37.080 --> 00:35:41.580
actually performs this attack, a userland
binary which schedules these processes to
00:35:41.580 --> 00:35:45.860
different cores, and a GUI that will allow
you to interact with this kernel module
00:35:45.860 --> 00:35:49.710
and rapidly start doing these cache
attacks for yourself and perform them
00:35:49.710 --> 00:35:56.860
against different processes and secure
code and secure world code. So, the
00:35:56.860 --> 00:36:02.820
intention behind this tool is to be very
generalizable to make it very easy to use
00:36:02.820 --> 00:36:08.430
this platform for different devices and to
allow people way to, once again, quickly
00:36:08.430 --> 00:36:12.360
develop these attacks; and also to see if
their own code is vulnerable to these
00:36:12.360 --> 00:36:18.490
cache attacks, to see if their code has
these secret dependent memory accesses.
00:36:18.490 --> 00:36:25.349
So, can we get even better... spatial
resolution? Right now, we're down to 64
00:36:25.349 --> 00:36:30.320
bytes and that's the size of a cache line,
which is the size of our shared hardware.
00:36:30.320 --> 00:36:35.510
And on SGX, we actually can get better
than 64 bytes, based on something called a
00:36:35.510 --> 00:36:39.160
branch-shadowing attack. So, a branch-
shadowing attack takes advantage of
00:36:39.160 --> 00:36:42.730
something called the branch target buffer.
And the branch target buffer is a
00:36:42.730 --> 00:36:48.490
structure that's used for branch
prediction. It's similar to a cache, but
00:36:48.490 --> 00:36:51.740
there's a key difference where the branch
target buffer doesn't compare the full
00:36:51.740 --> 00:36:54.770
address, when seeing if something is
already in the cache or not: It doesn't
00:36:54.770 --> 00:36:59.701
compare all of the upper level bits. So,
that means that it's possible that two
00:36:59.701 --> 00:37:04.140
different addresses will experience a
collision, and the same entry from that
00:37:04.140 --> 00:37:08.870
BTB cache will be read out for an improper
address. Now, since this is just for
00:37:08.870 --> 00:37:12.090
branch prediction, the worst that can
happen is, you'll get a misprediction and
00:37:12.090 --> 00:37:18.070
a small time penalty, but that's about it.
The idea of behind the branch-shadowing
00:37:18.070 --> 00:37:22.440
attack is leveraging the small difference
in this overlapping and this collision of
00:37:22.440 --> 00:37:28.540
addresses in order to sort of execute a
shared code cell flush-and-reload attack
00:37:28.540 --> 00:37:35.330
on the branch target buffer. So, here what
goes on is, during the attack the attacker
00:37:35.330 --> 00:37:39.650
modifies the SGX Enclave to make sure that
the branches that are within the Enclave
00:37:39.650 --> 00:37:44.340
will collide with branches that are not in
the Enclave. The attacker executes the
00:37:44.340 --> 00:37:50.440
Enclave code and then the attacker
executes their own code and based on the
00:37:50.440 --> 00:37:55.460
outcome of the the victim code in that
cache, the attacker code may or may not
00:37:55.460 --> 00:37:59.210
experience a branch prediction. So, the
attacker is able to tell the outcome of a
00:37:59.210 --> 00:38:03.310
branch, because of this overlap in this
collision, like would be in a flush-and-
00:38:03.310 --> 00:38:06.570
reload attack, where those memories
overlap between the attacker and the
00:38:06.570 --> 00:38:14.020
victim. So here, our spatial resolution is
fantastic: We can tell down to individual
00:38:14.020 --> 00:38:19.440
branch instructions in SGX; we can tell
exactly, which branches were executed and
00:38:19.440 --> 00:38:25.010
which directions they were taken, in the
case of conditional branches. The temporal
00:38:25.010 --> 00:38:29.720
resolution is also, once again, almost
unlimited, because we can use the same
00:38:29.720 --> 00:38:33.880
timer interrupts in order to schedule our
process, our attacker process. And the
00:38:33.880 --> 00:38:39.120
noise is, once again, very low, because we
can, once again, use the same sort of
00:38:39.120 --> 00:38:43.980
branch misprediction counters, that exist
in the Intel world, in order to measure
00:38:43.980 --> 00:38:51.510
this noise. So, does anything of that
apply to the TrustZone attacks? Well, in
00:38:51.510 --> 00:38:55.040
this case the victim and attacker don't
share entries in the branch target buffer,
00:38:55.040 --> 00:39:01.610
because the attacker is not able to map
the virtual address of the victim process.
00:39:01.610 --> 00:39:05.340
But this is kind of reminiscent of our
earlier cache attacks, so our flush-and-
00:39:05.340 --> 00:39:10.100
reload attack only worked when the attack
on the victim shared that memory, but we
00:39:10.100 --> 00:39:13.930
still have the prime-and-probe attack for
when they don't. So, what if we use a
00:39:13.930 --> 00:39:21.380
prime-and-probe-style attack on the branch
target buffer cache in ARM processors? So,
00:39:21.380 --> 00:39:25.320
essentially what we do here is, we prime
the branch target buffer by executing mini
00:39:25.320 --> 00:39:29.531
attacker branches to sort of fill up this
BTB cache with the attacker branch
00:39:29.531 --> 00:39:34.770
prediction data; we let the victim execute
a branch which will evict an attacker BTB
00:39:34.770 --> 00:39:39.120
entry; and then we have the attacker re-
execute those branches and see if there
00:39:39.120 --> 00:39:45.120
have been any mispredictions. So now, the
cool thing about this attack is, the
00:39:45.120 --> 00:39:50.320
structure of the BTB cache is different
from that of the L1 caches. So, instead of
00:39:50.320 --> 00:39:59.750
having 256 different sets in the L1 cache,
the BTB cache has 2048 different sets, so
00:39:59.750 --> 00:40:06.380
we can tell which branch it attacks, based
on which one of 2048 different set IDs
00:40:06.380 --> 00:40:11.230
that it could fall into. And even more
than that, on the ARM platform, at least
00:40:11.230 --> 00:40:15.730
on the Nexus 5x that I was working with,
the granularity is no longer 64 bytes,
00:40:15.730 --> 00:40:21.830
which is the size of the line, it's now 16
bytes. So, we can see which branches the
00:40:21.830 --> 00:40:27.620
the trusted code within TrustZone is
executing within 16 bytes. So, what does
00:40:27.620 --> 00:40:31.820
this look like? So, previously with the
true-spy attack, this is sort of the
00:40:31.820 --> 00:40:37.410
outcome of our prime-and-probe attack: We
get 1 measurement for those 256 different
00:40:37.410 --> 00:40:43.420
set IDs. When we added those interrupts,
we're able to get that time resolution,
00:40:43.420 --> 00:40:48.090
and it looks something like this. Now,
maybe you can see a little bit at the top
00:40:48.090 --> 00:40:52.660
of the screen, how there's these repeated
sections of little white blocks, and you
00:40:52.660 --> 00:40:56.720
can sort of use that to infer, maybe
there's the same cache line and cache
00:40:56.720 --> 00:41:00.870
instructions that are called over and
over. So, just looking at this L1-I cache
00:41:00.870 --> 00:41:06.920
attack, you can tell some information
about how the process went. Now, let's
00:41:06.920 --> 00:41:11.870
compare that to the BTB attack. And I
don't know if you can see too clearly --
00:41:11.870 --> 00:41:17.190
it's a it's a bit too high of resolution
right now -- so let's just focus in on one
00:41:17.190 --> 00:41:22.580
small part of this overall trace. And this
is what it looks like. So, each of those
00:41:22.580 --> 00:41:27.720
white pixels represents a branch that was
taken by that secure-world code and we can
00:41:27.720 --> 00:41:31.070
see repeated patterns, we can see maybe
different functions that were called, we
00:41:31.070 --> 00:41:35.310
can see different loops. And just by
looking at this 1 trace, we can infer a
00:41:35.310 --> 00:41:40.110
lot of information on how that secure
world executed. So, it's incredibly
00:41:40.110 --> 00:41:44.230
powerful and all of those secrets are just
waiting to be uncovered using these new
00:41:44.230 --> 00:41:52.890
tools. So, where do we go from here? What
sort of countermeasures do we have? Well,
00:41:52.890 --> 00:41:56.690
first of all I think, the long term
solution is going to be moving to no more
00:41:56.690 --> 00:42:00.200
shared hardware. We need to have separate
hardware and no more shared caches in
00:42:00.200 --> 00:42:05.750
order to fully get rid of these different
cache attacks. And we've already seen this
00:42:05.750 --> 00:42:11.420
trend in different cell phones. So, for
example, in Apple SSEs for a long time now
00:42:11.420 --> 00:42:15.521
-- I think since the Apple A7 -- the
secure Enclave, which runs the secure
00:42:15.521 --> 00:42:21.000
code, has its own cache. So, these cache
attacks can't be accomplished from code
00:42:21.000 --> 00:42:27.400
outside of that secure Enclave. So, just
by using that separate hardware, it knocks
00:42:27.400 --> 00:42:30.970
out a whole class of different potential
side-channel and microarchitecture
00:42:30.970 --> 00:42:35.610
attacks. And just recently, the Pixel 2 is
moving in the same direction. The Pixel 2
00:42:35.610 --> 00:42:40.540
now includes a hardware security module
that includes cryptographic operations;
00:42:40.540 --> 00:42:45.890
and that chip also has its own memory and
its own caches, so now we can no longer
00:42:45.890 --> 00:42:51.270
use this attack to extract information
about what's going on in this external
00:42:51.270 --> 00:42:56.530
hardware security module. But even then,
using this separate hardware, that doesn't
00:42:56.530 --> 00:43:00.800
solve all of our problems. Because we
still have the question of "What do we
00:43:00.800 --> 00:43:05.900
include in this separate hardware?" On the
one hand, we want to include more code in
00:43:05.900 --> 00:43:11.370
that a separate hardware, so we're less
vulnerable to these side-channel attacks,
00:43:11.370 --> 00:43:16.490
but on the other hand, we don't want to
expand the attack surface anymore. Because
00:43:16.490 --> 00:43:19.060
the more code we include in these secure
environments, the more like that a
00:43:19.060 --> 00:43:22.600
vulnerabiliyy will be found and the
attacker will be able to get a foothold
00:43:22.600 --> 00:43:26.470
within the secure, trusted environment.
So, there's going to be a balance between
00:43:26.470 --> 00:43:30.270
what do you choose to include in the
separate hardware and what you don't. So,
00:43:30.270 --> 00:43:35.220
do you include DRM code? Do you include
cryptographic code? It's still an open
00:43:35.220 --> 00:43:41.800
question. And that's sort of the long-term
approach. In the short term, you just kind
00:43:41.800 --> 00:43:46.370
of have to write side-channel-free
software: Just be very careful about what
00:43:46.370 --> 00:43:50.811
your process does, if there are any
secret, dependent memory accesses or a
00:43:50.811 --> 00:43:55.310
secret, dependent branching or secret,
dependent function calls, because any of
00:43:55.310 --> 00:44:00.010
those can leak the secrets out of your
trusted execution environment. So, here
00:44:00.010 --> 00:44:03.460
are the things that, if you are a
developer of trusted execution environment
00:44:03.460 --> 00:44:08.150
code, that I want you to keep in mind:
First of all, performance is very often at
00:44:08.150 --> 00:44:13.130
odds with security. We've seen over and
over that the performance enhancements to
00:44:13.130 --> 00:44:18.880
these processors open up the ability for
these microarchitectural attacks to be
00:44:18.880 --> 00:44:23.750
more efficient. Additionally, these
trusted execution environments don't
00:44:23.750 --> 00:44:27.160
protect against everything; there are
still these side-channel attacks and these
00:44:27.160 --> 00:44:32.310
microarchitectural attacks that these
systems are vulnerable to. These attacks
00:44:32.310 --> 00:44:37.650
are very powerful; they can be
accomplished simply; and with the
00:44:37.650 --> 00:44:41.770
publication of the code that I've written,
it should be very simple to get set up and
00:44:41.770 --> 00:44:46.070
to analyze your own code to see "Am I
vulnerable, do I expose information in the
00:44:46.070 --> 00:44:52.760
same way?" And lastly, it only takes 1
small error, 1 tiny leak from your trusted
00:44:52.760 --> 00:44:56.670
and secure code, in order to extract the
entire secret, in order to bring the whole
00:44:56.670 --> 00:45:03.920
thing down. So, what I want to leave you
with is: I want you to remember that you
00:45:03.920 --> 00:45:08.520
are responsible for making sure that your
program is not vulnerable to these
00:45:08.520 --> 00:45:13.110
microarchitectural attacks, because if you
do not take responsibility for this, who
00:45:13.110 --> 00:45:16.645
will? Thank you!
00:45:16.645 --> 00:45:25.040
Applause
00:45:25.040 --> 00:45:29.821
Herald: Thank you very much. Please, if
you want to leave the hall, please do it
00:45:29.821 --> 00:45:35.000
quiet and take all your belongings with
you and respect the speaker. We have
00:45:35.000 --> 00:45:43.230
plenty of time, 16, 17 minutes for Q&A, so
please line up on the microphones. No
00:45:43.230 --> 00:45:50.650
questions from the signal angel, all
right. So, we can start with microphone 6,
00:45:50.650 --> 00:45:54.770
please.
Mic 6: Okay. There was a symbol of secure
00:45:54.770 --> 00:46:01.160
OSes at the ARM TrustZone. Which a idea of
them if the non-secure OS gets all the
00:46:01.160 --> 00:46:04.210
interrupts? What does is
the secure OS for?
00:46:04.210 --> 00:46:08.880
Keegan: Yeah so, in the ARMv8 there are a
couple different kinds of interrupts. So,
00:46:08.880 --> 00:46:11.760
I think -- if I'm remembering the
terminology correctly -- there is an IRQ
00:46:11.760 --> 00:46:16.800
and an FIQ interrupt. So, the non-secure
mode handles the IRQ interrupts and the
00:46:16.800 --> 00:46:20.440
secure mode handles the FIQ interrupts.
So, depending on which one you send, it
00:46:20.440 --> 00:46:24.840
will depend on which direction that
monitor will direct that interrupt.
00:46:29.640 --> 00:46:32.010
Mic 6: Thank you.
Herald: Okay, thank you. Microphone number
00:46:32.010 --> 00:46:37.930
7, please.
Mic 7: Does any of your present attacks on
00:46:37.930 --> 00:46:45.290
TrustZone also apply to the AMD
implementation of TrustZone or are you
00:46:45.290 --> 00:46:48.380
looking into it?
Keegan: I haven't looked into AMD too
00:46:48.380 --> 00:46:54.011
much, because, as far as I can tell,
that's not used as commonly, but there are
00:46:54.011 --> 00:46:57.490
many different types of trusted execution
environments. The 2 that I focus on were
00:46:57.490 --> 00:47:04.760
SGX and TrustZone, because those are the
most common examples that I've seen.
00:47:04.760 --> 00:47:09.250
Herald: Thank you. Microphone
number 8, please.
00:47:09.250 --> 00:47:20.370
Mic 8: When TrustZone is moved to
dedicated hardware, dedicated memory,
00:47:20.370 --> 00:47:27.780
couldn't you replicate the userspace
attacks by loading your own trusted
00:47:27.780 --> 00:47:32.210
userspace app and use it as an
oracle of some sorts?
00:47:32.210 --> 00:47:35.760
Keegan: If you can load your own trust
code, then yes, you could do that. But in
00:47:35.760 --> 00:47:39.650
many of the models I've seen today, that's
not possible. So, that's why you have
00:47:39.650 --> 00:47:44.250
things like code signing, which prevent
the arbitrary user from running their own
00:47:44.250 --> 00:47:50.310
code in the trusted OS... or in the the
trusted environment.
00:47:50.310 --> 00:47:55.010
Herald: All right. Microphone number 1.
Mic 1: So, these attacks are more powerful
00:47:55.010 --> 00:48:00.720
against code that's running in... just the
execution environments than similar
00:48:00.720 --> 00:48:07.100
attacks would be against ring-3 code, or,
in general, trusted code. Does that mean
00:48:07.100 --> 00:48:10.910
that trusting execution environments are
basically an attractive nuisance that we
00:48:10.910 --> 00:48:15.080
shouldn't use?
Keegan: There's still a large benefit to
00:48:15.080 --> 00:48:17.600
using these trusted execution
environments. The point I want to get
00:48:17.600 --> 00:48:21.390
across is that, although they add a lot of
features, they don't protect against
00:48:21.390 --> 00:48:25.450
everything, so you should keep in mind
that these side-channel attacks do still
00:48:25.450 --> 00:48:28.820
exist and you still need to protect
against them. But overall, these are
00:48:28.820 --> 00:48:35.930
better things and worthwhile in including.
Herald: Thank you. Microphone number 1
00:48:35.930 --> 00:48:41.580
again, please
Mic 1: So, AMD is doing something with
00:48:41.580 --> 00:48:47.780
encrypting memory and I'm not sure if they
encrypt addresses, too, and but would that
00:48:47.780 --> 00:48:53.090
be a defense against such attacks?
Keegan: So, I'm not too familiar with AMD,
00:48:53.090 --> 00:48:57.690
but SGX also encrypts memory. It encrypts
it in between the lowest-level cache and
00:48:57.690 --> 00:49:02.170
the main memory. But that doesn't really
have an impact on the actual operation,
00:49:02.170 --> 00:49:06.220
because the memories encrypt at the cache
line level and as the attacker, we don't
00:49:06.220 --> 00:49:10.380
care what that data is within that cache
line, we only care which cache line is
00:49:10.380 --> 00:49:16.150
being accessed.
Mic 1: If you encrypt addresses, wouldn't
00:49:16.150 --> 00:49:20.551
that help against that?
Keegan: I'm not sure, how you would
00:49:20.551 --> 00:49:25.070
encrypt the addresses yourself. As long as
those adresses map into the same set IDs
00:49:25.070 --> 00:49:30.200
that the victim can map into, then the
victim could still pull off the same style
00:49:30.200 --> 00:49:35.030
of attacks.
Herald: Great. We have a question from the
00:49:35.030 --> 00:49:38.200
internet, please.
Signal Angel: The question is "Does the
00:49:38.200 --> 00:49:42.410
secure enclave on the Samsung Exynos
distinguish the receiver of the messag, so
00:49:42.410 --> 00:49:46.830
that if the user application asked to
decode an AES message, can one sniff on
00:49:46.830 --> 00:49:52.220
the value that the secure
enclave returns?"
00:49:52.220 --> 00:49:56.680
Keegan: So, that sounds like it's asking
about the true-spy style attack, where
00:49:56.680 --> 00:50:01.270
it's calling to the secure world to
encrypt something with AES. I think, that
00:50:01.270 --> 00:50:04.830
would all depend on the different
implementation: As long as it's encrypting
00:50:04.830 --> 00:50:09.790
for a certain key and it's able to do that
repeatably, then the attack would,
00:50:09.790 --> 00:50:16.290
assuming a vulnerable AES implementation,
would be able to extract that key out.
00:50:16.290 --> 00:50:20.750
Herald: Cool. Microphone number 2, please.
Mic 2: Do you recommend a reference to
00:50:20.750 --> 00:50:25.350
understand how these cache line attacks
and branch oracles actually lead to key
00:50:25.350 --> 00:50:29.540
recovery?
Keegan: Yeah. So, I will flip through
00:50:29.540 --> 00:50:33.620
these pages which include a lot of the
references for the attacks that I've
00:50:33.620 --> 00:50:38.030
mentioned, so if you're watching the
video, you can see these right away or
00:50:38.030 --> 00:50:43.200
just access the slides. And a lot of these
contain good starting points. So, I didn't
00:50:43.200 --> 00:50:46.340
go into a lot of the details on how, for
example, the true-spy attack recovered
00:50:46.340 --> 00:50:53.090
that AES key., but that paper does have a
lot of good links, how those areas can
00:50:53.090 --> 00:50:56.350
lead to key recovery. Same thing with the
CLKSCREW attack, how the different fault
00:50:56.350 --> 00:51:03.070
injection can lead to key recovery.
Herald: Microphone number 6, please.
00:51:03.070 --> 00:51:07.900
Mic 6: I think my question might have been
very, almost the same thing: How hard is
00:51:07.900 --> 00:51:11.920
it actually to recover the keys? Is this
like a massive machine learning problem or
00:51:11.920 --> 00:51:18.500
is this something that you can do
practically on a single machine?
00:51:18.500 --> 00:51:21.640
Keegan: It varies entirely by the end
implementation. So, for all these attacks
00:51:21.640 --> 00:51:25.750
work, you need to have some sort of
vulnerable implementation and some
00:51:25.750 --> 00:51:29.010
implementations leak more data than
others. In the case of a lot of the AES
00:51:29.010 --> 00:51:33.880
attacks, where you're doing the passive
attacks, those are very easy to do on just
00:51:33.880 --> 00:51:37.630
your own computer. For the AES fault
injection attack, I think that one
00:51:37.630 --> 00:51:42.340
required more brute force, in the CLKSCREW
paper, so that one required more computing
00:51:42.340 --> 00:51:49.780
resources, but still, it was entirely
practical to do in a realistic setting.
00:51:49.780 --> 00:51:53.770
Herald: Cool, thank you. So, we have one
more: Microphone number 1, please.
00:51:53.770 --> 00:51:59.080
Mic 1: So, I hope it's not a too naive
question, but I was wondering, since all
00:51:59.080 --> 00:52:04.730
these attacks are based on cache hit and
misses, isn't it possible to forcibly
00:52:04.730 --> 00:52:11.280
flush or invalidate or insert noise in
cache after each operation in this trust
00:52:11.280 --> 00:52:23.520
environment, in order to mess up the
guesswork of the attacker? So, discarding
00:52:23.520 --> 00:52:29.180
optimization and performance for
additional security benefits.
00:52:29.180 --> 00:52:32.420
Keegan: Yeah, and that is absolutely
possible and you are absolutely right: It
00:52:32.420 --> 00:52:36.300
does lead to a performance degradation,
because if you always flush the entire
00:52:36.300 --> 00:52:41.190
cache every time you do a context switch,
that will be a huge performance hit. So
00:52:41.190 --> 00:52:45.190
again, that comes down to the question of
the performance and security trade-off:
00:52:45.190 --> 00:52:49.540
Which one do you end up going with? And it
seems historically the choice has been
00:52:49.540 --> 00:52:54.000
more in the direction of performance.
Mic 1: Thank you.
00:52:54.000 --> 00:52:56.920
Herald: But we have one more: Microphone
number 1, please.
00:52:56.920 --> 00:53:01.500
Mic 1: So, I have more of a moral
question: So, how well should we really
00:53:01.500 --> 00:53:07.720
protect from attacks which need some
ring-0 cooperation? Because, basically,
00:53:07.720 --> 00:53:14.350
when we use TrustZone for purpose... we
would see clear, like protecting the
00:53:14.350 --> 00:53:20.250
browser from interacting from outside
world, then we are basically using the
00:53:20.250 --> 00:53:27.280
safe execution environment for sandboxing
the process. But once we need some
00:53:27.280 --> 00:53:32.281
cooperation from the kernel, some of that
attacks, is in fact, empower the user
00:53:32.281 --> 00:53:36.320
instead of the hardware producer.
Keegan: Yeah, and you're right. It
00:53:36.320 --> 00:53:39.210
depends entirely on what your application
is and what your threat model is that
00:53:39.210 --> 00:53:43.020
you're looking at. So, if you're using
these trusted execution environments to do
00:53:43.020 --> 00:53:48.430
DRM, for example, then maybe you wouldn't
be worried about that ring-0 attack or
00:53:48.430 --> 00:53:51.620
that privileged attacker who has their
phone rooted and is trying to recover
00:53:51.620 --> 00:53:56.740
these media encryption keys from this
execution environment. But maybe there are
00:53:56.740 --> 00:54:01.230
other scenarios where you're not as
worried about having an attack with a
00:54:01.230 --> 00:54:05.580
compromised ring 0. So, it entirely
depends on context.
00:54:05.580 --> 00:54:09.000
Herald: Alright, thank you. So, we have
one more: Microphone number 1, again.
00:54:09.000 --> 00:54:10.990
Mic 1: Hey there. Great talk, thank you
very much.
00:54:10.990 --> 00:54:13.040
Keegan: Thank you.
Mic 1: Just a short question: Do you have
00:54:13.040 --> 00:54:16.980
any success stories about attacking the
TrustZone and the different
00:54:16.980 --> 00:54:24.010
implementations of TE with some vendors
like some OEMs creating phones and stuff?
00:54:24.010 --> 00:54:29.750
Keegan: Not that I'm announcing
at this time.
00:54:29.750 --> 00:54:35.584
Herald: So, thank you very much. Please,
again a warm round of applause for Keegan!
00:54:35.584 --> 00:54:39.998
Applause
00:54:39.998 --> 00:54:45.489
34c3 postroll music
00:54:45.489 --> 00:55:02.000
subtitles created by c3subtitles.de
in the year 2018. Join, and help us!