34C3 - Microarchitectural Attacks on Trusted Execution Environments

0:00 - 0:15

34C3 preroll music
0:15 - 0:23

Herald: Hello fellow creatures.
Welcome and
0:23 - 0:30

I wanna start with a question.
Another one: Who do we trust?
0:30 - 0:36

Do we trust the TrustZones
on our smartphones?
0:36 - 0:42

Well Keegan Ryan, he's really
fortunate to be here and
0:42 - 0:52

he was inspired by another talk from the
CCC before - I think it was 29C3 and his
0:52 - 0:58

research on smartphones and systems on a
chip used in smart phones will answer
0:58 - 1:03

these questions if you can trust those
trusted execution environments. Please
1:03 - 1:06

give a warm round of applause
to Keegan and enjoy!
1:06 - 1:11

Applause
1:11 - 1:14

Kegan Ryan: All right, thank you! So I'm
Keegan Ryan, I'm a consultant with NCC
1:14 - 1:20

group and this is micro architectural
attacks on Trusted Execution Environments.
1:20 - 1:23

So, in order to understand what a Trusted
Execution Environment is we need to go
1:23 - 1:30

back into processor security, specifically
on x86. So as many of you are probably
1:30 - 1:34

aware there are a couple different modes
which we can execute code under in x86
1:34 - 1:39

processors and that includes ring 3, which
is the user code and the applications, and
1:39 - 1:46

also ring 0 which is the kernel code. Now
there's also a ring 1 and ring 2 that are
1:46 - 1:50

supposedly used for drivers or guest
operating systems but really it just boils
1:50 - 1:56

down to ring 0 and ring 3. And in this
diagram we have here we see that privilege
1:56 - 2:02

increases as we go up the diagram, so ring
0 is the most privileged ring and ring 3
2:02 - 2:05

is the least privileged ring. So all of
our secrets, all of our sensitive
2:05 - 2:10

information, all of the attackers goals
are in ring 0 and the attacker is trying
2:10 - 2:16

to access those from the unprivileged
world of ring 3. Now you may have a
2:16 - 2:20

question what if I want to add a processor
feature that I don't want ring 0 to be
2:20 - 2:26

able to access? Well then you add ring -1
which is often used for a hypervisor. Now
2:26 - 2:31

the hypervisor has all the secrets and the
hypervisor can manage different guest
2:31 - 2:36

operating systems and each of these guest
operating systems can execute in ring 0
2:36 - 2:41

without having any idea of the other
operating systems. So this way now the
2:41 - 2:45

secrets are all in ring -1 so now the
attackers goals have shifted from ring 0
2:45 - 2:51

to ring -1. The attacker has to attack
ring -1 from a less privileged ring and
2:51 - 2:55

tries to access those secrets. But what if
you want to add a processor feature that
2:55 - 3:01

you don't want ring -1 to be able to
access? So you add ring -2 which is System
3:01 - 3:05

Management Mode and that's capable of
monitoring power, directly interfacing
3:05 - 3:10

with firmware and other chips on a
motherboard and it's able to access and do
3:10 - 3:14

a lot of things that the hypervisor is not
able to and now all of your secrets and
3:14 - 3:18

all of your attacker goals are in ring -2
and the attacker has to attack those from
3:18 - 3:22

a less privileged ring. Now maybe you want
to add something to your processor that
3:22 - 3:27

you don't want ring -2 to be able access,
so you add ring -3 and I think you get the
3:27 - 3:31

picture now. And we just keep on adding
more and more privilege rings and keep
3:31 - 3:35

putting our secrets and our attackers
goals in these higher and higher
3:35 - 3:41

privileged rings but what if we're
thinking about it wrong? What if instead
3:41 - 3:47

we want to put all the secrets in the
least privileged ring? So this is sort of
3:47 - 3:51

the idea behind SGX and it's useful for
things like DRM where you want that to run
3:51 - 3:57

ring 3 code but have sensitive secrets or
other assigning capabilities running in
3:57 - 4:02

ring 3. But this picture is getting a
little bit complicated, this diagram is a
4:02 - 4:06

little bit complex so let's simplify it a
little bit. We'll only be looking at ring
4:06 - 4:12

0 through ring 3 which is the kernel, the
userland and the SGX enclave which also
4:12 - 4:17

executes in ring 3. Now when you're
executing code in the SGX enclave you
4:17 - 4:22

first load the code into the enclave and
then from that point on you trust the
4:22 - 4:27

execution of whatever's going on in that
enclave. You trust that the other elements
4:27 - 4:32

the kernel, the userland, the other rings
are not going to be able to access what's
4:32 - 4:38

in that enclave so you've made your
Trusted Execution Environment. This is a
4:38 - 4:45

bit of a weird model because now your
attacker is in the ring 0 kernel and your
4:45 - 4:49

target victim here is in ring 3. So
instead of the attacker trying to move up
4:49 - 4:54

the privilege chain, the attacker is
trying to move down. Which is pretty
4:54 - 4:58

strange and you might have some questions
like "under this model who handles memory
4:58 - 5:01

management?" because traditionally that's
something that ring 0 would manage and
5:01 - 5:05

ring 0 would be responsible for paging
memory in and out for different processes
5:05 - 5:10

in different code that's executing it in
ring 3. But on the other hand you don't
5:10 - 5:16

want that to happen with the SGX enclave
because what if the malicious ring 0 adds
5:16 - 5:22

a page to the enclave that the enclave
doesn't expect? So in order to solve this
5:22 - 5:29

problem, SGX does allow ring 0 to handle
page faults. But simultaneously and in
5:29 - 5:35

parallel it verifies every memory load to
make sure that no access violations are
5:35 - 5:40

made so that all the SGX memory is safe.
So it allows ring 0 to do its job but it
5:40 - 5:45

sort of watches over at the same time to
make sure that nothing is messed up. So
5:45 - 5:51

it's a bit of a weird convoluted solution
to a strange inverted problem but it works
5:51 - 5:58

and that's essentially how SGX works and
the idea behind SGX. Now we can look at
5:58 - 6:03

x86 and we can see that ARMv8 is
constructed in a similar way but it
6:03 - 6:08

improves on x86 in a couple key ways. So
first of all ARMv8 gets rid of ring 1 and
6:08 - 6:12

ring 2 so you don't have to worry about
those and it just has different privilege
6:12 - 6:17

levels for userland and the kernel. And
these different privilege levels are
6:17 - 6:22

called exception levels in the ARM
terminology. And the second thing that ARM
6:22 - 6:26

gets right compared to x86 is that instead
of starting at 3 and counting down as
6:26 - 6:31

privilege goes up, ARM starts at 0 and
counts up so we don't have to worry about
6:31 - 6:36

negative numbers anymore. Now when we add
the next privilege level the hypervisor we
6:36 - 6:41

call it exception level 2 and the next one
after that is the monitor in exception
6:41 - 6:47

level 3. So at this point we still want to
have the ability to run trusted code in
6:47 - 6:53

exception level 0 the least privileged
level of the ARMv8 processor. So in order
6:53 - 6:59

to support this we need to separate this
diagram into two different sections. In
6:59 - 7:04

ARMv8 these are called the secure world
and the non-secure world. So we have the
7:04 - 7:08

non-secure world on the left in blue that
consists of the userland, the kernel and
7:08 - 7:12

the hypervisor and we have the secure
world on the right which consists of the
7:12 - 7:17

monitor in exception level 3, a trusted
operating system in exception level 1 and
7:17 - 7:23

trusted applications in exception level 0.
So the idea is that if you run anything in
7:23 - 7:27

the secure world, it should not be
accessible or modifiable by anything in
7:27 - 7:32

the non secure world. So that's how our
attacker is trying to access it. The
7:32 - 7:36

attacker has access to the non secure
kernel, which is often Linux, and they're
7:36 - 7:40

trying to go after the trusted apps. So
once again we have this weird inversion
7:40 - 7:43

where we're trying to go from a more
privileged level to a less privileged
7:43 - 7:48

level and trying to extract secrets in
that way. So the question that arises when
7:48 - 7:53

using these Trusted Execution Environments
that are implemented in SGX and TrustZone
7:53 - 7:58

in ARM is "can we use these privilege
modes in our privilege access in order to
7:58 - 8:03

attack these Trusted Execution
Environments?". Now transfer that question
8:03 - 8:06

and we can start looking at a few
different research papers. The first one
8:06 - 8:11

that I want to go into is one called
CLKSCREW and it's an attack on TrustZone.
8:11 - 8:14

So throughout this presentation I'm going
to go through a few different papers and
8:14 - 8:18

just to make it clear which papers have
already been published and which ones are
8:18 - 8:21

old I'll include the citations in the
upper right hand corner so that way you
8:21 - 8:27

can tell what's old and what's new. And as
far as papers go this CLKSCREW paper is
8:27 - 8:31

relatively new. It was released in 2017.
And the way CLKSCREW works is it takes
8:31 - 8:38

advantage of the energy management
features of a processor. So a non-secure
8:38 - 8:42

operating system has the ability to manage
the energy consumption of the different
8:42 - 8:48

cores. So if a certain target core doesn't
have much scheduled to do then the
8:48 - 8:52

operating system is able to scale back
that voltage or dial down the frequency on
8:52 - 8:56

that core so that core uses less energy
which is a great thing for performance: it
8:56 - 9:01

really extends battery life, it makes the
the cores last longer and it gives better
9:01 - 9:07

performance overall. But the problem here
is what if you have two separate cores and
9:07 - 9:12

one of your cores is running this non-
trusted operating system and the other
9:12 - 9:16

core is running code in the secure world?
It's running that trusted code those
9:16 - 9:21

trusted applications so that non secure
operating system can still dial down that
9:21 - 9:26

voltage and it can still change that
frequency and those changes will affect
9:26 - 9:31

the secure world code. So what the
CLKSCREW attack does is the non secure
9:31 - 9:36

operating system core will dial down the
voltage, it will overclock the frequency
9:36 - 9:41

on the target secure world core in order
to induce faults to make sure to make the
9:41 - 9:46

computation on that core fail in some way
and when that computation fails you get
9:46 - 9:50

certain cryptographic errors that the
attack can use to infer things like secret
9:50 - 9:56

keys, secret AES keys and to bypass code
signing implemented in the secure world.
9:56 - 10:00

So it's a very powerful attack that's made
possible because the non-secure operating
10:00 - 10:06

system is privileged enough in order to
use these energy management features. Now
10:06 - 10:10

CLKSCREW is an example of an active attack
where the attacker is actively changing
10:10 - 10:15

the outcome of the victim code of that
code in the secure world. But what about
10:15 - 10:21

passive attacks? So in a passive attack,
the attacker does not modify the actual
10:21 - 10:25

outcome of the process. The attacker just
tries to monitor that process infer what's
10:25 - 10:29

going on and that is the sort of attack
that we'll be considering for the rest of
10:29 - 10:36

the presentation. So in a lot of SGX and
TrustZone implementations, the trusted and
10:36 - 10:40

the non-trusted code both share the same
hardware and this shared hardware could be
10:40 - 10:46

a shared cache, it could be a branch
predictor, it could be a TLB. The point is
10:46 - 10:53

that they share the same hardware so that
the changes made by the secure code may be
10:53 - 10:57

reflected in the behavior of the non-
secure code. So the trusted code might
10:57 - 11:02

execute, change the state of that shared
cache for example and then the untrusted
11:02 - 11:07

code may be able to go in, see the changes
in that cache and infer information about
11:07 - 11:12

the behavior of the secure code. So that's
essentially how our side channel attacks
11:12 - 11:16

are going to work. If the non-secure code
is going to monitor these shared hardware
11:16 - 11:23

resources for state changes that reflect
the behavior of the secure code. Now we've
11:23 - 11:28

all talked about how Intel and SGX address
the problem of memory management and who's
11:28 - 11:33

responsible for making sure that those
attacks don't work on SGX. So what do they
11:33 - 11:37

have to say on how they protect against
these side channel attacks and attacks on
11:37 - 11:45

this shared cache hardware? They don't..
at all. They essentially say "we do not
11:45 - 11:49

consider this part of our threat model. It
is up to the developer to implement the
11:49 - 11:54

protections needed to protect against
these side-channel attacks". Which is
11:54 - 11:57

great news for us because these side
channel attacks can be very powerful and
11:57 - 12:00

if there aren't any hardware features that
are necessarily stopping us from being
12:00 - 12:07

able to accomplish our goal it makes us
that more likely to succeed. So with that
12:07 - 12:11

we can sort of take a step back from trust
zone industry acts and just take a look at
12:11 - 12:15

cache attacks to make sure that we all
have the same understanding of how the
12:15 - 12:20

cache attacks will be applied to these
Trusted Execution Environments. To start
12:20 - 12:26

that let's go over a brief recap of how a
cache works. So caches are necessary in
12:26 - 12:30

processors because accessing the main
memory is slow. When you try to access
12:30 - 12:34

something from the main memory it takes a
while to be read into the process. So the
12:34 - 12:40

cache exists as sort of a layer to
remember what that information is so if
12:40 - 12:45

the process ever needs information from
that same address it just reloads it from
12:45 - 12:50

the cache and that access is going to be
fast. So it really speeds up the memory
12:50 - 12:56

access for repeated accesses to the same
address. And then if we try to access a
12:56 - 13:00

different address then that will also be
read into the cache, slowly at first but
13:00 - 13:07

then quickly for repeated accesses and so
on and so forth. Now as you can probably
13:07 - 13:11

tell from all of these examples the memory
blocks have been moving horizontally
13:11 - 13:16

they've always been staying in the same
row. And that is reflective of the idea of
13:16 - 13:20

sets in a cache. So there are a number of
different set IDs and that corresponds to
13:20 - 13:24

the different rows in this diagram. So for
our example there are four different set
13:24 - 13:31

IDs and each address in the main memory
maps to a different set ID. So that
13:31 - 13:35

address in main memory will only go into
that location in the cache with the same
13:35 - 13:40

set ID so it will only travel along those
rows. So that means if you have two
13:40 - 13:43

different blocks of memory that mapped to
different set IDs they're not going to
13:43 - 13:49

interfere with each other in the cache.
But that raises the question "what about
13:49 - 13:53

two memory blocks that do map to the same
set ID?". Well if there's room in the
13:53 - 13:59

cache then the same thing will happen as
before: those memory contents will be
13:59 - 14:04

loaded into the cache and then retrieved
from the cache for future accesses. And
14:04 - 14:08

the number of possible entries for a
particular set ID within a cache is called
14:08 - 14:12

the associativity. And on this diagram
that's represented by the number of
14:12 - 14:17

columns in the cache. So we will call our
cache in this example a 2-way set-
14:17 - 14:22

associative cache. Now the next question
is "what happens if you try to read a
14:22 - 14:27

memory address that maps the same set ID
but all of those entries within that said ID
14:27 - 14:33

within the cache are full?". Well one of
those entries is chosen, it's evicted from
14:33 - 14:39

the cache, the new memory is read in and
then that's fed to the process. So it
14:39 - 14:44

doesn't really matter how the cache entry
is chosen that you're evicting for the
14:44 - 14:48

purpose of the presentation you can just
assume that it's random. But the important
14:48 - 14:52

thing is that if you try to access that
same memory that was evicted before you're
14:52 - 14:56

not going to have to wait for that time
penalty for that to be reloaded into the
14:56 - 15:01

cache and read into the process. So those
are caches in a nutshell in particularly
15:01 - 15:06

set associative caches, we can begin
looking at the different types of cache
15:06 - 15:09

attacks. So for a cache attack we have two
different processes we have an attacker
15:09 - 15:14

process and a victim process. For this
type of attack that we're considering both
15:14 - 15:17

of them share the same underlying code so
they're trying to access the same
15:17 - 15:22

resources which could be the case if you
have page deduplication in virtual
15:22 - 15:26

machines or if you have copy-on-write
mechanisms for shared code and shared
15:26 - 15:32

libraries. But the point is that they
share the same underlying memory. Now the
15:32 - 15:36

Flush and Reload Attack works in two
stages for the attacker. The attacker
15:36 - 15:39

first starts by flushing out the cache.
They flush each and every addresses in the
15:39 - 15:44

cache so the cache is just empty. Then the
attacker let's the victim executes for a
15:44 - 15:49

small amount of time so the victim might
read on an address from main memory
15:49 - 15:53

loading that into the cache and then the
second stage of the attack is the reload
15:53 - 15:58

phase. In the reload phase the attacker
tries to load different memory addresses
15:58 - 16:04

from main memory and see if those entries
are in the cache or not. Here the attacker
16:04 - 16:09

will first try to load address 0 and see
that because it takes a long time to read
16:09 - 16:14

the contents of address 0 the attacker can
infer that address 0 was not part of the
16:14 - 16:17

cache which makes sense because the
attacker flushed it from the cache in the
16:17 - 16:23

first stage. The attacker then tries to
read the memory at address 1 and sees that
16:23 - 16:29

this operation is fast so the attacker
infers that the contents of address 1 are
16:29 - 16:33

in the cache and because the attacker
flushed everything from the cache before
16:33 - 16:37

the victim executed, the attacker then
concludes that the victim is responsible
16:37 - 16:43

for bringing address 1 into the cache.
This Flush+Reload attack reveals which
16:43 - 16:47

memory addresses the victim accesses
during that small slice of time. Then
16:47 - 16:51

after that reload phase, the attack
repeats so the attacker flushes again
16:51 - 16:58

let's the victim execute, reloads again
and so on. There's also a variant on the
16:58 - 17:01

Flush+Reload attack that's called the
Flush+Flush attack which I'm not going to
17:01 - 17:06

go into the details of, but essentially
it's the same idea. But instead of using
17:06 - 17:09

load instructions to determine whether or
not a piece of memory is in the cache or
17:09 - 17:14

not, it uses flush instructions because
flush instructions will take longer if
17:14 - 17:19

something is in the cache already. The
important thing is that both the
17:19 - 17:23

Flush+Reload attack and the Flush+Flush
attack rely on the attacker and the victim
17:23 - 17:27

sharing the same memory. But this isn't
always the case so we need to consider
17:27 - 17:31

what happens when the attacker and the
victim do not share memory. For this we
17:31 - 17:36

have the Prime+Probe attack. The
Prime+Probe attack once again works in two
17:36 - 17:40

separate stages. In the first stage the
attacker prime's the cache by reading all
17:40 - 17:44

the attacker memory into the cache and
then the attacker lets the victim execute
17:44 - 17:50

for a small amount of time. So no matter
what the victim accesses from main memory
17:50 - 17:54

since the cache is full of the attacker
data, one of those attacker entries will
17:54 - 17:59

be replaced by a victim entry. Then in the
second phase of the attack, during the
17:59 - 18:04

probe phase, the attacker checks the
different cache entries for particular set
18:04 - 18:09

IDs and sees if all of the attacker
entries are still in the cache. So maybe
18:09 - 18:13

our attacker is curious about the last set
ID, the bottom row, so the attacker first
18:13 - 18:18

tries to load the memory at address 3 and
because this operation is fast the
18:18 - 18:23

attacker knows that address 3 is in the
cache. The attacker tries the same thing
18:23 - 18:28

with address 7, sees that this operation
is slow and infers that at some point
18:28 - 18:33

address 7 was evicted from the cache so
the attacker knows that something had to
18:33 - 18:37

evicted from the cache and it had to be
from the victim so the attacker concludes
18:37 - 18:43

that the victim accessed something in that
last set ID and that bottom row. The
18:43 - 18:47

attacker doesn't know if it was the
contents of address 11 or the contents of
18:47 - 18:51

address 15 or even what those contents
are, but the attacker has a good idea of
18:51 - 18:57

which set ID it was. So, the good things,
the important things to remember about
18:57 - 19:01

cache attacks is that caches are very
important, they're crucial for performance
19:01 - 19:06

on processors, they give a huge speed
boost and there's a huge time difference
19:06 - 19:12

between having a cache and not having a
cache for your executables. But the
19:12 - 19:16

downside to this is that big time
difference also allows the attacker to
19:16 - 19:22

infer information about how the victim is
using the cache. We're able to use these
19:22 - 19:24

cache attacks in the two different
scenarios of, where memory is shared, in
19:24 - 19:28

the case of the Flush+Reload and
Flush+Flush attacks and in the case where
19:28 - 19:32

memory is not shared, in the case of the
Prime+Probe attack. And finally the
19:32 - 19:37

important thing to keep in mind is that,
for these cache attacks, we know where the
19:37 - 19:40

victim is looking, but we don't know what
they see. So we don't know the contents of
19:40 - 19:44

the memory that the victim is actually
seeing, we just know the location and the
19:44 - 19:52

addresses. So, what does an example trace
of these attacks look like? Well, there's
19:52 - 19:56

an easy way to represent these as two-
dimensional images. So in this image, we
19:56 - 20:02

have our horizontal axis as time, so each
column in this image represents a
20:02 - 20:07

different time slice, a different
iteration of the Prime measure and Probe.
20:07 - 20:11

So, then we also have the vertical access
which is the different set IDs, which is
20:11 - 20:18

the location that's accessed by the victim
process, and then here a pixel is white if
20:18 - 20:24

the victim accessed that set ID during
that time slice. So, as you look from left
20:24 - 20:28

to right as time moves forward, you can
sort of see the changes in the patterns of
20:28 - 20:34

the memory accesses made by the victim
process. Now, for this particular example
20:34 - 20:40

the trace is captured on an execution of
AES repeated several times, an AES
20:40 - 20:45

encryption repeated about 20 times. And
you can tell that this is a repeated
20:45 - 20:49

action because you see the same repeated
memory access patterns in the data, you
20:49 - 20:55

see the same structures repeated over and
over. So, you know that this is reflecting
20:55 - 21:01

at what's going on throughout time, but
what does it have to do with AES itself?
21:01 - 21:06

Well, if we take the same trace with the
same settings, but a different key, we see
21:06 - 21:12

that there is a different memory access
pattern with different repetition within
21:12 - 21:18

the trace. So, only the key changed, the
code didn't change. So, even though we're
21:18 - 21:22

not able to read the contents of the key
directly using this cache attack, we know
21:22 - 21:26

that the key is changing these memory
access patterns, and if we can see these
21:26 - 21:31

memory access patterns, then we can infer
the key. So, that's the essential idea: we
21:31 - 21:35

want to make these images as clear as
possible and as descriptive as possible so
21:35 - 21:42

we have the best chance of learning what
those secrets are. And we can define the
21:42 - 21:47

metrics for what makes these cache attacks
powerful in a few different ways. So, the
21:47 - 21:52

three ways we'll be looking at are spatial
resolution, temporal resolution and noise.
21:52 - 21:56

So, spatial resolution refers to how
accurately we can determine the where. If
21:56 - 22:01

we know that the victim access to memory
address within 1,000 bytes, that's
22:01 - 22:07

obviously not as powerful as knowing where
they accessed within 512 bytes. Temporal
22:07 - 22:12

resolution is similar, where we want to
know the order of what accesses the victim
22:12 - 22:18

made. So if that time slice during our
attack is 1 millisecond, we're going to
22:18 - 22:22

get much better ordering information on
those memory access than we would get if
22:22 - 22:27

we only saw all the memory accesses over
the course of one second. So the shorter
22:27 - 22:32

that time slice, the better the temporal
resolution, the longer our picture will be
22:32 - 22:38

on the horizontal access, and the clearer
of an image of the cache that we'll see.
22:38 - 22:41

And the last metric to evaluate our
attacks on is noise and that reflects how
22:41 - 22:46

accurately our measurements reflect the
true state of the cache. So, right now
22:46 - 22:50

we've been using time and data to infer
whether or not an item was in the cache or
22:50 - 22:54

not, but this is a little bit noisy. It's
possible that we'll have false positives
22:54 - 22:57

or false negatives, so we want to keep
that in mind as we look at the different
22:57 - 23:03

attacks. So, that's essentially cache
attacks, and then, in a nutshell and
23:03 - 23:07

that's all you really need to understand
in order to understand these attacks as
23:07 - 23:11

they've been implemented on Trusted
Execution Environments. And the first
23:11 - 23:15

particular attack that we're going to be
looking at is called a Controlled-Channel
23:15 - 23:20

Attack on SGX, and this attack isn't
necessarily a cache attack, but we can
23:20 - 23:24

analyze it in the same way that we analyze
the cache attacks. So, it's still useful
23:24 - 23:31

to look at. Now, if you remember how
memory management occurs with SGX, we know
23:31 - 23:36

that if a page fault occurs during SGX
Enclave code execution, that page fault is
23:36 - 23:43

handled by the kernel. So, the kernel has
to know which page the Enclave needs to be
23:43 - 23:48

paged in. The kernel already gets some
information about what the Enclave is
23:48 - 23:55

looking at. Now, in the Controlled-Channel
attack, there's a, what the attacker does
23:55 - 24:00

from the non-trusted OS is the attacker
pages almost every other page from the
24:00 - 24:05

Enclave out of memory. So no matter
whatever page that Enclave tries to
24:05 - 24:10

access, it's very likely to cause a page
fault, which will be redirected to the
24:10 - 24:14

non-trusted OS, where the non-trusted OS
can record it, page out any other pages
24:14 - 24:20

and continue execution. So, the OS
essentially gets a list of sequential page
24:20 - 24:26

accesses made by the SGX Enclaves, all by
capturing the page fault handler. This is
24:26 - 24:30

a very general attack, you don't need to
know what's going on in the Enclave in
24:30 - 24:33

order to pull this off. You just load up
an arbitrary Enclave and you're able to
24:33 - 24:41

see which pages that Enclave is trying to
access. So, how does it do on our metrics?
24:41 - 24:44

First of all, this spatial resolution is
not great. We can only see where the
24:44 - 24:50

victim is accessing within 4096 bytes or
the size of a full page because SGX
24:50 - 24:56

obscures the offset into the page where
the page fault occurs. The temporal
24:56 - 24:59

resolution is good but not great, because
even though we're able to see any
24:59 - 25:04

sequential accesses to different pages
we're not able to see sequential accesses
25:04 - 25:10

to the same page because we need to keep
that same page paged-in while we let our
25:10 - 25:15

SGX Enclave run for that small time slice.
So temporal resolution is good but not
25:15 - 25:22

perfect. But the noise is, there is no
noise in this attack because no matter
25:22 - 25:26

where the page fault occurs, the untrusted
operating system is going to capture that
25:26 - 25:30

page fault and is going to handle it. So,
it's very low noise, not great spatial
25:30 - 25:37

resolution but overall still a powerful
attack. But we still want to improve on
25:37 - 25:41

that spatial resolution, we want to be
able to see what the Enclave is doing that
25:41 - 25:46

greater than a resolution of a one page of
four kilobytes. So that's exactly what the
25:46 - 25:50

CacheZoom paper does, and instead of
interrupting the SGX Enclave execution
25:50 - 25:55

with page faults, it uses timer
interrupts. Because the untrusted
25:55 - 25:59

operating system is able to schedule when
timer interrupts occur, so it's able to
25:59 - 26:03

schedule them at very tight intervals, so
it's able to get that small and tight
26:03 - 26:09

temporal resolution. And essentially what
happens in between is this timer
26:09 - 26:13

interrupts fires, the untrusted operating
system runs the Prime+Probe attack code in
26:13 - 26:18

this case, and resumes execution of the
onclick process, and this repeats. So this
26:18 - 26:25

is a Prime+Probe attack on the L1 data
cache. So, this attack let's you see what
26:25 - 26:31

data The Enclave is looking at. Now, this
attack could be easily modified to use the
26:31 - 26:36

L1 instruction cache, so in that case you
learn which instructions The Enclave is
26:36 - 26:41

executing. And overall this is an even
more powerful attack than the Control-
26:41 - 26:46

Channel attack. If we look at the metrics,
we can see that the spatial resolution is
26:46 - 26:50

a lot better, now we're looking at spatial
resolution of 64 bytes or the size of an
26:50 - 26:55

individual line. The temporal resolution
is very good, it's "almost unlimited", to
26:55 - 27:00

quote the paper, because the untrusted
operating system has the privilege to keep
27:00 - 27:05

scheduling those time interrupts closer
and closer together until it's able to
27:05 - 27:10

capture very small time slices of the
victim process .And the noise itself is
27:10 - 27:15

low, we're still using a cycle counter to
measure the time it takes to load memory
27:15 - 27:21

in and out of the cache, but it's, it's
useful, the chances of having a false
27:21 - 27:27

positive or false negative are low, so the
noise is low as well. Now, we can also
27:27 - 27:31

look at Trust Zone attacks, because so far
the attacks that we've looked at, the
27:31 - 27:35

passive attacks, have been against SGX and
those attacks on SGX have been pretty
27:35 - 27:41

powerful. So, what are the published
attacks on Trust Zone? Well, there's one
27:41 - 27:45

called TruSpy, which is kind of similar in
concept to the CacheZoom attack that we
27:45 - 27:52

just looked at on SGX. It's once again a
Prime+probe attack on the L1 data cache,
27:52 - 27:57

and the difference here is that instead of
interrupting the victim code execution
27:57 - 28:04

multiple times, the TruSpy attack does the
prime step, does the full AES encryption,
28:04 - 28:09

and then does the probe step. And the
reason they do this, is because as they
28:09 - 28:13

say, the secure world is protected, and is
not interruptible in the same way that SGX
28:13 - 28:21

is interruptable. But even despite this,
just having one measurement per execution,
28:21 - 28:25

the TruSpy authors were able to use some
statistics to still recover the AES key
28:25 - 28:30

from that noise. And their methods were so
powerful, they are able to do this from an
28:30 - 28:35

unapproved application in user land, so
they don't even need to be running within
28:35 - 28:40

the kernel in order to be able to pull off
this attack. So, how does this attack
28:40 - 28:43

measure up? The spatial resolution is once
again 64 bytes because that's the size of
28:43 - 28:49

a cache line on this processor, and the
temporal resolution is, is pretty poor
28:49 - 28:54

here, because we only get one measurement
per execution of the AES encryption. This
28:54 - 28:59

is also a particularly noisy attack
because we're making the measurements from
28:59 - 29:03

the user land, but even if we make the
measurements from the kernel, we're still
29:03 - 29:06

going to have the same issues of false
positives and false negatives associated
29:06 - 29:12

with using a cycle counter to measure
membership in a cache. So, we'd like to
29:12 - 29:16

improve this a little bit. We'd like to
improve the temporal resolution, so we
29:16 - 29:21

have the power of the cache attack to be a
little bit closer on TrustZone, as it is
29:21 - 29:27

on SGX. So, we want to improve that
temporal resolution. Let's dig into that
29:27 - 29:31

statement a little bit, that the secure
world is protected and not interruptable.
29:31 - 29:36

And to do, this we go back to this diagram
of ARMv8 and how that TrustZone is set up.
29:36 - 29:41

So, it is true that when an interrupt
occurs, it is directed to the monitor and,
29:41 - 29:46

because the monitor operates in the secure
world, if we interrupt secure code that's
29:46 - 29:49

running an exception level 0, we're just
going to end up running secure code an
29:49 - 29:54

exception level 3. So, this doesn't
necessarily get us anything. I think,
29:54 - 29:58

that's what the author's mean by saying
that it's protected against this. Just by
29:58 - 30:03

setting an interrupt, we don't have a
way to redirect our flow to the non-
30:03 - 30:08

trusted code. At least that's how it works
in theory. In practice, the Linux
30:08 - 30:12

operating system, running in exception
level 1 in the non-secure world, kind of
30:12 - 30:15

needs interrupts in order to be able to
work, so if an interrupt occurs and it's
30:15 - 30:18

being sent to the monitor, the monitor
will just forward it right to the non-
30:18 - 30:22

secure operating system. So, we have
interrupts just the same way as we did in
30:22 - 30:29

CacheZoom. And we can improve the
TrustZone attacks by using this idea: We
30:29 - 30:34

have 2 cores, where one core is running
the secure code, the other core is running
30:34 - 30:38

the non-secure code, and the non-secure
code is sending interrupts to the secure-
30:38 - 30:43

world core and that will give us that
interleaving of attacker process and
30:43 - 30:47

victim process that allow us to have a
powerful prime-and-probe attack. So, what
30:47 - 30:51

does this look like? We have the attack
core and the victim core. The attack core
30:51 - 30:55

sends an interrupt to the victim core.
This interrupt is captured by the monitor,
30:55 - 30:59

which passes it to the non-secure
operating system. The not-secure operating
30:59 - 31:03

system transfers this to our attack code,
which runs the prime-and-probe attack.
31:03 - 31:07

Then, we leave the interrupt, the
execution within the victim code in the
31:07 - 31:11

secure world resumes and we just repeat
this over and over. So, now we have that
31:11 - 31:17

interleaving of data... of the processes
of the attacker and the victim. So, now,
31:17 - 31:23

instead of having a temporal resolution of
one measurement per execution, we once
31:23 - 31:26

again have almost unlimited temporal
resolution, because we can just schedule
31:26 - 31:32

when we send those interrupts from the
attacker core. Now, we'd also like to
31:32 - 31:38

improve the noise measurements. The...
because if we can improve the noise, we'll
31:38 - 31:42

get clearer pictures and we'll be able to
infer those secrets more clearly. So, we
31:42 - 31:46

can get some improvement by switching the
measurements from userland and starting to
31:46 - 31:51

do those in the kernel, but again we have
the cycle counters. So, what if, instead
31:51 - 31:54

of using the cycle counter to measure
whether or not something is in the cache,
31:54 - 32:00

we use the other performance counters?
Because on ARMv8 platforms, there is a way
32:00 - 32:04

to use performance counters to measure
different events, such as cache hits and
32:04 - 32:10

cache misses. So, these events and these
performance monitors require privileged
32:10 - 32:15

access in order to use, which, for this
attack, we do have. Now, in a typical
32:15 - 32:19

cache text scenario we wouldn't have
access to these performance monitors,
32:19 - 32:22

which is why they haven't really been
explored before, but in this weird
32:22 - 32:25

scenario where we're attacking the less
privileged code from the more privileged
32:25 - 32:29

code, we do have access to these
performance monitors and we can use these
32:29 - 32:34

monitors during the probe step to get a
very accurate count of whether or not a
32:34 - 32:40

certain memory load caused a cache miss or
a cache hit. So, we're able to essentially
32:40 - 32:46

get rid of the different levels of noise.
Now, one thing to point out is that maybe
32:46 - 32:49

we'd like to use these ARMv8 performance
counters in order to count the different
32:49 - 32:54

events that are occurring in the secure
world code. So, maybe we start the
32:54 - 32:58

performance counters from the non-secure
world, let the secure world run and then,
32:58 - 33:02

when they secure world exits, we use the
non-secure world to read these performance
33:02 - 33:05

counters and maybe we'd like to see how
many instructions the secure world
33:05 - 33:09

executed or how many branch instructions
or how many arithmetic instructions or how
33:09 - 33:13

many cache misses there were. But
unfortunately, ARMv8 took this into
33:13 - 33:17

account and by default, performance
counters that are started in the non-
33:17 - 33:21

secure world will not measure events that
happen in the secure world, which is
33:21 - 33:25

smart; which is how it should be. And the
only reason I bring this up is because
33:25 - 33:29

that's not how it is an ARMv7. So, we go
into a whole different talk with that,
33:29 - 33:34

just exploring the different implications
of what that means, but I want to focus on
33:34 - 33:39

ARMv8, because that's that's the newest of
the new. So, we'll keep looking at that.
33:39 - 33:43

So, we instrument the primary probe attack
to use these performance counters, so we
33:43 - 33:47

can get a clear picture of what is and
what is not in the cache. And instead of
33:47 - 33:52

having noisy measurements based on time,
we have virtually no noise at all, because
33:52 - 33:56

we get the truth straight from the
processor itself, whether or not we
33:56 - 34:02

experience a cache miss. So, how do we
implement these attacks, where do we go
34:02 - 34:06

from here? We have all these ideas; we
have ways to make these TrustZone attacks
34:06 - 34:12

more powerful, but that's not worthwhile,
unless we actually implement them. So, the
34:12 - 34:17

goal here is to implement these attacks on
TrustZone and since typically the non-
34:17 - 34:21

secure world operating system is based on
Linux, we'll take that into account when
34:21 - 34:25

making our implementation. So, we'll write
a kernel module that uses these
34:25 - 34:29

performance counters and these inner
processor interrupts, in order to actually
34:29 - 34:33

accomplish these attacks; and we'll write
it in such a way that it's very
34:33 - 34:37

generalizable. So you can take this kernel
module that's was written for one device
34:37 - 34:42

-- in my case I did most of my attention
on the Nexus 5x -- and it's very easy to
34:42 - 34:47

transfer this module to any other Linux-
based device that has a trust zone that has
34:47 - 34:52

these shared caches, so it should be very
easy to port this over and to perform
34:52 - 34:58

these same powerful cache attacks on
different platforms. We can also do clever
34:58 - 35:02

things based on the Linux operating
system, so that we limit that collection
35:02 - 35:06

window to just when we're executing within
the secure world, so we can align our
35:06 - 35:11

traces a lot more easily that way. And the
end result is having a synchronized trace
35:11 - 35:15

for each different attacks, because, since
we've written in a modular way, we're able
35:15 - 35:19

to run different attacks simultaneously.
So, maybe we're running one prime-and-
35:19 - 35:23

probe attack on the L1 data cache, to
learn where the victim is accessing
35:23 - 35:27

memory, and we're simultaneously running
an attack on the L1 instruction cache, so
35:27 - 35:34

we can see what instructions the victim is
executing. And these can be aligned. So,
35:34 - 35:37

the tool that I've written is a
combination of a kernel module which
35:37 - 35:42

actually performs this attack, a userland
binary which schedules these processes to
35:42 - 35:46

different cores, and a GUI that will allow
you to interact with this kernel module
35:46 - 35:50

and rapidly start doing these cache
attacks for yourself and perform them
35:50 - 35:57

against different processes and secure
code and secure world code. So, the
35:57 - 36:03

intention behind this tool is to be very
generalizable to make it very easy to use
36:03 - 36:08

this platform for different devices and to
allow people way to, once again, quickly
36:08 - 36:12

develop these attacks; and also to see if
their own code is vulnerable to these
36:12 - 36:18

cache attacks, to see if their code has
these secret dependent memory accesses.
36:18 - 36:25

So, can we get even better... spatial
resolution? Right now, we're down to 64
36:25 - 36:30

bytes and that's the size of a cache line,
which is the size of our shared hardware.
36:30 - 36:36

And on SGX, we actually can get better
than 64 bytes, based on something called a
36:36 - 36:39

branch-shadowing attack. So, a branch-
shadowing attack takes advantage of
36:39 - 36:43

something called the branch target buffer.
And the branch target buffer is a
36:43 - 36:48

structure that's used for branch
prediction. It's similar to a cache, but
36:48 - 36:52

there's a key difference where the branch
target buffer doesn't compare the full
36:52 - 36:55

address, when seeing if something is
already in the cache or not: It doesn't
36:55 - 37:00

compare all of the upper level bits. So,
that means that it's possible that two
37:00 - 37:04

different addresses will experience a
collision, and the same entry from that
37:04 - 37:09

BTB cache will be read out for an improper
address. Now, since this is just for
37:09 - 37:12

branch prediction, the worst that can
happen is, you'll get a misprediction and
37:12 - 37:18

a small time penalty, but that's about it.
The idea of behind the branch-shadowing
37:18 - 37:22

attack is leveraging the small difference
in this overlapping and this collision of
37:22 - 37:29

addresses in order to sort of execute a
shared code cell flush-and-reload attack
37:29 - 37:35

on the branch target buffer. So, here what
goes on is, during the attack the attacker
37:35 - 37:40

modifies the SGX Enclave to make sure that
the branches that are within the Enclave
37:40 - 37:44

will collide with branches that are not in
the Enclave. The attacker executes the
37:44 - 37:50

Enclave code and then the attacker
executes their own code and based on the
37:50 - 37:55

outcome of the the victim code in that
cache, the attacker code may or may not
37:55 - 37:59

experience a branch prediction. So, the
attacker is able to tell the outcome of a
37:59 - 38:03

branch, because of this overlap in this
collision, like would be in a flush-and-
38:03 - 38:07

reload attack, where those memories
overlap between the attacker and the
38:07 - 38:14

victim. So here, our spatial resolution is
fantastic: We can tell down to individual
38:14 - 38:19

branch instructions in SGX; we can tell
exactly, which branches were executed and
38:19 - 38:25

which directions they were taken, in the
case of conditional branches. The temporal
38:25 - 38:30

resolution is also, once again, almost
unlimited, because we can use the same
38:30 - 38:34

timer interrupts in order to schedule our
process, our attacker process. And the
38:34 - 38:39

noise is, once again, very low, because we
can, once again, use the same sort of
38:39 - 38:44

branch misprediction counters, that exist
in the Intel world, in order to measure
38:44 - 38:52

this noise. So, does anything of that
apply to the TrustZone attacks? Well, in
38:52 - 38:55

this case the victim and attacker don't
share entries in the branch target buffer,
38:55 - 39:02

because the attacker is not able to map
the virtual address of the victim process.
39:02 - 39:05

But this is kind of reminiscent of our
earlier cache attacks, so our flush-and-
39:05 - 39:10

reload attack only worked when the attack
on the victim shared that memory, but we
39:10 - 39:14

still have the prime-and-probe attack for
when they don't. So, what if we use a
39:14 - 39:21

prime-and-probe-style attack on the branch
target buffer cache in ARM processors? So,
39:21 - 39:25

essentially what we do here is, we prime
the branch target buffer by executing mini
39:25 - 39:30

attacker branches to sort of fill up this
BTB cache with the attacker branch
39:30 - 39:35

prediction data; we let the victim execute
a branch which will evict an attacker BTB
39:35 - 39:39

entry; and then we have the attacker re-
execute those branches and see if there
39:39 - 39:45

have been any mispredictions. So now, the
cool thing about this attack is, the
39:45 - 39:50

structure of the BTB cache is different
from that of the L1 caches. So, instead of
39:50 - 40:00

having 256 different sets in the L1 cache,
the BTB cache has 2048 different sets, so
40:00 - 40:06

we can tell which branch it attacks, based
on which one of 2048 different set IDs
40:06 - 40:11

that it could fall into. And even more
than that, on the ARM platform, at least
40:11 - 40:16

on the Nexus 5x that I was working with,
the granularity is no longer 64 bytes,
40:16 - 40:22

which is the size of the line, it's now 16
bytes. So, we can see which branches the
40:22 - 40:28

the trusted code within TrustZone is
executing within 16 bytes. So, what does
40:28 - 40:32

this look like? So, previously with the
true-spy attack, this is sort of the
40:32 - 40:37

outcome of our prime-and-probe attack: We
get 1 measurement for those 256 different
40:37 - 40:43

set IDs. When we added those interrupts,
we're able to get that time resolution,
40:43 - 40:48

and it looks something like this. Now,
maybe you can see a little bit at the top
40:48 - 40:53

of the screen, how there's these repeated
sections of little white blocks, and you
40:53 - 40:57

can sort of use that to infer, maybe
there's the same cache line and cache
40:57 - 41:01

instructions that are called over and
over. So, just looking at this L1-I cache
41:01 - 41:07

attack, you can tell some information
about how the process went. Now, let's
41:07 - 41:12

compare that to the BTB attack. And I
don't know if you can see too clearly --
41:12 - 41:17

it's a it's a bit too high of resolution
right now -- so let's just focus in on one
41:17 - 41:23

small part of this overall trace. And this
is what it looks like. So, each of those
41:23 - 41:28

white pixels represents a branch that was
taken by that secure-world code and we can
41:28 - 41:31

see repeated patterns, we can see maybe
different functions that were called, we
41:31 - 41:35

can see different loops. And just by
looking at this 1 trace, we can infer a
41:35 - 41:40

lot of information on how that secure
world executed. So, it's incredibly
41:40 - 41:44

powerful and all of those secrets are just
waiting to be uncovered using these new
41:44 - 41:53

tools. So, where do we go from here? What
sort of countermeasures do we have? Well,
41:53 - 41:57

first of all I think, the long term
solution is going to be moving to no more
41:57 - 42:00

shared hardware. We need to have separate
hardware and no more shared caches in
42:00 - 42:06

order to fully get rid of these different
cache attacks. And we've already seen this
42:06 - 42:11

trend in different cell phones. So, for
example, in Apple SSEs for a long time now
42:11 - 42:16

-- I think since the Apple A7 -- the
secure Enclave, which runs the secure
42:16 - 42:21

code, has its own cache. So, these cache
attacks can't be accomplished from code
42:21 - 42:27

outside of that secure Enclave. So, just
by using that separate hardware, it knocks
42:27 - 42:31

out a whole class of different potential
side-channel and microarchitecture
42:31 - 42:36

attacks. And just recently, the Pixel 2 is
moving in the same direction. The Pixel 2
42:36 - 42:41

now includes a hardware security module
that includes cryptographic operations;
42:41 - 42:46

and that chip also has its own memory and
its own caches, so now we can no longer
42:46 - 42:51

use this attack to extract information
about what's going on in this external
42:51 - 42:57

hardware security module. But even then,
using this separate hardware, that doesn't
42:57 - 43:01

solve all of our problems. Because we
still have the question of "What do we
43:01 - 43:06

include in this separate hardware?" On the
one hand, we want to include more code in
43:06 - 43:11

that a separate hardware, so we're less
vulnerable to these side-channel attacks,
43:11 - 43:16

but on the other hand, we don't want to
expand the attack surface anymore. Because
43:16 - 43:19

the more code we include in these secure
environments, the more like that a
43:19 - 43:23

vulnerabiliyy will be found and the
attacker will be able to get a foothold
43:23 - 43:26

within the secure, trusted environment.
So, there's going to be a balance between
43:26 - 43:30

what do you choose to include in the
separate hardware and what you don't. So,
43:30 - 43:35

do you include DRM code? Do you include
cryptographic code? It's still an open
43:35 - 43:42

question. And that's sort of the long-term
approach. In the short term, you just kind
43:42 - 43:46

of have to write side-channel-free
software: Just be very careful about what
43:46 - 43:51

your process does, if there are any
secret, dependent memory accesses or a
43:51 - 43:55

secret, dependent branching or secret,
dependent function calls, because any of
43:55 - 44:00

those can leak the secrets out of your
trusted execution environment. So, here
44:00 - 44:03

are the things that, if you are a
developer of trusted execution environment
44:03 - 44:08

code, that I want you to keep in mind:
First of all, performance is very often at
44:08 - 44:13

odds with security. We've seen over and
over that the performance enhancements to
44:13 - 44:19

these processors open up the ability for
these microarchitectural attacks to be
44:19 - 44:24

more efficient. Additionally, these
trusted execution environments don't
44:24 - 44:27

protect against everything; there are
still these side-channel attacks and these
44:27 - 44:32

microarchitectural attacks that these
systems are vulnerable to. These attacks
44:32 - 44:38

are very powerful; they can be
accomplished simply; and with the
44:38 - 44:42

publication of the code that I've written,
it should be very simple to get set up and
44:42 - 44:46

to analyze your own code to see "Am I
vulnerable, do I expose information in the
44:46 - 44:53

same way?" And lastly, it only takes 1
small error, 1 tiny leak from your trusted
44:53 - 44:57

and secure code, in order to extract the
entire secret, in order to bring the whole
44:57 - 45:04

thing down. So, what I want to leave you
with is: I want you to remember that you
45:04 - 45:09

are responsible for making sure that your
program is not vulnerable to these
45:09 - 45:13

microarchitectural attacks, because if you
do not take responsibility for this, who
45:13 - 45:17

will? Thank you!
45:17 - 45:25

Applause
45:25 - 45:30

Herald: Thank you very much. Please, if
you want to leave the hall, please do it
45:30 - 45:35

quiet and take all your belongings with
you and respect the speaker. We have
45:35 - 45:43

plenty of time, 16, 17 minutes for Q&A, so
please line up on the microphones. No
45:43 - 45:51

questions from the signal angel, all
right. So, we can start with microphone 6,
45:51 - 45:55

please.
Mic 6: Okay. There was a symbol of secure
45:55 - 46:01

OSes at the ARM TrustZone. Which a idea of
them if the non-secure OS gets all the
46:01 - 46:04

interrupts? What does is
the secure OS for?
46:04 - 46:09

Keegan: Yeah so, in the ARMv8 there are a
couple different kinds of interrupts. So,
46:09 - 46:12

I think -- if I'm remembering the
terminology correctly -- there is an IRQ
46:12 - 46:17

and an FIQ interrupt. So, the non-secure
mode handles the IRQ interrupts and the
46:17 - 46:20

secure mode handles the FIQ interrupts.
So, depending on which one you send, it
46:20 - 46:25

will depend on which direction that
monitor will direct that interrupt.
46:30 - 46:32

Mic 6: Thank you.
Herald: Okay, thank you. Microphone number
46:32 - 46:38

7, please.
Mic 7: Does any of your present attacks on
46:38 - 46:45

TrustZone also apply to the AMD
implementation of TrustZone or are you
46:45 - 46:48

looking into it?
Keegan: I haven't looked into AMD too
46:48 - 46:54

much, because, as far as I can tell,
that's not used as commonly, but there are
46:54 - 46:57

many different types of trusted execution
environments. The 2 that I focus on were
46:57 - 47:05

SGX and TrustZone, because those are the
most common examples that I've seen.
47:05 - 47:09

Herald: Thank you. Microphone
number 8, please.
47:09 - 47:20

Mic 8: When TrustZone is moved to
dedicated hardware, dedicated memory,
47:20 - 47:28

couldn't you replicate the userspace
attacks by loading your own trusted
47:28 - 47:32

userspace app and use it as an
oracle of some sorts?
47:32 - 47:36

Keegan: If you can load your own trust
code, then yes, you could do that. But in
47:36 - 47:40

many of the models I've seen today, that's
not possible. So, that's why you have
47:40 - 47:44

things like code signing, which prevent
the arbitrary user from running their own
47:44 - 47:50

code in the trusted OS... or in the the
trusted environment.
47:50 - 47:55

Herald: All right. Microphone number 1.
Mic 1: So, these attacks are more powerful
47:55 - 48:01

against code that's running in... just the
execution environments than similar
48:01 - 48:07

attacks would be against ring-3 code, or,
in general, trusted code. Does that mean
48:07 - 48:11

that trusting execution environments are
basically an attractive nuisance that we
48:11 - 48:15

shouldn't use?
Keegan: There's still a large benefit to
48:15 - 48:18

using these trusted execution
environments. The point I want to get
48:18 - 48:21

across is that, although they add a lot of
features, they don't protect against
48:21 - 48:25

everything, so you should keep in mind
that these side-channel attacks do still
48:25 - 48:29

exist and you still need to protect
against them. But overall, these are
48:29 - 48:36

better things and worthwhile in including.
Herald: Thank you. Microphone number 1
48:36 - 48:42

again, please
Mic 1: So, AMD is doing something with
48:42 - 48:48

encrypting memory and I'm not sure if they
encrypt addresses, too, and but would that
48:48 - 48:53

be a defense against such attacks?
Keegan: So, I'm not too familiar with AMD,
48:53 - 48:58

but SGX also encrypts memory. It encrypts
it in between the lowest-level cache and
48:58 - 49:02

the main memory. But that doesn't really
have an impact on the actual operation,
49:02 - 49:06

because the memories encrypt at the cache
line level and as the attacker, we don't
49:06 - 49:10

care what that data is within that cache
line, we only care which cache line is
49:10 - 49:16

being accessed.
Mic 1: If you encrypt addresses, wouldn't
49:16 - 49:21

that help against that?
Keegan: I'm not sure, how you would
49:21 - 49:25

encrypt the addresses yourself. As long as
those adresses map into the same set IDs
49:25 - 49:30

that the victim can map into, then the
victim could still pull off the same style
49:30 - 49:35

of attacks.
Herald: Great. We have a question from the
49:35 - 49:38

internet, please.
Signal Angel: The question is "Does the
49:38 - 49:42

secure enclave on the Samsung Exynos
distinguish the receiver of the messag, so
49:42 - 49:47

that if the user application asked to
decode an AES message, can one sniff on
49:47 - 49:52

the value that the secure
enclave returns?"
49:52 - 49:57

Keegan: So, that sounds like it's asking
about the true-spy style attack, where
49:57 - 50:01

it's calling to the secure world to
encrypt something with AES. I think, that
50:01 - 50:05

would all depend on the different
implementation: As long as it's encrypting
50:05 - 50:10

for a certain key and it's able to do that
repeatably, then the attack would,
50:10 - 50:16

assuming a vulnerable AES implementation,
would be able to extract that key out.
50:16 - 50:21

Herald: Cool. Microphone number 2, please.
Mic 2: Do you recommend a reference to
50:21 - 50:25

understand how these cache line attacks
and branch oracles actually lead to key
50:25 - 50:30

recovery?
Keegan: Yeah. So, I will flip through
50:30 - 50:34

these pages which include a lot of the
references for the attacks that I've
50:34 - 50:38

mentioned, so if you're watching the
video, you can see these right away or
50:38 - 50:43

just access the slides. And a lot of these
contain good starting points. So, I didn't
50:43 - 50:46

go into a lot of the details on how, for
example, the true-spy attack recovered
50:46 - 50:53

that AES key., but that paper does have a
lot of good links, how those areas can
50:53 - 50:56

lead to key recovery. Same thing with the
CLKSCREW attack, how the different fault
50:56 - 51:03

injection can lead to key recovery.
Herald: Microphone number 6, please.
51:03 - 51:08

Mic 6: I think my question might have been
very, almost the same thing: How hard is
51:08 - 51:12

it actually to recover the keys? Is this
like a massive machine learning problem or
51:12 - 51:18

is this something that you can do
practically on a single machine?
51:18 - 51:22

Keegan: It varies entirely by the end
implementation. So, for all these attacks
51:22 - 51:26

work, you need to have some sort of
vulnerable implementation and some
51:26 - 51:29

implementations leak more data than
others. In the case of a lot of the AES
51:29 - 51:34

attacks, where you're doing the passive
attacks, those are very easy to do on just
51:34 - 51:38

your own computer. For the AES fault
injection attack, I think that one
51:38 - 51:42

required more brute force, in the CLKSCREW
paper, so that one required more computing
51:42 - 51:50

resources, but still, it was entirely
practical to do in a realistic setting.
51:50 - 51:54

Herald: Cool, thank you. So, we have one
more: Microphone number 1, please.
51:54 - 51:59

Mic 1: So, I hope it's not a too naive
question, but I was wondering, since all
51:59 - 52:05

these attacks are based on cache hit and
misses, isn't it possible to forcibly
52:05 - 52:11

flush or invalidate or insert noise in
cache after each operation in this trust
52:11 - 52:24

environment, in order to mess up the
guesswork of the attacker? So, discarding
52:24 - 52:29

optimization and performance for
additional security benefits.
52:29 - 52:32

Keegan: Yeah, and that is absolutely
possible and you are absolutely right: It
52:32 - 52:36

does lead to a performance degradation,
because if you always flush the entire
52:36 - 52:41

cache every time you do a context switch,
that will be a huge performance hit. So
52:41 - 52:45

again, that comes down to the question of
the performance and security trade-off:
52:45 - 52:50

Which one do you end up going with? And it
seems historically the choice has been
52:50 - 52:54

more in the direction of performance.
Mic 1: Thank you.
52:54 - 52:57

Herald: But we have one more: Microphone
number 1, please.
52:57 - 53:02

Mic 1: So, I have more of a moral
question: So, how well should we really
53:02 - 53:08

protect from attacks which need some
ring-0 cooperation? Because, basically,
53:08 - 53:14

when we use TrustZone for purpose... we
would see clear, like protecting the
53:14 - 53:20

browser from interacting from outside
world, then we are basically using the
53:20 - 53:27

safe execution environment for sandboxing
the process. But once we need some
53:27 - 53:32

cooperation from the kernel, some of that
attacks, is in fact, empower the user
53:32 - 53:36

instead of the hardware producer.
Keegan: Yeah, and you're right. It
53:36 - 53:39

depends entirely on what your application
is and what your threat model is that
53:39 - 53:43

you're looking at. So, if you're using
these trusted execution environments to do
53:43 - 53:48

DRM, for example, then maybe you wouldn't
be worried about that ring-0 attack or
53:48 - 53:52

that privileged attacker who has their
phone rooted and is trying to recover
53:52 - 53:57

these media encryption keys from this
execution environment. But maybe there are
53:57 - 54:01

other scenarios where you're not as
worried about having an attack with a
54:01 - 54:06

compromised ring 0. So, it entirely
depends on context.
54:06 - 54:09

Herald: Alright, thank you. So, we have
one more: Microphone number 1, again.
54:09 - 54:11

Mic 1: Hey there. Great talk, thank you
very much.
54:11 - 54:13

Keegan: Thank you.
Mic 1: Just a short question: Do you have
54:13 - 54:17

any success stories about attacking the
TrustZone and the different
54:17 - 54:24

implementations of TE with some vendors
like some OEMs creating phones and stuff?
54:24 - 54:30

Keegan: Not that I'm announcing
at this time.
54:30 - 54:36

Herald: So, thank you very much. Please,
again a warm round of applause for Keegan!
54:36 - 54:40

Applause
54:40 - 54:45

34c3 postroll music
54:45 - 55:02

subtitles created by c3subtitles.de
in the year 2018. Join, and help us!

Title:: 34C3 - Microarchitectural Attacks on Trusted Execution Environments
Description:: more » « less
Video Language:: English
Duration:: 55:02

	C3Subtitles edited English subtitles for 34C3 - Microarchitectural Attacks on Trusted Execution Environments
	Bar Sch edited English subtitles for 34C3 - Microarchitectural Attacks on Trusted Execution Environments
	Bar Sch edited English subtitles for 34C3 - Microarchitectural Attacks on Trusted Execution Environments
	Bar Sch edited English subtitles for 34C3 - Microarchitectural Attacks on Trusted Execution Environments
	Bar Sch edited English subtitles for 34C3 - Microarchitectural Attacks on Trusted Execution Environments
	Bar Sch edited English subtitles for 34C3 - Microarchitectural Attacks on Trusted Execution Environments
	Bar Sch edited English subtitles for 34C3 - Microarchitectural Attacks on Trusted Execution Environments
	Bar Sch edited English subtitles for 34C3 - Microarchitectural Attacks on Trusted Execution Environments

Show all

English subtitles

Revisions

Revision 11 Edited

C3Subtitles

34C3 - Microarchitectural Attacks on Trusted Execution Environments

Revisions

Our website uses cookies

Operating cookies (Required)