-
34C3 preroll music
-
Herald: Hello fellow creatures.
Welcome and
-
I wanna start with a question.
Another one: Who do we trust?
-
Do we trust the TrustZones
on our smartphones?
-
Well Keegan Ryan, he's really
fortunate to be here and
-
he was inspired by another talk from the
CCC before - I think it was 29C3 and his
-
research on smartphones and systems on a
chip used in smart phones will answer
-
these questions if you can trust those
trusted execution environments. Please
-
give a warm round of applause
to Keegan and enjoy!
-
Applause
-
Kegan Ryan: All right, thank you! So I'm
Keegan Ryan, I'm a consultant with NCC
-
group and this is micro architectural
attacks on Trusted Execution Environments.
-
So, in order to understand what a Trusted
Execution Environment is we need to go
-
back into processor security, specifically
on x86. So as many of you are probably
-
aware there are a couple different modes
which we can execute code under in x86
-
processors and that includes ring 3, which
is the user code and the applications, and
-
also ring 0 which is the kernel code. Now
there's also a ring 1 and ring 2 that are
-
supposedly used for drivers or guest
operating systems but really it just boils
-
down to ring 0 and ring 3. And in this
diagram we have here we see that privilege
-
increases as we go up the diagram, so ring
0 is the most privileged ring and ring 3
-
is the least privileged ring. So all of
our secrets, all of our sensitive
-
information, all of the attackers goals
are in ring 0 and the attacker is trying
-
to access those from the unprivileged
world of ring 3. Now you may have a
-
question what if I want to add a processor
feature that I don't want ring 0 to be
-
able to access? Well then you add ring -1
which is often used for a hypervisor. Now
-
the hypervisor has all the secrets and the
hypervisor can manage different guest
-
operating systems and each of these guest
operating systems can execute in ring 0
-
without having any idea of the other
operating systems. So this way now the
-
secrets are all in ring -1 so now the
attackers goals have shifted from ring 0
-
to ring -1. The attacker has to attack
ring -1 from a less privileged ring and
-
tries to access those secrets. But what if
you want to add a processor feature that
-
you don't want ring -1 to be able to
access? So you add ring -2 which is System
-
Management Mode and that's capable of
monitoring power, directly interfacing
-
with firmware and other chips on a
motherboard and it's able to access and do
-
a lot of things that the hypervisor is not
able to and now all of your secrets and
-
all of your attacker goals are in ring -2
and the attacker has to attack those from
-
a less privileged ring. Now maybe you want
to add something to your processor that
-
you don't want ring -2 to be able access,
so you add ring -3 and I think you get the
-
picture now. And we just keep on adding
more and more privilege rings and keep
-
putting our secrets and our attackers
goals in these higher and higher
-
privileged rings but what if we're
thinking about it wrong? What if instead
-
we want to put all the secrets in the
least privileged ring? So this is sort of
-
the idea behind SGX and it's useful for
things like DRM where you want that to run
-
ring 3 code but have sensitive secrets or
other assigning capabilities running in
-
ring 3. But this picture is getting a
little bit complicated, this diagram is a
-
little bit complex so let's simplify it a
little bit. We'll only be looking at ring
-
0 through ring 3 which is the kernel, the
userland and the SGX enclave which also
-
executes in ring 3. Now when you're
executing code in the SGX enclave you
-
first load the code into the enclave and
then from that point on you trust the
-
execution of whatever's going on in that
enclave. You trust that the other elements
-
the kernel, the userland, the other rings
are not going to be able to access what's
-
in that enclave so you've made your
Trusted Execution Environment. This is a
-
bit of a weird model because now your
attacker is in the ring 0 kernel and your
-
target victim here is in ring 3. So
instead of the attacker trying to move up
-
the privilege chain, the attacker is
trying to move down. Which is pretty
-
strange and you might have some questions
like "under this model who handles memory
-
management?" because traditionally that's
something that ring 0 would manage and
-
ring 0 would be responsible for paging
memory in and out for different processes
-
in different code that's executing it in
ring 3. But on the other hand you don't
-
want that to happen with the SGX enclave
because what if the malicious ring 0 adds
-
a page to the enclave that the enclave
doesn't expect? So in order to solve this
-
problem, SGX does allow ring 0 to handle
page faults. But simultaneously and in
-
parallel it verifies every memory load to
make sure that no access violations are
-
made so that all the SGX memory is safe.
So it allows ring 0 to do its job but it
-
sort of watches over at the same time to
make sure that nothing is messed up. So
-
it's a bit of a weird convoluted solution
to a strange inverted problem but it works
-
and that's essentially how SGX works and
the idea behind SGX. Now we can look at
-
x86 and we can see that ARMv8 is
constructed in a similar way but it
-
improves on x86 in a couple key ways. So
first of all ARMv8 gets rid of ring 1 and
-
ring 2 so you don't have to worry about
those and it just has different privilege
-
levels for userland and the kernel. And
these different privilege levels are
-
called exception levels in the ARM
terminology. And the second thing that ARM
-
gets right compared to x86 is that instead
of starting at 3 and counting down as
-
privilege goes up, ARM starts at 0 and
counts up so we don't have to worry about
-
negative numbers anymore. Now when we add
the next privilege level the hypervisor we
-
call it exception level 2 and the next one
after that is the monitor in exception
-
level 3. So at this point we still want to
have the ability to run trusted code in
-
exception level 0 the least privileged
level of the ARMv8 processor. So in order
-
to support this we need to separate this
diagram into two different sections. In
-
ARMv8 these are called the secure world
and the non-secure world. So we have the
-
non-secure world on the left in blue that
consists of the userland, the kernel and
-
the hypervisor and we have the secure
world on the right which consists of the
-
monitor in exception level 3, a trusted
operating system in exception level 1 and
-
trusted applications in exception level 0.
So the idea is that if you run anything in
-
the secure world, it should not be
accessible or modifiable by anything in
-
the non secure world. So that's how our
attacker is trying to access it. The
-
attacker has access to the non secure
kernel, which is often Linux, and they're
-
trying to go after the trusted apps. So
once again we have this weird inversion
-
where we're trying to go from a more
privileged level to a less privileged
-
level and trying to extract secrets in
that way. So the question that arises when
-
using these Trusted Execution Environments
that are implemented in SGX and TrustZone
-
in ARM is "can we use these privilege
modes in our privilege access in order to
-
attack these Trusted Execution
Environments?". Now transfer that question
-
and we can start looking at a few
different research papers. The first one
-
that I want to go into is one called
CLKSCREW and it's an attack on TrustZone.
-
So throughout this presentation I'm going
to go through a few different papers and
-
just to make it clear which papers have
already been published and which ones are
-
old I'll include the citations in the
upper right hand corner so that way you
-
can tell what's old and what's new. And as
far as papers go this CLKSCREW paper is
-
relatively new. It was released in 2017.
And the way CLKSCREW works is it takes
-
advantage of the energy management
features of a processor. So a non-secure
-
operating system has the ability to manage
the energy consumption of the different
-
cores. So if a certain target core doesn't
have much scheduled to do then the
-
operating system is able to scale back
that voltage or dial down the frequency on
-
that core so that core uses less energy
which is a great thing for performance: it
-
really extends battery life, it makes the
the cores last longer and it gives better
-
performance overall. But the problem here
is what if you have two separate cores and
-
one of your cores is running this non-
trusted operating system and the other
-
core is running code in the secure world?
It's running that trusted code those
-
trusted applications so that non secure
operating system can still dial down that
-
voltage and it can still change that
frequency and those changes will affect
-
the secure world code. So what the
CLKSCREW attack does is the non secure
-
operating system core will dial down the
voltage, it will overclock the frequency
-
on the target secure world core in order
to induce faults to make sure to make the
-
computation on that core fail in some way
and when that computation fails you get
-
certain cryptographic errors that the
attack can use to infer things like secret
-
keys, secret AES keys and to bypass code
signing implemented in the secure world.
-
So it's a very powerful attack that's made
possible because the non-secure operating
-
system is privileged enough in order to
use these energy management features. Now
-
CLKSCREW is an example of an active attack
where the attacker is actively changing
-
the outcome of the victim code of that
code in the secure world. But what about
-
passive attacks? So in a passive attack,
the attacker does not modify the actual
-
outcome of the process. The attacker just
tries to monitor that process infer what's
-
going on and that is the sort of attack
that we'll be considering for the rest of
-
the presentation. So in a lot of SGX and
TrustZone implementations, the trusted and
-
the non-trusted code both share the same
hardware and this shared hardware could be
-
a shared cache, it could be a branch
predictor, it could be a TLB. The point is
-
that they share the same hardware so that
the changes made by the secure code may be
-
reflected in the behavior of the non-
secure code. So the trusted code might
-
execute, change the state of that shared
cache for example and then the untrusted
-
code may be able to go in, see the changes
in that cache and infer information about
-
the behavior of the secure code. So that's
essentially how our side channel attacks
-
are going to work. If the non-secure code
is going to monitor these shared hardware
-
resources for state changes that reflect
the behavior of the secure code. Now we've
-
all talked about how Intel and SGX address
the problem of memory management and who's
-
responsible for making sure that those
attacks don't work on SGX. So what do they
-
have to say on how they protect against
these side channel attacks and attacks on
-
this shared cache hardware? They don't..
at all. They essentially say "we do not
-
consider this part of our threat model. It
is up to the developer to implement the
-
protections needed to protect against
these side-channel attacks". Which is
-
great news for us because these side
channel attacks can be very powerful and
-
if there aren't any hardware features that
are necessarily stopping us from being
-
able to accomplish our goal it makes us
that more likely to succeed. So with that
-
we can sort of take a step back from trust
zone industry acts and just take a look at
-
cache attacks to make sure that we all
have the same understanding of how the
-
cache attacks will be applied to these
Trusted Execution Environments. To start
-
that let's go over a brief recap of how a
cache works. So caches are necessary in
-
processors because accessing the main
memory is slow. When you try to access
-
something from the main memory it takes a
while to be read into the process. So the
-
cache exists as sort of a layer to
remember what that information is so if
-
the process ever needs information from
that same address it just reloads it from
-
the cache and that access is going to be
fast. So it really speeds up the memory
-
access for repeated accesses to the same
address. And then if we try to access a
-
different address then that will also be
read into the cache, slowly at first but
-
then quickly for repeated accesses and so
on and so forth. Now as you can probably
-
tell from all of these examples the memory
blocks have been moving horizontally
-
they've always been staying in the same
row. And that is reflective of the idea of
-
sets in a cache. So there are a number of
different set IDs and that corresponds to
-
the different rows in this diagram. So for
our example there are four different set
-
IDs and each address in the main memory
maps to a different set ID. So that
-
address in main memory will only go into
that location in the cache with the same
-
set ID so it will only travel along those
rows. So that means if you have two
-
different blocks of memory that mapped to
different set IDs they're not going to
-
interfere with each other in the cache.
But that raises the question "what about
-
two memory blocks that do map to the same
set ID?". Well if there's room in the
-
cache then the same thing will happen as
before: those memory contents will be
-
loaded into the cache and then retrieved
from the cache for future accesses. And
-
the number of possible entries for a
particular set ID within a cache is called
-
the associativity. And on this diagram
that's represented by the number of
-
columns in the cache. So we will call our
cache in this example a 2-way set-
-
associative cache. Now the next question
is "what happens if you try to read a
-
memory address that maps the same set ID
but all of those entries within that said ID
-
within the cache are full?". Well one of
those entries is chosen, it's evicted from
-
the cache, the new memory is read in and
then that's fed to the process. So it
-
doesn't really matter how the cache entry
is chosen that you're evicting for the
-
purpose of the presentation you can just
assume that it's random. But the important
-
thing is that if you try to access that
same memory that was evicted before you're
-
not going to have to wait for that time
penalty for that to be reloaded into the
-
cache and read into the process. So those
are caches in a nutshell in particularly
-
set associative caches, we can begin
looking at the different types of cache
-
attacks. So for a cache attack we have two
different processes we have an attacker
-
process and a victim process. For this
type of attack that we're considering both
-
of them share the same underlying code so
they're trying to access the same
-
resources which could be the case if you
have page deduplication in virtual
-
machines or if you have copy-on-write
mechanisms for shared code and shared
-
libraries. But the point is that they
share the same underlying memory. Now the
-
Flush and Reload Attack works in two
stages for the attacker. The attacker
-
first starts by flushing out the cache.
They flush each and every addresses in the
-
cache so the cache is just empty. Then the
attacker let's the victim executes for a
-
small amount of time so the victim might
read on an address from main memory
-
loading that into the cache and then the
second stage of the attack is the reload
-
phase. In the reload phase the attacker
tries to load different memory addresses
-
from main memory and see if those entries
are in the cache or not. Here the attacker
-
will first try to load address 0 and see
that because it takes a long time to read
-
the contents of address 0 the attacker can
infer that address 0 was not part of the
-
cache which makes sense because the
attacker flushed it from the cache in the
-
first stage. The attacker then tries to
read the memory at address 1 and sees that
-
this operation is fast so the attacker
infers that the contents of address 1 are
-
in the cache and because the attacker
flushed everything from the cache before
-
the victim executed, the attacker then
concludes that the victim is responsible
-
for bringing address 1 into the cache.
This Flush+Reload attack reveals which
-
memory addresses the victim accesses
during that small slice of time. Then
-
after that reload phase, the attack
repeats so the attacker flushes again
-
let's the victim execute, reloads again
and so on. There's also a variant on the
-
Flush+Reload attack that's called the
Flush+Flush attack which I'm not going to
-
go into the details of, but essentially
it's the same idea. But instead of using
-
load instructions to determine whether or
not a piece of memory is in the cache or
-
not, it uses flush instructions because
flush instructions will take longer if
-
something is in the cache already. The
important thing is that both the
-
Flush+Reload attack and the Flush+Flush
attack rely on the attacker and the victim
-
sharing the same memory. But this isn't
always the case so we need to consider
-
what happens when the attacker and the
victim do not share memory. For this we
-
have the Prime+Probe attack. The
Prime+Probe attack once again works in two
-
separate stages. In the first stage the
attacker prime's the cache by reading all
-
the attacker memory into the cache and
then the attacker lets the victim execute
-
for a small amount of time. So no matter
what the victim accesses from main memory
-
since the cache is full of the attacker
data, one of those attacker entries will
-
be replaced by a victim entry. Then in the
second phase of the attack, during the
-
probe phase, the attacker checks the
different cache entries for particular set
-
IDs and sees if all of the attacker
entries are still in the cache. So maybe
-
our attacker is curious about the last set
ID, the bottom row, so the attacker first
-
tries to load the memory at address 3 and
because this operation is fast the
-
attacker knows that address 3 is in the
cache. The attacker tries the same thing
-
with address 7, sees that this operation
is slow and infers that at some point
-
address 7 was evicted from the cache so
the attacker knows that something had to
-
evicted from the cache and it had to be
from the victim so the attacker concludes
-
that the victim accessed something in that
last set ID and that bottom row. The
-
attacker doesn't know if it was the
contents of address 11 or the contents of
-
address 15 or even what those contents
are, but the attacker has a good idea of
-
which set ID it was. So, the good things,
the important things to remember about
-
cache attacks is that caches are very
important, they're crucial for performance
-
on processors, they give a huge speed
boost and there's a huge time difference
-
between having a cache and not having a
cache for your executables. But the
-
downside to this is that big time
difference also allows the attacker to
-
infer information about how the victim is
using the cache. We're able to use these
-
cache attacks in the two different
scenarios of, where memory is shared, in
-
the case of the Flush+Reload and
Flush+Flush attacks and in the case where
-
memory is not shared, in the case of the
Prime+Probe attack. And finally the
-
important thing to keep in mind is that,
for these cache attacks, we know where the
-
victim is looking, but we don't know what
they see. So we don't know the contents of
-
the memory that the victim is actually
seeing, we just know the location and the
-
addresses. So, what does an example trace
of these attacks look like? Well, there's
-
an easy way to represent these as two-
dimensional images. So in this image, we
-
have our horizontal axis as time, so each
column in this image represents a
-
different time slice, a different
iteration of the Prime measure and Probe.
-
So, then we also have the vertical access
which is the different set IDs, which is
-
the location that's accessed by the victim
process, and then here a pixel is white if
-
the victim accessed that set ID during
that time slice. So, as you look from left
-
to right as time moves forward, you can
sort of see the changes in the patterns of
-
the memory accesses made by the victim
process. Now, for this particular example
-
the trace is captured on an execution of
AES repeated several times, an AES
-
encryption repeated about 20 times. And
you can tell that this is a repeated
-
action because you see the same repeated
memory access patterns in the data, you
-
see the same structures repeated over and
over. So, you know that this is reflecting
-
at what's going on throughout time, but
what does it have to do with AES itself?
-
Well, if we take the same trace with the
same settings, but a different key, we see
-
that there is a different memory access
pattern with different repetition within
-
the trace. So, only the key changed, the
code didn't change. So, even though we're
-
not able to read the contents of the key
directly using this cache attack, we know
-
that the key is changing these memory
access patterns, and if we can see these
-
memory access patterns, then we can infer
the key. So, that's the essential idea: we
-
want to make these images as clear as
possible and as descriptive as possible so
-
we have the best chance of learning what
those secrets are. And we can define the
-
metrics for what makes these cache attacks
powerful in a few different ways. So, the
-
three ways we'll be looking at are spatial
resolution, temporal resolution and noise.
-
So, spatial resolution refers to how
accurately we can determine the where. If
-
we know that the victim access to memory
address within 1,000 bytes, that's
-
obviously not as powerful as knowing where
they accessed within 512 bytes. Temporal
-
resolution is similar, where we want to
know the order of what accesses the victim
-
made. So if that time slice during our
attack is 1 millisecond, we're going to
-
get much better ordering information on
those memory access than we would get if
-
we only saw all the memory accesses over
the course of one second. So the shorter
-
that time slice, the better the temporal
resolution, the longer our picture will be
-
on the horizontal access, and the clearer
of an image of the cache that we'll see.
-
And the last metric to evaluate our
attacks on is noise and that reflects how
-
accurately our measurements reflect the
true state of the cache. So, right now
-
we've been using time and data to infer
whether or not an item was in the cache or
-
not, but this is a little bit noisy. It's
possible that we'll have false positives
-
or false negatives, so we want to keep
that in mind as we look at the different
-
attacks. So, that's essentially cache
attacks, and then, in a nutshell and
-
that's all you really need to understand
in order to understand these attacks as
-
they've been implemented on Trusted
Execution Environments. And the first
-
particular attack that we're going to be
looking at is called a Controlled-Channel
-
Attack on SGX, and this attack isn't
necessarily a cache attack, but we can
-
analyze it in the same way that we analyze
the cache attacks. So, it's still useful
-
to look at. Now, if you remember how
memory management occurs with SGX, we know
-
that if a page fault occurs during SGX
Enclave code execution, that page fault is
-
handled by the kernel. So, the kernel has
to know which page the Enclave needs to be
-
paged in. The kernel already gets some
information about what the Enclave is
-
looking at. Now, in the Controlled-Channel
attack, there's a, what the attacker does
-
from the non-trusted OS is the attacker
pages almost every other page from the
-
Enclave out of memory. So no matter
whatever page that Enclave tries to
-
access, it's very likely to cause a page
fault, which will be redirected to the
-
non-trusted OS, where the non-trusted OS
can record it, page out any other pages
-
and continue execution. So, the OS
essentially gets a list of sequential page
-
accesses made by the SGX Enclaves, all by
capturing the page fault handler. This is
-
a very general attack, you don't need to
know what's going on in the Enclave in
-
order to pull this off. You just load up
an arbitrary Enclave and you're able to
-
see which pages that Enclave is trying to
access. So, how does it do on our metrics?
-
First of all, this spatial resolution is
not great. We can only see where the
-
victim is accessing within 4096 bytes or
the size of a full page because SGX
-
obscures the offset into the page where
the page fault occurs. The temporal
-
resolution is good but not great, because
even though we're able to see any
-
sequential accesses to different pages
we're not able to see sequential accesses
-
to the same page because we need to keep
that same page paged-in while we let our
-
SGX Enclave run for that small time slice.
So temporal resolution is good but not
-
perfect. But the noise is, there is no
noise in this attack because no matter
-
where the page fault occurs, the untrusted
operating system is going to capture that
-
page fault and is going to handle it. So,
it's very low noise, not great spatial
-
resolution but overall still a powerful
attack. But we still want to improve on
-
that spatial resolution, we want to be
able to see what the Enclave is doing that
-
greater than a resolution of a one page of
four kilobytes. So that's exactly what the
-
CacheZoom paper does, and instead of
interrupting the SGX Enclave execution
-
with page faults, it uses timer
interrupts. Because the untrusted
-
operating system is able to schedule when
timer interrupts occur, so it's able to
-
schedule them at very tight intervals, so
it's able to get that small and tight
-
temporal resolution. And essentially what
happens in between is this timer
-
interrupts fires, the untrusted operating
system runs the Prime+Probe attack code in
-
this case, and resumes execution of the
onclick process, and this repeats. So this
-
is a Prime+Probe attack on the L1 data
cache. So, this attack let's you see what
-
data The Enclave is looking at. Now, this
attack could be easily modified to use the
-
L1 instruction cache, so in that case you
learn which instructions The Enclave is
-
executing. And overall this is an even
more powerful attack than the Control-
-
Channel attack. If we look at the metrics,
we can see that the spatial resolution is
-
a lot better, now we're looking at spatial
resolution of 64 bytes or the size of an
-
individual line. The temporal resolution
is very good, it's "almost unlimited", to
-
quote the paper, because the untrusted
operating system has the privilege to keep
-
scheduling those time interrupts closer
and closer together until it's able to
-
capture very small time slices of the
victim process .And the noise itself is
-
low, we're still using a cycle counter to
measure the time it takes to load memory
-
in and out of the cache, but it's, it's
useful, the chances of having a false
-
positive or false negative are low, so the
noise is low as well. Now, we can also
-
look at Trust Zone attacks, because so far
the attacks that we've looked at, the
-
passive attacks, have been against SGX and
those attacks on SGX have been pretty
-
powerful. So, what are the published
attacks on Trust Zone? Well, there's one
-
called TruSpy, which is kind of similar in
concept to the CacheZoom attack that we
-
just looked at on SGX. It's once again a
Prime+probe attack on the L1 data cache,
-
and the difference here is that instead of
interrupting the victim code execution
-
multiple times, the TruSpy attack does the
prime step, does the full AES encryption,
-
and then does the probe step. And the
reason they do this, is because as they
-
say, the secure world is protected, and is
not interruptible in the same way that SGX
-
is interruptable. But even despite this,
just having one measurement per execution,
-
the TruSpy authors were able to use some
statistics to still recover the AES key
-
from that noise. And their methods were so
powerful, they are able to do this from an
-
unapproved application in user land, so
they don't even need to be running within
-
the kernel in order to be able to pull off
this attack. So, how does this attack
-
measure up? The spatial resolution is once
again 64 bytes because that's the size of
-
a cache line on this processor, and the
temporal resolution is, is pretty poor
-
here, because we only get one measurement
per execution of the AES encryption. This
-
is also a particularly noisy attack
because we're making the measurements from
-
the user land, but even if we make the
measurements from the kernel, we're still
-
going to have the same issues of false
positives and false negatives associated
-
with using a cycle counter to measure
membership in a cache. So, we'd like to
-
improve this a little bit. We'd like to
improve the temporal resolution, so we
-
have the power of the cache attack to be a
little bit closer on TrustZone, as it is
-
on SGX. So, we want to improve that
temporal resolution. Let's dig into that
-
statement a little bit, that the secure
world is protected and not interruptable.
-
And to do, this we go back to this diagram
of ARMv8 and how that TrustZone is set up.
-
So, it is true that when an interrupt
occurs, it is directed to the monitor and,
-
because the monitor operates in the secure
world, if we interrupt secure code that's
-
running an exception level 0, we're just
going to end up running secure code an
-
exception level 3. So, this doesn't
necessarily get us anything. I think,
-
that's what the author's mean by saying
that it's protected against this. Just by
-
setting an interrupt, we don't have a
way to redirect our flow to the non-
-
trusted code. At least that's how it works
in theory. In practice, the Linux
-
operating system, running in exception
level 1 in the non-secure world, kind of
-
needs interrupts in order to be able to
work, so if an interrupt occurs and it's
-
being sent to the monitor, the monitor
will just forward it right to the non-
-
secure operating system. So, we have
interrupts just the same way as we did in
-
CacheZoom. And we can improve the
TrustZone attacks by using this idea: We
-
have 2 cores, where one core is running
the secure code, the other core is running
-
the non-secure code, and the non-secure
code is sending interrupts to the secure-
-
world core and that will give us that
interleaving of attacker process and
-
victim process that allow us to have a
powerful prime-and-probe attack. So, what
-
does this look like? We have the attack
core and the victim core. The attack core
-
sends an interrupt to the victim core.
This interrupt is captured by the monitor,
-
which passes it to the non-secure
operating system. The not-secure operating
-
system transfers this to our attack code,
which runs the prime-and-probe attack.
-
Then, we leave the interrupt, the
execution within the victim code in the
-
secure world resumes and we just repeat
this over and over. So, now we have that
-
interleaving of data... of the processes
of the attacker and the victim. So, now,
-
instead of having a temporal resolution of
one measurement per execution, we once
-
again have almost unlimited temporal
resolution, because we can just schedule
-
when we send those interrupts from the
attacker core. Now, we'd also like to
-
improve the noise measurements. The...
because if we can improve the noise, we'll
-
get clearer pictures and we'll be able to
infer those secrets more clearly. So, we
-
can get some improvement by switching the
measurements from userland and starting to
-
do those in the kernel, but again we have
the cycle counters. So, what if, instead
-
of using the cycle counter to measure
whether or not something is in the cache,
-
we use the other performance counters?
Because on ARMv8 platforms, there is a way
-
to use performance counters to measure
different events, such as cache hits and
-
cache misses. So, these events and these
performance monitors require privileged
-
access in order to use, which, for this
attack, we do have. Now, in a typical
-
cache text scenario we wouldn't have
access to these performance monitors,
-
which is why they haven't really been
explored before, but in this weird
-
scenario where we're attacking the less
privileged code from the more privileged
-
code, we do have access to these
performance monitors and we can use these
-
monitors during the probe step to get a
very accurate count of whether or not a
-
certain memory load caused a cache miss or
a cache hit. So, we're able to essentially
-
get rid of the different levels of noise.
Now, one thing to point out is that maybe
-
we'd like to use these ARMv8 performance
counters in order to count the different
-
events that are occurring in the secure
world code. So, maybe we start the
-
performance counters from the non-secure
world, let the secure world run and then,
-
when they secure world exits, we use the
non-secure world to read these performance
-
counters and maybe we'd like to see how
many instructions the secure world
-
executed or how many branch instructions
or how many arithmetic instructions or how
-
many cache misses there were. But
unfortunately, ARMv8 took this into
-
account and by default, performance
counters that are started in the non-
-
secure world will not measure events that
happen in the secure world, which is
-
smart; which is how it should be. And the
only reason I bring this up is because
-
that's not how it is an ARMv7. So, we go
into a whole different talk with that,
-
just exploring the different implications
of what that means, but I want to focus on
-
ARMv8, because that's that's the newest of
the new. So, we'll keep looking at that.
-
So, we instrument the primary probe attack
to use these performance counters, so we
-
can get a clear picture of what is and
what is not in the cache. And instead of
-
having noisy measurements based on time,
we have virtually no noise at all, because
-
we get the truth straight from the
processor itself, whether or not we
-
experience a cache miss. So, how do we
implement these attacks, where do we go
-
from here? We have all these ideas; we
have ways to make these TrustZone attacks
-
more powerful, but that's not worthwhile,
unless we actually implement them. So, the
-
goal here is to implement these attacks on
TrustZone and since typically the non-
-
secure world operating system is based on
Linux, we'll take that into account when
-
making our implementation. So, we'll write
a kernel module that uses these
-
performance counters and these inner
processor interrupts, in order to actually
-
accomplish these attacks; and we'll write
it in such a way that it's very
-
generalizable. So you can take this kernel
module that's was written for one device
-
-- in my case I did most of my attention
on the Nexus 5x -- and it's very easy to
-
transfer this module to any other Linux-
based device that has a trust zone that has
-
these shared caches, so it should be very
easy to port this over and to perform
-
these same powerful cache attacks on
different platforms. We can also do clever
-
things based on the Linux operating
system, so that we limit that collection
-
window to just when we're executing within
the secure world, so we can align our
-
traces a lot more easily that way. And the
end result is having a synchronized trace
-
for each different attacks, because, since
we've written in a modular way, we're able
-
to run different attacks simultaneously.
So, maybe we're running one prime-and-
-
probe attack on the L1 data cache, to
learn where the victim is accessing
-
memory, and we're simultaneously running
an attack on the L1 instruction cache, so
-
we can see what instructions the victim is
executing. And these can be aligned. So,
-
the tool that I've written is a
combination of a kernel module which
-
actually performs this attack, a userland
binary which schedules these processes to
-
different cores, and a GUI that will allow
you to interact with this kernel module
-
and rapidly start doing these cache
attacks for yourself and perform them
-
against different processes and secure
code and secure world code. So, the
-
intention behind this tool is to be very
generalizable to make it very easy to use
-
this platform for different devices and to
allow people way to, once again, quickly
-
develop these attacks; and also to see if
their own code is vulnerable to these
-
cache attacks, to see if their code has
these secret dependent memory accesses.
-
So, can we get even better... spatial
resolution? Right now, we're down to 64
-
bytes and that's the size of a cache line,
which is the size of our shared hardware.
-
And on SGX, we actually can get better
than 64 bytes, based on something called a
-
branch-shadowing attack. So, a branch-
shadowing attack takes advantage of
-
something called the branch target buffer.
And the branch target buffer is a
-
structure that's used for branch
prediction. It's similar to a cache, but
-
there's a key difference where the branch
target buffer doesn't compare the full
-
address, when seeing if something is
already in the cache or not: It doesn't
-
compare all of the upper level bits. So,
that means that it's possible that two
-
different addresses will experience a
collision, and the same entry from that
-
BTB cache will be read out for an improper
address. Now, since this is just for
-
branch prediction, the worst that can
happen is, you'll get a misprediction and
-
a small time penalty, but that's about it.
The idea of behind the branch-shadowing
-
attack is leveraging the small difference
in this overlapping and this collision of
-
addresses in order to sort of execute a
shared code cell flush-and-reload attack
-
on the branch target buffer. So, here what
goes on is, during the attack the attacker
-
modifies the SGX Enclave to make sure that
the branches that are within the Enclave
-
will collide with branches that are not in
the Enclave. The attacker executes the
-
Enclave code and then the attacker
executes their own code and based on the
-
outcome of the the victim code in that
cache, the attacker code may or may not
-
experience a branch prediction. So, the
attacker is able to tell the outcome of a
-
branch, because of this overlap in this
collision, like would be in a flush-and-
-
reload attack, where those memories
overlap between the attacker and the
-
victim. So here, our spatial resolution is
fantastic: We can tell down to individual
-
branch instructions in SGX; we can tell
exactly, which branches were executed and
-
which directions they were taken, in the
case of conditional branches. The temporal
-
resolution is also, once again, almost
unlimited, because we can use the same
-
timer interrupts in order to schedule our
process, our attacker process. And the
-
noise is, once again, very low, because we
can, once again, use the same sort of
-
branch misprediction counters, that exist
in the Intel world, in order to measure
-
this noise. So, does anything of that
apply to the TrustZone attacks? Well, in
-
this case the victim and attacker don't
share entries in the branch target buffer,
-
because the attacker is not able to map
the virtual address of the victim process.
-
But this is kind of reminiscent of our
earlier cache attacks, so our flush-and-
-
reload attack only worked when the attack
on the victim shared that memory, but we
-
still have the prime-and-probe attack for
when they don't. So, what if we use a
-
prime-and-probe-style attack on the branch
target buffer cache in ARM processors? So,
-
essentially what we do here is, we prime
the branch target buffer by executing mini
-
attacker branches to sort of fill up this
BTB cache with the attacker branch
-
prediction data; we let the victim execute
a branch which will evict an attacker BTB
-
entry; and then we have the attacker re-
execute those branches and see if there
-
have been any mispredictions. So now, the
cool thing about this attack is, the
-
structure of the BTB cache is different
from that of the L1 caches. So, instead of
-
having 256 different sets in the L1 cache,
the BTB cache has 2048 different sets, so
-
we can tell which branch it attacks, based
on which one of 2048 different set IDs
-
that it could fall into. And even more
than that, on the ARM platform, at least
-
on the Nexus 5x that I was working with,
the granularity is no longer 64 bytes,
-
which is the size of the line, it's now 16
bytes. So, we can see which branches the
-
the trusted code within TrustZone is
executing within 16 bytes. So, what does
-
this look like? So, previously with the
true-spy attack, this is sort of the
-
outcome of our prime-and-probe attack: We
get 1 measurement for those 256 different
-
set IDs. When we added those interrupts,
we're able to get that time resolution,
-
and it looks something like this. Now,
maybe you can see a little bit at the top
-
of the screen, how there's these repeated
sections of little white blocks, and you
-
can sort of use that to infer, maybe
there's the same cache line and cache
-
instructions that are called over and
over. So, just looking at this L1-I cache
-
attack, you can tell some information
about how the process went. Now, let's
-
compare that to the BTB attack. And I
don't know if you can see too clearly --
-
it's a it's a bit too high of resolution
right now -- so let's just focus in on one
-
small part of this overall trace. And this
is what it looks like. So, each of those
-
white pixels represents a branch that was
taken by that secure-world code and we can
-
see repeated patterns, we can see maybe
different functions that were called, we
-
can see different loops. And just by
looking at this 1 trace, we can infer a
-
lot of information on how that secure
world executed. So, it's incredibly
-
powerful and all of those secrets are just
waiting to be uncovered using these new
-
tools. So, where do we go from here? What
sort of countermeasures do we have? Well,
-
first of all I think, the long term
solution is going to be moving to no more
-
shared hardware. We need to have separate
hardware and no more shared caches in
-
order to fully get rid of these different
cache attacks. And we've already seen this
-
trend in different cell phones. So, for
example, in Apple SSEs for a long time now
-
-- I think since the Apple A7 -- the
secure Enclave, which runs the secure
-
code, has its own cache. So, these cache
attacks can't be accomplished from code
-
outside of that secure Enclave. So, just
by using that separate hardware, it knocks
-
out a whole class of different potential
side-channel and microarchitecture
-
attacks. And just recently, the Pixel 2 is
moving in the same direction. The Pixel 2
-
now includes a hardware security module
that includes cryptographic operations;
-
and that chip also has its own memory and
its own caches, so now we can no longer
-
use this attack to extract information
about what's going on in this external
-
hardware security module. But even then,
using this separate hardware, that doesn't
-
solve all of our problems. Because we
still have the question of "What do we
-
include in this separate hardware?" On the
one hand, we want to include more code in
-
that a separate hardware, so we're less
vulnerable to these side-channel attacks,
-
but on the other hand, we don't want to
expand the attack surface anymore. Because
-
the more code we include in these secure
environments, the more like that a
-
vulnerabiliyy will be found and the
attacker will be able to get a foothold
-
within the secure, trusted environment.
So, there's going to be a balance between
-
what do you choose to include in the
separate hardware and what you don't. So,
-
do you include DRM code? Do you include
cryptographic code? It's still an open
-
question. And that's sort of the long-term
approach. In the short term, you just kind
-
of have to write side-channel-free
software: Just be very careful about what
-
your process does, if there are any
secret, dependent memory accesses or a
-
secret, dependent branching or secret,
dependent function calls, because any of
-
those can leak the secrets out of your
trusted execution environment. So, here
-
are the things that, if you are a
developer of trusted execution environment
-
code, that I want you to keep in mind:
First of all, performance is very often at
-
odds with security. We've seen over and
over that the performance enhancements to
-
these processors open up the ability for
these microarchitectural attacks to be
-
more efficient. Additionally, these
trusted execution environments don't
-
protect against everything; there are
still these side-channel attacks and these
-
microarchitectural attacks that these
systems are vulnerable to. These attacks
-
are very powerful; they can be
accomplished simply; and with the
-
publication of the code that I've written,
it should be very simple to get set up and
-
to analyze your own code to see "Am I
vulnerable, do I expose information in the
-
same way?" And lastly, it only takes 1
small error, 1 tiny leak from your trusted
-
and secure code, in order to extract the
entire secret, in order to bring the whole
-
thing down. So, what I want to leave you
with is: I want you to remember that you
-
are responsible for making sure that your
program is not vulnerable to these
-
microarchitectural attacks, because if you
do not take responsibility for this, who
-
will? Thank you!
-
Applause
-
Herald: Thank you very much. Please, if
you want to leave the hall, please do it
-
quiet and take all your belongings with
you and respect the speaker. We have
-
plenty of time, 16, 17 minutes for Q&A, so
please line up on the microphones. No
-
questions from the signal angel, all
right. So, we can start with microphone 6,
-
please.
Mic 6: Okay. There was a symbol of secure
-
OSes at the ARM TrustZone. Which a idea of
them if the non-secure OS gets all the
-
interrupts? What does is
the secure OS for?
-
Keegan: Yeah so, in the ARMv8 there are a
couple different kinds of interrupts. So,
-
I think -- if I'm remembering the
terminology correctly -- there is an IRQ
-
and an FIQ interrupt. So, the non-secure
mode handles the IRQ interrupts and the
-
secure mode handles the FIQ interrupts.
So, depending on which one you send, it
-
will depend on which direction that
monitor will direct that interrupt.
-
Mic 6: Thank you.
Herald: Okay, thank you. Microphone number
-
7, please.
Mic 7: Does any of your present attacks on
-
TrustZone also apply to the AMD
implementation of TrustZone or are you
-
looking into it?
Keegan: I haven't looked into AMD too
-
much, because, as far as I can tell,
that's not used as commonly, but there are
-
many different types of trusted execution
environments. The 2 that I focus on were
-
SGX and TrustZone, because those are the
most common examples that I've seen.
-
Herald: Thank you. Microphone
number 8, please.
-
Mic 8: When TrustZone is moved to
dedicated hardware, dedicated memory,
-
couldn't you replicate the userspace
attacks by loading your own trusted
-
userspace app and use it as an
oracle of some sorts?
-
Keegan: If you can load your own trust
code, then yes, you could do that. But in
-
many of the models I've seen today, that's
not possible. So, that's why you have
-
things like code signing, which prevent
the arbitrary user from running their own
-
code in the trusted OS... or in the the
trusted environment.
-
Herald: All right. Microphone number 1.
Mic 1: So, these attacks are more powerful
-
against code that's running in... just the
execution environments than similar
-
attacks would be against ring-3 code, or,
in general, trusted code. Does that mean
-
that trusting execution environments are
basically an attractive nuisance that we
-
shouldn't use?
Keegan: There's still a large benefit to
-
using these trusted execution
environments. The point I want to get
-
across is that, although they add a lot of
features, they don't protect against
-
everything, so you should keep in mind
that these side-channel attacks do still
-
exist and you still need to protect
against them. But overall, these are
-
better things and worthwhile in including.
Herald: Thank you. Microphone number 1
-
again, please
Mic 1: So, AMD is doing something with
-
encrypting memory and I'm not sure if they
encrypt addresses, too, and but would that
-
be a defense against such attacks?
Keegan: So, I'm not too familiar with AMD,
-
but SGX also encrypts memory. It encrypts
it in between the lowest-level cache and
-
the main memory. But that doesn't really
have an impact on the actual operation,
-
because the memories encrypt at the cache
line level and as the attacker, we don't
-
care what that data is within that cache
line, we only care which cache line is
-
being accessed.
Mic 1: If you encrypt addresses, wouldn't
-
that help against that?
Keegan: I'm not sure, how you would
-
encrypt the addresses yourself. As long as
those adresses map into the same set IDs
-
that the victim can map into, then the
victim could still pull off the same style
-
of attacks.
Herald: Great. We have a question from the
-
internet, please.
Signal Angel: The question is "Does the
-
secure enclave on the Samsung Exynos
distinguish the receiver of the messag, so
-
that if the user application asked to
decode an AES message, can one sniff on
-
the value that the secure
enclave returns?"
-
Keegan: So, that sounds like it's asking
about the true-spy style attack, where
-
it's calling to the secure world to
encrypt something with AES. I think, that
-
would all depend on the different
implementation: As long as it's encrypting
-
for a certain key and it's able to do that
repeatably, then the attack would,
-
assuming a vulnerable AES implementation,
would be able to extract that key out.
-
Herald: Cool. Microphone number 2, please.
Mic 2: Do you recommend a reference to
-
understand how these cache line attacks
and branch oracles actually lead to key
-
recovery?
Keegan: Yeah. So, I will flip through
-
these pages which include a lot of the
references for the attacks that I've
-
mentioned, so if you're watching the
video, you can see these right away or
-
just access the slides. And a lot of these
contain good starting points. So, I didn't
-
go into a lot of the details on how, for
example, the true-spy attack recovered
-
that AES key., but that paper does have a
lot of good links, how those areas can
-
lead to key recovery. Same thing with the
CLKSCREW attack, how the different fault
-
injection can lead to key recovery.
Herald: Microphone number 6, please.
-
Mic 6: I think my question might have been
very, almost the same thing: How hard is
-
it actually to recover the keys? Is this
like a massive machine learning problem or
-
is this something that you can do
practically on a single machine?
-
Keegan: It varies entirely by the end
implementation. So, for all these attacks
-
work, you need to have some sort of
vulnerable implementation and some
-
implementations leak more data than
others. In the case of a lot of the AES
-
attacks, where you're doing the passive
attacks, those are very easy to do on just
-
your own computer. For the AES fault
injection attack, I think that one
-
required more brute force, in the CLKSCREW
paper, so that one required more computing
-
resources, but still, it was entirely
practical to do in a realistic setting.
-
Herald: Cool, thank you. So, we have one
more: Microphone number 1, please.
-
Mic 1: So, I hope it's not a too naive
question, but I was wondering, since all
-
these attacks are based on cache hit and
misses, isn't it possible to forcibly
-
flush or invalidate or insert noise in
cache after each operation in this trust
-
environment, in order to mess up the
guesswork of the attacker? So, discarding
-
optimization and performance for
additional security benefits.
-
Keegan: Yeah, and that is absolutely
possible and you are absolutely right: It
-
does lead to a performance degradation,
because if you always flush the entire
-
cache every time you do a context switch,
that will be a huge performance hit. So
-
again, that comes down to the question of
the performance and security trade-off:
-
Which one do you end up going with? And it
seems historically the choice has been
-
more in the direction of performance.
Mic 1: Thank you.
-
Herald: But we have one more: Microphone
number 1, please.
-
Mic 1: So, I have more of a moral
question: So, how well should we really
-
protect from attacks which need some
ring-0 cooperation? Because, basically,
-
when we use TrustZone for purpose... we
would see clear, like protecting the
-
browser from interacting from outside
world, then we are basically using the
-
safe execution environment for sandboxing
the process. But once we need some
-
cooperation from the kernel, some of that
attacks, is in fact, empower the user
-
instead of the hardware producer.
Keegan: Yeah, and you're right. It
-
depends entirely on what your application
is and what your threat model is that
-
you're looking at. So, if you're using
these trusted execution environments to do
-
DRM, for example, then maybe you wouldn't
be worried about that ring-0 attack or
-
that privileged attacker who has their
phone rooted and is trying to recover
-
these media encryption keys from this
execution environment. But maybe there are
-
other scenarios where you're not as
worried about having an attack with a
-
compromised ring 0. So, it entirely
depends on context.
-
Herald: Alright, thank you. So, we have
one more: Microphone number 1, again.
-
Mic 1: Hey there. Great talk, thank you
very much.
-
Keegan: Thank you.
Mic 1: Just a short question: Do you have
-
any success stories about attacking the
TrustZone and the different
-
implementations of TE with some vendors
like some OEMs creating phones and stuff?
-
Keegan: Not that I'm announcing
at this time.
-
Herald: So, thank you very much. Please,
again a warm round of applause for Keegan!
-
Applause
-
34c3 postroll music
-
subtitles created by c3subtitles.de
in the year 2018. Join, and help us!