Return to Video

34C3 - Microarchitectural Attacks on Trusted Execution Environments

  • 0:00 - 0:15
    34C3 preroll music
  • 0:15 - 0:23
    Herald: Hello fellow creatures.
    Welcome and
  • 0:23 - 0:30
    I wanna start with a question.
    Another one: Who do we trust?
  • 0:30 - 0:36
    Do we trust the TrustZones
    on our smartphones?
  • 0:36 - 0:42
    Well Keegan Ryan, he's really
    fortunate to be here and
  • 0:42 - 0:52
    he was inspired by another talk from the
    CCC before - I think it was 29C3 and his
  • 0:52 - 0:58
    research on smartphones and systems on a
    chip used in smart phones will answer
  • 0:58 - 1:03
    these questions if you can trust those
    trusted execution environments. Please
  • 1:03 - 1:06
    give a warm round of applause
    to Keegan and enjoy!
  • 1:06 - 1:11
    Applause
  • 1:11 - 1:14
    Kegan Ryan: All right, thank you! So I'm
    Keegan Ryan, I'm a consultant with NCC
  • 1:14 - 1:20
    group and this is micro architectural
    attacks on Trusted Execution Environments.
  • 1:20 - 1:23
    So, in order to understand what a Trusted
    Execution Environment is we need to go
  • 1:23 - 1:30
    back into processor security, specifically
    on x86. So as many of you are probably
  • 1:30 - 1:34
    aware there are a couple different modes
    which we can execute code under in x86
  • 1:34 - 1:39
    processors and that includes ring 3, which
    is the user code and the applications, and
  • 1:39 - 1:46
    also ring 0 which is the kernel code. Now
    there's also a ring 1 and ring 2 that are
  • 1:46 - 1:50
    supposedly used for drivers or guest
    operating systems but really it just boils
  • 1:50 - 1:56
    down to ring 0 and ring 3. And in this
    diagram we have here we see that privilege
  • 1:56 - 2:02
    increases as we go up the diagram, so ring
    0 is the most privileged ring and ring 3
  • 2:02 - 2:05
    is the least privileged ring. So all of
    our secrets, all of our sensitive
  • 2:05 - 2:10
    information, all of the attackers goals
    are in ring 0 and the attacker is trying
  • 2:10 - 2:16
    to access those from the unprivileged
    world of ring 3. Now you may have a
  • 2:16 - 2:20
    question what if I want to add a processor
    feature that I don't want ring 0 to be
  • 2:20 - 2:26
    able to access? Well then you add ring -1
    which is often used for a hypervisor. Now
  • 2:26 - 2:31
    the hypervisor has all the secrets and the
    hypervisor can manage different guest
  • 2:31 - 2:36
    operating systems and each of these guest
    operating systems can execute in ring 0
  • 2:36 - 2:41
    without having any idea of the other
    operating systems. So this way now the
  • 2:41 - 2:45
    secrets are all in ring -1 so now the
    attackers goals have shifted from ring 0
  • 2:45 - 2:51
    to ring -1. The attacker has to attack
    ring -1 from a less privileged ring and
  • 2:51 - 2:55
    tries to access those secrets. But what if
    you want to add a processor feature that
  • 2:55 - 3:01
    you don't want ring -1 to be able to
    access? So you add ring -2 which is System
  • 3:01 - 3:05
    Management Mode and that's capable of
    monitoring power, directly interfacing
  • 3:05 - 3:10
    with firmware and other chips on a
    motherboard and it's able to access and do
  • 3:10 - 3:14
    a lot of things that the hypervisor is not
    able to and now all of your secrets and
  • 3:14 - 3:18
    all of your attacker goals are in ring -2
    and the attacker has to attack those from
  • 3:18 - 3:22
    a less privileged ring. Now maybe you want
    to add something to your processor that
  • 3:22 - 3:27
    you don't want ring -2 to be able access,
    so you add ring -3 and I think you get the
  • 3:27 - 3:31
    picture now. And we just keep on adding
    more and more privilege rings and keep
  • 3:31 - 3:35
    putting our secrets and our attackers
    goals in these higher and higher
  • 3:35 - 3:41
    privileged rings but what if we're
    thinking about it wrong? What if instead
  • 3:41 - 3:47
    we want to put all the secrets in the
    least privileged ring? So this is sort of
  • 3:47 - 3:51
    the idea behind SGX and it's useful for
    things like DRM where you want that to run
  • 3:51 - 3:57
    ring 3 code but have sensitive secrets or
    other assigning capabilities running in
  • 3:57 - 4:02
    ring 3. But this picture is getting a
    little bit complicated, this diagram is a
  • 4:02 - 4:06
    little bit complex so let's simplify it a
    little bit. We'll only be looking at ring
  • 4:06 - 4:12
    0 through ring 3 which is the kernel, the
    userland and the SGX enclave which also
  • 4:12 - 4:17
    executes in ring 3. Now when you're
    executing code in the SGX enclave you
  • 4:17 - 4:22
    first load the code into the enclave and
    then from that point on you trust the
  • 4:22 - 4:27
    execution of whatever's going on in that
    enclave. You trust that the other elements
  • 4:27 - 4:32
    the kernel, the userland, the other rings
    are not going to be able to access what's
  • 4:32 - 4:38
    in that enclave so you've made your
    Trusted Execution Environment. This is a
  • 4:38 - 4:45
    bit of a weird model because now your
    attacker is in the ring 0 kernel and your
  • 4:45 - 4:49
    target victim here is in ring 3. So
    instead of the attacker trying to move up
  • 4:49 - 4:54
    the privilege chain, the attacker is
    trying to move down. Which is pretty
  • 4:54 - 4:58
    strange and you might have some questions
    like "under this model who handles memory
  • 4:58 - 5:01
    management?" because traditionally that's
    something that ring 0 would manage and
  • 5:01 - 5:05
    ring 0 would be responsible for paging
    memory in and out for different processes
  • 5:05 - 5:10
    in different code that's executing it in
    ring 3. But on the other hand you don't
  • 5:10 - 5:16
    want that to happen with the SGX enclave
    because what if the malicious ring 0 adds
  • 5:16 - 5:22
    a page to the enclave that the enclave
    doesn't expect? So in order to solve this
  • 5:22 - 5:29
    problem, SGX does allow ring 0 to handle
    page faults. But simultaneously and in
  • 5:29 - 5:35
    parallel it verifies every memory load to
    make sure that no access violations are
  • 5:35 - 5:40
    made so that all the SGX memory is safe.
    So it allows ring 0 to do its job but it
  • 5:40 - 5:45
    sort of watches over at the same time to
    make sure that nothing is messed up. So
  • 5:45 - 5:51
    it's a bit of a weird convoluted solution
    to a strange inverted problem but it works
  • 5:51 - 5:58
    and that's essentially how SGX works and
    the idea behind SGX. Now we can look at
  • 5:58 - 6:03
    x86 and we can see that ARMv8 is
    constructed in a similar way but it
  • 6:03 - 6:08
    improves on x86 in a couple key ways. So
    first of all ARMv8 gets rid of ring 1 and
  • 6:08 - 6:12
    ring 2 so you don't have to worry about
    those and it just has different privilege
  • 6:12 - 6:17
    levels for userland and the kernel. And
    these different privilege levels are
  • 6:17 - 6:22
    called exception levels in the ARM
    terminology. And the second thing that ARM
  • 6:22 - 6:26
    gets right compared to x86 is that instead
    of starting at 3 and counting down as
  • 6:26 - 6:31
    privilege goes up, ARM starts at 0 and
    counts up so we don't have to worry about
  • 6:31 - 6:36
    negative numbers anymore. Now when we add
    the next privilege level the hypervisor we
  • 6:36 - 6:41
    call it exception level 2 and the next one
    after that is the monitor in exception
  • 6:41 - 6:47
    level 3. So at this point we still want to
    have the ability to run trusted code in
  • 6:47 - 6:53
    exception level 0 the least privileged
    level of the ARMv8 processor. So in order
  • 6:53 - 6:59
    to support this we need to separate this
    diagram into two different sections. In
  • 6:59 - 7:04
    ARMv8 these are called the secure world
    and the non-secure world. So we have the
  • 7:04 - 7:08
    non-secure world on the left in blue that
    consists of the userland, the kernel and
  • 7:08 - 7:12
    the hypervisor and we have the secure
    world on the right which consists of the
  • 7:12 - 7:17
    monitor in exception level 3, a trusted
    operating system in exception level 1 and
  • 7:17 - 7:23
    trusted applications in exception level 0.
    So the idea is that if you run anything in
  • 7:23 - 7:27
    the secure world, it should not be
    accessible or modifiable by anything in
  • 7:27 - 7:32
    the non secure world. So that's how our
    attacker is trying to access it. The
  • 7:32 - 7:36
    attacker has access to the non secure
    kernel, which is often Linux, and they're
  • 7:36 - 7:40
    trying to go after the trusted apps. So
    once again we have this weird inversion
  • 7:40 - 7:43
    where we're trying to go from a more
    privileged level to a less privileged
  • 7:43 - 7:48
    level and trying to extract secrets in
    that way. So the question that arises when
  • 7:48 - 7:53
    using these Trusted Execution Environments
    that are implemented in SGX and TrustZone
  • 7:53 - 7:58
    in ARM is "can we use these privilege
    modes in our privilege access in order to
  • 7:58 - 8:03
    attack these Trusted Execution
    Environments?". Now transfer that question
  • 8:03 - 8:06
    and we can start looking at a few
    different research papers. The first one
  • 8:06 - 8:11
    that I want to go into is one called
    CLKSCREW and it's an attack on TrustZone.
  • 8:11 - 8:14
    So throughout this presentation I'm going
    to go through a few different papers and
  • 8:14 - 8:18
    just to make it clear which papers have
    already been published and which ones are
  • 8:18 - 8:21
    old I'll include the citations in the
    upper right hand corner so that way you
  • 8:21 - 8:27
    can tell what's old and what's new. And as
    far as papers go this CLKSCREW paper is
  • 8:27 - 8:31
    relatively new. It was released in 2017.
    And the way CLKSCREW works is it takes
  • 8:31 - 8:38
    advantage of the energy management
    features of a processor. So a non-secure
  • 8:38 - 8:42
    operating system has the ability to manage
    the energy consumption of the different
  • 8:42 - 8:48
    cores. So if a certain target core doesn't
    have much scheduled to do then the
  • 8:48 - 8:52
    operating system is able to scale back
    that voltage or dial down the frequency on
  • 8:52 - 8:56
    that core so that core uses less energy
    which is a great thing for performance: it
  • 8:56 - 9:01
    really extends battery life, it makes the
    the cores last longer and it gives better
  • 9:01 - 9:07
    performance overall. But the problem here
    is what if you have two separate cores and
  • 9:07 - 9:12
    one of your cores is running this non-
    trusted operating system and the other
  • 9:12 - 9:16
    core is running code in the secure world?
    It's running that trusted code those
  • 9:16 - 9:21
    trusted applications so that non secure
    operating system can still dial down that
  • 9:21 - 9:26
    voltage and it can still change that
    frequency and those changes will affect
  • 9:26 - 9:31
    the secure world code. So what the
    CLKSCREW attack does is the non secure
  • 9:31 - 9:36
    operating system core will dial down the
    voltage, it will overclock the frequency
  • 9:36 - 9:41
    on the target secure world core in order
    to induce faults to make sure to make the
  • 9:41 - 9:46
    computation on that core fail in some way
    and when that computation fails you get
  • 9:46 - 9:50
    certain cryptographic errors that the
    attack can use to infer things like secret
  • 9:50 - 9:56
    keys, secret AES keys and to bypass code
    signing implemented in the secure world.
  • 9:56 - 10:00
    So it's a very powerful attack that's made
    possible because the non-secure operating
  • 10:00 - 10:06
    system is privileged enough in order to
    use these energy management features. Now
  • 10:06 - 10:10
    CLKSCREW is an example of an active attack
    where the attacker is actively changing
  • 10:10 - 10:15
    the outcome of the victim code of that
    code in the secure world. But what about
  • 10:15 - 10:21
    passive attacks? So in a passive attack,
    the attacker does not modify the actual
  • 10:21 - 10:25
    outcome of the process. The attacker just
    tries to monitor that process infer what's
  • 10:25 - 10:29
    going on and that is the sort of attack
    that we'll be considering for the rest of
  • 10:29 - 10:36
    the presentation. So in a lot of SGX and
    TrustZone implementations, the trusted and
  • 10:36 - 10:40
    the non-trusted code both share the same
    hardware and this shared hardware could be
  • 10:40 - 10:46
    a shared cache, it could be a branch
    predictor, it could be a TLB. The point is
  • 10:46 - 10:53
    that they share the same hardware so that
    the changes made by the secure code may be
  • 10:53 - 10:57
    reflected in the behavior of the non-
    secure code. So the trusted code might
  • 10:57 - 11:02
    execute, change the state of that shared
    cache for example and then the untrusted
  • 11:02 - 11:07
    code may be able to go in, see the changes
    in that cache and infer information about
  • 11:07 - 11:12
    the behavior of the secure code. So that's
    essentially how our side channel attacks
  • 11:12 - 11:16
    are going to work. If the non-secure code
    is going to monitor these shared hardware
  • 11:16 - 11:23
    resources for state changes that reflect
    the behavior of the secure code. Now we've
  • 11:23 - 11:28
    all talked about how Intel and SGX address
    the problem of memory management and who's
  • 11:28 - 11:33
    responsible for making sure that those
    attacks don't work on SGX. So what do they
  • 11:33 - 11:37
    have to say on how they protect against
    these side channel attacks and attacks on
  • 11:37 - 11:45
    this shared cache hardware? They don't..
    at all. They essentially say "we do not
  • 11:45 - 11:49
    consider this part of our threat model. It
    is up to the developer to implement the
  • 11:49 - 11:54
    protections needed to protect against
    these side-channel attacks". Which is
  • 11:54 - 11:57
    great news for us because these side
    channel attacks can be very powerful and
  • 11:57 - 12:00
    if there aren't any hardware features that
    are necessarily stopping us from being
  • 12:00 - 12:07
    able to accomplish our goal it makes us
    that more likely to succeed. So with that
  • 12:07 - 12:11
    we can sort of take a step back from trust
    zone industry acts and just take a look at
  • 12:11 - 12:15
    cache attacks to make sure that we all
    have the same understanding of how the
  • 12:15 - 12:20
    cache attacks will be applied to these
    Trusted Execution Environments. To start
  • 12:20 - 12:26
    that let's go over a brief recap of how a
    cache works. So caches are necessary in
  • 12:26 - 12:30
    processors because accessing the main
    memory is slow. When you try to access
  • 12:30 - 12:34
    something from the main memory it takes a
    while to be read into the process. So the
  • 12:34 - 12:40
    cache exists as sort of a layer to
    remember what that information is so if
  • 12:40 - 12:45
    the process ever needs information from
    that same address it just reloads it from
  • 12:45 - 12:50
    the cache and that access is going to be
    fast. So it really speeds up the memory
  • 12:50 - 12:56
    access for repeated accesses to the same
    address. And then if we try to access a
  • 12:56 - 13:00
    different address then that will also be
    read into the cache, slowly at first but
  • 13:00 - 13:07
    then quickly for repeated accesses and so
    on and so forth. Now as you can probably
  • 13:07 - 13:11
    tell from all of these examples the memory
    blocks have been moving horizontally
  • 13:11 - 13:16
    they've always been staying in the same
    row. And that is reflective of the idea of
  • 13:16 - 13:20
    sets in a cache. So there are a number of
    different set IDs and that corresponds to
  • 13:20 - 13:24
    the different rows in this diagram. So for
    our example there are four different set
  • 13:24 - 13:31
    IDs and each address in the main memory
    maps to a different set ID. So that
  • 13:31 - 13:35
    address in main memory will only go into
    that location in the cache with the same
  • 13:35 - 13:40
    set ID so it will only travel along those
    rows. So that means if you have two
  • 13:40 - 13:43
    different blocks of memory that mapped to
    different set IDs they're not going to
  • 13:43 - 13:49
    interfere with each other in the cache.
    But that raises the question "what about
  • 13:49 - 13:53
    two memory blocks that do map to the same
    set ID?". Well if there's room in the
  • 13:53 - 13:59
    cache then the same thing will happen as
    before: those memory contents will be
  • 13:59 - 14:04
    loaded into the cache and then retrieved
    from the cache for future accesses. And
  • 14:04 - 14:08
    the number of possible entries for a
    particular set ID within a cache is called
  • 14:08 - 14:12
    the associativity. And on this diagram
    that's represented by the number of
  • 14:12 - 14:17
    columns in the cache. So we will call our
    cache in this example a 2-way set-
  • 14:17 - 14:22
    associative cache. Now the next question
    is "what happens if you try to read a
  • 14:22 - 14:27
    memory address that maps the same set ID
    but all of those entries within that said ID
  • 14:27 - 14:33
    within the cache are full?". Well one of
    those entries is chosen, it's evicted from
  • 14:33 - 14:39
    the cache, the new memory is read in and
    then that's fed to the process. So it
  • 14:39 - 14:44
    doesn't really matter how the cache entry
    is chosen that you're evicting for the
  • 14:44 - 14:48
    purpose of the presentation you can just
    assume that it's random. But the important
  • 14:48 - 14:52
    thing is that if you try to access that
    same memory that was evicted before you're
  • 14:52 - 14:56
    not going to have to wait for that time
    penalty for that to be reloaded into the
  • 14:56 - 15:01
    cache and read into the process. So those
    are caches in a nutshell in particularly
  • 15:01 - 15:06
    set associative caches, we can begin
    looking at the different types of cache
  • 15:06 - 15:09
    attacks. So for a cache attack we have two
    different processes we have an attacker
  • 15:09 - 15:14
    process and a victim process. For this
    type of attack that we're considering both
  • 15:14 - 15:17
    of them share the same underlying code so
    they're trying to access the same
  • 15:17 - 15:22
    resources which could be the case if you
    have page deduplication in virtual
  • 15:22 - 15:26
    machines or if you have copy-on-write
    mechanisms for shared code and shared
  • 15:26 - 15:32
    libraries. But the point is that they
    share the same underlying memory. Now the
  • 15:32 - 15:36
    Flush and Reload Attack works in two
    stages for the attacker. The attacker
  • 15:36 - 15:39
    first starts by flushing out the cache.
    They flush each and every addresses in the
  • 15:39 - 15:44
    cache so the cache is just empty. Then the
    attacker let's the victim executes for a
  • 15:44 - 15:49
    small amount of time so the victim might
    read on an address from main memory
  • 15:49 - 15:53
    loading that into the cache and then the
    second stage of the attack is the reload
  • 15:53 - 15:58
    phase. In the reload phase the attacker
    tries to load different memory addresses
  • 15:58 - 16:04
    from main memory and see if those entries
    are in the cache or not. Here the attacker
  • 16:04 - 16:09
    will first try to load address 0 and see
    that because it takes a long time to read
  • 16:09 - 16:14
    the contents of address 0 the attacker can
    infer that address 0 was not part of the
  • 16:14 - 16:17
    cache which makes sense because the
    attacker flushed it from the cache in the
  • 16:17 - 16:23
    first stage. The attacker then tries to
    read the memory at address 1 and sees that
  • 16:23 - 16:29
    this operation is fast so the attacker
    infers that the contents of address 1 are
  • 16:29 - 16:33
    in the cache and because the attacker
    flushed everything from the cache before
  • 16:33 - 16:37
    the victim executed, the attacker then
    concludes that the victim is responsible
  • 16:37 - 16:43
    for bringing address 1 into the cache.
    This Flush+Reload attack reveals which
  • 16:43 - 16:47
    memory addresses the victim accesses
    during that small slice of time. Then
  • 16:47 - 16:51
    after that reload phase, the attack
    repeats so the attacker flushes again
  • 16:51 - 16:58
    let's the victim execute, reloads again
    and so on. There's also a variant on the
  • 16:58 - 17:01
    Flush+Reload attack that's called the
    Flush+Flush attack which I'm not going to
  • 17:01 - 17:06
    go into the details of, but essentially
    it's the same idea. But instead of using
  • 17:06 - 17:09
    load instructions to determine whether or
    not a piece of memory is in the cache or
  • 17:09 - 17:14
    not, it uses flush instructions because
    flush instructions will take longer if
  • 17:14 - 17:19
    something is in the cache already. The
    important thing is that both the
  • 17:19 - 17:23
    Flush+Reload attack and the Flush+Flush
    attack rely on the attacker and the victim
  • 17:23 - 17:27
    sharing the same memory. But this isn't
    always the case so we need to consider
  • 17:27 - 17:31
    what happens when the attacker and the
    victim do not share memory. For this we
  • 17:31 - 17:36
    have the Prime+Probe attack. The
    Prime+Probe attack once again works in two
  • 17:36 - 17:40
    separate stages. In the first stage the
    attacker prime's the cache by reading all
  • 17:40 - 17:44
    the attacker memory into the cache and
    then the attacker lets the victim execute
  • 17:44 - 17:50
    for a small amount of time. So no matter
    what the victim accesses from main memory
  • 17:50 - 17:54
    since the cache is full of the attacker
    data, one of those attacker entries will
  • 17:54 - 17:59
    be replaced by a victim entry. Then in the
    second phase of the attack, during the
  • 17:59 - 18:04
    probe phase, the attacker checks the
    different cache entries for particular set
  • 18:04 - 18:09
    IDs and sees if all of the attacker
    entries are still in the cache. So maybe
  • 18:09 - 18:13
    our attacker is curious about the last set
    ID, the bottom row, so the attacker first
  • 18:13 - 18:18
    tries to load the memory at address 3 and
    because this operation is fast the
  • 18:18 - 18:23
    attacker knows that address 3 is in the
    cache. The attacker tries the same thing
  • 18:23 - 18:28
    with address 7, sees that this operation
    is slow and infers that at some point
  • 18:28 - 18:33
    address 7 was evicted from the cache so
    the attacker knows that something had to
  • 18:33 - 18:37
    evicted from the cache and it had to be
    from the victim so the attacker concludes
  • 18:37 - 18:43
    that the victim accessed something in that
    last set ID and that bottom row. The
  • 18:43 - 18:47
    attacker doesn't know if it was the
    contents of address 11 or the contents of
  • 18:47 - 18:51
    address 15 or even what those contents
    are, but the attacker has a good idea of
  • 18:51 - 18:57
    which set ID it was. So, the good things,
    the important things to remember about
  • 18:57 - 19:01
    cache attacks is that caches are very
    important, they're crucial for performance
  • 19:01 - 19:06
    on processors, they give a huge speed
    boost and there's a huge time difference
  • 19:06 - 19:12
    between having a cache and not having a
    cache for your executables. But the
  • 19:12 - 19:16
    downside to this is that big time
    difference also allows the attacker to
  • 19:16 - 19:22
    infer information about how the victim is
    using the cache. We're able to use these
  • 19:22 - 19:24
    cache attacks in the two different
    scenarios of, where memory is shared, in
  • 19:24 - 19:28
    the case of the Flush+Reload and
    Flush+Flush attacks and in the case where
  • 19:28 - 19:32
    memory is not shared, in the case of the
    Prime+Probe attack. And finally the
  • 19:32 - 19:37
    important thing to keep in mind is that,
    for these cache attacks, we know where the
  • 19:37 - 19:40
    victim is looking, but we don't know what
    they see. So we don't know the contents of
  • 19:40 - 19:44
    the memory that the victim is actually
    seeing, we just know the location and the
  • 19:44 - 19:52
    addresses. So, what does an example trace
    of these attacks look like? Well, there's
  • 19:52 - 19:56
    an easy way to represent these as two-
    dimensional images. So in this image, we
  • 19:56 - 20:02
    have our horizontal axis as time, so each
    column in this image represents a
  • 20:02 - 20:07
    different time slice, a different
    iteration of the Prime measure and Probe.
  • 20:07 - 20:11
    So, then we also have the vertical access
    which is the different set IDs, which is
  • 20:11 - 20:18
    the location that's accessed by the victim
    process, and then here a pixel is white if
  • 20:18 - 20:24
    the victim accessed that set ID during
    that time slice. So, as you look from left
  • 20:24 - 20:28
    to right as time moves forward, you can
    sort of see the changes in the patterns of
  • 20:28 - 20:34
    the memory accesses made by the victim
    process. Now, for this particular example
  • 20:34 - 20:40
    the trace is captured on an execution of
    AES repeated several times, an AES
  • 20:40 - 20:45
    encryption repeated about 20 times. And
    you can tell that this is a repeated
  • 20:45 - 20:49
    action because you see the same repeated
    memory access patterns in the data, you
  • 20:49 - 20:55
    see the same structures repeated over and
    over. So, you know that this is reflecting
  • 20:55 - 21:01
    at what's going on throughout time, but
    what does it have to do with AES itself?
  • 21:01 - 21:06
    Well, if we take the same trace with the
    same settings, but a different key, we see
  • 21:06 - 21:12
    that there is a different memory access
    pattern with different repetition within
  • 21:12 - 21:18
    the trace. So, only the key changed, the
    code didn't change. So, even though we're
  • 21:18 - 21:22
    not able to read the contents of the key
    directly using this cache attack, we know
  • 21:22 - 21:26
    that the key is changing these memory
    access patterns, and if we can see these
  • 21:26 - 21:31
    memory access patterns, then we can infer
    the key. So, that's the essential idea: we
  • 21:31 - 21:35
    want to make these images as clear as
    possible and as descriptive as possible so
  • 21:35 - 21:42
    we have the best chance of learning what
    those secrets are. And we can define the
  • 21:42 - 21:47
    metrics for what makes these cache attacks
    powerful in a few different ways. So, the
  • 21:47 - 21:52
    three ways we'll be looking at are spatial
    resolution, temporal resolution and noise.
  • 21:52 - 21:56
    So, spatial resolution refers to how
    accurately we can determine the where. If
  • 21:56 - 22:01
    we know that the victim access to memory
    address within 1,000 bytes, that's
  • 22:01 - 22:07
    obviously not as powerful as knowing where
    they accessed within 512 bytes. Temporal
  • 22:07 - 22:12
    resolution is similar, where we want to
    know the order of what accesses the victim
  • 22:12 - 22:18
    made. So if that time slice during our
    attack is 1 millisecond, we're going to
  • 22:18 - 22:22
    get much better ordering information on
    those memory access than we would get if
  • 22:22 - 22:27
    we only saw all the memory accesses over
    the course of one second. So the shorter
  • 22:27 - 22:32
    that time slice, the better the temporal
    resolution, the longer our picture will be
  • 22:32 - 22:38
    on the horizontal access, and the clearer
    of an image of the cache that we'll see.
  • 22:38 - 22:41
    And the last metric to evaluate our
    attacks on is noise and that reflects how
  • 22:41 - 22:46
    accurately our measurements reflect the
    true state of the cache. So, right now
  • 22:46 - 22:50
    we've been using time and data to infer
    whether or not an item was in the cache or
  • 22:50 - 22:54
    not, but this is a little bit noisy. It's
    possible that we'll have false positives
  • 22:54 - 22:57
    or false negatives, so we want to keep
    that in mind as we look at the different
  • 22:57 - 23:03
    attacks. So, that's essentially cache
    attacks, and then, in a nutshell and
  • 23:03 - 23:07
    that's all you really need to understand
    in order to understand these attacks as
  • 23:07 - 23:11
    they've been implemented on Trusted
    Execution Environments. And the first
  • 23:11 - 23:15
    particular attack that we're going to be
    looking at is called a Controlled-Channel
  • 23:15 - 23:20
    Attack on SGX, and this attack isn't
    necessarily a cache attack, but we can
  • 23:20 - 23:24
    analyze it in the same way that we analyze
    the cache attacks. So, it's still useful
  • 23:24 - 23:31
    to look at. Now, if you remember how
    memory management occurs with SGX, we know
  • 23:31 - 23:36
    that if a page fault occurs during SGX
    Enclave code execution, that page fault is
  • 23:36 - 23:43
    handled by the kernel. So, the kernel has
    to know which page the Enclave needs to be
  • 23:43 - 23:48
    paged in. The kernel already gets some
    information about what the Enclave is
  • 23:48 - 23:55
    looking at. Now, in the Controlled-Channel
    attack, there's a, what the attacker does
  • 23:55 - 24:00
    from the non-trusted OS is the attacker
    pages almost every other page from the
  • 24:00 - 24:05
    Enclave out of memory. So no matter
    whatever page that Enclave tries to
  • 24:05 - 24:10
    access, it's very likely to cause a page
    fault, which will be redirected to the
  • 24:10 - 24:14
    non-trusted OS, where the non-trusted OS
    can record it, page out any other pages
  • 24:14 - 24:20
    and continue execution. So, the OS
    essentially gets a list of sequential page
  • 24:20 - 24:26
    accesses made by the SGX Enclaves, all by
    capturing the page fault handler. This is
  • 24:26 - 24:30
    a very general attack, you don't need to
    know what's going on in the Enclave in
  • 24:30 - 24:33
    order to pull this off. You just load up
    an arbitrary Enclave and you're able to
  • 24:33 - 24:41
    see which pages that Enclave is trying to
    access. So, how does it do on our metrics?
  • 24:41 - 24:44
    First of all, this spatial resolution is
    not great. We can only see where the
  • 24:44 - 24:50
    victim is accessing within 4096 bytes or
    the size of a full page because SGX
  • 24:50 - 24:56
    obscures the offset into the page where
    the page fault occurs. The temporal
  • 24:56 - 24:59
    resolution is good but not great, because
    even though we're able to see any
  • 24:59 - 25:04
    sequential accesses to different pages
    we're not able to see sequential accesses
  • 25:04 - 25:10
    to the same page because we need to keep
    that same page paged-in while we let our
  • 25:10 - 25:15
    SGX Enclave run for that small time slice.
    So temporal resolution is good but not
  • 25:15 - 25:22
    perfect. But the noise is, there is no
    noise in this attack because no matter
  • 25:22 - 25:26
    where the page fault occurs, the untrusted
    operating system is going to capture that
  • 25:26 - 25:30
    page fault and is going to handle it. So,
    it's very low noise, not great spatial
  • 25:30 - 25:37
    resolution but overall still a powerful
    attack. But we still want to improve on
  • 25:37 - 25:41
    that spatial resolution, we want to be
    able to see what the Enclave is doing that
  • 25:41 - 25:46
    greater than a resolution of a one page of
    four kilobytes. So that's exactly what the
  • 25:46 - 25:50
    CacheZoom paper does, and instead of
    interrupting the SGX Enclave execution
  • 25:50 - 25:55
    with page faults, it uses timer
    interrupts. Because the untrusted
  • 25:55 - 25:59
    operating system is able to schedule when
    timer interrupts occur, so it's able to
  • 25:59 - 26:03
    schedule them at very tight intervals, so
    it's able to get that small and tight
  • 26:03 - 26:09
    temporal resolution. And essentially what
    happens in between is this timer
  • 26:09 - 26:13
    interrupts fires, the untrusted operating
    system runs the Prime+Probe attack code in
  • 26:13 - 26:18
    this case, and resumes execution of the
    onclick process, and this repeats. So this
  • 26:18 - 26:25
    is a Prime+Probe attack on the L1 data
    cache. So, this attack let's you see what
  • 26:25 - 26:31
    data The Enclave is looking at. Now, this
    attack could be easily modified to use the
  • 26:31 - 26:36
    L1 instruction cache, so in that case you
    learn which instructions The Enclave is
  • 26:36 - 26:41
    executing. And overall this is an even
    more powerful attack than the Control-
  • 26:41 - 26:46
    Channel attack. If we look at the metrics,
    we can see that the spatial resolution is
  • 26:46 - 26:50
    a lot better, now we're looking at spatial
    resolution of 64 bytes or the size of an
  • 26:50 - 26:55
    individual line. The temporal resolution
    is very good, it's "almost unlimited", to
  • 26:55 - 27:00
    quote the paper, because the untrusted
    operating system has the privilege to keep
  • 27:00 - 27:05
    scheduling those time interrupts closer
    and closer together until it's able to
  • 27:05 - 27:10
    capture very small time slices of the
    victim process .And the noise itself is
  • 27:10 - 27:15
    low, we're still using a cycle counter to
    measure the time it takes to load memory
  • 27:15 - 27:21
    in and out of the cache, but it's, it's
    useful, the chances of having a false
  • 27:21 - 27:27
    positive or false negative are low, so the
    noise is low as well. Now, we can also
  • 27:27 - 27:31
    look at Trust Zone attacks, because so far
    the attacks that we've looked at, the
  • 27:31 - 27:35
    passive attacks, have been against SGX and
    those attacks on SGX have been pretty
  • 27:35 - 27:41
    powerful. So, what are the published
    attacks on Trust Zone? Well, there's one
  • 27:41 - 27:45
    called TruSpy, which is kind of similar in
    concept to the CacheZoom attack that we
  • 27:45 - 27:52
    just looked at on SGX. It's once again a
    Prime+probe attack on the L1 data cache,
  • 27:52 - 27:57
    and the difference here is that instead of
    interrupting the victim code execution
  • 27:57 - 28:04
    multiple times, the TruSpy attack does the
    prime step, does the full AES encryption,
  • 28:04 - 28:09
    and then does the probe step. And the
    reason they do this, is because as they
  • 28:09 - 28:13
    say, the secure world is protected, and is
    not interruptible in the same way that SGX
  • 28:13 - 28:21
    is interruptable. But even despite this,
    just having one measurement per execution,
  • 28:21 - 28:25
    the TruSpy authors were able to use some
    statistics to still recover the AES key
  • 28:25 - 28:30
    from that noise. And their methods were so
    powerful, they are able to do this from an
  • 28:30 - 28:35
    unapproved application in user land, so
    they don't even need to be running within
  • 28:35 - 28:40
    the kernel in order to be able to pull off
    this attack. So, how does this attack
  • 28:40 - 28:43
    measure up? The spatial resolution is once
    again 64 bytes because that's the size of
  • 28:43 - 28:49
    a cache line on this processor, and the
    temporal resolution is, is pretty poor
  • 28:49 - 28:54
    here, because we only get one measurement
    per execution of the AES encryption. This
  • 28:54 - 28:59
    is also a particularly noisy attack
    because we're making the measurements from
  • 28:59 - 29:03
    the user land, but even if we make the
    measurements from the kernel, we're still
  • 29:03 - 29:06
    going to have the same issues of false
    positives and false negatives associated
  • 29:06 - 29:12
    with using a cycle counter to measure
    membership in a cache. So, we'd like to
  • 29:12 - 29:16
    improve this a little bit. We'd like to
    improve the temporal resolution, so we
  • 29:16 - 29:21
    have the power of the cache attack to be a
    little bit closer on TrustZone, as it is
  • 29:21 - 29:27
    on SGX. So, we want to improve that
    temporal resolution. Let's dig into that
  • 29:27 - 29:31
    statement a little bit, that the secure
    world is protected and not interruptable.
  • 29:31 - 29:36
    And to do, this we go back to this diagram
    of ARMv8 and how that TrustZone is set up.
  • 29:36 - 29:41
    So, it is true that when an interrupt
    occurs, it is directed to the monitor and,
  • 29:41 - 29:46
    because the monitor operates in the secure
    world, if we interrupt secure code that's
  • 29:46 - 29:49
    running an exception level 0, we're just
    going to end up running secure code an
  • 29:49 - 29:54
    exception level 3. So, this doesn't
    necessarily get us anything. I think,
  • 29:54 - 29:58
    that's what the author's mean by saying
    that it's protected against this. Just by
  • 29:58 - 30:03
    setting an interrupt, we don't have a
    way to redirect our flow to the non-
  • 30:03 - 30:08
    trusted code. At least that's how it works
    in theory. In practice, the Linux
  • 30:08 - 30:12
    operating system, running in exception
    level 1 in the non-secure world, kind of
  • 30:12 - 30:15
    needs interrupts in order to be able to
    work, so if an interrupt occurs and it's
  • 30:15 - 30:18
    being sent to the monitor, the monitor
    will just forward it right to the non-
  • 30:18 - 30:22
    secure operating system. So, we have
    interrupts just the same way as we did in
  • 30:22 - 30:29
    CacheZoom. And we can improve the
    TrustZone attacks by using this idea: We
  • 30:29 - 30:34
    have 2 cores, where one core is running
    the secure code, the other core is running
  • 30:34 - 30:38
    the non-secure code, and the non-secure
    code is sending interrupts to the secure-
  • 30:38 - 30:43
    world core and that will give us that
    interleaving of attacker process and
  • 30:43 - 30:47
    victim process that allow us to have a
    powerful prime-and-probe attack. So, what
  • 30:47 - 30:51
    does this look like? We have the attack
    core and the victim core. The attack core
  • 30:51 - 30:55
    sends an interrupt to the victim core.
    This interrupt is captured by the monitor,
  • 30:55 - 30:59
    which passes it to the non-secure
    operating system. The not-secure operating
  • 30:59 - 31:03
    system transfers this to our attack code,
    which runs the prime-and-probe attack.
  • 31:03 - 31:07
    Then, we leave the interrupt, the
    execution within the victim code in the
  • 31:07 - 31:11
    secure world resumes and we just repeat
    this over and over. So, now we have that
  • 31:11 - 31:17
    interleaving of data... of the processes
    of the attacker and the victim. So, now,
  • 31:17 - 31:23
    instead of having a temporal resolution of
    one measurement per execution, we once
  • 31:23 - 31:26
    again have almost unlimited temporal
    resolution, because we can just schedule
  • 31:26 - 31:32
    when we send those interrupts from the
    attacker core. Now, we'd also like to
  • 31:32 - 31:38
    improve the noise measurements. The...
    because if we can improve the noise, we'll
  • 31:38 - 31:42
    get clearer pictures and we'll be able to
    infer those secrets more clearly. So, we
  • 31:42 - 31:46
    can get some improvement by switching the
    measurements from userland and starting to
  • 31:46 - 31:51
    do those in the kernel, but again we have
    the cycle counters. So, what if, instead
  • 31:51 - 31:54
    of using the cycle counter to measure
    whether or not something is in the cache,
  • 31:54 - 32:00
    we use the other performance counters?
    Because on ARMv8 platforms, there is a way
  • 32:00 - 32:04
    to use performance counters to measure
    different events, such as cache hits and
  • 32:04 - 32:10
    cache misses. So, these events and these
    performance monitors require privileged
  • 32:10 - 32:15
    access in order to use, which, for this
    attack, we do have. Now, in a typical
  • 32:15 - 32:19
    cache text scenario we wouldn't have
    access to these performance monitors,
  • 32:19 - 32:22
    which is why they haven't really been
    explored before, but in this weird
  • 32:22 - 32:25
    scenario where we're attacking the less
    privileged code from the more privileged
  • 32:25 - 32:29
    code, we do have access to these
    performance monitors and we can use these
  • 32:29 - 32:34
    monitors during the probe step to get a
    very accurate count of whether or not a
  • 32:34 - 32:40
    certain memory load caused a cache miss or
    a cache hit. So, we're able to essentially
  • 32:40 - 32:46
    get rid of the different levels of noise.
    Now, one thing to point out is that maybe
  • 32:46 - 32:49
    we'd like to use these ARMv8 performance
    counters in order to count the different
  • 32:49 - 32:54
    events that are occurring in the secure
    world code. So, maybe we start the
  • 32:54 - 32:58
    performance counters from the non-secure
    world, let the secure world run and then,
  • 32:58 - 33:02
    when they secure world exits, we use the
    non-secure world to read these performance
  • 33:02 - 33:05
    counters and maybe we'd like to see how
    many instructions the secure world
  • 33:05 - 33:09
    executed or how many branch instructions
    or how many arithmetic instructions or how
  • 33:09 - 33:13
    many cache misses there were. But
    unfortunately, ARMv8 took this into
  • 33:13 - 33:17
    account and by default, performance
    counters that are started in the non-
  • 33:17 - 33:21
    secure world will not measure events that
    happen in the secure world, which is
  • 33:21 - 33:25
    smart; which is how it should be. And the
    only reason I bring this up is because
  • 33:25 - 33:29
    that's not how it is an ARMv7. So, we go
    into a whole different talk with that,
  • 33:29 - 33:34
    just exploring the different implications
    of what that means, but I want to focus on
  • 33:34 - 33:39
    ARMv8, because that's that's the newest of
    the new. So, we'll keep looking at that.
  • 33:39 - 33:43
    So, we instrument the primary probe attack
    to use these performance counters, so we
  • 33:43 - 33:47
    can get a clear picture of what is and
    what is not in the cache. And instead of
  • 33:47 - 33:52
    having noisy measurements based on time,
    we have virtually no noise at all, because
  • 33:52 - 33:56
    we get the truth straight from the
    processor itself, whether or not we
  • 33:56 - 34:02
    experience a cache miss. So, how do we
    implement these attacks, where do we go
  • 34:02 - 34:06
    from here? We have all these ideas; we
    have ways to make these TrustZone attacks
  • 34:06 - 34:12
    more powerful, but that's not worthwhile,
    unless we actually implement them. So, the
  • 34:12 - 34:17
    goal here is to implement these attacks on
    TrustZone and since typically the non-
  • 34:17 - 34:21
    secure world operating system is based on
    Linux, we'll take that into account when
  • 34:21 - 34:25
    making our implementation. So, we'll write
    a kernel module that uses these
  • 34:25 - 34:29
    performance counters and these inner
    processor interrupts, in order to actually
  • 34:29 - 34:33
    accomplish these attacks; and we'll write
    it in such a way that it's very
  • 34:33 - 34:37
    generalizable. So you can take this kernel
    module that's was written for one device
  • 34:37 - 34:42
    -- in my case I did most of my attention
    on the Nexus 5x -- and it's very easy to
  • 34:42 - 34:47
    transfer this module to any other Linux-
    based device that has a trust zone that has
  • 34:47 - 34:52
    these shared caches, so it should be very
    easy to port this over and to perform
  • 34:52 - 34:58
    these same powerful cache attacks on
    different platforms. We can also do clever
  • 34:58 - 35:02
    things based on the Linux operating
    system, so that we limit that collection
  • 35:02 - 35:06
    window to just when we're executing within
    the secure world, so we can align our
  • 35:06 - 35:11
    traces a lot more easily that way. And the
    end result is having a synchronized trace
  • 35:11 - 35:15
    for each different attacks, because, since
    we've written in a modular way, we're able
  • 35:15 - 35:19
    to run different attacks simultaneously.
    So, maybe we're running one prime-and-
  • 35:19 - 35:23
    probe attack on the L1 data cache, to
    learn where the victim is accessing
  • 35:23 - 35:27
    memory, and we're simultaneously running
    an attack on the L1 instruction cache, so
  • 35:27 - 35:34
    we can see what instructions the victim is
    executing. And these can be aligned. So,
  • 35:34 - 35:37
    the tool that I've written is a
    combination of a kernel module which
  • 35:37 - 35:42
    actually performs this attack, a userland
    binary which schedules these processes to
  • 35:42 - 35:46
    different cores, and a GUI that will allow
    you to interact with this kernel module
  • 35:46 - 35:50
    and rapidly start doing these cache
    attacks for yourself and perform them
  • 35:50 - 35:57
    against different processes and secure
    code and secure world code. So, the
  • 35:57 - 36:03
    intention behind this tool is to be very
    generalizable to make it very easy to use
  • 36:03 - 36:08
    this platform for different devices and to
    allow people way to, once again, quickly
  • 36:08 - 36:12
    develop these attacks; and also to see if
    their own code is vulnerable to these
  • 36:12 - 36:18
    cache attacks, to see if their code has
    these secret dependent memory accesses.
  • 36:18 - 36:25
    So, can we get even better... spatial
    resolution? Right now, we're down to 64
  • 36:25 - 36:30
    bytes and that's the size of a cache line,
    which is the size of our shared hardware.
  • 36:30 - 36:36
    And on SGX, we actually can get better
    than 64 bytes, based on something called a
  • 36:36 - 36:39
    branch-shadowing attack. So, a branch-
    shadowing attack takes advantage of
  • 36:39 - 36:43
    something called the branch target buffer.
    And the branch target buffer is a
  • 36:43 - 36:48
    structure that's used for branch
    prediction. It's similar to a cache, but
  • 36:48 - 36:52
    there's a key difference where the branch
    target buffer doesn't compare the full
  • 36:52 - 36:55
    address, when seeing if something is
    already in the cache or not: It doesn't
  • 36:55 - 37:00
    compare all of the upper level bits. So,
    that means that it's possible that two
  • 37:00 - 37:04
    different addresses will experience a
    collision, and the same entry from that
  • 37:04 - 37:09
    BTB cache will be read out for an improper
    address. Now, since this is just for
  • 37:09 - 37:12
    branch prediction, the worst that can
    happen is, you'll get a misprediction and
  • 37:12 - 37:18
    a small time penalty, but that's about it.
    The idea of behind the branch-shadowing
  • 37:18 - 37:22
    attack is leveraging the small difference
    in this overlapping and this collision of
  • 37:22 - 37:29
    addresses in order to sort of execute a
    shared code cell flush-and-reload attack
  • 37:29 - 37:35
    on the branch target buffer. So, here what
    goes on is, during the attack the attacker
  • 37:35 - 37:40
    modifies the SGX Enclave to make sure that
    the branches that are within the Enclave
  • 37:40 - 37:44
    will collide with branches that are not in
    the Enclave. The attacker executes the
  • 37:44 - 37:50
    Enclave code and then the attacker
    executes their own code and based on the
  • 37:50 - 37:55
    outcome of the the victim code in that
    cache, the attacker code may or may not
  • 37:55 - 37:59
    experience a branch prediction. So, the
    attacker is able to tell the outcome of a
  • 37:59 - 38:03
    branch, because of this overlap in this
    collision, like would be in a flush-and-
  • 38:03 - 38:07
    reload attack, where those memories
    overlap between the attacker and the
  • 38:07 - 38:14
    victim. So here, our spatial resolution is
    fantastic: We can tell down to individual
  • 38:14 - 38:19
    branch instructions in SGX; we can tell
    exactly, which branches were executed and
  • 38:19 - 38:25
    which directions they were taken, in the
    case of conditional branches. The temporal
  • 38:25 - 38:30
    resolution is also, once again, almost
    unlimited, because we can use the same
  • 38:30 - 38:34
    timer interrupts in order to schedule our
    process, our attacker process. And the
  • 38:34 - 38:39
    noise is, once again, very low, because we
    can, once again, use the same sort of
  • 38:39 - 38:44
    branch misprediction counters, that exist
    in the Intel world, in order to measure
  • 38:44 - 38:52
    this noise. So, does anything of that
    apply to the TrustZone attacks? Well, in
  • 38:52 - 38:55
    this case the victim and attacker don't
    share entries in the branch target buffer,
  • 38:55 - 39:02
    because the attacker is not able to map
    the virtual address of the victim process.
  • 39:02 - 39:05
    But this is kind of reminiscent of our
    earlier cache attacks, so our flush-and-
  • 39:05 - 39:10
    reload attack only worked when the attack
    on the victim shared that memory, but we
  • 39:10 - 39:14
    still have the prime-and-probe attack for
    when they don't. So, what if we use a
  • 39:14 - 39:21
    prime-and-probe-style attack on the branch
    target buffer cache in ARM processors? So,
  • 39:21 - 39:25
    essentially what we do here is, we prime
    the branch target buffer by executing mini
  • 39:25 - 39:30
    attacker branches to sort of fill up this
    BTB cache with the attacker branch
  • 39:30 - 39:35
    prediction data; we let the victim execute
    a branch which will evict an attacker BTB
  • 39:35 - 39:39
    entry; and then we have the attacker re-
    execute those branches and see if there
  • 39:39 - 39:45
    have been any mispredictions. So now, the
    cool thing about this attack is, the
  • 39:45 - 39:50
    structure of the BTB cache is different
    from that of the L1 caches. So, instead of
  • 39:50 - 40:00
    having 256 different sets in the L1 cache,
    the BTB cache has 2048 different sets, so
  • 40:00 - 40:06
    we can tell which branch it attacks, based
    on which one of 2048 different set IDs
  • 40:06 - 40:11
    that it could fall into. And even more
    than that, on the ARM platform, at least
  • 40:11 - 40:16
    on the Nexus 5x that I was working with,
    the granularity is no longer 64 bytes,
  • 40:16 - 40:22
    which is the size of the line, it's now 16
    bytes. So, we can see which branches the
  • 40:22 - 40:28
    the trusted code within TrustZone is
    executing within 16 bytes. So, what does
  • 40:28 - 40:32
    this look like? So, previously with the
    true-spy attack, this is sort of the
  • 40:32 - 40:37
    outcome of our prime-and-probe attack: We
    get 1 measurement for those 256 different
  • 40:37 - 40:43
    set IDs. When we added those interrupts,
    we're able to get that time resolution,
  • 40:43 - 40:48
    and it looks something like this. Now,
    maybe you can see a little bit at the top
  • 40:48 - 40:53
    of the screen, how there's these repeated
    sections of little white blocks, and you
  • 40:53 - 40:57
    can sort of use that to infer, maybe
    there's the same cache line and cache
  • 40:57 - 41:01
    instructions that are called over and
    over. So, just looking at this L1-I cache
  • 41:01 - 41:07
    attack, you can tell some information
    about how the process went. Now, let's
  • 41:07 - 41:12
    compare that to the BTB attack. And I
    don't know if you can see too clearly --
  • 41:12 - 41:17
    it's a it's a bit too high of resolution
    right now -- so let's just focus in on one
  • 41:17 - 41:23
    small part of this overall trace. And this
    is what it looks like. So, each of those
  • 41:23 - 41:28
    white pixels represents a branch that was
    taken by that secure-world code and we can
  • 41:28 - 41:31
    see repeated patterns, we can see maybe
    different functions that were called, we
  • 41:31 - 41:35
    can see different loops. And just by
    looking at this 1 trace, we can infer a
  • 41:35 - 41:40
    lot of information on how that secure
    world executed. So, it's incredibly
  • 41:40 - 41:44
    powerful and all of those secrets are just
    waiting to be uncovered using these new
  • 41:44 - 41:53
    tools. So, where do we go from here? What
    sort of countermeasures do we have? Well,
  • 41:53 - 41:57
    first of all I think, the long term
    solution is going to be moving to no more
  • 41:57 - 42:00
    shared hardware. We need to have separate
    hardware and no more shared caches in
  • 42:00 - 42:06
    order to fully get rid of these different
    cache attacks. And we've already seen this
  • 42:06 - 42:11
    trend in different cell phones. So, for
    example, in Apple SSEs for a long time now
  • 42:11 - 42:16
    -- I think since the Apple A7 -- the
    secure Enclave, which runs the secure
  • 42:16 - 42:21
    code, has its own cache. So, these cache
    attacks can't be accomplished from code
  • 42:21 - 42:27
    outside of that secure Enclave. So, just
    by using that separate hardware, it knocks
  • 42:27 - 42:31
    out a whole class of different potential
    side-channel and microarchitecture
  • 42:31 - 42:36
    attacks. And just recently, the Pixel 2 is
    moving in the same direction. The Pixel 2
  • 42:36 - 42:41
    now includes a hardware security module
    that includes cryptographic operations;
  • 42:41 - 42:46
    and that chip also has its own memory and
    its own caches, so now we can no longer
  • 42:46 - 42:51
    use this attack to extract information
    about what's going on in this external
  • 42:51 - 42:57
    hardware security module. But even then,
    using this separate hardware, that doesn't
  • 42:57 - 43:01
    solve all of our problems. Because we
    still have the question of "What do we
  • 43:01 - 43:06
    include in this separate hardware?" On the
    one hand, we want to include more code in
  • 43:06 - 43:11
    that a separate hardware, so we're less
    vulnerable to these side-channel attacks,
  • 43:11 - 43:16
    but on the other hand, we don't want to
    expand the attack surface anymore. Because
  • 43:16 - 43:19
    the more code we include in these secure
    environments, the more like that a
  • 43:19 - 43:23
    vulnerabiliyy will be found and the
    attacker will be able to get a foothold
  • 43:23 - 43:26
    within the secure, trusted environment.
    So, there's going to be a balance between
  • 43:26 - 43:30
    what do you choose to include in the
    separate hardware and what you don't. So,
  • 43:30 - 43:35
    do you include DRM code? Do you include
    cryptographic code? It's still an open
  • 43:35 - 43:42
    question. And that's sort of the long-term
    approach. In the short term, you just kind
  • 43:42 - 43:46
    of have to write side-channel-free
    software: Just be very careful about what
  • 43:46 - 43:51
    your process does, if there are any
    secret, dependent memory accesses or a
  • 43:51 - 43:55
    secret, dependent branching or secret,
    dependent function calls, because any of
  • 43:55 - 44:00
    those can leak the secrets out of your
    trusted execution environment. So, here
  • 44:00 - 44:03
    are the things that, if you are a
    developer of trusted execution environment
  • 44:03 - 44:08
    code, that I want you to keep in mind:
    First of all, performance is very often at
  • 44:08 - 44:13
    odds with security. We've seen over and
    over that the performance enhancements to
  • 44:13 - 44:19
    these processors open up the ability for
    these microarchitectural attacks to be
  • 44:19 - 44:24
    more efficient. Additionally, these
    trusted execution environments don't
  • 44:24 - 44:27
    protect against everything; there are
    still these side-channel attacks and these
  • 44:27 - 44:32
    microarchitectural attacks that these
    systems are vulnerable to. These attacks
  • 44:32 - 44:38
    are very powerful; they can be
    accomplished simply; and with the
  • 44:38 - 44:42
    publication of the code that I've written,
    it should be very simple to get set up and
  • 44:42 - 44:46
    to analyze your own code to see "Am I
    vulnerable, do I expose information in the
  • 44:46 - 44:53
    same way?" And lastly, it only takes 1
    small error, 1 tiny leak from your trusted
  • 44:53 - 44:57
    and secure code, in order to extract the
    entire secret, in order to bring the whole
  • 44:57 - 45:04
    thing down. So, what I want to leave you
    with is: I want you to remember that you
  • 45:04 - 45:09
    are responsible for making sure that your
    program is not vulnerable to these
  • 45:09 - 45:13
    microarchitectural attacks, because if you
    do not take responsibility for this, who
  • 45:13 - 45:17
    will? Thank you!
  • 45:17 - 45:25
    Applause
  • 45:25 - 45:30
    Herald: Thank you very much. Please, if
    you want to leave the hall, please do it
  • 45:30 - 45:35
    quiet and take all your belongings with
    you and respect the speaker. We have
  • 45:35 - 45:43
    plenty of time, 16, 17 minutes for Q&A, so
    please line up on the microphones. No
  • 45:43 - 45:51
    questions from the signal angel, all
    right. So, we can start with microphone 6,
  • 45:51 - 45:55
    please.
    Mic 6: Okay. There was a symbol of secure
  • 45:55 - 46:01
    OSes at the ARM TrustZone. Which a idea of
    them if the non-secure OS gets all the
  • 46:01 - 46:04
    interrupts? What does is
    the secure OS for?
  • 46:04 - 46:09
    Keegan: Yeah so, in the ARMv8 there are a
    couple different kinds of interrupts. So,
  • 46:09 - 46:12
    I think -- if I'm remembering the
    terminology correctly -- there is an IRQ
  • 46:12 - 46:17
    and an FIQ interrupt. So, the non-secure
    mode handles the IRQ interrupts and the
  • 46:17 - 46:20
    secure mode handles the FIQ interrupts.
    So, depending on which one you send, it
  • 46:20 - 46:25
    will depend on which direction that
    monitor will direct that interrupt.
  • 46:30 - 46:32
    Mic 6: Thank you.
    Herald: Okay, thank you. Microphone number
  • 46:32 - 46:38
    7, please.
    Mic 7: Does any of your present attacks on
  • 46:38 - 46:45
    TrustZone also apply to the AMD
    implementation of TrustZone or are you
  • 46:45 - 46:48
    looking into it?
    Keegan: I haven't looked into AMD too
  • 46:48 - 46:54
    much, because, as far as I can tell,
    that's not used as commonly, but there are
  • 46:54 - 46:57
    many different types of trusted execution
    environments. The 2 that I focus on were
  • 46:57 - 47:05
    SGX and TrustZone, because those are the
    most common examples that I've seen.
  • 47:05 - 47:09
    Herald: Thank you. Microphone
    number 8, please.
  • 47:09 - 47:20
    Mic 8: When TrustZone is moved to
    dedicated hardware, dedicated memory,
  • 47:20 - 47:28
    couldn't you replicate the userspace
    attacks by loading your own trusted
  • 47:28 - 47:32
    userspace app and use it as an
    oracle of some sorts?
  • 47:32 - 47:36
    Keegan: If you can load your own trust
    code, then yes, you could do that. But in
  • 47:36 - 47:40
    many of the models I've seen today, that's
    not possible. So, that's why you have
  • 47:40 - 47:44
    things like code signing, which prevent
    the arbitrary user from running their own
  • 47:44 - 47:50
    code in the trusted OS... or in the the
    trusted environment.
  • 47:50 - 47:55
    Herald: All right. Microphone number 1.
    Mic 1: So, these attacks are more powerful
  • 47:55 - 48:01
    against code that's running in... just the
    execution environments than similar
  • 48:01 - 48:07
    attacks would be against ring-3 code, or,
    in general, trusted code. Does that mean
  • 48:07 - 48:11
    that trusting execution environments are
    basically an attractive nuisance that we
  • 48:11 - 48:15
    shouldn't use?
    Keegan: There's still a large benefit to
  • 48:15 - 48:18
    using these trusted execution
    environments. The point I want to get
  • 48:18 - 48:21
    across is that, although they add a lot of
    features, they don't protect against
  • 48:21 - 48:25
    everything, so you should keep in mind
    that these side-channel attacks do still
  • 48:25 - 48:29
    exist and you still need to protect
    against them. But overall, these are
  • 48:29 - 48:36
    better things and worthwhile in including.
    Herald: Thank you. Microphone number 1
  • 48:36 - 48:42
    again, please
    Mic 1: So, AMD is doing something with
  • 48:42 - 48:48
    encrypting memory and I'm not sure if they
    encrypt addresses, too, and but would that
  • 48:48 - 48:53
    be a defense against such attacks?
    Keegan: So, I'm not too familiar with AMD,
  • 48:53 - 48:58
    but SGX also encrypts memory. It encrypts
    it in between the lowest-level cache and
  • 48:58 - 49:02
    the main memory. But that doesn't really
    have an impact on the actual operation,
  • 49:02 - 49:06
    because the memories encrypt at the cache
    line level and as the attacker, we don't
  • 49:06 - 49:10
    care what that data is within that cache
    line, we only care which cache line is
  • 49:10 - 49:16
    being accessed.
    Mic 1: If you encrypt addresses, wouldn't
  • 49:16 - 49:21
    that help against that?
    Keegan: I'm not sure, how you would
  • 49:21 - 49:25
    encrypt the addresses yourself. As long as
    those adresses map into the same set IDs
  • 49:25 - 49:30
    that the victim can map into, then the
    victim could still pull off the same style
  • 49:30 - 49:35
    of attacks.
    Herald: Great. We have a question from the
  • 49:35 - 49:38
    internet, please.
    Signal Angel: The question is "Does the
  • 49:38 - 49:42
    secure enclave on the Samsung Exynos
    distinguish the receiver of the messag, so
  • 49:42 - 49:47
    that if the user application asked to
    decode an AES message, can one sniff on
  • 49:47 - 49:52
    the value that the secure
    enclave returns?"
  • 49:52 - 49:57
    Keegan: So, that sounds like it's asking
    about the true-spy style attack, where
  • 49:57 - 50:01
    it's calling to the secure world to
    encrypt something with AES. I think, that
  • 50:01 - 50:05
    would all depend on the different
    implementation: As long as it's encrypting
  • 50:05 - 50:10
    for a certain key and it's able to do that
    repeatably, then the attack would,
  • 50:10 - 50:16
    assuming a vulnerable AES implementation,
    would be able to extract that key out.
  • 50:16 - 50:21
    Herald: Cool. Microphone number 2, please.
    Mic 2: Do you recommend a reference to
  • 50:21 - 50:25
    understand how these cache line attacks
    and branch oracles actually lead to key
  • 50:25 - 50:30
    recovery?
    Keegan: Yeah. So, I will flip through
  • 50:30 - 50:34
    these pages which include a lot of the
    references for the attacks that I've
  • 50:34 - 50:38
    mentioned, so if you're watching the
    video, you can see these right away or
  • 50:38 - 50:43
    just access the slides. And a lot of these
    contain good starting points. So, I didn't
  • 50:43 - 50:46
    go into a lot of the details on how, for
    example, the true-spy attack recovered
  • 50:46 - 50:53
    that AES key., but that paper does have a
    lot of good links, how those areas can
  • 50:53 - 50:56
    lead to key recovery. Same thing with the
    CLKSCREW attack, how the different fault
  • 50:56 - 51:03
    injection can lead to key recovery.
    Herald: Microphone number 6, please.
  • 51:03 - 51:08
    Mic 6: I think my question might have been
    very, almost the same thing: How hard is
  • 51:08 - 51:12
    it actually to recover the keys? Is this
    like a massive machine learning problem or
  • 51:12 - 51:18
    is this something that you can do
    practically on a single machine?
  • 51:18 - 51:22
    Keegan: It varies entirely by the end
    implementation. So, for all these attacks
  • 51:22 - 51:26
    work, you need to have some sort of
    vulnerable implementation and some
  • 51:26 - 51:29
    implementations leak more data than
    others. In the case of a lot of the AES
  • 51:29 - 51:34
    attacks, where you're doing the passive
    attacks, those are very easy to do on just
  • 51:34 - 51:38
    your own computer. For the AES fault
    injection attack, I think that one
  • 51:38 - 51:42
    required more brute force, in the CLKSCREW
    paper, so that one required more computing
  • 51:42 - 51:50
    resources, but still, it was entirely
    practical to do in a realistic setting.
  • 51:50 - 51:54
    Herald: Cool, thank you. So, we have one
    more: Microphone number 1, please.
  • 51:54 - 51:59
    Mic 1: So, I hope it's not a too naive
    question, but I was wondering, since all
  • 51:59 - 52:05
    these attacks are based on cache hit and
    misses, isn't it possible to forcibly
  • 52:05 - 52:11
    flush or invalidate or insert noise in
    cache after each operation in this trust
  • 52:11 - 52:24
    environment, in order to mess up the
    guesswork of the attacker? So, discarding
  • 52:24 - 52:29
    optimization and performance for
    additional security benefits.
  • 52:29 - 52:32
    Keegan: Yeah, and that is absolutely
    possible and you are absolutely right: It
  • 52:32 - 52:36
    does lead to a performance degradation,
    because if you always flush the entire
  • 52:36 - 52:41
    cache every time you do a context switch,
    that will be a huge performance hit. So
  • 52:41 - 52:45
    again, that comes down to the question of
    the performance and security trade-off:
  • 52:45 - 52:50
    Which one do you end up going with? And it
    seems historically the choice has been
  • 52:50 - 52:54
    more in the direction of performance.
    Mic 1: Thank you.
  • 52:54 - 52:57
    Herald: But we have one more: Microphone
    number 1, please.
  • 52:57 - 53:02
    Mic 1: So, I have more of a moral
    question: So, how well should we really
  • 53:02 - 53:08
    protect from attacks which need some
    ring-0 cooperation? Because, basically,
  • 53:08 - 53:14
    when we use TrustZone for purpose... we
    would see clear, like protecting the
  • 53:14 - 53:20
    browser from interacting from outside
    world, then we are basically using the
  • 53:20 - 53:27
    safe execution environment for sandboxing
    the process. But once we need some
  • 53:27 - 53:32
    cooperation from the kernel, some of that
    attacks, is in fact, empower the user
  • 53:32 - 53:36
    instead of the hardware producer.
    Keegan: Yeah, and you're right. It
  • 53:36 - 53:39
    depends entirely on what your application
    is and what your threat model is that
  • 53:39 - 53:43
    you're looking at. So, if you're using
    these trusted execution environments to do
  • 53:43 - 53:48
    DRM, for example, then maybe you wouldn't
    be worried about that ring-0 attack or
  • 53:48 - 53:52
    that privileged attacker who has their
    phone rooted and is trying to recover
  • 53:52 - 53:57
    these media encryption keys from this
    execution environment. But maybe there are
  • 53:57 - 54:01
    other scenarios where you're not as
    worried about having an attack with a
  • 54:01 - 54:06
    compromised ring 0. So, it entirely
    depends on context.
  • 54:06 - 54:09
    Herald: Alright, thank you. So, we have
    one more: Microphone number 1, again.
  • 54:09 - 54:11
    Mic 1: Hey there. Great talk, thank you
    very much.
  • 54:11 - 54:13
    Keegan: Thank you.
    Mic 1: Just a short question: Do you have
  • 54:13 - 54:17
    any success stories about attacking the
    TrustZone and the different
  • 54:17 - 54:24
    implementations of TE with some vendors
    like some OEMs creating phones and stuff?
  • 54:24 - 54:30
    Keegan: Not that I'm announcing
    at this time.
  • 54:30 - 54:36
    Herald: So, thank you very much. Please,
    again a warm round of applause for Keegan!
  • 54:36 - 54:40
    Applause
  • 54:40 - 54:45
    34c3 postroll music
  • 54:45 - 55:02
    subtitles created by c3subtitles.de
    in the year 2018. Join, and help us!
Title:
34C3 - Microarchitectural Attacks on Trusted Execution Environments
Description:

more » « less
Video Language:
English
Duration:
55:02

English subtitles

Revisions