WEBVTT 00:00:00.000 --> 00:00:15.030 34C3 preroll music 00:00:15.030 --> 00:00:22.570 Herald: Hello fellow creatures. Welcome and 00:00:22.570 --> 00:00:30.140 I wanna start with a question. Another one: Who do we trust? 00:00:30.140 --> 00:00:36.500 Do we trust the TrustZones on our smartphones? 00:00:36.500 --> 00:00:41.710 Well Keegan Ryan, he's really fortunate to be here and 00:00:41.710 --> 00:00:51.710 he was inspired by another talk from the CCC before - I think it was 29C3 and his 00:00:51.710 --> 00:00:57.550 research on smartphones and systems on a chip used in smart phones will answer 00:00:57.550 --> 00:01:02.520 these questions if you can trust those trusted execution environments. Please 00:01:02.520 --> 00:01:06.330 give a warm round of applause to Keegan and enjoy! 00:01:06.330 --> 00:01:10.630 Applause 00:01:10.630 --> 00:01:14.220 Kegan Ryan: All right, thank you! So I'm Keegan Ryan, I'm a consultant with NCC 00:01:14.220 --> 00:01:19.740 group and this is micro architectural attacks on Trusted Execution Environments. 00:01:19.740 --> 00:01:23.250 So, in order to understand what a Trusted Execution Environment is we need to go 00:01:23.250 --> 00:01:29.729 back into processor security, specifically on x86. So as many of you are probably 00:01:29.729 --> 00:01:33.729 aware there are a couple different modes which we can execute code under in x86 00:01:33.729 --> 00:01:39.290 processors and that includes ring 3, which is the user code and the applications, and 00:01:39.290 --> 00:01:45.570 also ring 0 which is the kernel code. Now there's also a ring 1 and ring 2 that are 00:01:45.570 --> 00:01:50.229 supposedly used for drivers or guest operating systems but really it just boils 00:01:50.229 --> 00:01:56.159 down to ring 0 and ring 3. And in this diagram we have here we see that privilege 00:01:56.159 --> 00:02:02.149 increases as we go up the diagram, so ring 0 is the most privileged ring and ring 3 00:02:02.149 --> 00:02:05.470 is the least privileged ring. So all of our secrets, all of our sensitive 00:02:05.470 --> 00:02:10.030 information, all of the attackers goals are in ring 0 and the attacker is trying 00:02:10.030 --> 00:02:15.890 to access those from the unprivileged world of ring 3. Now you may have a 00:02:15.890 --> 00:02:20.150 question what if I want to add a processor feature that I don't want ring 0 to be 00:02:20.150 --> 00:02:26.240 able to access? Well then you add ring -1 which is often used for a hypervisor. Now 00:02:26.240 --> 00:02:30.610 the hypervisor has all the secrets and the hypervisor can manage different guest 00:02:30.610 --> 00:02:35.680 operating systems and each of these guest operating systems can execute in ring 0 00:02:35.680 --> 00:02:41.300 without having any idea of the other operating systems. So this way now the 00:02:41.300 --> 00:02:45.230 secrets are all in ring -1 so now the attackers goals have shifted from ring 0 00:02:45.230 --> 00:02:50.760 to ring -1. The attacker has to attack ring -1 from a less privileged ring and 00:02:50.760 --> 00:02:55.430 tries to access those secrets. But what if you want to add a processor feature that 00:02:55.430 --> 00:03:00.560 you don't want ring -1 to be able to access? So you add ring -2 which is System 00:03:00.560 --> 00:03:05.230 Management Mode and that's capable of monitoring power, directly interfacing 00:03:05.230 --> 00:03:10.350 with firmware and other chips on a motherboard and it's able to access and do 00:03:10.350 --> 00:03:13.820 a lot of things that the hypervisor is not able to and now all of your secrets and 00:03:13.820 --> 00:03:17.880 all of your attacker goals are in ring -2 and the attacker has to attack those from 00:03:17.880 --> 00:03:22.400 a less privileged ring. Now maybe you want to add something to your processor that 00:03:22.400 --> 00:03:26.900 you don't want ring -2 to be able access, so you add ring -3 and I think you get the 00:03:26.900 --> 00:03:31.450 picture now. And we just keep on adding more and more privilege rings and keep 00:03:31.450 --> 00:03:35.260 putting our secrets and our attackers goals in these higher and higher 00:03:35.260 --> 00:03:41.260 privileged rings but what if we're thinking about it wrong? What if instead 00:03:41.260 --> 00:03:46.710 we want to put all the secrets in the least privileged ring? So this is sort of 00:03:46.710 --> 00:03:51.490 the idea behind SGX and it's useful for things like DRM where you want that to run 00:03:51.490 --> 00:03:56.980 ring 3 code but have sensitive secrets or other assigning capabilities running in 00:03:56.980 --> 00:04:02.050 ring 3. But this picture is getting a little bit complicated, this diagram is a 00:04:02.050 --> 00:04:06.250 little bit complex so let's simplify it a little bit. We'll only be looking at ring 00:04:06.250 --> 00:04:12.100 0 through ring 3 which is the kernel, the userland and the SGX enclave which also 00:04:12.100 --> 00:04:16.910 executes in ring 3. Now when you're executing code in the SGX enclave you 00:04:16.910 --> 00:04:22.170 first load the code into the enclave and then from that point on you trust the 00:04:22.170 --> 00:04:26.980 execution of whatever's going on in that enclave. You trust that the other elements 00:04:26.980 --> 00:04:31.640 the kernel, the userland, the other rings are not going to be able to access what's 00:04:31.640 --> 00:04:38.020 in that enclave so you've made your Trusted Execution Environment. This is a 00:04:38.020 --> 00:04:44.750 bit of a weird model because now your attacker is in the ring 0 kernel and your 00:04:44.750 --> 00:04:48.840 target victim here is in ring 3. So instead of the attacker trying to move up 00:04:48.840 --> 00:04:54.070 the privilege chain, the attacker is trying to move down. Which is pretty 00:04:54.070 --> 00:04:57.820 strange and you might have some questions like "under this model who handles memory 00:04:57.820 --> 00:05:01.470 management?" because traditionally that's something that ring 0 would manage and 00:05:01.470 --> 00:05:05.290 ring 0 would be responsible for paging memory in and out for different processes 00:05:05.290 --> 00:05:10.460 in different code that's executing it in ring 3. But on the other hand you don't 00:05:10.460 --> 00:05:16.030 want that to happen with the SGX enclave because what if the malicious ring 0 adds 00:05:16.030 --> 00:05:22.410 a page to the enclave that the enclave doesn't expect? So in order to solve this 00:05:22.410 --> 00:05:28.950 problem, SGX does allow ring 0 to handle page faults. But simultaneously and in 00:05:28.950 --> 00:05:35.380 parallel it verifies every memory load to make sure that no access violations are 00:05:35.380 --> 00:05:40.139 made so that all the SGX memory is safe. So it allows ring 0 to do its job but it 00:05:40.139 --> 00:05:45.010 sort of watches over at the same time to make sure that nothing is messed up. So 00:05:45.010 --> 00:05:51.120 it's a bit of a weird convoluted solution to a strange inverted problem but it works 00:05:51.120 --> 00:05:57.580 and that's essentially how SGX works and the idea behind SGX. Now we can look at 00:05:57.580 --> 00:06:02.530 x86 and we can see that ARMv8 is constructed in a similar way but it 00:06:02.530 --> 00:06:08.450 improves on x86 in a couple key ways. So first of all ARMv8 gets rid of ring 1 and 00:06:08.450 --> 00:06:12.170 ring 2 so you don't have to worry about those and it just has different privilege 00:06:12.170 --> 00:06:17.370 levels for userland and the kernel. And these different privilege levels are 00:06:17.370 --> 00:06:21.520 called exception levels in the ARM terminology. And the second thing that ARM 00:06:21.520 --> 00:06:25.930 gets right compared to x86 is that instead of starting at 3 and counting down as 00:06:25.930 --> 00:06:30.730 privilege goes up, ARM starts at 0 and counts up so we don't have to worry about 00:06:30.730 --> 00:06:35.940 negative numbers anymore. Now when we add the next privilege level the hypervisor we 00:06:35.940 --> 00:06:40.860 call it exception level 2 and the next one after that is the monitor in exception 00:06:40.860 --> 00:06:47.210 level 3. So at this point we still want to have the ability to run trusted code in 00:06:47.210 --> 00:06:52.650 exception level 0 the least privileged level of the ARMv8 processor. So in order 00:06:52.650 --> 00:06:59.060 to support this we need to separate this diagram into two different sections. In 00:06:59.060 --> 00:07:03.510 ARMv8 these are called the secure world and the non-secure world. So we have the 00:07:03.510 --> 00:07:07.740 non-secure world on the left in blue that consists of the userland, the kernel and 00:07:07.740 --> 00:07:11.900 the hypervisor and we have the secure world on the right which consists of the 00:07:11.900 --> 00:07:17.360 monitor in exception level 3, a trusted operating system in exception level 1 and 00:07:17.360 --> 00:07:23.030 trusted applications in exception level 0. So the idea is that if you run anything in 00:07:23.030 --> 00:07:27.360 the secure world, it should not be accessible or modifiable by anything in 00:07:27.360 --> 00:07:32.320 the non secure world. So that's how our attacker is trying to access it. The 00:07:32.320 --> 00:07:36.371 attacker has access to the non secure kernel, which is often Linux, and they're 00:07:36.371 --> 00:07:40.120 trying to go after the trusted apps. So once again we have this weird inversion 00:07:40.120 --> 00:07:43.330 where we're trying to go from a more privileged level to a less privileged 00:07:43.330 --> 00:07:48.260 level and trying to extract secrets in that way. So the question that arises when 00:07:48.260 --> 00:07:53.070 using these Trusted Execution Environments that are implemented in SGX and TrustZone 00:07:53.070 --> 00:07:58.330 in ARM is "can we use these privilege modes in our privilege access in order to 00:07:58.330 --> 00:08:03.330 attack these Trusted Execution Environments?". Now transfer that question 00:08:03.330 --> 00:08:06.260 and we can start looking at a few different research papers. The first one 00:08:06.260 --> 00:08:11.360 that I want to go into is one called CLKSCREW and it's an attack on TrustZone. 00:08:11.360 --> 00:08:14.360 So throughout this presentation I'm going to go through a few different papers and 00:08:14.360 --> 00:08:18.050 just to make it clear which papers have already been published and which ones are 00:08:18.050 --> 00:08:21.400 old I'll include the citations in the upper right hand corner so that way you 00:08:21.400 --> 00:08:26.580 can tell what's old and what's new. And as far as papers go this CLKSCREW paper is 00:08:26.580 --> 00:08:31.430 relatively new. It was released in 2017. And the way CLKSCREW works is it takes 00:08:31.430 --> 00:08:38.009 advantage of the energy management features of a processor. So a non-secure 00:08:38.009 --> 00:08:41.679 operating system has the ability to manage the energy consumption of the different 00:08:41.679 --> 00:08:47.970 cores. So if a certain target core doesn't have much scheduled to do then the 00:08:47.970 --> 00:08:52.350 operating system is able to scale back that voltage or dial down the frequency on 00:08:52.350 --> 00:08:56.449 that core so that core uses less energy which is a great thing for performance: it 00:08:56.449 --> 00:09:00.971 really extends battery life, it makes the the cores last longer and it gives better 00:09:00.971 --> 00:09:07.009 performance overall. But the problem here is what if you have two separate cores and 00:09:07.009 --> 00:09:11.740 one of your cores is running this non- trusted operating system and the other 00:09:11.740 --> 00:09:15.579 core is running code in the secure world? It's running that trusted code those 00:09:15.579 --> 00:09:21.240 trusted applications so that non secure operating system can still dial down that 00:09:21.240 --> 00:09:25.629 voltage and it can still change that frequency and those changes will affect 00:09:25.629 --> 00:09:30.740 the secure world code. So what the CLKSCREW attack does is the non secure 00:09:30.740 --> 00:09:36.470 operating system core will dial down the voltage, it will overclock the frequency 00:09:36.470 --> 00:09:40.749 on the target secure world core in order to induce faults to make sure to make the 00:09:40.749 --> 00:09:45.909 computation on that core fail in some way and when that computation fails you get 00:09:45.909 --> 00:09:50.439 certain cryptographic errors that the attack can use to infer things like secret 00:09:50.439 --> 00:09:56.040 keys, secret AES keys and to bypass code signing implemented in the secure world. 00:09:56.040 --> 00:09:59.680 So it's a very powerful attack that's made possible because the non-secure operating 00:09:59.680 --> 00:10:06.099 system is privileged enough in order to use these energy management features. Now 00:10:06.099 --> 00:10:10.189 CLKSCREW is an example of an active attack where the attacker is actively changing 00:10:10.189 --> 00:10:15.470 the outcome of the victim code of that code in the secure world. But what about 00:10:15.470 --> 00:10:20.540 passive attacks? So in a passive attack, the attacker does not modify the actual 00:10:20.540 --> 00:10:25.220 outcome of the process. The attacker just tries to monitor that process infer what's 00:10:25.220 --> 00:10:29.200 going on and that is the sort of attack that we'll be considering for the rest of 00:10:29.200 --> 00:10:35.769 the presentation. So in a lot of SGX and TrustZone implementations, the trusted and 00:10:35.769 --> 00:10:39.759 the non-trusted code both share the same hardware and this shared hardware could be 00:10:39.759 --> 00:10:45.800 a shared cache, it could be a branch predictor, it could be a TLB. The point is 00:10:45.800 --> 00:10:53.230 that they share the same hardware so that the changes made by the secure code may be 00:10:53.230 --> 00:10:57.209 reflected in the behavior of the non- secure code. So the trusted code might 00:10:57.209 --> 00:11:02.259 execute, change the state of that shared cache for example and then the untrusted 00:11:02.259 --> 00:11:07.179 code may be able to go in, see the changes in that cache and infer information about 00:11:07.179 --> 00:11:11.720 the behavior of the secure code. So that's essentially how our side channel attacks 00:11:11.720 --> 00:11:16.160 are going to work. If the non-secure code is going to monitor these shared hardware 00:11:16.160 --> 00:11:23.050 resources for state changes that reflect the behavior of the secure code. Now we've 00:11:23.050 --> 00:11:27.899 all talked about how Intel and SGX address the problem of memory management and who's 00:11:27.899 --> 00:11:33.399 responsible for making sure that those attacks don't work on SGX. So what do they 00:11:33.399 --> 00:11:37.050 have to say on how they protect against these side channel attacks and attacks on 00:11:37.050 --> 00:11:45.490 this shared cache hardware? They don't.. at all. They essentially say "we do not 00:11:45.490 --> 00:11:48.931 consider this part of our threat model. It is up to the developer to implement the 00:11:48.931 --> 00:11:53.530 protections needed to protect against these side-channel attacks". Which is 00:11:53.530 --> 00:11:56.769 great news for us because these side channel attacks can be very powerful and 00:11:56.769 --> 00:12:00.350 if there aren't any hardware features that are necessarily stopping us from being 00:12:00.350 --> 00:12:06.910 able to accomplish our goal it makes us that more likely to succeed. So with that 00:12:06.910 --> 00:12:11.430 we can sort of take a step back from trust zone industry acts and just take a look at 00:12:11.430 --> 00:12:14.959 cache attacks to make sure that we all have the same understanding of how the 00:12:14.959 --> 00:12:19.549 cache attacks will be applied to these Trusted Execution Environments. To start 00:12:19.549 --> 00:12:25.619 that let's go over a brief recap of how a cache works. So caches are necessary in 00:12:25.619 --> 00:12:29.949 processors because accessing the main memory is slow. When you try to access 00:12:29.949 --> 00:12:34.079 something from the main memory it takes a while to be read into the process. So the 00:12:34.079 --> 00:12:40.389 cache exists as sort of a layer to remember what that information is so if 00:12:40.389 --> 00:12:45.040 the process ever needs information from that same address it just reloads it from 00:12:45.040 --> 00:12:49.699 the cache and that access is going to be fast. So it really speeds up the memory 00:12:49.699 --> 00:12:55.810 access for repeated accesses to the same address. And then if we try to access a 00:12:55.810 --> 00:13:00.069 different address then that will also be read into the cache, slowly at first but 00:13:00.069 --> 00:13:06.720 then quickly for repeated accesses and so on and so forth. Now as you can probably 00:13:06.720 --> 00:13:10.970 tell from all of these examples the memory blocks have been moving horizontally 00:13:10.970 --> 00:13:15.649 they've always been staying in the same row. And that is reflective of the idea of 00:13:15.649 --> 00:13:20.360 sets in a cache. So there are a number of different set IDs and that corresponds to 00:13:20.360 --> 00:13:24.189 the different rows in this diagram. So for our example there are four different set 00:13:24.189 --> 00:13:30.889 IDs and each address in the main memory maps to a different set ID. So that 00:13:30.889 --> 00:13:35.100 address in main memory will only go into that location in the cache with the same 00:13:35.100 --> 00:13:39.730 set ID so it will only travel along those rows. So that means if you have two 00:13:39.730 --> 00:13:43.410 different blocks of memory that mapped to different set IDs they're not going to 00:13:43.410 --> 00:13:48.899 interfere with each other in the cache. But that raises the question "what about 00:13:48.899 --> 00:13:53.310 two memory blocks that do map to the same set ID?". Well if there's room in the 00:13:53.310 --> 00:13:58.759 cache then the same thing will happen as before: those memory contents will be 00:13:58.759 --> 00:14:03.769 loaded into the cache and then retrieved from the cache for future accesses. And 00:14:03.769 --> 00:14:08.110 the number of possible entries for a particular set ID within a cache is called 00:14:08.110 --> 00:14:11.800 the associativity. And on this diagram that's represented by the number of 00:14:11.800 --> 00:14:16.819 columns in the cache. So we will call our cache in this example a 2-way set- 00:14:16.819 --> 00:14:22.350 associative cache. Now the next question is "what happens if you try to read a 00:14:22.350 --> 00:14:27.049 memory address that maps the same set ID but all of those entries within that said ID 00:14:27.049 --> 00:14:32.529 within the cache are full?". Well one of those entries is chosen, it's evicted from 00:14:32.529 --> 00:14:38.729 the cache, the new memory is read in and then that's fed to the process. So it 00:14:38.729 --> 00:14:43.779 doesn't really matter how the cache entry is chosen that you're evicting for the 00:14:43.779 --> 00:14:47.960 purpose of the presentation you can just assume that it's random. But the important 00:14:47.960 --> 00:14:51.899 thing is that if you try to access that same memory that was evicted before you're 00:14:51.899 --> 00:14:55.689 not going to have to wait for that time penalty for that to be reloaded into the 00:14:55.689 --> 00:15:01.329 cache and read into the process. So those are caches in a nutshell in particularly 00:15:01.329 --> 00:15:05.749 set associative caches, we can begin looking at the different types of cache 00:15:05.749 --> 00:15:09.319 attacks. So for a cache attack we have two different processes we have an attacker 00:15:09.319 --> 00:15:13.779 process and a victim process. For this type of attack that we're considering both 00:15:13.779 --> 00:15:17.290 of them share the same underlying code so they're trying to access the same 00:15:17.290 --> 00:15:21.829 resources which could be the case if you have page deduplication in virtual 00:15:21.829 --> 00:15:26.009 machines or if you have copy-on-write mechanisms for shared code and shared 00:15:26.009 --> 00:15:31.649 libraries. But the point is that they share the same underlying memory. Now the 00:15:31.649 --> 00:15:35.659 Flush and Reload Attack works in two stages for the attacker. The attacker 00:15:35.659 --> 00:15:39.420 first starts by flushing out the cache. They flush each and every addresses in the 00:15:39.420 --> 00:15:44.309 cache so the cache is just empty. Then the attacker let's the victim executes for a 00:15:44.309 --> 00:15:48.769 small amount of time so the victim might read on an address from main memory 00:15:48.769 --> 00:15:53.489 loading that into the cache and then the second stage of the attack is the reload 00:15:53.489 --> 00:15:58.099 phase. In the reload phase the attacker tries to load different memory addresses 00:15:58.099 --> 00:16:04.171 from main memory and see if those entries are in the cache or not. Here the attacker 00:16:04.171 --> 00:16:09.380 will first try to load address 0 and see that because it takes a long time to read 00:16:09.380 --> 00:16:14.429 the contents of address 0 the attacker can infer that address 0 was not part of the 00:16:14.429 --> 00:16:17.499 cache which makes sense because the attacker flushed it from the cache in the 00:16:17.499 --> 00:16:23.330 first stage. The attacker then tries to read the memory at address 1 and sees that 00:16:23.330 --> 00:16:29.089 this operation is fast so the attacker infers that the contents of address 1 are 00:16:29.089 --> 00:16:32.859 in the cache and because the attacker flushed everything from the cache before 00:16:32.859 --> 00:16:37.119 the victim executed, the attacker then concludes that the victim is responsible 00:16:37.119 --> 00:16:42.540 for bringing address 1 into the cache. This Flush+Reload attack reveals which 00:16:42.540 --> 00:16:47.370 memory addresses the victim accesses during that small slice of time. Then 00:16:47.370 --> 00:16:50.970 after that reload phase, the attack repeats so the attacker flushes again 00:16:50.970 --> 00:16:57.739 let's the victim execute, reloads again and so on. There's also a variant on the 00:16:57.739 --> 00:17:01.050 Flush+Reload attack that's called the Flush+Flush attack which I'm not going to 00:17:01.050 --> 00:17:05.569 go into the details of, but essentially it's the same idea. But instead of using 00:17:05.569 --> 00:17:08.980 load instructions to determine whether or not a piece of memory is in the cache or 00:17:08.980 --> 00:17:13.720 not, it uses flush instructions because flush instructions will take longer if 00:17:13.720 --> 00:17:19.138 something is in the cache already. The important thing is that both the 00:17:19.138 --> 00:17:22.819 Flush+Reload attack and the Flush+Flush attack rely on the attacker and the victim 00:17:22.819 --> 00:17:27.029 sharing the same memory. But this isn't always the case so we need to consider 00:17:27.029 --> 00:17:30.810 what happens when the attacker and the victim do not share memory. For this we 00:17:30.810 --> 00:17:35.670 have the Prime+Probe attack. The Prime+Probe attack once again works in two 00:17:35.670 --> 00:17:40.380 separate stages. In the first stage the attacker prime's the cache by reading all 00:17:40.380 --> 00:17:44.401 the attacker memory into the cache and then the attacker lets the victim execute 00:17:44.401 --> 00:17:49.750 for a small amount of time. So no matter what the victim accesses from main memory 00:17:49.750 --> 00:17:54.460 since the cache is full of the attacker data, one of those attacker entries will 00:17:54.460 --> 00:17:59.190 be replaced by a victim entry. Then in the second phase of the attack, during the 00:17:59.190 --> 00:18:03.529 probe phase, the attacker checks the different cache entries for particular set 00:18:03.529 --> 00:18:08.959 IDs and sees if all of the attacker entries are still in the cache. So maybe 00:18:08.959 --> 00:18:13.440 our attacker is curious about the last set ID, the bottom row, so the attacker first 00:18:13.440 --> 00:18:18.090 tries to load the memory at address 3 and because this operation is fast the 00:18:18.090 --> 00:18:23.000 attacker knows that address 3 is in the cache. The attacker tries the same thing 00:18:23.000 --> 00:18:28.159 with address 7, sees that this operation is slow and infers that at some point 00:18:28.159 --> 00:18:33.279 address 7 was evicted from the cache so the attacker knows that something had to 00:18:33.279 --> 00:18:37.490 evicted from the cache and it had to be from the victim so the attacker concludes 00:18:37.490 --> 00:18:42.840 that the victim accessed something in that last set ID and that bottom row. The 00:18:42.840 --> 00:18:47.230 attacker doesn't know if it was the contents of address 11 or the contents of 00:18:47.230 --> 00:18:51.260 address 15 or even what those contents are, but the attacker has a good idea of 00:18:51.260 --> 00:18:57.090 which set ID it was. So, the good things, the important things to remember about 00:18:57.090 --> 00:19:01.179 cache attacks is that caches are very important, they're crucial for performance 00:19:01.179 --> 00:19:06.059 on processors, they give a huge speed boost and there's a huge time difference 00:19:06.059 --> 00:19:11.569 between having a cache and not having a cache for your executables. But the 00:19:11.569 --> 00:19:16.080 downside to this is that big time difference also allows the attacker to 00:19:16.080 --> 00:19:21.620 infer information about how the victim is using the cache. We're able to use these 00:19:21.620 --> 00:19:24.429 cache attacks in the two different scenarios of, where memory is shared, in 00:19:24.429 --> 00:19:28.230 the case of the Flush+Reload and Flush+Flush attacks and in the case where 00:19:28.230 --> 00:19:31.739 memory is not shared, in the case of the Prime+Probe attack. And finally the 00:19:31.739 --> 00:19:36.659 important thing to keep in mind is that, for these cache attacks, we know where the 00:19:36.659 --> 00:19:40.480 victim is looking, but we don't know what they see. So we don't know the contents of 00:19:40.480 --> 00:19:44.360 the memory that the victim is actually seeing, we just know the location and the 00:19:44.360 --> 00:19:51.549 addresses. So, what does an example trace of these attacks look like? Well, there's 00:19:51.549 --> 00:19:56.451 an easy way to represent these as two- dimensional images. So in this image, we 00:19:56.451 --> 00:20:01.760 have our horizontal axis as time, so each column in this image represents a 00:20:01.760 --> 00:20:07.159 different time slice, a different iteration of the Prime measure and Probe. 00:20:07.159 --> 00:20:11.440 So, then we also have the vertical access which is the different set IDs, which is 00:20:11.440 --> 00:20:18.360 the location that's accessed by the victim process, and then here a pixel is white if 00:20:18.360 --> 00:20:24.159 the victim accessed that set ID during that time slice. So, as you look from left 00:20:24.159 --> 00:20:28.139 to right as time moves forward, you can sort of see the changes in the patterns of 00:20:28.139 --> 00:20:34.070 the memory accesses made by the victim process. Now, for this particular example 00:20:34.070 --> 00:20:39.860 the trace is captured on an execution of AES repeated several times, an AES 00:20:39.860 --> 00:20:44.519 encryption repeated about 20 times. And you can tell that this is a repeated 00:20:44.519 --> 00:20:49.070 action because you see the same repeated memory access patterns in the data, you 00:20:49.070 --> 00:20:55.320 see the same structures repeated over and over. So, you know that this is reflecting 00:20:55.320 --> 00:21:00.749 at what's going on throughout time, but what does it have to do with AES itself? 00:21:00.749 --> 00:21:05.950 Well, if we take the same trace with the same settings, but a different key, we see 00:21:05.950 --> 00:21:11.590 that there is a different memory access pattern with different repetition within 00:21:11.590 --> 00:21:18.200 the trace. So, only the key changed, the code didn't change. So, even though we're 00:21:18.200 --> 00:21:22.130 not able to read the contents of the key directly using this cache attack, we know 00:21:22.130 --> 00:21:25.610 that the key is changing these memory access patterns, and if we can see these 00:21:25.610 --> 00:21:30.850 memory access patterns, then we can infer the key. So, that's the essential idea: we 00:21:30.850 --> 00:21:35.380 want to make these images as clear as possible and as descriptive as possible so 00:21:35.380 --> 00:21:42.279 we have the best chance of learning what those secrets are. And we can define the 00:21:42.279 --> 00:21:47.389 metrics for what makes these cache attacks powerful in a few different ways. So, the 00:21:47.389 --> 00:21:51.759 three ways we'll be looking at are spatial resolution, temporal resolution and noise. 00:21:51.759 --> 00:21:56.300 So, spatial resolution refers to how accurately we can determine the where. If 00:21:56.300 --> 00:22:00.510 we know that the victim access to memory address within 1,000 bytes, that's 00:22:00.510 --> 00:22:06.820 obviously not as powerful as knowing where they accessed within 512 bytes. Temporal 00:22:06.820 --> 00:22:12.049 resolution is similar, where we want to know the order of what accesses the victim 00:22:12.049 --> 00:22:17.769 made. So if that time slice during our attack is 1 millisecond, we're going to 00:22:17.769 --> 00:22:22.139 get much better ordering information on those memory access than we would get if 00:22:22.139 --> 00:22:27.350 we only saw all the memory accesses over the course of one second. So the shorter 00:22:27.350 --> 00:22:32.159 that time slice, the better the temporal resolution, the longer our picture will be 00:22:32.159 --> 00:22:37.790 on the horizontal access, and the clearer of an image of the cache that we'll see. 00:22:37.790 --> 00:22:41.419 And the last metric to evaluate our attacks on is noise and that reflects how 00:22:41.419 --> 00:22:46.070 accurately our measurements reflect the true state of the cache. So, right now 00:22:46.070 --> 00:22:49.950 we've been using time and data to infer whether or not an item was in the cache or 00:22:49.950 --> 00:22:54.340 not, but this is a little bit noisy. It's possible that we'll have false positives 00:22:54.340 --> 00:22:57.370 or false negatives, so we want to keep that in mind as we look at the different 00:22:57.370 --> 00:23:03.081 attacks. So, that's essentially cache attacks, and then, in a nutshell and 00:23:03.081 --> 00:23:06.519 that's all you really need to understand in order to understand these attacks as 00:23:06.519 --> 00:23:11.389 they've been implemented on Trusted Execution Environments. And the first 00:23:11.389 --> 00:23:14.510 particular attack that we're going to be looking at is called a Controlled-Channel 00:23:14.510 --> 00:23:19.890 Attack on SGX, and this attack isn't necessarily a cache attack, but we can 00:23:19.890 --> 00:23:23.770 analyze it in the same way that we analyze the cache attacks. So, it's still useful 00:23:23.770 --> 00:23:30.940 to look at. Now, if you remember how memory management occurs with SGX, we know 00:23:30.940 --> 00:23:36.210 that if a page fault occurs during SGX Enclave code execution, that page fault is 00:23:36.210 --> 00:23:43.019 handled by the kernel. So, the kernel has to know which page the Enclave needs to be 00:23:43.019 --> 00:23:48.050 paged in. The kernel already gets some information about what the Enclave is 00:23:48.050 --> 00:23:54.789 looking at. Now, in the Controlled-Channel attack, there's a, what the attacker does 00:23:54.789 --> 00:23:59.839 from the non-trusted OS is the attacker pages almost every other page from the 00:23:59.839 --> 00:24:05.260 Enclave out of memory. So no matter whatever page that Enclave tries to 00:24:05.260 --> 00:24:09.770 access, it's very likely to cause a page fault, which will be redirected to the 00:24:09.770 --> 00:24:14.150 non-trusted OS, where the non-trusted OS can record it, page out any other pages 00:24:14.150 --> 00:24:20.429 and continue execution. So, the OS essentially gets a list of sequential page 00:24:20.429 --> 00:24:26.259 accesses made by the SGX Enclaves, all by capturing the page fault handler. This is 00:24:26.259 --> 00:24:29.669 a very general attack, you don't need to know what's going on in the Enclave in 00:24:29.669 --> 00:24:33.460 order to pull this off. You just load up an arbitrary Enclave and you're able to 00:24:33.460 --> 00:24:40.720 see which pages that Enclave is trying to access. So, how does it do on our metrics? 00:24:40.720 --> 00:24:44.270 First of all, this spatial resolution is not great. We can only see where the 00:24:44.270 --> 00:24:50.470 victim is accessing within 4096 bytes or the size of a full page because SGX 00:24:50.470 --> 00:24:55.519 obscures the offset into the page where the page fault occurs. The temporal 00:24:55.519 --> 00:24:58.760 resolution is good but not great, because even though we're able to see any 00:24:58.760 --> 00:25:04.450 sequential accesses to different pages we're not able to see sequential accesses 00:25:04.450 --> 00:25:09.970 to the same page because we need to keep that same page paged-in while we let our 00:25:09.970 --> 00:25:15.490 SGX Enclave run for that small time slice. So temporal resolution is good but not 00:25:15.490 --> 00:25:22.440 perfect. But the noise is, there is no noise in this attack because no matter 00:25:22.440 --> 00:25:26.149 where the page fault occurs, the untrusted operating system is going to capture that 00:25:26.149 --> 00:25:30.180 page fault and is going to handle it. So, it's very low noise, not great spatial 00:25:30.180 --> 00:25:37.490 resolution but overall still a powerful attack. But we still want to improve on 00:25:37.490 --> 00:25:40.700 that spatial resolution, we want to be able to see what the Enclave is doing that 00:25:40.700 --> 00:25:45.970 greater than a resolution of a one page of four kilobytes. So that's exactly what the 00:25:45.970 --> 00:25:50.179 CacheZoom paper does, and instead of interrupting the SGX Enclave execution 00:25:50.179 --> 00:25:55.370 with page faults, it uses timer interrupts. Because the untrusted 00:25:55.370 --> 00:25:59.280 operating system is able to schedule when timer interrupts occur, so it's able to 00:25:59.280 --> 00:26:03.320 schedule them at very tight intervals, so it's able to get that small and tight 00:26:03.320 --> 00:26:08.549 temporal resolution. And essentially what happens in between is this timer 00:26:08.549 --> 00:26:13.410 interrupts fires, the untrusted operating system runs the Prime+Probe attack code in 00:26:13.410 --> 00:26:18.240 this case, and resumes execution of the onclick process, and this repeats. So this 00:26:18.240 --> 00:26:24.549 is a Prime+Probe attack on the L1 data cache. So, this attack let's you see what 00:26:24.549 --> 00:26:30.529 data The Enclave is looking at. Now, this attack could be easily modified to use the 00:26:30.529 --> 00:26:36.000 L1 instruction cache, so in that case you learn which instructions The Enclave is 00:26:36.000 --> 00:26:41.419 executing. And overall this is an even more powerful attack than the Control- 00:26:41.419 --> 00:26:46.429 Channel attack. If we look at the metrics, we can see that the spatial resolution is 00:26:46.429 --> 00:26:50.360 a lot better, now we're looking at spatial resolution of 64 bytes or the size of an 00:26:50.360 --> 00:26:55.370 individual line. The temporal resolution is very good, it's "almost unlimited", to 00:26:55.370 --> 00:27:00.250 quote the paper, because the untrusted operating system has the privilege to keep 00:27:00.250 --> 00:27:05.179 scheduling those time interrupts closer and closer together until it's able to 00:27:05.179 --> 00:27:10.260 capture very small time slices of the victim process .And the noise itself is 00:27:10.260 --> 00:27:14.559 low, we're still using a cycle counter to measure the time it takes to load memory 00:27:14.559 --> 00:27:20.629 in and out of the cache, but it's, it's useful, the chances of having a false 00:27:20.629 --> 00:27:26.809 positive or false negative are low, so the noise is low as well. Now, we can also 00:27:26.809 --> 00:27:31.129 look at Trust Zone attacks, because so far the attacks that we've looked at, the 00:27:31.129 --> 00:27:35.130 passive attacks, have been against SGX and those attacks on SGX have been pretty 00:27:35.130 --> 00:27:40.669 powerful. So, what are the published attacks on Trust Zone? Well, there's one 00:27:40.669 --> 00:27:44.990 called TruSpy, which is kind of similar in concept to the CacheZoom attack that we 00:27:44.990 --> 00:27:51.629 just looked at on SGX. It's once again a Prime+probe attack on the L1 data cache, 00:27:51.629 --> 00:27:57.129 and the difference here is that instead of interrupting the victim code execution 00:27:57.129 --> 00:28:04.460 multiple times, the TruSpy attack does the prime step, does the full AES encryption, 00:28:04.460 --> 00:28:08.539 and then does the probe step. And the reason they do this, is because as they 00:28:08.539 --> 00:28:13.330 say, the secure world is protected, and is not interruptible in the same way that SGX 00:28:13.330 --> 00:28:20.690 is interruptable. But even despite this, just having one measurement per execution, 00:28:20.690 --> 00:28:24.940 the TruSpy authors were able to use some statistics to still recover the AES key 00:28:24.940 --> 00:28:30.460 from that noise. And their methods were so powerful, they are able to do this from an 00:28:30.460 --> 00:28:34.539 unapproved application in user land, so they don't even need to be running within 00:28:34.539 --> 00:28:39.820 the kernel in order to be able to pull off this attack. So, how does this attack 00:28:39.820 --> 00:28:43.360 measure up? The spatial resolution is once again 64 bytes because that's the size of 00:28:43.360 --> 00:28:48.559 a cache line on this processor, and the temporal resolution is, is pretty poor 00:28:48.559 --> 00:28:54.190 here, because we only get one measurement per execution of the AES encryption. This 00:28:54.190 --> 00:28:58.700 is also a particularly noisy attack because we're making the measurements from 00:28:58.700 --> 00:29:02.659 the user land, but even if we make the measurements from the kernel, we're still 00:29:02.659 --> 00:29:05.789 going to have the same issues of false positives and false negatives associated 00:29:05.789 --> 00:29:12.470 with using a cycle counter to measure membership in a cache. So, we'd like to 00:29:12.470 --> 00:29:16.389 improve this a little bit. We'd like to improve the temporal resolution, so we 00:29:16.389 --> 00:29:20.749 have the power of the cache attack to be a little bit closer on TrustZone, as it is 00:29:20.749 --> 00:29:27.149 on SGX. So, we want to improve that temporal resolution. Let's dig into that 00:29:27.149 --> 00:29:30.549 statement a little bit, that the secure world is protected and not interruptable. 00:29:30.549 --> 00:29:36.499 And to do, this we go back to this diagram of ARMv8 and how that TrustZone is set up. 00:29:36.499 --> 00:29:41.490 So, it is true that when an interrupt occurs, it is directed to the monitor and, 00:29:41.490 --> 00:29:45.530 because the monitor operates in the secure world, if we interrupt secure code that's 00:29:45.530 --> 00:29:49.081 running an exception level 0, we're just going to end up running secure code an 00:29:49.081 --> 00:29:54.239 exception level 3. So, this doesn't necessarily get us anything. I think, 00:29:54.239 --> 00:29:57.880 that's what the author's mean by saying that it's protected against this. Just by 00:29:57.880 --> 00:30:02.780 setting an interrupt, we don't have a way to redirect our flow to the non- 00:30:02.780 --> 00:30:08.190 trusted code. At least that's how it works in theory. In practice, the Linux 00:30:08.190 --> 00:30:11.840 operating system, running in exception level 1 in the non-secure world, kind of 00:30:11.840 --> 00:30:15.299 needs interrupts in order to be able to work, so if an interrupt occurs and it's 00:30:15.299 --> 00:30:18.120 being sent to the monitor, the monitor will just forward it right to the non- 00:30:18.120 --> 00:30:22.500 secure operating system. So, we have interrupts just the same way as we did in 00:30:22.500 --> 00:30:28.930 CacheZoom. And we can improve the TrustZone attacks by using this idea: We 00:30:28.930 --> 00:30:33.549 have 2 cores, where one core is running the secure code, the other core is running 00:30:33.549 --> 00:30:38.101 the non-secure code, and the non-secure code is sending interrupts to the secure- 00:30:38.101 --> 00:30:42.809 world core and that will give us that interleaving of attacker process and 00:30:42.809 --> 00:30:47.409 victim process that allow us to have a powerful prime-and-probe attack. So, what 00:30:47.409 --> 00:30:51.139 does this look like? We have the attack core and the victim core. The attack core 00:30:51.139 --> 00:30:54.909 sends an interrupt to the victim core. This interrupt is captured by the monitor, 00:30:54.909 --> 00:30:58.769 which passes it to the non-secure operating system. The not-secure operating 00:30:58.769 --> 00:31:02.979 system transfers this to our attack code, which runs the prime-and-probe attack. 00:31:02.979 --> 00:31:06.529 Then, we leave the interrupt, the execution within the victim code in the 00:31:06.529 --> 00:31:10.910 secure world resumes and we just repeat this over and over. So, now we have that 00:31:10.910 --> 00:31:16.690 interleaving of data... of the processes of the attacker and the victim. So, now, 00:31:16.690 --> 00:31:22.690 instead of having a temporal resolution of one measurement per execution, we once 00:31:22.690 --> 00:31:26.320 again have almost unlimited temporal resolution, because we can just schedule 00:31:26.320 --> 00:31:32.229 when we send those interrupts from the attacker core. Now, we'd also like to 00:31:32.229 --> 00:31:37.590 improve the noise measurements. The... because if we can improve the noise, we'll 00:31:37.590 --> 00:31:42.159 get clearer pictures and we'll be able to infer those secrets more clearly. So, we 00:31:42.159 --> 00:31:45.720 can get some improvement by switching the measurements from userland and starting to 00:31:45.720 --> 00:31:50.830 do those in the kernel, but again we have the cycle counters. So, what if, instead 00:31:50.830 --> 00:31:54.330 of using the cycle counter to measure whether or not something is in the cache, 00:31:54.330 --> 00:32:00.070 we use the other performance counters? Because on ARMv8 platforms, there is a way 00:32:00.070 --> 00:32:03.769 to use performance counters to measure different events, such as cache hits and 00:32:03.769 --> 00:32:09.809 cache misses. So, these events and these performance monitors require privileged 00:32:09.809 --> 00:32:15.330 access in order to use, which, for this attack, we do have. Now, in a typical 00:32:15.330 --> 00:32:18.779 cache text scenario we wouldn't have access to these performance monitors, 00:32:18.779 --> 00:32:22.259 which is why they haven't really been explored before, but in this weird 00:32:22.259 --> 00:32:25.250 scenario where we're attacking the less privileged code from the more privileged 00:32:25.250 --> 00:32:29.340 code, we do have access to these performance monitors and we can use these 00:32:29.340 --> 00:32:33.640 monitors during the probe step to get a very accurate count of whether or not a 00:32:33.640 --> 00:32:39.519 certain memory load caused a cache miss or a cache hit. So, we're able to essentially 00:32:39.519 --> 00:32:45.720 get rid of the different levels of noise. Now, one thing to point out is that maybe 00:32:45.720 --> 00:32:49.230 we'd like to use these ARMv8 performance counters in order to count the different 00:32:49.230 --> 00:32:53.729 events that are occurring in the secure world code. So, maybe we start the 00:32:53.729 --> 00:32:57.909 performance counters from the non-secure world, let the secure world run and then, 00:32:57.909 --> 00:33:01.669 when they secure world exits, we use the non-secure world to read these performance 00:33:01.669 --> 00:33:05.440 counters and maybe we'd like to see how many instructions the secure world 00:33:05.440 --> 00:33:09.019 executed or how many branch instructions or how many arithmetic instructions or how 00:33:09.019 --> 00:33:13.179 many cache misses there were. But unfortunately, ARMv8 took this into 00:33:13.179 --> 00:33:17.350 account and by default, performance counters that are started in the non- 00:33:17.350 --> 00:33:20.769 secure world will not measure events that happen in the secure world, which is 00:33:20.769 --> 00:33:24.570 smart; which is how it should be. And the only reason I bring this up is because 00:33:24.570 --> 00:33:29.320 that's not how it is an ARMv7. So, we go into a whole different talk with that, 00:33:29.320 --> 00:33:33.909 just exploring the different implications of what that means, but I want to focus on 00:33:33.909 --> 00:33:39.230 ARMv8, because that's that's the newest of the new. So, we'll keep looking at that. 00:33:39.230 --> 00:33:42.540 So, we instrument the primary probe attack to use these performance counters, so we 00:33:42.540 --> 00:33:46.509 can get a clear picture of what is and what is not in the cache. And instead of 00:33:46.509 --> 00:33:52.399 having noisy measurements based on time, we have virtually no noise at all, because 00:33:52.399 --> 00:33:55.919 we get the truth straight from the processor itself, whether or not we 00:33:55.919 --> 00:34:01.660 experience a cache miss. So, how do we implement these attacks, where do we go 00:34:01.660 --> 00:34:05.549 from here? We have all these ideas; we have ways to make these TrustZone attacks 00:34:05.549 --> 00:34:11.840 more powerful, but that's not worthwhile, unless we actually implement them. So, the 00:34:11.840 --> 00:34:16.510 goal here is to implement these attacks on TrustZone and since typically the non- 00:34:16.510 --> 00:34:20.960 secure world operating system is based on Linux, we'll take that into account when 00:34:20.960 --> 00:34:25.360 making our implementation. So, we'll write a kernel module that uses these 00:34:25.360 --> 00:34:29.340 performance counters and these inner processor interrupts, in order to actually 00:34:29.340 --> 00:34:33.179 accomplish these attacks; and we'll write it in such a way that it's very 00:34:33.179 --> 00:34:37.300 generalizable. So you can take this kernel module that's was written for one device 00:34:37.300 --> 00:34:41.650 -- in my case I did most of my attention on the Nexus 5x -- and it's very easy to 00:34:41.650 --> 00:34:46.739 transfer this module to any other Linux- based device that has a trust zone that has 00:34:46.739 --> 00:34:52.139 these shared caches, so it should be very easy to port this over and to perform 00:34:52.139 --> 00:34:57.810 these same powerful cache attacks on different platforms. We can also do clever 00:34:57.810 --> 00:35:01.500 things based on the Linux operating system, so that we limit that collection 00:35:01.500 --> 00:35:05.500 window to just when we're executing within the secure world, so we can align our 00:35:05.500 --> 00:35:10.580 traces a lot more easily that way. And the end result is having a synchronized trace 00:35:10.580 --> 00:35:14.930 for each different attacks, because, since we've written in a modular way, we're able 00:35:14.930 --> 00:35:19.440 to run different attacks simultaneously. So, maybe we're running one prime-and- 00:35:19.440 --> 00:35:23.050 probe attack on the L1 data cache, to learn where the victim is accessing 00:35:23.050 --> 00:35:27.050 memory, and we're simultaneously running an attack on the L1 instruction cache, so 00:35:27.050 --> 00:35:33.910 we can see what instructions the victim is executing. And these can be aligned. So, 00:35:33.910 --> 00:35:37.080 the tool that I've written is a combination of a kernel module which 00:35:37.080 --> 00:35:41.580 actually performs this attack, a userland binary which schedules these processes to 00:35:41.580 --> 00:35:45.860 different cores, and a GUI that will allow you to interact with this kernel module 00:35:45.860 --> 00:35:49.710 and rapidly start doing these cache attacks for yourself and perform them 00:35:49.710 --> 00:35:56.860 against different processes and secure code and secure world code. So, the 00:35:56.860 --> 00:36:02.820 intention behind this tool is to be very generalizable to make it very easy to use 00:36:02.820 --> 00:36:08.430 this platform for different devices and to allow people way to, once again, quickly 00:36:08.430 --> 00:36:12.360 develop these attacks; and also to see if their own code is vulnerable to these 00:36:12.360 --> 00:36:18.490 cache attacks, to see if their code has these secret dependent memory accesses. 00:36:18.490 --> 00:36:25.349 So, can we get even better... spatial resolution? Right now, we're down to 64 00:36:25.349 --> 00:36:30.320 bytes and that's the size of a cache line, which is the size of our shared hardware. 00:36:30.320 --> 00:36:35.510 And on SGX, we actually can get better than 64 bytes, based on something called a 00:36:35.510 --> 00:36:39.160 branch-shadowing attack. So, a branch- shadowing attack takes advantage of 00:36:39.160 --> 00:36:42.730 something called the branch target buffer. And the branch target buffer is a 00:36:42.730 --> 00:36:48.490 structure that's used for branch prediction. It's similar to a cache, but 00:36:48.490 --> 00:36:51.740 there's a key difference where the branch target buffer doesn't compare the full 00:36:51.740 --> 00:36:54.770 address, when seeing if something is already in the cache or not: It doesn't 00:36:54.770 --> 00:36:59.701 compare all of the upper level bits. So, that means that it's possible that two 00:36:59.701 --> 00:37:04.140 different addresses will experience a collision, and the same entry from that 00:37:04.140 --> 00:37:08.870 BTB cache will be read out for an improper address. Now, since this is just for 00:37:08.870 --> 00:37:12.090 branch prediction, the worst that can happen is, you'll get a misprediction and 00:37:12.090 --> 00:37:18.070 a small time penalty, but that's about it. The idea of behind the branch-shadowing 00:37:18.070 --> 00:37:22.440 attack is leveraging the small difference in this overlapping and this collision of 00:37:22.440 --> 00:37:28.540 addresses in order to sort of execute a shared code cell flush-and-reload attack 00:37:28.540 --> 00:37:35.330 on the branch target buffer. So, here what goes on is, during the attack the attacker 00:37:35.330 --> 00:37:39.650 modifies the SGX Enclave to make sure that the branches that are within the Enclave 00:37:39.650 --> 00:37:44.340 will collide with branches that are not in the Enclave. The attacker executes the 00:37:44.340 --> 00:37:50.440 Enclave code and then the attacker executes their own code and based on the 00:37:50.440 --> 00:37:55.460 outcome of the the victim code in that cache, the attacker code may or may not 00:37:55.460 --> 00:37:59.210 experience a branch prediction. So, the attacker is able to tell the outcome of a 00:37:59.210 --> 00:38:03.310 branch, because of this overlap in this collision, like would be in a flush-and- 00:38:03.310 --> 00:38:06.570 reload attack, where those memories overlap between the attacker and the 00:38:06.570 --> 00:38:14.020 victim. So here, our spatial resolution is fantastic: We can tell down to individual 00:38:14.020 --> 00:38:19.440 branch instructions in SGX; we can tell exactly, which branches were executed and 00:38:19.440 --> 00:38:25.010 which directions they were taken, in the case of conditional branches. The temporal 00:38:25.010 --> 00:38:29.720 resolution is also, once again, almost unlimited, because we can use the same 00:38:29.720 --> 00:38:33.880 timer interrupts in order to schedule our process, our attacker process. And the 00:38:33.880 --> 00:38:39.120 noise is, once again, very low, because we can, once again, use the same sort of 00:38:39.120 --> 00:38:43.980 branch misprediction counters, that exist in the Intel world, in order to measure 00:38:43.980 --> 00:38:51.510 this noise. So, does anything of that apply to the TrustZone attacks? Well, in 00:38:51.510 --> 00:38:55.040 this case the victim and attacker don't share entries in the branch target buffer, 00:38:55.040 --> 00:39:01.610 because the attacker is not able to map the virtual address of the victim process. 00:39:01.610 --> 00:39:05.340 But this is kind of reminiscent of our earlier cache attacks, so our flush-and- 00:39:05.340 --> 00:39:10.100 reload attack only worked when the attack on the victim shared that memory, but we 00:39:10.100 --> 00:39:13.930 still have the prime-and-probe attack for when they don't. So, what if we use a 00:39:13.930 --> 00:39:21.380 prime-and-probe-style attack on the branch target buffer cache in ARM processors? So, 00:39:21.380 --> 00:39:25.320 essentially what we do here is, we prime the branch target buffer by executing mini 00:39:25.320 --> 00:39:29.531 attacker branches to sort of fill up this BTB cache with the attacker branch 00:39:29.531 --> 00:39:34.770 prediction data; we let the victim execute a branch which will evict an attacker BTB 00:39:34.770 --> 00:39:39.120 entry; and then we have the attacker re- execute those branches and see if there 00:39:39.120 --> 00:39:45.120 have been any mispredictions. So now, the cool thing about this attack is, the 00:39:45.120 --> 00:39:50.320 structure of the BTB cache is different from that of the L1 caches. So, instead of 00:39:50.320 --> 00:39:59.750 having 256 different sets in the L1 cache, the BTB cache has 2048 different sets, so 00:39:59.750 --> 00:40:06.380 we can tell which branch it attacks, based on which one of 2048 different set IDs 00:40:06.380 --> 00:40:11.230 that it could fall into. And even more than that, on the ARM platform, at least 00:40:11.230 --> 00:40:15.730 on the Nexus 5x that I was working with, the granularity is no longer 64 bytes, 00:40:15.730 --> 00:40:21.830 which is the size of the line, it's now 16 bytes. So, we can see which branches the 00:40:21.830 --> 00:40:27.620 the trusted code within TrustZone is executing within 16 bytes. So, what does 00:40:27.620 --> 00:40:31.820 this look like? So, previously with the true-spy attack, this is sort of the 00:40:31.820 --> 00:40:37.410 outcome of our prime-and-probe attack: We get 1 measurement for those 256 different 00:40:37.410 --> 00:40:43.420 set IDs. When we added those interrupts, we're able to get that time resolution, 00:40:43.420 --> 00:40:48.090 and it looks something like this. Now, maybe you can see a little bit at the top 00:40:48.090 --> 00:40:52.660 of the screen, how there's these repeated sections of little white blocks, and you 00:40:52.660 --> 00:40:56.720 can sort of use that to infer, maybe there's the same cache line and cache 00:40:56.720 --> 00:41:00.870 instructions that are called over and over. So, just looking at this L1-I cache 00:41:00.870 --> 00:41:06.920 attack, you can tell some information about how the process went. Now, let's 00:41:06.920 --> 00:41:11.870 compare that to the BTB attack. And I don't know if you can see too clearly -- 00:41:11.870 --> 00:41:17.190 it's a it's a bit too high of resolution right now -- so let's just focus in on one 00:41:17.190 --> 00:41:22.580 small part of this overall trace. And this is what it looks like. So, each of those 00:41:22.580 --> 00:41:27.720 white pixels represents a branch that was taken by that secure-world code and we can 00:41:27.720 --> 00:41:31.070 see repeated patterns, we can see maybe different functions that were called, we 00:41:31.070 --> 00:41:35.310 can see different loops. And just by looking at this 1 trace, we can infer a 00:41:35.310 --> 00:41:40.110 lot of information on how that secure world executed. So, it's incredibly 00:41:40.110 --> 00:41:44.230 powerful and all of those secrets are just waiting to be uncovered using these new 00:41:44.230 --> 00:41:52.890 tools. So, where do we go from here? What sort of countermeasures do we have? Well, 00:41:52.890 --> 00:41:56.690 first of all I think, the long term solution is going to be moving to no more 00:41:56.690 --> 00:42:00.200 shared hardware. We need to have separate hardware and no more shared caches in 00:42:00.200 --> 00:42:05.750 order to fully get rid of these different cache attacks. And we've already seen this 00:42:05.750 --> 00:42:11.420 trend in different cell phones. So, for example, in Apple SSEs for a long time now 00:42:11.420 --> 00:42:15.521 -- I think since the Apple A7 -- the secure Enclave, which runs the secure 00:42:15.521 --> 00:42:21.000 code, has its own cache. So, these cache attacks can't be accomplished from code 00:42:21.000 --> 00:42:27.400 outside of that secure Enclave. So, just by using that separate hardware, it knocks 00:42:27.400 --> 00:42:30.970 out a whole class of different potential side-channel and microarchitecture 00:42:30.970 --> 00:42:35.610 attacks. And just recently, the Pixel 2 is moving in the same direction. The Pixel 2 00:42:35.610 --> 00:42:40.540 now includes a hardware security module that includes cryptographic operations; 00:42:40.540 --> 00:42:45.890 and that chip also has its own memory and its own caches, so now we can no longer 00:42:45.890 --> 00:42:51.270 use this attack to extract information about what's going on in this external 00:42:51.270 --> 00:42:56.530 hardware security module. But even then, using this separate hardware, that doesn't 00:42:56.530 --> 00:43:00.800 solve all of our problems. Because we still have the question of "What do we 00:43:00.800 --> 00:43:05.900 include in this separate hardware?" On the one hand, we want to include more code in 00:43:05.900 --> 00:43:11.370 that a separate hardware, so we're less vulnerable to these side-channel attacks, 00:43:11.370 --> 00:43:16.490 but on the other hand, we don't want to expand the attack surface anymore. Because 00:43:16.490 --> 00:43:19.060 the more code we include in these secure environments, the more like that a 00:43:19.060 --> 00:43:22.600 vulnerabiliyy will be found and the attacker will be able to get a foothold 00:43:22.600 --> 00:43:26.470 within the secure, trusted environment. So, there's going to be a balance between 00:43:26.470 --> 00:43:30.270 what do you choose to include in the separate hardware and what you don't. So, 00:43:30.270 --> 00:43:35.220 do you include DRM code? Do you include cryptographic code? It's still an open 00:43:35.220 --> 00:43:41.800 question. And that's sort of the long-term approach. In the short term, you just kind 00:43:41.800 --> 00:43:46.370 of have to write side-channel-free software: Just be very careful about what 00:43:46.370 --> 00:43:50.811 your process does, if there are any secret, dependent memory accesses or a 00:43:50.811 --> 00:43:55.310 secret, dependent branching or secret, dependent function calls, because any of 00:43:55.310 --> 00:44:00.010 those can leak the secrets out of your trusted execution environment. So, here 00:44:00.010 --> 00:44:03.460 are the things that, if you are a developer of trusted execution environment 00:44:03.460 --> 00:44:08.150 code, that I want you to keep in mind: First of all, performance is very often at 00:44:08.150 --> 00:44:13.130 odds with security. We've seen over and over that the performance enhancements to 00:44:13.130 --> 00:44:18.880 these processors open up the ability for these microarchitectural attacks to be 00:44:18.880 --> 00:44:23.750 more efficient. Additionally, these trusted execution environments don't 00:44:23.750 --> 00:44:27.160 protect against everything; there are still these side-channel attacks and these 00:44:27.160 --> 00:44:32.310 microarchitectural attacks that these systems are vulnerable to. These attacks 00:44:32.310 --> 00:44:37.650 are very powerful; they can be accomplished simply; and with the 00:44:37.650 --> 00:44:41.770 publication of the code that I've written, it should be very simple to get set up and 00:44:41.770 --> 00:44:46.070 to analyze your own code to see "Am I vulnerable, do I expose information in the 00:44:46.070 --> 00:44:52.760 same way?" And lastly, it only takes 1 small error, 1 tiny leak from your trusted 00:44:52.760 --> 00:44:56.670 and secure code, in order to extract the entire secret, in order to bring the whole 00:44:56.670 --> 00:45:03.920 thing down. So, what I want to leave you with is: I want you to remember that you 00:45:03.920 --> 00:45:08.520 are responsible for making sure that your program is not vulnerable to these 00:45:08.520 --> 00:45:13.110 microarchitectural attacks, because if you do not take responsibility for this, who 00:45:13.110 --> 00:45:16.645 will? Thank you! 00:45:16.645 --> 00:45:25.040 Applause 00:45:25.040 --> 00:45:29.821 Herald: Thank you very much. Please, if you want to leave the hall, please do it 00:45:29.821 --> 00:45:35.000 quiet and take all your belongings with you and respect the speaker. We have 00:45:35.000 --> 00:45:43.230 plenty of time, 16, 17 minutes for Q&A, so please line up on the microphones. No 00:45:43.230 --> 00:45:50.650 questions from the signal angel, all right. So, we can start with microphone 6, 00:45:50.650 --> 00:45:54.770 please. Mic 6: Okay. There was a symbol of secure 00:45:54.770 --> 00:46:01.160 OSes at the ARM TrustZone. Which a idea of them if the non-secure OS gets all the 00:46:01.160 --> 00:46:04.210 interrupts? What does is the secure OS for? 00:46:04.210 --> 00:46:08.880 Keegan: Yeah so, in the ARMv8 there are a couple different kinds of interrupts. So, 00:46:08.880 --> 00:46:11.760 I think -- if I'm remembering the terminology correctly -- there is an IRQ 00:46:11.760 --> 00:46:16.800 and an FIQ interrupt. So, the non-secure mode handles the IRQ interrupts and the 00:46:16.800 --> 00:46:20.440 secure mode handles the FIQ interrupts. So, depending on which one you send, it 00:46:20.440 --> 00:46:24.840 will depend on which direction that monitor will direct that interrupt. 00:46:29.640 --> 00:46:32.010 Mic 6: Thank you. Herald: Okay, thank you. Microphone number 00:46:32.010 --> 00:46:37.930 7, please. Mic 7: Does any of your present attacks on 00:46:37.930 --> 00:46:45.290 TrustZone also apply to the AMD implementation of TrustZone or are you 00:46:45.290 --> 00:46:48.380 looking into it? Keegan: I haven't looked into AMD too 00:46:48.380 --> 00:46:54.011 much, because, as far as I can tell, that's not used as commonly, but there are 00:46:54.011 --> 00:46:57.490 many different types of trusted execution environments. The 2 that I focus on were 00:46:57.490 --> 00:47:04.760 SGX and TrustZone, because those are the most common examples that I've seen. 00:47:04.760 --> 00:47:09.250 Herald: Thank you. Microphone number 8, please. 00:47:09.250 --> 00:47:20.370 Mic 8: When TrustZone is moved to dedicated hardware, dedicated memory, 00:47:20.370 --> 00:47:27.780 couldn't you replicate the userspace attacks by loading your own trusted 00:47:27.780 --> 00:47:32.210 userspace app and use it as an oracle of some sorts? 00:47:32.210 --> 00:47:35.760 Keegan: If you can load your own trust code, then yes, you could do that. But in 00:47:35.760 --> 00:47:39.650 many of the models I've seen today, that's not possible. So, that's why you have 00:47:39.650 --> 00:47:44.250 things like code signing, which prevent the arbitrary user from running their own 00:47:44.250 --> 00:47:50.310 code in the trusted OS... or in the the trusted environment. 00:47:50.310 --> 00:47:55.010 Herald: All right. Microphone number 1. Mic 1: So, these attacks are more powerful 00:47:55.010 --> 00:48:00.720 against code that's running in... just the execution environments than similar 00:48:00.720 --> 00:48:07.100 attacks would be against ring-3 code, or, in general, trusted code. Does that mean 00:48:07.100 --> 00:48:10.910 that trusting execution environments are basically an attractive nuisance that we 00:48:10.910 --> 00:48:15.080 shouldn't use? Keegan: There's still a large benefit to 00:48:15.080 --> 00:48:17.600 using these trusted execution environments. The point I want to get 00:48:17.600 --> 00:48:21.390 across is that, although they add a lot of features, they don't protect against 00:48:21.390 --> 00:48:25.450 everything, so you should keep in mind that these side-channel attacks do still 00:48:25.450 --> 00:48:28.820 exist and you still need to protect against them. But overall, these are 00:48:28.820 --> 00:48:35.930 better things and worthwhile in including. Herald: Thank you. Microphone number 1 00:48:35.930 --> 00:48:41.580 again, please Mic 1: So, AMD is doing something with 00:48:41.580 --> 00:48:47.780 encrypting memory and I'm not sure if they encrypt addresses, too, and but would that 00:48:47.780 --> 00:48:53.090 be a defense against such attacks? Keegan: So, I'm not too familiar with AMD, 00:48:53.090 --> 00:48:57.690 but SGX also encrypts memory. It encrypts it in between the lowest-level cache and 00:48:57.690 --> 00:49:02.170 the main memory. But that doesn't really have an impact on the actual operation, 00:49:02.170 --> 00:49:06.220 because the memories encrypt at the cache line level and as the attacker, we don't 00:49:06.220 --> 00:49:10.380 care what that data is within that cache line, we only care which cache line is 00:49:10.380 --> 00:49:16.150 being accessed. Mic 1: If you encrypt addresses, wouldn't 00:49:16.150 --> 00:49:20.551 that help against that? Keegan: I'm not sure, how you would 00:49:20.551 --> 00:49:25.070 encrypt the addresses yourself. As long as those adresses map into the same set IDs 00:49:25.070 --> 00:49:30.200 that the victim can map into, then the victim could still pull off the same style 00:49:30.200 --> 00:49:35.030 of attacks. Herald: Great. We have a question from the 00:49:35.030 --> 00:49:38.200 internet, please. Signal Angel: The question is "Does the 00:49:38.200 --> 00:49:42.410 secure enclave on the Samsung Exynos distinguish the receiver of the messag, so 00:49:42.410 --> 00:49:46.830 that if the user application asked to decode an AES message, can one sniff on 00:49:46.830 --> 00:49:52.220 the value that the secure enclave returns?" 00:49:52.220 --> 00:49:56.680 Keegan: So, that sounds like it's asking about the true-spy style attack, where 00:49:56.680 --> 00:50:01.270 it's calling to the secure world to encrypt something with AES. I think, that 00:50:01.270 --> 00:50:04.830 would all depend on the different implementation: As long as it's encrypting 00:50:04.830 --> 00:50:09.790 for a certain key and it's able to do that repeatably, then the attack would, 00:50:09.790 --> 00:50:16.290 assuming a vulnerable AES implementation, would be able to extract that key out. 00:50:16.290 --> 00:50:20.750 Herald: Cool. Microphone number 2, please. Mic 2: Do you recommend a reference to 00:50:20.750 --> 00:50:25.350 understand how these cache line attacks and branch oracles actually lead to key 00:50:25.350 --> 00:50:29.540 recovery? Keegan: Yeah. So, I will flip through 00:50:29.540 --> 00:50:33.620 these pages which include a lot of the references for the attacks that I've 00:50:33.620 --> 00:50:38.030 mentioned, so if you're watching the video, you can see these right away or 00:50:38.030 --> 00:50:43.200 just access the slides. And a lot of these contain good starting points. So, I didn't 00:50:43.200 --> 00:50:46.340 go into a lot of the details on how, for example, the true-spy attack recovered 00:50:46.340 --> 00:50:53.090 that AES key., but that paper does have a lot of good links, how those areas can 00:50:53.090 --> 00:50:56.350 lead to key recovery. Same thing with the CLKSCREW attack, how the different fault 00:50:56.350 --> 00:51:03.070 injection can lead to key recovery. Herald: Microphone number 6, please. 00:51:03.070 --> 00:51:07.900 Mic 6: I think my question might have been very, almost the same thing: How hard is 00:51:07.900 --> 00:51:11.920 it actually to recover the keys? Is this like a massive machine learning problem or 00:51:11.920 --> 00:51:18.500 is this something that you can do practically on a single machine? 00:51:18.500 --> 00:51:21.640 Keegan: It varies entirely by the end implementation. So, for all these attacks 00:51:21.640 --> 00:51:25.750 work, you need to have some sort of vulnerable implementation and some 00:51:25.750 --> 00:51:29.010 implementations leak more data than others. In the case of a lot of the AES 00:51:29.010 --> 00:51:33.880 attacks, where you're doing the passive attacks, those are very easy to do on just 00:51:33.880 --> 00:51:37.630 your own computer. For the AES fault injection attack, I think that one 00:51:37.630 --> 00:51:42.340 required more brute force, in the CLKSCREW paper, so that one required more computing 00:51:42.340 --> 00:51:49.780 resources, but still, it was entirely practical to do in a realistic setting. 00:51:49.780 --> 00:51:53.770 Herald: Cool, thank you. So, we have one more: Microphone number 1, please. 00:51:53.770 --> 00:51:59.080 Mic 1: So, I hope it's not a too naive question, but I was wondering, since all 00:51:59.080 --> 00:52:04.730 these attacks are based on cache hit and misses, isn't it possible to forcibly 00:52:04.730 --> 00:52:11.280 flush or invalidate or insert noise in cache after each operation in this trust 00:52:11.280 --> 00:52:23.520 environment, in order to mess up the guesswork of the attacker? So, discarding 00:52:23.520 --> 00:52:29.180 optimization and performance for additional security benefits. 00:52:29.180 --> 00:52:32.420 Keegan: Yeah, and that is absolutely possible and you are absolutely right: It 00:52:32.420 --> 00:52:36.300 does lead to a performance degradation, because if you always flush the entire 00:52:36.300 --> 00:52:41.190 cache every time you do a context switch, that will be a huge performance hit. So 00:52:41.190 --> 00:52:45.190 again, that comes down to the question of the performance and security trade-off: 00:52:45.190 --> 00:52:49.540 Which one do you end up going with? And it seems historically the choice has been 00:52:49.540 --> 00:52:54.000 more in the direction of performance. Mic 1: Thank you. 00:52:54.000 --> 00:52:56.920 Herald: But we have one more: Microphone number 1, please. 00:52:56.920 --> 00:53:01.500 Mic 1: So, I have more of a moral question: So, how well should we really 00:53:01.500 --> 00:53:07.720 protect from attacks which need some ring-0 cooperation? Because, basically, 00:53:07.720 --> 00:53:14.350 when we use TrustZone for purpose... we would see clear, like protecting the 00:53:14.350 --> 00:53:20.250 browser from interacting from outside world, then we are basically using the 00:53:20.250 --> 00:53:27.280 safe execution environment for sandboxing the process. But once we need some 00:53:27.280 --> 00:53:32.281 cooperation from the kernel, some of that attacks, is in fact, empower the user 00:53:32.281 --> 00:53:36.320 instead of the hardware producer. Keegan: Yeah, and you're right. It 00:53:36.320 --> 00:53:39.210 depends entirely on what your application is and what your threat model is that 00:53:39.210 --> 00:53:43.020 you're looking at. So, if you're using these trusted execution environments to do 00:53:43.020 --> 00:53:48.430 DRM, for example, then maybe you wouldn't be worried about that ring-0 attack or 00:53:48.430 --> 00:53:51.620 that privileged attacker who has their phone rooted and is trying to recover 00:53:51.620 --> 00:53:56.740 these media encryption keys from this execution environment. But maybe there are 00:53:56.740 --> 00:54:01.230 other scenarios where you're not as worried about having an attack with a 00:54:01.230 --> 00:54:05.580 compromised ring 0. So, it entirely depends on context. 00:54:05.580 --> 00:54:09.000 Herald: Alright, thank you. So, we have one more: Microphone number 1, again. 00:54:09.000 --> 00:54:10.990 Mic 1: Hey there. Great talk, thank you very much. 00:54:10.990 --> 00:54:13.040 Keegan: Thank you. Mic 1: Just a short question: Do you have 00:54:13.040 --> 00:54:16.980 any success stories about attacking the TrustZone and the different 00:54:16.980 --> 00:54:24.010 implementations of TE with some vendors like some OEMs creating phones and stuff? 00:54:24.010 --> 00:54:29.750 Keegan: Not that I'm announcing at this time. 00:54:29.750 --> 00:54:35.584 Herald: So, thank you very much. Please, again a warm round of applause for Keegan! 00:54:35.584 --> 00:54:39.998 Applause 00:54:39.998 --> 00:54:45.489 34c3 postroll music 00:54:45.489 --> 00:55:02.000 subtitles created by c3subtitles.de in the year 2018. Join, and help us!