0:00:00.000,0:00:15.030 34C3 preroll music 0:00:15.030,0:00:22.570 Herald: Hello fellow creatures.[br]Welcome and 0:00:22.570,0:00:30.140 I wanna start with a question. [br]Another one: Who do we trust? 0:00:30.140,0:00:36.500 Do we trust the TrustZones [br]on our smartphones? 0:00:36.500,0:00:41.710 Well Keegan Ryan, he's really [br]fortunate to be here and 0:00:41.710,0:00:51.710 he was inspired by another talk from the[br]CCC before - I think it was 29C3 and his 0:00:51.710,0:00:57.550 research on smartphones and systems on a[br]chip used in smart phones will answer 0:00:57.550,0:01:02.520 these questions if you can trust those[br]trusted execution environments. Please 0:01:02.520,0:01:06.330 give a warm round of applause[br]to Keegan and enjoy! 0:01:06.330,0:01:10.630 Applause 0:01:10.630,0:01:14.220 Kegan Ryan: All right, thank you! So I'm[br]Keegan Ryan, I'm a consultant with NCC 0:01:14.220,0:01:19.740 group and this is micro architectural[br]attacks on Trusted Execution Environments. 0:01:19.740,0:01:23.250 So, in order to understand what a Trusted[br]Execution Environment is we need to go 0:01:23.250,0:01:29.729 back into processor security, specifically[br]on x86. So as many of you are probably 0:01:29.729,0:01:33.729 aware there are a couple different modes[br]which we can execute code under in x86 0:01:33.729,0:01:39.290 processors and that includes ring 3, which[br]is the user code and the applications, and 0:01:39.290,0:01:45.570 also ring 0 which is the kernel code. Now[br]there's also a ring 1 and ring 2 that are 0:01:45.570,0:01:50.229 supposedly used for drivers or guest[br]operating systems but really it just boils 0:01:50.229,0:01:56.159 down to ring 0 and ring 3. And in this[br]diagram we have here we see that privilege 0:01:56.159,0:02:02.149 increases as we go up the diagram, so ring[br]0 is the most privileged ring and ring 3 0:02:02.149,0:02:05.470 is the least privileged ring. So all of[br]our secrets, all of our sensitive 0:02:05.470,0:02:10.030 information, all of the attackers goals[br]are in ring 0 and the attacker is trying 0:02:10.030,0:02:15.890 to access those from the unprivileged[br]world of ring 3. Now you may have a 0:02:15.890,0:02:20.150 question what if I want to add a processor[br]feature that I don't want ring 0 to be 0:02:20.150,0:02:26.240 able to access? Well then you add ring -1[br]which is often used for a hypervisor. Now 0:02:26.240,0:02:30.610 the hypervisor has all the secrets and the[br]hypervisor can manage different guest 0:02:30.610,0:02:35.680 operating systems and each of these guest[br]operating systems can execute in ring 0 0:02:35.680,0:02:41.300 without having any idea of the other[br]operating systems. So this way now the 0:02:41.300,0:02:45.230 secrets are all in ring -1 so now the[br]attackers goals have shifted from ring 0 0:02:45.230,0:02:50.760 to ring -1. The attacker has to attack[br]ring -1 from a less privileged ring and 0:02:50.760,0:02:55.430 tries to access those secrets. But what if[br]you want to add a processor feature that 0:02:55.430,0:03:00.560 you don't want ring -1 to be able to[br]access? So you add ring -2 which is System 0:03:00.560,0:03:05.230 Management Mode and that's capable of[br]monitoring power, directly interfacing 0:03:05.230,0:03:10.350 with firmware and other chips on a[br]motherboard and it's able to access and do 0:03:10.350,0:03:13.820 a lot of things that the hypervisor is not[br]able to and now all of your secrets and 0:03:13.820,0:03:17.880 all of your attacker goals are in ring -2[br]and the attacker has to attack those from 0:03:17.880,0:03:22.400 a less privileged ring. Now maybe you want[br]to add something to your processor that 0:03:22.400,0:03:26.900 you don't want ring -2 to be able access,[br]so you add ring -3 and I think you get the 0:03:26.900,0:03:31.450 picture now. And we just keep on adding[br]more and more privilege rings and keep 0:03:31.450,0:03:35.260 putting our secrets and our attackers[br]goals in these higher and higher 0:03:35.260,0:03:41.260 privileged rings but what if we're[br]thinking about it wrong? What if instead 0:03:41.260,0:03:46.710 we want to put all the secrets in the[br]least privileged ring? So this is sort of 0:03:46.710,0:03:51.490 the idea behind SGX and it's useful for[br]things like DRM where you want that to run 0:03:51.490,0:03:56.980 ring 3 code but have sensitive secrets or[br]other assigning capabilities running in 0:03:56.980,0:04:02.050 ring 3. But this picture is getting a[br]little bit complicated, this diagram is a 0:04:02.050,0:04:06.250 little bit complex so let's simplify it a[br]little bit. We'll only be looking at ring 0:04:06.250,0:04:12.100 0 through ring 3 which is the kernel, the[br]userland and the SGX enclave which also 0:04:12.100,0:04:16.910 executes in ring 3. Now when you're[br]executing code in the SGX enclave you 0:04:16.910,0:04:22.170 first load the code into the enclave and[br]then from that point on you trust the 0:04:22.170,0:04:26.980 execution of whatever's going on in that[br]enclave. You trust that the other elements 0:04:26.980,0:04:31.640 the kernel, the userland, the other rings[br]are not going to be able to access what's 0:04:31.640,0:04:38.020 in that enclave so you've made your[br]Trusted Execution Environment. This is a 0:04:38.020,0:04:44.750 bit of a weird model because now your[br]attacker is in the ring 0 kernel and your 0:04:44.750,0:04:48.840 target victim here is in ring 3. So[br]instead of the attacker trying to move up 0:04:48.840,0:04:54.070 the privilege chain, the attacker is[br]trying to move down. Which is pretty 0:04:54.070,0:04:57.820 strange and you might have some questions[br]like "under this model who handles memory 0:04:57.820,0:05:01.470 management?" because traditionally that's[br]something that ring 0 would manage and 0:05:01.470,0:05:05.290 ring 0 would be responsible for paging[br]memory in and out for different processes 0:05:05.290,0:05:10.460 in different code that's executing it in[br]ring 3. But on the other hand you don't 0:05:10.460,0:05:16.030 want that to happen with the SGX enclave[br]because what if the malicious ring 0 adds 0:05:16.030,0:05:22.410 a page to the enclave that the enclave[br]doesn't expect? So in order to solve this 0:05:22.410,0:05:28.950 problem, SGX does allow ring 0 to handle[br]page faults. But simultaneously and in 0:05:28.950,0:05:35.380 parallel it verifies every memory load to[br]make sure that no access violations are 0:05:35.380,0:05:40.139 made so that all the SGX memory is safe.[br]So it allows ring 0 to do its job but it 0:05:40.139,0:05:45.010 sort of watches over at the same time to[br]make sure that nothing is messed up. So 0:05:45.010,0:05:51.120 it's a bit of a weird convoluted solution[br]to a strange inverted problem but it works 0:05:51.120,0:05:57.580 and that's essentially how SGX works and[br]the idea behind SGX. Now we can look at 0:05:57.580,0:06:02.530 x86 and we can see that ARMv8 is[br]constructed in a similar way but it 0:06:02.530,0:06:08.450 improves on x86 in a couple key ways. So[br]first of all ARMv8 gets rid of ring 1 and 0:06:08.450,0:06:12.170 ring 2 so you don't have to worry about[br]those and it just has different privilege 0:06:12.170,0:06:17.370 levels for userland and the kernel. And[br]these different privilege levels are 0:06:17.370,0:06:21.520 called exception levels in the ARM[br]terminology. And the second thing that ARM 0:06:21.520,0:06:25.930 gets right compared to x86 is that instead[br]of starting at 3 and counting down as 0:06:25.930,0:06:30.730 privilege goes up, ARM starts at 0 and[br]counts up so we don't have to worry about 0:06:30.730,0:06:35.940 negative numbers anymore. Now when we add[br]the next privilege level the hypervisor we 0:06:35.940,0:06:40.860 call it exception level 2 and the next one[br]after that is the monitor in exception 0:06:40.860,0:06:47.210 level 3. So at this point we still want to[br]have the ability to run trusted code in 0:06:47.210,0:06:52.650 exception level 0 the least privileged[br]level of the ARMv8 processor. So in order 0:06:52.650,0:06:59.060 to support this we need to separate this[br]diagram into two different sections. In 0:06:59.060,0:07:03.510 ARMv8 these are called the secure world[br]and the non-secure world. So we have the 0:07:03.510,0:07:07.740 non-secure world on the left in blue that[br]consists of the userland, the kernel and 0:07:07.740,0:07:11.900 the hypervisor and we have the secure[br]world on the right which consists of the 0:07:11.900,0:07:17.360 monitor in exception level 3, a trusted[br]operating system in exception level 1 and 0:07:17.360,0:07:23.030 trusted applications in exception level 0.[br]So the idea is that if you run anything in 0:07:23.030,0:07:27.360 the secure world, it should not be[br]accessible or modifiable by anything in 0:07:27.360,0:07:32.320 the non secure world. So that's how our[br]attacker is trying to access it. The 0:07:32.320,0:07:36.371 attacker has access to the non secure[br]kernel, which is often Linux, and they're 0:07:36.371,0:07:40.120 trying to go after the trusted apps. So[br]once again we have this weird inversion 0:07:40.120,0:07:43.330 where we're trying to go from a more[br]privileged level to a less privileged 0:07:43.330,0:07:48.260 level and trying to extract secrets in[br]that way. So the question that arises when 0:07:48.260,0:07:53.070 using these Trusted Execution Environments[br]that are implemented in SGX and TrustZone 0:07:53.070,0:07:58.330 in ARM is "can we use these privilege[br]modes in our privilege access in order to 0:07:58.330,0:08:03.330 attack these Trusted Execution[br]Environments?". Now transfer that question 0:08:03.330,0:08:06.260 and we can start looking at a few[br]different research papers. The first one 0:08:06.260,0:08:11.360 that I want to go into is one called[br]CLKSCREW and it's an attack on TrustZone. 0:08:11.360,0:08:14.360 So throughout this presentation I'm going[br]to go through a few different papers and 0:08:14.360,0:08:18.050 just to make it clear which papers have[br]already been published and which ones are 0:08:18.050,0:08:21.400 old I'll include the citations in the[br]upper right hand corner so that way you 0:08:21.400,0:08:26.580 can tell what's old and what's new. And as[br]far as papers go this CLKSCREW paper is 0:08:26.580,0:08:31.430 relatively new. It was released in 2017.[br]And the way CLKSCREW works is it takes 0:08:31.430,0:08:38.009 advantage of the energy management[br]features of a processor. So a non-secure 0:08:38.009,0:08:41.679 operating system has the ability to manage[br]the energy consumption of the different 0:08:41.679,0:08:47.970 cores. So if a certain target core doesn't[br]have much scheduled to do then the 0:08:47.970,0:08:52.350 operating system is able to scale back[br]that voltage or dial down the frequency on 0:08:52.350,0:08:56.449 that core so that core uses less energy[br]which is a great thing for performance: it 0:08:56.449,0:09:00.971 really extends battery life, it makes the[br]the cores last longer and it gives better 0:09:00.971,0:09:07.009 performance overall. But the problem here[br]is what if you have two separate cores and 0:09:07.009,0:09:11.740 one of your cores is running this non-[br]trusted operating system and the other 0:09:11.740,0:09:15.579 core is running code in the secure world?[br]It's running that trusted code those 0:09:15.579,0:09:21.240 trusted applications so that non secure[br]operating system can still dial down that 0:09:21.240,0:09:25.629 voltage and it can still change that[br]frequency and those changes will affect 0:09:25.629,0:09:30.740 the secure world code. So what the[br]CLKSCREW attack does is the non secure 0:09:30.740,0:09:36.470 operating system core will dial down the[br]voltage, it will overclock the frequency 0:09:36.470,0:09:40.749 on the target secure world core in order[br]to induce faults to make sure to make the 0:09:40.749,0:09:45.909 computation on that core fail in some way[br]and when that computation fails you get 0:09:45.909,0:09:50.439 certain cryptographic errors that the[br]attack can use to infer things like secret 0:09:50.439,0:09:56.040 keys, secret AES keys and to bypass code[br]signing implemented in the secure world. 0:09:56.040,0:09:59.680 So it's a very powerful attack that's made[br]possible because the non-secure operating 0:09:59.680,0:10:06.099 system is privileged enough in order to[br]use these energy management features. Now 0:10:06.099,0:10:10.189 CLKSCREW is an example of an active attack[br]where the attacker is actively changing 0:10:10.189,0:10:15.470 the outcome of the victim code of that[br]code in the secure world. But what about 0:10:15.470,0:10:20.540 passive attacks? So in a passive attack,[br]the attacker does not modify the actual 0:10:20.540,0:10:25.220 outcome of the process. The attacker just[br]tries to monitor that process infer what's 0:10:25.220,0:10:29.200 going on and that is the sort of attack[br]that we'll be considering for the rest of 0:10:29.200,0:10:35.769 the presentation. So in a lot of SGX and[br]TrustZone implementations, the trusted and 0:10:35.769,0:10:39.759 the non-trusted code both share the same[br]hardware and this shared hardware could be 0:10:39.759,0:10:45.800 a shared cache, it could be a branch[br]predictor, it could be a TLB. The point is 0:10:45.800,0:10:53.230 that they share the same hardware so that[br]the changes made by the secure code may be 0:10:53.230,0:10:57.209 reflected in the behavior of the non-[br]secure code. So the trusted code might 0:10:57.209,0:11:02.259 execute, change the state of that shared[br]cache for example and then the untrusted 0:11:02.259,0:11:07.179 code may be able to go in, see the changes[br]in that cache and infer information about 0:11:07.179,0:11:11.720 the behavior of the secure code. So that's[br]essentially how our side channel attacks 0:11:11.720,0:11:16.160 are going to work. If the non-secure code[br]is going to monitor these shared hardware 0:11:16.160,0:11:23.050 resources for state changes that reflect[br]the behavior of the secure code. Now we've 0:11:23.050,0:11:27.899 all talked about how Intel and SGX address[br]the problem of memory management and who's 0:11:27.899,0:11:33.399 responsible for making sure that those[br]attacks don't work on SGX. So what do they 0:11:33.399,0:11:37.050 have to say on how they protect against[br]these side channel attacks and attacks on 0:11:37.050,0:11:45.490 this shared cache hardware? They don't..[br]at all. They essentially say "we do not 0:11:45.490,0:11:48.931 consider this part of our threat model. It[br]is up to the developer to implement the 0:11:48.931,0:11:53.530 protections needed to protect against[br]these side-channel attacks". Which is 0:11:53.530,0:11:56.769 great news for us because these side[br]channel attacks can be very powerful and 0:11:56.769,0:12:00.350 if there aren't any hardware features that[br]are necessarily stopping us from being 0:12:00.350,0:12:06.910 able to accomplish our goal it makes us[br]that more likely to succeed. So with that 0:12:06.910,0:12:11.430 we can sort of take a step back from trust[br]zone industry acts and just take a look at 0:12:11.430,0:12:14.959 cache attacks to make sure that we all[br]have the same understanding of how the 0:12:14.959,0:12:19.549 cache attacks will be applied to these[br]Trusted Execution Environments. To start 0:12:19.549,0:12:25.619 that let's go over a brief recap of how a[br]cache works. So caches are necessary in 0:12:25.619,0:12:29.949 processors because accessing the main[br]memory is slow. When you try to access 0:12:29.949,0:12:34.079 something from the main memory it takes a[br]while to be read into the process. So the 0:12:34.079,0:12:40.389 cache exists as sort of a layer to[br]remember what that information is so if 0:12:40.389,0:12:45.040 the process ever needs information from[br]that same address it just reloads it from 0:12:45.040,0:12:49.699 the cache and that access is going to be[br]fast. So it really speeds up the memory 0:12:49.699,0:12:55.810 access for repeated accesses to the same[br]address. And then if we try to access a 0:12:55.810,0:13:00.069 different address then that will also be[br]read into the cache, slowly at first but 0:13:00.069,0:13:06.720 then quickly for repeated accesses and so[br]on and so forth. Now as you can probably 0:13:06.720,0:13:10.970 tell from all of these examples the memory[br]blocks have been moving horizontally 0:13:10.970,0:13:15.649 they've always been staying in the same[br]row. And that is reflective of the idea of 0:13:15.649,0:13:20.360 sets in a cache. So there are a number of[br]different set IDs and that corresponds to 0:13:20.360,0:13:24.189 the different rows in this diagram. So for[br]our example there are four different set 0:13:24.189,0:13:30.889 IDs and each address in the main memory[br]maps to a different set ID. So that 0:13:30.889,0:13:35.100 address in main memory will only go into[br]that location in the cache with the same 0:13:35.100,0:13:39.730 set ID so it will only travel along those[br]rows. So that means if you have two 0:13:39.730,0:13:43.410 different blocks of memory that mapped to[br]different set IDs they're not going to 0:13:43.410,0:13:48.899 interfere with each other in the cache.[br]But that raises the question "what about 0:13:48.899,0:13:53.310 two memory blocks that do map to the same[br]set ID?". Well if there's room in the 0:13:53.310,0:13:58.759 cache then the same thing will happen as[br]before: those memory contents will be 0:13:58.759,0:14:03.769 loaded into the cache and then retrieved[br]from the cache for future accesses. And 0:14:03.769,0:14:08.110 the number of possible entries for a[br]particular set ID within a cache is called 0:14:08.110,0:14:11.800 the associativity. And on this diagram[br]that's represented by the number of 0:14:11.800,0:14:16.819 columns in the cache. So we will call our[br]cache in this example a 2-way set- 0:14:16.819,0:14:22.350 associative cache. Now the next question[br]is "what happens if you try to read a 0:14:22.350,0:14:27.049 memory address that maps the same set ID[br]but all of those entries within that said ID 0:14:27.049,0:14:32.529 within the cache are full?". Well one of[br]those entries is chosen, it's evicted from 0:14:32.529,0:14:38.729 the cache, the new memory is read in and[br]then that's fed to the process. So it 0:14:38.729,0:14:43.779 doesn't really matter how the cache entry[br]is chosen that you're evicting for the 0:14:43.779,0:14:47.960 purpose of the presentation you can just[br]assume that it's random. But the important 0:14:47.960,0:14:51.899 thing is that if you try to access that[br]same memory that was evicted before you're 0:14:51.899,0:14:55.689 not going to have to wait for that time[br]penalty for that to be reloaded into the 0:14:55.689,0:15:01.329 cache and read into the process. So those[br]are caches in a nutshell in particularly 0:15:01.329,0:15:05.749 set associative caches, we can begin[br]looking at the different types of cache 0:15:05.749,0:15:09.319 attacks. So for a cache attack we have two[br]different processes we have an attacker 0:15:09.319,0:15:13.779 process and a victim process. For this[br]type of attack that we're considering both 0:15:13.779,0:15:17.290 of them share the same underlying code so[br]they're trying to access the same 0:15:17.290,0:15:21.829 resources which could be the case if you[br]have page deduplication in virtual 0:15:21.829,0:15:26.009 machines or if you have copy-on-write[br]mechanisms for shared code and shared 0:15:26.009,0:15:31.649 libraries. But the point is that they[br]share the same underlying memory. Now the 0:15:31.649,0:15:35.659 Flush and Reload Attack works in two[br]stages for the attacker. The attacker 0:15:35.659,0:15:39.420 first starts by flushing out the cache.[br]They flush each and every addresses in the 0:15:39.420,0:15:44.309 cache so the cache is just empty. Then the[br]attacker let's the victim executes for a 0:15:44.309,0:15:48.769 small amount of time so the victim might[br]read on an address from main memory 0:15:48.769,0:15:53.489 loading that into the cache and then the[br]second stage of the attack is the reload 0:15:53.489,0:15:58.099 phase. In the reload phase the attacker[br]tries to load different memory addresses 0:15:58.099,0:16:04.171 from main memory and see if those entries[br]are in the cache or not. Here the attacker 0:16:04.171,0:16:09.380 will first try to load address 0 and see[br]that because it takes a long time to read 0:16:09.380,0:16:14.429 the contents of address 0 the attacker can[br]infer that address 0 was not part of the 0:16:14.429,0:16:17.499 cache which makes sense because the[br]attacker flushed it from the cache in the 0:16:17.499,0:16:23.330 first stage. The attacker then tries to[br]read the memory at address 1 and sees that 0:16:23.330,0:16:29.089 this operation is fast so the attacker[br]infers that the contents of address 1 are 0:16:29.089,0:16:32.859 in the cache and because the attacker[br]flushed everything from the cache before 0:16:32.859,0:16:37.119 the victim executed, the attacker then[br]concludes that the victim is responsible 0:16:37.119,0:16:42.540 for bringing address 1 into the cache.[br]This Flush+Reload attack reveals which 0:16:42.540,0:16:47.370 memory addresses the victim accesses[br]during that small slice of time. Then 0:16:47.370,0:16:50.970 after that reload phase, the attack[br]repeats so the attacker flushes again 0:16:50.970,0:16:57.739 let's the victim execute, reloads again[br]and so on. There's also a variant on the 0:16:57.739,0:17:01.050 Flush+Reload attack that's called the[br]Flush+Flush attack which I'm not going to 0:17:01.050,0:17:05.569 go into the details of, but essentially[br]it's the same idea. But instead of using 0:17:05.569,0:17:08.980 load instructions to determine whether or[br]not a piece of memory is in the cache or 0:17:08.980,0:17:13.720 not, it uses flush instructions because[br]flush instructions will take longer if 0:17:13.720,0:17:19.138 something is in the cache already. The[br]important thing is that both the 0:17:19.138,0:17:22.819 Flush+Reload attack and the Flush+Flush[br]attack rely on the attacker and the victim 0:17:22.819,0:17:27.029 sharing the same memory. But this isn't[br]always the case so we need to consider 0:17:27.029,0:17:30.810 what happens when the attacker and the[br]victim do not share memory. For this we 0:17:30.810,0:17:35.670 have the Prime+Probe attack. The[br]Prime+Probe attack once again works in two 0:17:35.670,0:17:40.380 separate stages. In the first stage the[br]attacker prime's the cache by reading all 0:17:40.380,0:17:44.401 the attacker memory into the cache and[br]then the attacker lets the victim execute 0:17:44.401,0:17:49.750 for a small amount of time. So no matter[br]what the victim accesses from main memory 0:17:49.750,0:17:54.460 since the cache is full of the attacker[br]data, one of those attacker entries will 0:17:54.460,0:17:59.190 be replaced by a victim entry. Then in the[br]second phase of the attack, during the 0:17:59.190,0:18:03.529 probe phase, the attacker checks the[br]different cache entries for particular set 0:18:03.529,0:18:08.959 IDs and sees if all of the attacker[br]entries are still in the cache. So maybe 0:18:08.959,0:18:13.440 our attacker is curious about the last set[br]ID, the bottom row, so the attacker first 0:18:13.440,0:18:18.090 tries to load the memory at address 3 and[br]because this operation is fast the 0:18:18.090,0:18:23.000 attacker knows that address 3 is in the[br]cache. The attacker tries the same thing 0:18:23.000,0:18:28.159 with address 7, sees that this operation[br]is slow and infers that at some point 0:18:28.159,0:18:33.279 address 7 was evicted from the cache so[br]the attacker knows that something had to 0:18:33.279,0:18:37.490 evicted from the cache and it had to be[br]from the victim so the attacker concludes 0:18:37.490,0:18:42.840 that the victim accessed something in that[br]last set ID and that bottom row. The 0:18:42.840,0:18:47.230 attacker doesn't know if it was the[br]contents of address 11 or the contents of 0:18:47.230,0:18:51.260 address 15 or even what those contents[br]are, but the attacker has a good idea of 0:18:51.260,0:18:57.090 which set ID it was. So, the good things,[br]the important things to remember about 0:18:57.090,0:19:01.179 cache attacks is that caches are very[br]important, they're crucial for performance 0:19:01.179,0:19:06.059 on processors, they give a huge speed[br]boost and there's a huge time difference 0:19:06.059,0:19:11.569 between having a cache and not having a[br]cache for your executables. But the 0:19:11.569,0:19:16.080 downside to this is that big time[br]difference also allows the attacker to 0:19:16.080,0:19:21.620 infer information about how the victim is[br]using the cache. We're able to use these 0:19:21.620,0:19:24.429 cache attacks in the two different[br]scenarios of, where memory is shared, in 0:19:24.429,0:19:28.230 the case of the Flush+Reload and[br]Flush+Flush attacks and in the case where 0:19:28.230,0:19:31.739 memory is not shared, in the case of the[br]Prime+Probe attack. And finally the 0:19:31.739,0:19:36.659 important thing to keep in mind is that,[br]for these cache attacks, we know where the 0:19:36.659,0:19:40.480 victim is looking, but we don't know what[br]they see. So we don't know the contents of 0:19:40.480,0:19:44.360 the memory that the victim is actually[br]seeing, we just know the location and the 0:19:44.360,0:19:51.549 addresses. So, what does an example trace[br]of these attacks look like? Well, there's 0:19:51.549,0:19:56.451 an easy way to represent these as two-[br]dimensional images. So in this image, we 0:19:56.451,0:20:01.760 have our horizontal axis as time, so each[br]column in this image represents a 0:20:01.760,0:20:07.159 different time slice, a different[br]iteration of the Prime measure and Probe. 0:20:07.159,0:20:11.440 So, then we also have the vertical access[br]which is the different set IDs, which is 0:20:11.440,0:20:18.360 the location that's accessed by the victim[br]process, and then here a pixel is white if 0:20:18.360,0:20:24.159 the victim accessed that set ID during[br]that time slice. So, as you look from left 0:20:24.159,0:20:28.139 to right as time moves forward, you can[br]sort of see the changes in the patterns of 0:20:28.139,0:20:34.070 the memory accesses made by the victim[br]process. Now, for this particular example 0:20:34.070,0:20:39.860 the trace is captured on an execution of[br]AES repeated several times, an AES 0:20:39.860,0:20:44.519 encryption repeated about 20 times. And[br]you can tell that this is a repeated 0:20:44.519,0:20:49.070 action because you see the same repeated[br]memory access patterns in the data, you 0:20:49.070,0:20:55.320 see the same structures repeated over and[br]over. So, you know that this is reflecting 0:20:55.320,0:21:00.749 at what's going on throughout time, but[br]what does it have to do with AES itself? 0:21:00.749,0:21:05.950 Well, if we take the same trace with the[br]same settings, but a different key, we see 0:21:05.950,0:21:11.590 that there is a different memory access[br]pattern with different repetition within 0:21:11.590,0:21:18.200 the trace. So, only the key changed, the[br]code didn't change. So, even though we're 0:21:18.200,0:21:22.130 not able to read the contents of the key[br]directly using this cache attack, we know 0:21:22.130,0:21:25.610 that the key is changing these memory[br]access patterns, and if we can see these 0:21:25.610,0:21:30.850 memory access patterns, then we can infer[br]the key. So, that's the essential idea: we 0:21:30.850,0:21:35.380 want to make these images as clear as[br]possible and as descriptive as possible so 0:21:35.380,0:21:42.279 we have the best chance of learning what[br]those secrets are. And we can define the 0:21:42.279,0:21:47.389 metrics for what makes these cache attacks[br]powerful in a few different ways. So, the 0:21:47.389,0:21:51.759 three ways we'll be looking at are spatial[br]resolution, temporal resolution and noise. 0:21:51.759,0:21:56.300 So, spatial resolution refers to how[br]accurately we can determine the where. If 0:21:56.300,0:22:00.510 we know that the victim access to memory[br]address within 1,000 bytes, that's 0:22:00.510,0:22:06.820 obviously not as powerful as knowing where[br]they accessed within 512 bytes. Temporal 0:22:06.820,0:22:12.049 resolution is similar, where we want to[br]know the order of what accesses the victim 0:22:12.049,0:22:17.769 made. So if that time slice during our[br]attack is 1 millisecond, we're going to 0:22:17.769,0:22:22.139 get much better ordering information on[br]those memory access than we would get if 0:22:22.139,0:22:27.350 we only saw all the memory accesses over[br]the course of one second. So the shorter 0:22:27.350,0:22:32.159 that time slice, the better the temporal[br]resolution, the longer our picture will be 0:22:32.159,0:22:37.790 on the horizontal access, and the clearer[br]of an image of the cache that we'll see. 0:22:37.790,0:22:41.419 And the last metric to evaluate our[br]attacks on is noise and that reflects how 0:22:41.419,0:22:46.070 accurately our measurements reflect the[br]true state of the cache. So, right now 0:22:46.070,0:22:49.950 we've been using time and data to infer[br]whether or not an item was in the cache or 0:22:49.950,0:22:54.340 not, but this is a little bit noisy. It's[br]possible that we'll have false positives 0:22:54.340,0:22:57.370 or false negatives, so we want to keep[br]that in mind as we look at the different 0:22:57.370,0:23:03.081 attacks. So, that's essentially cache[br]attacks, and then, in a nutshell and 0:23:03.081,0:23:06.519 that's all you really need to understand[br]in order to understand these attacks as 0:23:06.519,0:23:11.389 they've been implemented on Trusted[br]Execution Environments. And the first 0:23:11.389,0:23:14.510 particular attack that we're going to be[br]looking at is called a Controlled-Channel 0:23:14.510,0:23:19.890 Attack on SGX, and this attack isn't[br]necessarily a cache attack, but we can 0:23:19.890,0:23:23.770 analyze it in the same way that we analyze[br]the cache attacks. So, it's still useful 0:23:23.770,0:23:30.940 to look at. Now, if you remember how[br]memory management occurs with SGX, we know 0:23:30.940,0:23:36.210 that if a page fault occurs during SGX[br]Enclave code execution, that page fault is 0:23:36.210,0:23:43.019 handled by the kernel. So, the kernel has[br]to know which page the Enclave needs to be 0:23:43.019,0:23:48.050 paged in. The kernel already gets some[br]information about what the Enclave is 0:23:48.050,0:23:54.789 looking at. Now, in the Controlled-Channel[br]attack, there's a, what the attacker does 0:23:54.789,0:23:59.839 from the non-trusted OS is the attacker[br]pages almost every other page from the 0:23:59.839,0:24:05.260 Enclave out of memory. So no matter[br]whatever page that Enclave tries to 0:24:05.260,0:24:09.770 access, it's very likely to cause a page[br]fault, which will be redirected to the 0:24:09.770,0:24:14.150 non-trusted OS, where the non-trusted OS[br]can record it, page out any other pages 0:24:14.150,0:24:20.429 and continue execution. So, the OS[br]essentially gets a list of sequential page 0:24:20.429,0:24:26.259 accesses made by the SGX Enclaves, all by[br]capturing the page fault handler. This is 0:24:26.259,0:24:29.669 a very general attack, you don't need to[br]know what's going on in the Enclave in 0:24:29.669,0:24:33.460 order to pull this off. You just load up[br]an arbitrary Enclave and you're able to 0:24:33.460,0:24:40.720 see which pages that Enclave is trying to[br]access. So, how does it do on our metrics? 0:24:40.720,0:24:44.270 First of all, this spatial resolution is[br]not great. We can only see where the 0:24:44.270,0:24:50.470 victim is accessing within 4096 bytes or[br]the size of a full page because SGX 0:24:50.470,0:24:55.519 obscures the offset into the page where[br]the page fault occurs. The temporal 0:24:55.519,0:24:58.760 resolution is good but not great, because[br]even though we're able to see any 0:24:58.760,0:25:04.450 sequential accesses to different pages[br]we're not able to see sequential accesses 0:25:04.450,0:25:09.970 to the same page because we need to keep[br]that same page paged-in while we let our 0:25:09.970,0:25:15.490 SGX Enclave run for that small time slice.[br]So temporal resolution is good but not 0:25:15.490,0:25:22.440 perfect. But the noise is, there is no[br]noise in this attack because no matter 0:25:22.440,0:25:26.149 where the page fault occurs, the untrusted[br]operating system is going to capture that 0:25:26.149,0:25:30.180 page fault and is going to handle it. So,[br]it's very low noise, not great spatial 0:25:30.180,0:25:37.490 resolution but overall still a powerful[br]attack. But we still want to improve on 0:25:37.490,0:25:40.700 that spatial resolution, we want to be[br]able to see what the Enclave is doing that 0:25:40.700,0:25:45.970 greater than a resolution of a one page of[br]four kilobytes. So that's exactly what the 0:25:45.970,0:25:50.179 CacheZoom paper does, and instead of[br]interrupting the SGX Enclave execution 0:25:50.179,0:25:55.370 with page faults, it uses timer[br]interrupts. Because the untrusted 0:25:55.370,0:25:59.280 operating system is able to schedule when[br]timer interrupts occur, so it's able to 0:25:59.280,0:26:03.320 schedule them at very tight intervals, so[br]it's able to get that small and tight 0:26:03.320,0:26:08.549 temporal resolution. And essentially what[br]happens in between is this timer 0:26:08.549,0:26:13.410 interrupts fires, the untrusted operating[br]system runs the Prime+Probe attack code in 0:26:13.410,0:26:18.240 this case, and resumes execution of the[br]onclick process, and this repeats. So this 0:26:18.240,0:26:24.549 is a Prime+Probe attack on the L1 data[br]cache. So, this attack let's you see what 0:26:24.549,0:26:30.529 data The Enclave is looking at. Now, this[br]attack could be easily modified to use the 0:26:30.529,0:26:36.000 L1 instruction cache, so in that case you[br]learn which instructions The Enclave is 0:26:36.000,0:26:41.419 executing. And overall this is an even[br]more powerful attack than the Control- 0:26:41.419,0:26:46.429 Channel attack. If we look at the metrics,[br]we can see that the spatial resolution is 0:26:46.429,0:26:50.360 a lot better, now we're looking at spatial[br]resolution of 64 bytes or the size of an 0:26:50.360,0:26:55.370 individual line. The temporal resolution[br]is very good, it's "almost unlimited", to 0:26:55.370,0:27:00.250 quote the paper, because the untrusted[br]operating system has the privilege to keep 0:27:00.250,0:27:05.179 scheduling those time interrupts closer[br]and closer together until it's able to 0:27:05.179,0:27:10.260 capture very small time slices of the[br]victim process .And the noise itself is 0:27:10.260,0:27:14.559 low, we're still using a cycle counter to[br]measure the time it takes to load memory 0:27:14.559,0:27:20.629 in and out of the cache, but it's, it's[br]useful, the chances of having a false 0:27:20.629,0:27:26.809 positive or false negative are low, so the[br]noise is low as well. Now, we can also 0:27:26.809,0:27:31.129 look at Trust Zone attacks, because so far[br]the attacks that we've looked at, the 0:27:31.129,0:27:35.130 passive attacks, have been against SGX and[br]those attacks on SGX have been pretty 0:27:35.130,0:27:40.669 powerful. So, what are the published[br]attacks on Trust Zone? Well, there's one 0:27:40.669,0:27:44.990 called TruSpy, which is kind of similar in[br]concept to the CacheZoom attack that we 0:27:44.990,0:27:51.629 just looked at on SGX. It's once again a[br]Prime+probe attack on the L1 data cache, 0:27:51.629,0:27:57.129 and the difference here is that instead of[br]interrupting the victim code execution 0:27:57.129,0:28:04.460 multiple times, the TruSpy attack does the[br]prime step, does the full AES encryption, 0:28:04.460,0:28:08.539 and then does the probe step. And the[br]reason they do this, is because as they 0:28:08.539,0:28:13.330 say, the secure world is protected, and is[br]not interruptible in the same way that SGX 0:28:13.330,0:28:20.690 is interruptable. But even despite this,[br]just having one measurement per execution, 0:28:20.690,0:28:24.940 the TruSpy authors were able to use some[br]statistics to still recover the AES key 0:28:24.940,0:28:30.460 from that noise. And their methods were so[br]powerful, they are able to do this from an 0:28:30.460,0:28:34.539 unapproved application in user land, so[br]they don't even need to be running within 0:28:34.539,0:28:39.820 the kernel in order to be able to pull off[br]this attack. So, how does this attack 0:28:39.820,0:28:43.360 measure up? The spatial resolution is once[br]again 64 bytes because that's the size of 0:28:43.360,0:28:48.559 a cache line on this processor, and the[br]temporal resolution is, is pretty poor 0:28:48.559,0:28:54.190 here, because we only get one measurement[br]per execution of the AES encryption. This 0:28:54.190,0:28:58.700 is also a particularly noisy attack[br]because we're making the measurements from 0:28:58.700,0:29:02.659 the user land, but even if we make the[br]measurements from the kernel, we're still 0:29:02.659,0:29:05.789 going to have the same issues of false[br]positives and false negatives associated 0:29:05.789,0:29:12.470 with using a cycle counter to measure[br]membership in a cache. So, we'd like to 0:29:12.470,0:29:16.389 improve this a little bit. We'd like to[br]improve the temporal resolution, so we 0:29:16.389,0:29:20.749 have the power of the cache attack to be a[br]little bit closer on TrustZone, as it is 0:29:20.749,0:29:27.149 on SGX. So, we want to improve that[br]temporal resolution. Let's dig into that 0:29:27.149,0:29:30.549 statement a little bit, that the secure[br]world is protected and not interruptable. 0:29:30.549,0:29:36.499 And to do, this we go back to this diagram[br]of ARMv8 and how that TrustZone is set up. 0:29:36.499,0:29:41.490 So, it is true that when an interrupt[br]occurs, it is directed to the monitor and, 0:29:41.490,0:29:45.530 because the monitor operates in the secure[br]world, if we interrupt secure code that's 0:29:45.530,0:29:49.081 running an exception level 0, we're just[br]going to end up running secure code an 0:29:49.081,0:29:54.239 exception level 3. So, this doesn't[br]necessarily get us anything. I think, 0:29:54.239,0:29:57.880 that's what the author's mean by saying[br]that it's protected against this. Just by 0:29:57.880,0:30:02.780 setting an interrupt, we don't have a[br]way to redirect our flow to the non- 0:30:02.780,0:30:08.190 trusted code. At least that's how it works[br]in theory. In practice, the Linux 0:30:08.190,0:30:11.840 operating system, running in exception[br]level 1 in the non-secure world, kind of 0:30:11.840,0:30:15.299 needs interrupts in order to be able to[br]work, so if an interrupt occurs and it's 0:30:15.299,0:30:18.120 being sent to the monitor, the monitor[br]will just forward it right to the non- 0:30:18.120,0:30:22.500 secure operating system. So, we have[br]interrupts just the same way as we did in 0:30:22.500,0:30:28.930 CacheZoom. And we can improve the[br]TrustZone attacks by using this idea: We 0:30:28.930,0:30:33.549 have 2 cores, where one core is running[br]the secure code, the other core is running 0:30:33.549,0:30:38.101 the non-secure code, and the non-secure[br]code is sending interrupts to the secure- 0:30:38.101,0:30:42.809 world core and that will give us that[br]interleaving of attacker process and 0:30:42.809,0:30:47.409 victim process that allow us to have a[br]powerful prime-and-probe attack. So, what 0:30:47.409,0:30:51.139 does this look like? We have the attack[br]core and the victim core. The attack core 0:30:51.139,0:30:54.909 sends an interrupt to the victim core.[br]This interrupt is captured by the monitor, 0:30:54.909,0:30:58.769 which passes it to the non-secure[br]operating system. The not-secure operating 0:30:58.769,0:31:02.979 system transfers this to our attack code,[br]which runs the prime-and-probe attack. 0:31:02.979,0:31:06.529 Then, we leave the interrupt, the[br]execution within the victim code in the 0:31:06.529,0:31:10.910 secure world resumes and we just repeat[br]this over and over. So, now we have that 0:31:10.910,0:31:16.690 interleaving of data... of the processes[br]of the attacker and the victim. So, now, 0:31:16.690,0:31:22.690 instead of having a temporal resolution of[br]one measurement per execution, we once 0:31:22.690,0:31:26.320 again have almost unlimited temporal[br]resolution, because we can just schedule 0:31:26.320,0:31:32.229 when we send those interrupts from the[br]attacker core. Now, we'd also like to 0:31:32.229,0:31:37.590 improve the noise measurements. The...[br]because if we can improve the noise, we'll 0:31:37.590,0:31:42.159 get clearer pictures and we'll be able to[br]infer those secrets more clearly. So, we 0:31:42.159,0:31:45.720 can get some improvement by switching the[br]measurements from userland and starting to 0:31:45.720,0:31:50.830 do those in the kernel, but again we have[br]the cycle counters. So, what if, instead 0:31:50.830,0:31:54.330 of using the cycle counter to measure[br]whether or not something is in the cache, 0:31:54.330,0:32:00.070 we use the other performance counters?[br]Because on ARMv8 platforms, there is a way 0:32:00.070,0:32:03.769 to use performance counters to measure[br]different events, such as cache hits and 0:32:03.769,0:32:09.809 cache misses. So, these events and these[br]performance monitors require privileged 0:32:09.809,0:32:15.330 access in order to use, which, for this[br]attack, we do have. Now, in a typical 0:32:15.330,0:32:18.779 cache text scenario we wouldn't have[br]access to these performance monitors, 0:32:18.779,0:32:22.259 which is why they haven't really been[br]explored before, but in this weird 0:32:22.259,0:32:25.250 scenario where we're attacking the less[br]privileged code from the more privileged 0:32:25.250,0:32:29.340 code, we do have access to these[br]performance monitors and we can use these 0:32:29.340,0:32:33.640 monitors during the probe step to get a[br]very accurate count of whether or not a 0:32:33.640,0:32:39.519 certain memory load caused a cache miss or[br]a cache hit. So, we're able to essentially 0:32:39.519,0:32:45.720 get rid of the different levels of noise.[br]Now, one thing to point out is that maybe 0:32:45.720,0:32:49.230 we'd like to use these ARMv8 performance[br]counters in order to count the different 0:32:49.230,0:32:53.729 events that are occurring in the secure[br]world code. So, maybe we start the 0:32:53.729,0:32:57.909 performance counters from the non-secure[br]world, let the secure world run and then, 0:32:57.909,0:33:01.669 when they secure world exits, we use the[br]non-secure world to read these performance 0:33:01.669,0:33:05.440 counters and maybe we'd like to see how[br]many instructions the secure world 0:33:05.440,0:33:09.019 executed or how many branch instructions[br]or how many arithmetic instructions or how 0:33:09.019,0:33:13.179 many cache misses there were. But[br]unfortunately, ARMv8 took this into 0:33:13.179,0:33:17.350 account and by default, performance[br]counters that are started in the non- 0:33:17.350,0:33:20.769 secure world will not measure events that[br]happen in the secure world, which is 0:33:20.769,0:33:24.570 smart; which is how it should be. And the[br]only reason I bring this up is because 0:33:24.570,0:33:29.320 that's not how it is an ARMv7. So, we go[br]into a whole different talk with that, 0:33:29.320,0:33:33.909 just exploring the different implications[br]of what that means, but I want to focus on 0:33:33.909,0:33:39.230 ARMv8, because that's that's the newest of[br]the new. So, we'll keep looking at that. 0:33:39.230,0:33:42.540 So, we instrument the primary probe attack[br]to use these performance counters, so we 0:33:42.540,0:33:46.509 can get a clear picture of what is and[br]what is not in the cache. And instead of 0:33:46.509,0:33:52.399 having noisy measurements based on time,[br]we have virtually no noise at all, because 0:33:52.399,0:33:55.919 we get the truth straight from the[br]processor itself, whether or not we 0:33:55.919,0:34:01.660 experience a cache miss. So, how do we[br]implement these attacks, where do we go 0:34:01.660,0:34:05.549 from here? We have all these ideas; we[br]have ways to make these TrustZone attacks 0:34:05.549,0:34:11.840 more powerful, but that's not worthwhile,[br]unless we actually implement them. So, the 0:34:11.840,0:34:16.510 goal here is to implement these attacks on[br]TrustZone and since typically the non- 0:34:16.510,0:34:20.960 secure world operating system is based on[br]Linux, we'll take that into account when 0:34:20.960,0:34:25.360 making our implementation. So, we'll write[br]a kernel module that uses these 0:34:25.360,0:34:29.340 performance counters and these inner[br]processor interrupts, in order to actually 0:34:29.340,0:34:33.179 accomplish these attacks; and we'll write[br]it in such a way that it's very 0:34:33.179,0:34:37.300 generalizable. So you can take this kernel[br]module that's was written for one device 0:34:37.300,0:34:41.650 -- in my case I did most of my attention[br]on the Nexus 5x -- and it's very easy to 0:34:41.650,0:34:46.739 transfer this module to any other Linux-[br]based device that has a trust zone that has 0:34:46.739,0:34:52.139 these shared caches, so it should be very[br]easy to port this over and to perform 0:34:52.139,0:34:57.810 these same powerful cache attacks on[br]different platforms. We can also do clever 0:34:57.810,0:35:01.500 things based on the Linux operating[br]system, so that we limit that collection 0:35:01.500,0:35:05.500 window to just when we're executing within[br]the secure world, so we can align our 0:35:05.500,0:35:10.580 traces a lot more easily that way. And the[br]end result is having a synchronized trace 0:35:10.580,0:35:14.930 for each different attacks, because, since[br]we've written in a modular way, we're able 0:35:14.930,0:35:19.440 to run different attacks simultaneously.[br]So, maybe we're running one prime-and- 0:35:19.440,0:35:23.050 probe attack on the L1 data cache, to[br]learn where the victim is accessing 0:35:23.050,0:35:27.050 memory, and we're simultaneously running[br]an attack on the L1 instruction cache, so 0:35:27.050,0:35:33.910 we can see what instructions the victim is[br]executing. And these can be aligned. So, 0:35:33.910,0:35:37.080 the tool that I've written is a[br]combination of a kernel module which 0:35:37.080,0:35:41.580 actually performs this attack, a userland[br]binary which schedules these processes to 0:35:41.580,0:35:45.860 different cores, and a GUI that will allow[br]you to interact with this kernel module 0:35:45.860,0:35:49.710 and rapidly start doing these cache[br]attacks for yourself and perform them 0:35:49.710,0:35:56.860 against different processes and secure[br]code and secure world code. So, the 0:35:56.860,0:36:02.820 intention behind this tool is to be very[br]generalizable to make it very easy to use 0:36:02.820,0:36:08.430 this platform for different devices and to[br]allow people way to, once again, quickly 0:36:08.430,0:36:12.360 develop these attacks; and also to see if[br]their own code is vulnerable to these 0:36:12.360,0:36:18.490 cache attacks, to see if their code has[br]these secret dependent memory accesses. 0:36:18.490,0:36:25.349 So, can we get even better... spatial[br]resolution? Right now, we're down to 64 0:36:25.349,0:36:30.320 bytes and that's the size of a cache line,[br]which is the size of our shared hardware. 0:36:30.320,0:36:35.510 And on SGX, we actually can get better[br]than 64 bytes, based on something called a 0:36:35.510,0:36:39.160 branch-shadowing attack. So, a branch-[br]shadowing attack takes advantage of 0:36:39.160,0:36:42.730 something called the branch target buffer.[br]And the branch target buffer is a 0:36:42.730,0:36:48.490 structure that's used for branch[br]prediction. It's similar to a cache, but 0:36:48.490,0:36:51.740 there's a key difference where the branch[br]target buffer doesn't compare the full 0:36:51.740,0:36:54.770 address, when seeing if something is[br]already in the cache or not: It doesn't 0:36:54.770,0:36:59.701 compare all of the upper level bits. So,[br]that means that it's possible that two 0:36:59.701,0:37:04.140 different addresses will experience a[br]collision, and the same entry from that 0:37:04.140,0:37:08.870 BTB cache will be read out for an improper[br]address. Now, since this is just for 0:37:08.870,0:37:12.090 branch prediction, the worst that can[br]happen is, you'll get a misprediction and 0:37:12.090,0:37:18.070 a small time penalty, but that's about it.[br]The idea of behind the branch-shadowing 0:37:18.070,0:37:22.440 attack is leveraging the small difference[br]in this overlapping and this collision of 0:37:22.440,0:37:28.540 addresses in order to sort of execute a[br]shared code cell flush-and-reload attack 0:37:28.540,0:37:35.330 on the branch target buffer. So, here what[br]goes on is, during the attack the attacker 0:37:35.330,0:37:39.650 modifies the SGX Enclave to make sure that[br]the branches that are within the Enclave 0:37:39.650,0:37:44.340 will collide with branches that are not in[br]the Enclave. The attacker executes the 0:37:44.340,0:37:50.440 Enclave code and then the attacker[br]executes their own code and based on the 0:37:50.440,0:37:55.460 outcome of the the victim code in that[br]cache, the attacker code may or may not 0:37:55.460,0:37:59.210 experience a branch prediction. So, the[br]attacker is able to tell the outcome of a 0:37:59.210,0:38:03.310 branch, because of this overlap in this[br]collision, like would be in a flush-and- 0:38:03.310,0:38:06.570 reload attack, where those memories[br]overlap between the attacker and the 0:38:06.570,0:38:14.020 victim. So here, our spatial resolution is[br]fantastic: We can tell down to individual 0:38:14.020,0:38:19.440 branch instructions in SGX; we can tell[br]exactly, which branches were executed and 0:38:19.440,0:38:25.010 which directions they were taken, in the[br]case of conditional branches. The temporal 0:38:25.010,0:38:29.720 resolution is also, once again, almost[br]unlimited, because we can use the same 0:38:29.720,0:38:33.880 timer interrupts in order to schedule our[br]process, our attacker process. And the 0:38:33.880,0:38:39.120 noise is, once again, very low, because we[br]can, once again, use the same sort of 0:38:39.120,0:38:43.980 branch misprediction counters, that exist[br]in the Intel world, in order to measure 0:38:43.980,0:38:51.510 this noise. So, does anything of that[br]apply to the TrustZone attacks? Well, in 0:38:51.510,0:38:55.040 this case the victim and attacker don't[br]share entries in the branch target buffer, 0:38:55.040,0:39:01.610 because the attacker is not able to map[br]the virtual address of the victim process. 0:39:01.610,0:39:05.340 But this is kind of reminiscent of our[br]earlier cache attacks, so our flush-and- 0:39:05.340,0:39:10.100 reload attack only worked when the attack[br]on the victim shared that memory, but we 0:39:10.100,0:39:13.930 still have the prime-and-probe attack for[br]when they don't. So, what if we use a 0:39:13.930,0:39:21.380 prime-and-probe-style attack on the branch[br]target buffer cache in ARM processors? So, 0:39:21.380,0:39:25.320 essentially what we do here is, we prime[br]the branch target buffer by executing mini 0:39:25.320,0:39:29.531 attacker branches to sort of fill up this[br]BTB cache with the attacker branch 0:39:29.531,0:39:34.770 prediction data; we let the victim execute[br]a branch which will evict an attacker BTB 0:39:34.770,0:39:39.120 entry; and then we have the attacker re-[br]execute those branches and see if there 0:39:39.120,0:39:45.120 have been any mispredictions. So now, the[br]cool thing about this attack is, the 0:39:45.120,0:39:50.320 structure of the BTB cache is different[br]from that of the L1 caches. So, instead of 0:39:50.320,0:39:59.750 having 256 different sets in the L1 cache,[br]the BTB cache has 2048 different sets, so 0:39:59.750,0:40:06.380 we can tell which branch it attacks, based[br]on which one of 2048 different set IDs 0:40:06.380,0:40:11.230 that it could fall into. And even more[br]than that, on the ARM platform, at least 0:40:11.230,0:40:15.730 on the Nexus 5x that I was working with,[br]the granularity is no longer 64 bytes, 0:40:15.730,0:40:21.830 which is the size of the line, it's now 16[br]bytes. So, we can see which branches the 0:40:21.830,0:40:27.620 the trusted code within TrustZone is[br]executing within 16 bytes. So, what does 0:40:27.620,0:40:31.820 this look like? So, previously with the[br]true-spy attack, this is sort of the 0:40:31.820,0:40:37.410 outcome of our prime-and-probe attack: We[br]get 1 measurement for those 256 different 0:40:37.410,0:40:43.420 set IDs. When we added those interrupts,[br]we're able to get that time resolution, 0:40:43.420,0:40:48.090 and it looks something like this. Now,[br]maybe you can see a little bit at the top 0:40:48.090,0:40:52.660 of the screen, how there's these repeated[br]sections of little white blocks, and you 0:40:52.660,0:40:56.720 can sort of use that to infer, maybe[br]there's the same cache line and cache 0:40:56.720,0:41:00.870 instructions that are called over and[br]over. So, just looking at this L1-I cache 0:41:00.870,0:41:06.920 attack, you can tell some information[br]about how the process went. Now, let's 0:41:06.920,0:41:11.870 compare that to the BTB attack. And I[br]don't know if you can see too clearly -- 0:41:11.870,0:41:17.190 it's a it's a bit too high of resolution[br]right now -- so let's just focus in on one 0:41:17.190,0:41:22.580 small part of this overall trace. And this[br]is what it looks like. So, each of those 0:41:22.580,0:41:27.720 white pixels represents a branch that was[br]taken by that secure-world code and we can 0:41:27.720,0:41:31.070 see repeated patterns, we can see maybe[br]different functions that were called, we 0:41:31.070,0:41:35.310 can see different loops. And just by[br]looking at this 1 trace, we can infer a 0:41:35.310,0:41:40.110 lot of information on how that secure[br]world executed. So, it's incredibly 0:41:40.110,0:41:44.230 powerful and all of those secrets are just[br]waiting to be uncovered using these new 0:41:44.230,0:41:52.890 tools. So, where do we go from here? What[br]sort of countermeasures do we have? Well, 0:41:52.890,0:41:56.690 first of all I think, the long term[br]solution is going to be moving to no more 0:41:56.690,0:42:00.200 shared hardware. We need to have separate[br]hardware and no more shared caches in 0:42:00.200,0:42:05.750 order to fully get rid of these different[br]cache attacks. And we've already seen this 0:42:05.750,0:42:11.420 trend in different cell phones. So, for[br]example, in Apple SSEs for a long time now 0:42:11.420,0:42:15.521 -- I think since the Apple A7 -- the[br]secure Enclave, which runs the secure 0:42:15.521,0:42:21.000 code, has its own cache. So, these cache[br]attacks can't be accomplished from code 0:42:21.000,0:42:27.400 outside of that secure Enclave. So, just[br]by using that separate hardware, it knocks 0:42:27.400,0:42:30.970 out a whole class of different potential[br]side-channel and microarchitecture 0:42:30.970,0:42:35.610 attacks. And just recently, the Pixel 2 is[br]moving in the same direction. The Pixel 2 0:42:35.610,0:42:40.540 now includes a hardware security module[br]that includes cryptographic operations; 0:42:40.540,0:42:45.890 and that chip also has its own memory and[br]its own caches, so now we can no longer 0:42:45.890,0:42:51.270 use this attack to extract information[br]about what's going on in this external 0:42:51.270,0:42:56.530 hardware security module. But even then,[br]using this separate hardware, that doesn't 0:42:56.530,0:43:00.800 solve all of our problems. Because we[br]still have the question of "What do we 0:43:00.800,0:43:05.900 include in this separate hardware?" On the[br]one hand, we want to include more code in 0:43:05.900,0:43:11.370 that a separate hardware, so we're less[br]vulnerable to these side-channel attacks, 0:43:11.370,0:43:16.490 but on the other hand, we don't want to[br]expand the attack surface anymore. Because 0:43:16.490,0:43:19.060 the more code we include in these secure[br]environments, the more like that a 0:43:19.060,0:43:22.600 vulnerabiliyy will be found and the[br]attacker will be able to get a foothold 0:43:22.600,0:43:26.470 within the secure, trusted environment.[br]So, there's going to be a balance between 0:43:26.470,0:43:30.270 what do you choose to include in the[br]separate hardware and what you don't. So, 0:43:30.270,0:43:35.220 do you include DRM code? Do you include[br]cryptographic code? It's still an open 0:43:35.220,0:43:41.800 question. And that's sort of the long-term[br]approach. In the short term, you just kind 0:43:41.800,0:43:46.370 of have to write side-channel-free[br]software: Just be very careful about what 0:43:46.370,0:43:50.811 your process does, if there are any[br]secret, dependent memory accesses or a 0:43:50.811,0:43:55.310 secret, dependent branching or secret,[br]dependent function calls, because any of 0:43:55.310,0:44:00.010 those can leak the secrets out of your[br]trusted execution environment. So, here 0:44:00.010,0:44:03.460 are the things that, if you are a[br]developer of trusted execution environment 0:44:03.460,0:44:08.150 code, that I want you to keep in mind:[br]First of all, performance is very often at 0:44:08.150,0:44:13.130 odds with security. We've seen over and[br]over that the performance enhancements to 0:44:13.130,0:44:18.880 these processors open up the ability for[br]these microarchitectural attacks to be 0:44:18.880,0:44:23.750 more efficient. Additionally, these[br]trusted execution environments don't 0:44:23.750,0:44:27.160 protect against everything; there are[br]still these side-channel attacks and these 0:44:27.160,0:44:32.310 microarchitectural attacks that these[br]systems are vulnerable to. These attacks 0:44:32.310,0:44:37.650 are very powerful; they can be[br]accomplished simply; and with the 0:44:37.650,0:44:41.770 publication of the code that I've written,[br]it should be very simple to get set up and 0:44:41.770,0:44:46.070 to analyze your own code to see "Am I[br]vulnerable, do I expose information in the 0:44:46.070,0:44:52.760 same way?" And lastly, it only takes 1[br]small error, 1 tiny leak from your trusted 0:44:52.760,0:44:56.670 and secure code, in order to extract the[br]entire secret, in order to bring the whole 0:44:56.670,0:45:03.920 thing down. So, what I want to leave you[br]with is: I want you to remember that you 0:45:03.920,0:45:08.520 are responsible for making sure that your[br]program is not vulnerable to these 0:45:08.520,0:45:13.110 microarchitectural attacks, because if you[br]do not take responsibility for this, who 0:45:13.110,0:45:16.645 will? Thank you! 0:45:16.645,0:45:25.040 Applause 0:45:25.040,0:45:29.821 Herald: Thank you very much. Please, if[br]you want to leave the hall, please do it 0:45:29.821,0:45:35.000 quiet and take all your belongings with[br]you and respect the speaker. We have 0:45:35.000,0:45:43.230 plenty of time, 16, 17 minutes for Q&A, so[br]please line up on the microphones. No 0:45:43.230,0:45:50.650 questions from the signal angel, all[br]right. So, we can start with microphone 6, 0:45:50.650,0:45:54.770 please.[br]Mic 6: Okay. There was a symbol of secure 0:45:54.770,0:46:01.160 OSes at the ARM TrustZone. Which a idea of[br]them if the non-secure OS gets all the 0:46:01.160,0:46:04.210 interrupts? What does is[br]the secure OS for? 0:46:04.210,0:46:08.880 Keegan: Yeah so, in the ARMv8 there are a[br]couple different kinds of interrupts. So, 0:46:08.880,0:46:11.760 I think -- if I'm remembering the[br]terminology correctly -- there is an IRQ 0:46:11.760,0:46:16.800 and an FIQ interrupt. So, the non-secure[br]mode handles the IRQ interrupts and the 0:46:16.800,0:46:20.440 secure mode handles the FIQ interrupts.[br]So, depending on which one you send, it 0:46:20.440,0:46:24.840 will depend on which direction that[br]monitor will direct that interrupt. 0:46:29.640,0:46:32.010 Mic 6: Thank you.[br]Herald: Okay, thank you. Microphone number 0:46:32.010,0:46:37.930 7, please.[br]Mic 7: Does any of your present attacks on 0:46:37.930,0:46:45.290 TrustZone also apply to the AMD[br]implementation of TrustZone or are you 0:46:45.290,0:46:48.380 looking into it?[br]Keegan: I haven't looked into AMD too 0:46:48.380,0:46:54.011 much, because, as far as I can tell,[br]that's not used as commonly, but there are 0:46:54.011,0:46:57.490 many different types of trusted execution[br]environments. The 2 that I focus on were 0:46:57.490,0:47:04.760 SGX and TrustZone, because those are the[br]most common examples that I've seen. 0:47:04.760,0:47:09.250 Herald: Thank you. Microphone[br]number 8, please. 0:47:09.250,0:47:20.370 Mic 8: When TrustZone is moved to[br]dedicated hardware, dedicated memory, 0:47:20.370,0:47:27.780 couldn't you replicate the userspace[br]attacks by loading your own trusted 0:47:27.780,0:47:32.210 userspace app and use it as an[br]oracle of some sorts? 0:47:32.210,0:47:35.760 Keegan: If you can load your own trust[br]code, then yes, you could do that. But in 0:47:35.760,0:47:39.650 many of the models I've seen today, that's[br]not possible. So, that's why you have 0:47:39.650,0:47:44.250 things like code signing, which prevent[br]the arbitrary user from running their own 0:47:44.250,0:47:50.310 code in the trusted OS... or in the the[br]trusted environment. 0:47:50.310,0:47:55.010 Herald: All right. Microphone number 1.[br]Mic 1: So, these attacks are more powerful 0:47:55.010,0:48:00.720 against code that's running in... just the[br]execution environments than similar 0:48:00.720,0:48:07.100 attacks would be against ring-3 code, or,[br]in general, trusted code. Does that mean 0:48:07.100,0:48:10.910 that trusting execution environments are[br]basically an attractive nuisance that we 0:48:10.910,0:48:15.080 shouldn't use?[br]Keegan: There's still a large benefit to 0:48:15.080,0:48:17.600 using these trusted execution[br]environments. The point I want to get 0:48:17.600,0:48:21.390 across is that, although they add a lot of[br]features, they don't protect against 0:48:21.390,0:48:25.450 everything, so you should keep in mind[br]that these side-channel attacks do still 0:48:25.450,0:48:28.820 exist and you still need to protect[br]against them. But overall, these are 0:48:28.820,0:48:35.930 better things and worthwhile in including.[br]Herald: Thank you. Microphone number 1 0:48:35.930,0:48:41.580 again, please[br]Mic 1: So, AMD is doing something with 0:48:41.580,0:48:47.780 encrypting memory and I'm not sure if they[br]encrypt addresses, too, and but would that 0:48:47.780,0:48:53.090 be a defense against such attacks?[br]Keegan: So, I'm not too familiar with AMD, 0:48:53.090,0:48:57.690 but SGX also encrypts memory. It encrypts[br]it in between the lowest-level cache and 0:48:57.690,0:49:02.170 the main memory. But that doesn't really[br]have an impact on the actual operation, 0:49:02.170,0:49:06.220 because the memories encrypt at the cache[br]line level and as the attacker, we don't 0:49:06.220,0:49:10.380 care what that data is within that cache[br]line, we only care which cache line is 0:49:10.380,0:49:16.150 being accessed.[br]Mic 1: If you encrypt addresses, wouldn't 0:49:16.150,0:49:20.551 that help against that?[br]Keegan: I'm not sure, how you would 0:49:20.551,0:49:25.070 encrypt the addresses yourself. As long as[br]those adresses map into the same set IDs 0:49:25.070,0:49:30.200 that the victim can map into, then the[br]victim could still pull off the same style 0:49:30.200,0:49:35.030 of attacks.[br]Herald: Great. We have a question from the 0:49:35.030,0:49:38.200 internet, please.[br]Signal Angel: The question is "Does the 0:49:38.200,0:49:42.410 secure enclave on the Samsung Exynos[br]distinguish the receiver of the messag, so 0:49:42.410,0:49:46.830 that if the user application asked to[br]decode an AES message, can one sniff on 0:49:46.830,0:49:52.220 the value that the secure[br]enclave returns?" 0:49:52.220,0:49:56.680 Keegan: So, that sounds like it's asking[br]about the true-spy style attack, where 0:49:56.680,0:50:01.270 it's calling to the secure world to[br]encrypt something with AES. I think, that 0:50:01.270,0:50:04.830 would all depend on the different[br]implementation: As long as it's encrypting 0:50:04.830,0:50:09.790 for a certain key and it's able to do that[br]repeatably, then the attack would, 0:50:09.790,0:50:16.290 assuming a vulnerable AES implementation,[br]would be able to extract that key out. 0:50:16.290,0:50:20.750 Herald: Cool. Microphone number 2, please.[br]Mic 2: Do you recommend a reference to 0:50:20.750,0:50:25.350 understand how these cache line attacks[br]and branch oracles actually lead to key 0:50:25.350,0:50:29.540 recovery?[br]Keegan: Yeah. So, I will flip through 0:50:29.540,0:50:33.620 these pages which include a lot of the[br]references for the attacks that I've 0:50:33.620,0:50:38.030 mentioned, so if you're watching the[br]video, you can see these right away or 0:50:38.030,0:50:43.200 just access the slides. And a lot of these[br]contain good starting points. So, I didn't 0:50:43.200,0:50:46.340 go into a lot of the details on how, for[br]example, the true-spy attack recovered 0:50:46.340,0:50:53.090 that AES key., but that paper does have a[br]lot of good links, how those areas can 0:50:53.090,0:50:56.350 lead to key recovery. Same thing with the[br]CLKSCREW attack, how the different fault 0:50:56.350,0:51:03.070 injection can lead to key recovery.[br]Herald: Microphone number 6, please. 0:51:03.070,0:51:07.900 Mic 6: I think my question might have been[br]very, almost the same thing: How hard is 0:51:07.900,0:51:11.920 it actually to recover the keys? Is this[br]like a massive machine learning problem or 0:51:11.920,0:51:18.500 is this something that you can do[br]practically on a single machine? 0:51:18.500,0:51:21.640 Keegan: It varies entirely by the end[br]implementation. So, for all these attacks 0:51:21.640,0:51:25.750 work, you need to have some sort of[br]vulnerable implementation and some 0:51:25.750,0:51:29.010 implementations leak more data than[br]others. In the case of a lot of the AES 0:51:29.010,0:51:33.880 attacks, where you're doing the passive[br]attacks, those are very easy to do on just 0:51:33.880,0:51:37.630 your own computer. For the AES fault[br]injection attack, I think that one 0:51:37.630,0:51:42.340 required more brute force, in the CLKSCREW[br]paper, so that one required more computing 0:51:42.340,0:51:49.780 resources, but still, it was entirely[br]practical to do in a realistic setting. 0:51:49.780,0:51:53.770 Herald: Cool, thank you. So, we have one[br]more: Microphone number 1, please. 0:51:53.770,0:51:59.080 Mic 1: So, I hope it's not a too naive[br]question, but I was wondering, since all 0:51:59.080,0:52:04.730 these attacks are based on cache hit and[br]misses, isn't it possible to forcibly 0:52:04.730,0:52:11.280 flush or invalidate or insert noise in[br]cache after each operation in this trust 0:52:11.280,0:52:23.520 environment, in order to mess up the[br]guesswork of the attacker? So, discarding 0:52:23.520,0:52:29.180 optimization and performance for[br]additional security benefits. 0:52:29.180,0:52:32.420 Keegan: Yeah, and that is absolutely[br]possible and you are absolutely right: It 0:52:32.420,0:52:36.300 does lead to a performance degradation,[br]because if you always flush the entire 0:52:36.300,0:52:41.190 cache every time you do a context switch,[br]that will be a huge performance hit. So 0:52:41.190,0:52:45.190 again, that comes down to the question of[br]the performance and security trade-off: 0:52:45.190,0:52:49.540 Which one do you end up going with? And it[br]seems historically the choice has been 0:52:49.540,0:52:54.000 more in the direction of performance.[br]Mic 1: Thank you. 0:52:54.000,0:52:56.920 Herald: But we have one more: Microphone[br]number 1, please. 0:52:56.920,0:53:01.500 Mic 1: So, I have more of a moral[br]question: So, how well should we really 0:53:01.500,0:53:07.720 protect from attacks which need some[br]ring-0 cooperation? Because, basically, 0:53:07.720,0:53:14.350 when we use TrustZone for purpose... we[br]would see clear, like protecting the 0:53:14.350,0:53:20.250 browser from interacting from outside[br]world, then we are basically using the 0:53:20.250,0:53:27.280 safe execution environment for sandboxing[br]the process. But once we need some 0:53:27.280,0:53:32.281 cooperation from the kernel, some of that[br]attacks, is in fact, empower the user 0:53:32.281,0:53:36.320 instead of the hardware producer.[br]Keegan: Yeah, and you're right. It 0:53:36.320,0:53:39.210 depends entirely on what your application[br]is and what your threat model is that 0:53:39.210,0:53:43.020 you're looking at. So, if you're using[br]these trusted execution environments to do 0:53:43.020,0:53:48.430 DRM, for example, then maybe you wouldn't[br]be worried about that ring-0 attack or 0:53:48.430,0:53:51.620 that privileged attacker who has their[br]phone rooted and is trying to recover 0:53:51.620,0:53:56.740 these media encryption keys from this[br]execution environment. But maybe there are 0:53:56.740,0:54:01.230 other scenarios where you're not as[br]worried about having an attack with a 0:54:01.230,0:54:05.580 compromised ring 0. So, it entirely[br]depends on context. 0:54:05.580,0:54:09.000 Herald: Alright, thank you. So, we have[br]one more: Microphone number 1, again. 0:54:09.000,0:54:10.990 Mic 1: Hey there. Great talk, thank you[br]very much. 0:54:10.990,0:54:13.040 Keegan: Thank you.[br]Mic 1: Just a short question: Do you have 0:54:13.040,0:54:16.980 any success stories about attacking the[br]TrustZone and the different 0:54:16.980,0:54:24.010 implementations of TE with some vendors[br]like some OEMs creating phones and stuff? 0:54:24.010,0:54:29.750 Keegan: Not that I'm announcing[br]at this time. 0:54:29.750,0:54:35.584 Herald: So, thank you very much. Please,[br]again a warm round of applause for Keegan! 0:54:35.584,0:54:39.998 Applause 0:54:39.998,0:54:45.489 34c3 postroll music 0:54:45.489,0:55:02.000 subtitles created by c3subtitles.de[br]in the year 2018. Join, and help us!