[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:03.13,0:00:07.36,Default,,0000,0000,0000,,{\i1}35C3 preroll music{\i0} Dialogue: 0,0:00:18.78,0:00:23.87,Default,,0000,0000,0000,,Herald: So the next talk Benjamin Kollenda\Nand Philipp Koppe - they will refresh our Dialogue: 0,0:00:23.87,0:00:30.53,Default,,0000,0000,0000,,memories because they already had a talk\Non 34C3 where they talked about the micro Dialogue: 0,0:00:30.53,0:00:37.58,Default,,0000,0000,0000,,code ROM and today they're gonna give us\Nmore insights on how micro code works. And Dialogue: 0,0:00:37.58,0:00:44.32,Default,,0000,0000,0000,,more details on the ROM itself. Benjamin\Nis a PhD student and has a focus on Dialogue: 0,0:00:44.32,0:00:51.28,Default,,0000,0000,0000,,software attacks and defenses and together\Nwith Phillip they will now abuse AMD Dialogue: 0,0:00:51.28,0:00:55.19,Default,,0000,0000,0000,,microcode for fun and security. Please\Nenjoy. Dialogue: 0,0:00:55.19,0:00:58.73,Default,,0000,0000,0000,,{\i1}Applause{\i0} Dialogue: 0,0:01:01.32,0:01:06.26,Default,,0000,0000,0000,,Benjamin: Thank you. So as mentioned we\Nwere able to reverse engineer the AMD Dialogue: 0,0:01:06.26,0:01:11.60,Default,,0000,0000,0000,,microcode and the AMD microcode ROM and\NI'm going to talk about our journey. What Dialogue: 0,0:01:11.60,0:01:16.37,Default,,0000,0000,0000,,we learned on the way and how we did it.\NSo this joint work with my colleagues at Dialogue: 0,0:01:16.37,0:01:20.80,Default,,0000,0000,0000,,Ruhr Universtat Bochum and a quick outline\Nhow are we going to do it. We're going to Dialogue: 0,0:01:20.80,0:01:25.38,Default,,0000,0000,0000,,start with a quick crash course on micro\Narchitectural basics and what microcode Dialogue: 0,0:01:25.38,0:01:28.35,Default,,0000,0000,0000,,actually is. Then I talk about how we\Nreconstructed the Dialogue: 0,0:01:28.35,0:01:30.33,Default,,0000,0000,0000,,microcode ROM and what we learned Dialogue: 0,0:01:30.33,0:01:35.39,Default,,0000,0000,0000,,along the way. Then I quickly give some\Nexamples of the applications we Dialogue: 0,0:01:35.39,0:01:41.43,Default,,0000,0000,0000,,implemented with the knowledge we gained\Nfrom second step. And lastly I talk about Dialogue: 0,0:01:41.43,0:01:47.65,Default,,0000,0000,0000,,a framework we used. How it works and what\Nwe can do with it. And also this framework Dialogue: 0,0:01:47.65,0:01:51.90,Default,,0000,0000,0000,,is available on GitHub along with some\Nother tools so you're free to continue our Dialogue: 0,0:01:51.90,0:01:57.19,Default,,0000,0000,0000,,work. OK. So when I'm talking about\Nmicrocode you can think of it essentially Dialogue: 0,0:01:57.19,0:02:02.33,Default,,0000,0000,0000,,as a firmware for your processor. It\Nhandles multiple purposes for example Dialogue: 0,0:02:02.33,0:02:06.44,Default,,0000,0000,0000,,you can use it to fix CPU bugs that you\Nhave in silicon and you want to fix later Dialogue: 0,0:02:06.44,0:02:11.97,Default,,0000,0000,0000,,in the design phase. It is used for\Ninstruction decoding - I cover this one a Dialogue: 0,0:02:11.97,0:02:17.97,Default,,0000,0000,0000,,bit more. It is also used for exception\Nhandling. For example, if an exception or Dialogue: 0,0:02:17.97,0:02:22.20,Default,,0000,0000,0000,,interrupt is raised, microcode has a first\Nchance of modifying this interrupt Dialogue: 0,0:02:22.20,0:02:27.11,Default,,0000,0000,0000,,ignoring it or just passing it along to\Nthe operating system. It's also used for Dialogue: 0,0:02:27.11,0:02:31.79,Default,,0000,0000,0000,,power management and some other complex\Nfeatures like Intel SGX. And most Dialogue: 0,0:02:31.79,0:02:37.32,Default,,0000,0000,0000,,importantly for us microcode is updatable.\NThis used to patch errors in the field. Dialogue: 0,0:02:37.32,0:02:40.98,Default,,0000,0000,0000,,Everyone remembers Spectre / Meltdown\Npatches and there's Dialogue: 0,0:02:40.98,0:02:44.21,Default,,0000,0000,0000,,a microcode update. So your Dialogue: 0,0:02:44.21,0:02:50.83,Default,,0000,0000,0000,,x86 CPU takes multiple steps to execute an\Ninstruction. The first step is decoding Dialogue: 0,0:02:50.83,0:02:55.02,Default,,0000,0000,0000,,a x86 instruction into multiple smaller\Nmicro ops. Dialogue: 0,0:02:55.02,0:02:57.15,Default,,0000,0000,0000,,These are then scheduled into the pipeline Dialogue: 0,0:02:57.15,0:03:01.63,Default,,0000,0000,0000,,From there, they are dispatched to\Nthe different functional units Dialogue: 0,0:03:01.63,0:03:03.53,Default,,0000,0000,0000,,like your ALU / AGU Dialogue: 0,0:03:03.53,0:03:06.39,Default,,0000,0000,0000,,multiplication division units Dialogue: 0,0:03:06.39,0:03:08.36,Default,,0000,0000,0000,,For our purposes the decode step is the Dialogue: 0,0:03:08.36,0:03:12.19,Default,,0000,0000,0000,,most interesting one. In the decode step\Nyou have a instruction buffer that feeds Dialogue: 0,0:03:12.19,0:03:17.03,Default,,0000,0000,0000,,instructions to some decoders. You have\Nshort decoders that handle really simple Dialogue: 0,0:03:17.03,0:03:21.10,Default,,0000,0000,0000,,instructions. There are long decoders that\Ncan handle some more advance instructions. Dialogue: 0,0:03:21.10,0:03:25.26,Default,,0000,0000,0000,,And finally, the vector decoder. The\Nvector decoder handles the most complex Dialogue: 0,0:03:25.26,0:03:29.69,Default,,0000,0000,0000,,instructions with the help of microcode.\NSo the microcode engine is essentially the Dialogue: 0,0:03:29.69,0:03:31.25,Default,,0000,0000,0000,,vector decoder. Dialogue: 0,0:03:32.46,0:03:36.57,Default,,0000,0000,0000,,The Microcode engine in essence\Nis compromised out of a microcode Dialogue: 0,0:03:36.57,0:03:40.77,Default,,0000,0000,0000,,ROM that stores the instructions for the\Nmicrocode engine. Think of it as your Dialogue: 0,0:03:40.77,0:03:48.19,Default,,0000,0000,0000,,standard instructions. Then there is also\Na writeable memory the microcode RAM. This Dialogue: 0,0:03:48.19,0:03:52.52,Default,,0000,0000,0000,,is where the microcode updates end up when\Nyou apply microcode updates. And of course Dialogue: 0,0:03:52.52,0:03:57.31,Default,,0000,0000,0000,,around the storage has a whole lot of\Nthings that make it actually run. For this Dialogue: 0,0:03:57.31,0:04:00.86,Default,,0000,0000,0000,,talk, you only need to know what is a\NMatch Registers. Match Registers are Dialogue: 0,0:04:00.86,0:04:05.65,Default,,0000,0000,0000,,essentially breakpoint registers. So if we\Nwrite an address from inside the microcode Dialogue: 0,0:04:05.65,0:04:10.67,Default,,0000,0000,0000,,ROM inside a Match Register whenever this\Naddress is fetched, execution, control is Dialogue: 0,0:04:10.67,0:04:17.57,Default,,0000,0000,0000,,transferred to the microcode RAM so our\Npatch gets executed. And the microcode Dialogue: 0,0:04:17.57,0:04:23.06,Default,,0000,0000,0000,,updates are usually loaded by the BIOS or\Nby the kernel. Linux has an update driver, Dialogue: 0,0:04:23.06,0:04:28.34,Default,,0000,0000,0000,,sometimes the BIOS updates it with a\Npre-installed version and they have a Dialogue: 0,0:04:28.34,0:04:32.12,Default,,0000,0000,0000,,pretty simple structure, a partially\Ndocumented header, and followed by the Dialogue: 0,0:04:32.12,0:04:37.73,Default,,0000,0000,0000,,actual microcode that is loaded inside the\NCPU. And so microcode is organized in Dialogue: 0,0:04:37.73,0:04:42.65,Default,,0000,0000,0000,,something called triads. Each triad has\Nthree operations essentially x86 Dialogue: 0,0:04:42.65,0:04:48.23,Default,,0000,0000,0000,,instructions, but based on differences.\NAnd lastly, you have a sequence word. The Dialogue: 0,0:04:48.23,0:04:52.02,Default,,0000,0000,0000,,sequence word indicates which microcode\Ninstructions should be executed next. We Dialogue: 0,0:04:52.02,0:04:57.95,Default,,0000,0000,0000,,have options of executing just the next\Ntriad, executing another one by branching Dialogue: 0,0:04:57.95,0:05:01.94,Default,,0000,0000,0000,,to it, or just saying OK, I'm done with\Ndecoding this instruction continue with Dialogue: 0,0:05:01.94,0:05:07.49,Default,,0000,0000,0000,,x86 code. These updates are protected by\Nsome weak authentication which we were Dialogue: 0,0:05:07.49,0:05:13.26,Default,,0000,0000,0000,,able to break so we can create our own. We\Ncan analyze existing ones and we can apply Dialogue: 0,0:05:13.26,0:05:20.62,Default,,0000,0000,0000,,these to your standard laptop and desktop.\NHowever there can only ever be one update Dialogue: 0,0:05:20.62,0:05:26.53,Default,,0000,0000,0000,,loaded at the time and when you reboot\Nyour machine this update will be gone. Dialogue: 0,0:05:28.49,0:05:32.99,Default,,0000,0000,0000,,Also for the talk we are going to look at\Nsome microcode and we will present this Dialogue: 0,0:05:32.99,0:05:38.15,Default,,0000,0000,0000,,microcode using a register transfer\Nlanguage. It is heavily based on x86. I'm Dialogue: 0,0:05:38.15,0:05:43.29,Default,,0000,0000,0000,,just going to cover the differences\Nbetween these two. Most importantly the Dialogue: 0,0:05:43.29,0:05:48.65,Default,,0000,0000,0000,,microcode can have three operands for an\Ninstruction in comparison to x86 which Dialogue: 0,0:05:48.65,0:05:53.64,Default,,0000,0000,0000,,usually only has two. So you can specify a\Ndestination and two source operands. Dialogue: 0,0:05:55.62,0:05:56.45,Default,,0000,0000,0000,,Also, Dialogue: 0,0:05:57.21,0:06:02.24,Default,,0000,0000,0000,,microcode has some certain bit flags that\Nneed to be set and these we do we see with Dialogue: 0,0:06:02.24,0:06:07.45,Default,,0000,0000,0000,,these annotations for example ".C" means\Nsays instruction also updates a carry flag Dialogue: 0,0:06:07.45,0:06:14.05,Default,,0000,0000,0000,,based on the result. Then you have the\Ninstruction "jcc" which is a conditional Dialogue: 0,0:06:14.05,0:06:19.57,Default,,0000,0000,0000,,branch and the first operand denotes the\Ncondition up on which this branch is Dialogue: 0,0:06:19.57,0:06:24.10,Default,,0000,0000,0000,,taken. In this case branch if the carry\Nflag is one and [the] second operand Dialogue: 0,0:06:24.10,0:06:30.30,Default,,0000,0000,0000,,indicates the offset to add to the\Ninstruction pointer. Then we also have Dialogue: 0,0:06:30.30,0:06:35.76,Default,,0000,0000,0000,,some sequence word annotations: "next",\N"complete", and "branch". Also it should Dialogue: 0,0:06:35.76,0:06:39.96,Default,,0000,0000,0000,,be noted that the internal microcode\Narchitecture is a load-store architecture. Dialogue: 0,0:06:39.96,0:06:45.35,Default,,0000,0000,0000,,You can't use memory operands in other\Ninstructions like you can on x86 you Dialogue: 0,0:06:45.35,0:06:48.31,Default,,0000,0000,0000,,always need to load and store memory\Nexplicitly. Dialogue: 0,0:06:49.19,0:06:51.71,Default,,0000,0000,0000,,Now we are going to talk about Dialogue: 0,0:06:51.71,0:06:58.71,Default,,0000,0000,0000,,how we manage to recover the microcode\NROM. The microcode ROM is baked into your Dialogue: 0,0:06:58.71,0:07:06.86,Default,,0000,0000,0000,,CPU, you can't change it anymore. It is\Ndefined in the silicon during the Dialogue: 0,0:07:06.86,0:07:12.93,Default,,0000,0000,0000,,fabrication process and in this picture\Nyou can see a die shot taken with a Dialogue: 0,0:07:12.93,0:07:16.84,Default,,0000,0000,0000,,electron microscope and this is one of\Nthree regions that contains the bits for Dialogue: 0,0:07:16.84,0:07:23.24,Default,,0000,0000,0000,,the microcode operations. And if you zoom\Nin a bit more, each of these regions Dialogue: 0,0:07:23.24,0:07:30.05,Default,,0000,0000,0000,,consist out of four arrays and these are\Nfurther subdivided into blocks. Really Dialogue: 0,0:07:30.05,0:07:34.66,Default,,0000,0000,0000,,interesting is "Array 2" which is a bit\Nsmaller than the other ones but it has Dialogue: 0,0:07:34.66,0:07:42.16,Default,,0000,0000,0000,,some structures above it which are of a\Ndifferent visual layout. This is SRAM Dialogue: 0,0:07:42.16,0:07:47.05,Default,,0000,0000,0000,,which stores the microcode update. So this\Nis one-time reprogrammable memory that is Dialogue: 0,0:07:47.05,0:07:53.86,Default,,0000,0000,0000,,still pretty fast. So the microcode RAM is\Nlocated right next to the microcode ROM Dialogue: 0,0:07:53.86,0:07:57.64,Default,,0000,0000,0000,,which also makes sense from a design\Nstandpoint. Dialogue: 0,0:08:00.44,0:08:02.01,Default,,0000,0000,0000,,Just an overview of how we Dialogue: 0,0:08:02.01,0:08:06.93,Default,,0000,0000,0000,,went ahead and how we went about. We\Nstarted with pictures and then we used Dialogue: 0,0:08:06.93,0:08:11.46,Default,,0000,0000,0000,,some OCR-ike process to transform them\Ninto bit strings which we can then further Dialogue: 0,0:08:11.46,0:08:17.17,Default,,0000,0000,0000,,process. These bitstrings were then\Narranged into triads. We could already Dialogue: 0,0:08:17.17,0:08:22.05,Default,,0000,0000,0000,,gather that we got individual triades\Nright because there were data dependencies Dialogue: 0,0:08:22.05,0:08:27.55,Default,,0000,0000,0000,,all over the place, but between triads,\Nthere were no or very few data Dialogue: 0,0:08:27.55,0:08:33.70,Default,,0000,0000,0000,,dependencies so the ordering of the\Ntriades was still wrong and this was a Dialogue: 0,0:08:33.70,0:08:38.86,Default,,0000,0000,0000,,major problem when we went ahead and what\Nwe had to reverse engineer and this is Dialogue: 0,0:08:38.86,0:08:43.87,Default,,0000,0000,0000,,mapping a certain physical address of a\Ntriad that we gathered from the ROM Dialogue: 0,0:08:43.87,0:08:48.05,Default,,0000,0000,0000,,readout to a virtual address that is used\Ninside the microcode update or the Dialogue: 0,0:08:48.05,0:08:53.69,Default,,0000,0000,0000,,microcode ROM. But after reverse engineer\Nthis, you can just do a linear sweep Dialogue: 0,0:08:53.69,0:08:59.02,Default,,0000,0000,0000,,disassembly of the microcode ROM and\Narrive at human readable output. But this Dialogue: 0,0:08:59.02,0:09:04.87,Default,,0000,0000,0000,,recovery was a bit tricky because we\Nrequired physical virtual address pairs. Dialogue: 0,0:09:04.87,0:09:09.52,Default,,0000,0000,0000,,But gathering these is a bit harder\Nbecause we worked there through the Dialogue: 0,0:09:09.52,0:09:14.04,Default,,0000,0000,0000,,available updates, but we could only find\Ntwo pairs of them. These pairs were Dialogue: 0,0:09:14.04,0:09:18.52,Default,,0000,0000,0000,,actually easy to find because every update\Nreplaces a certain triad inside your Dialogue: 0,0:09:18.52,0:09:24.58,Default,,0000,0000,0000,,microcode ROM and this triad is usually\Nalso placed in the microcode update. So by Dialogue: 0,0:09:24.58,0:09:31.26,Default,,0000,0000,0000,,matching the address this update replaces\Nwith a microcode ROM readout. You can just Dialogue: 0,0:09:31.26,0:09:38.00,Default,,0000,0000,0000,,get your two data points. But we had to\Nget more data points so we generated these Dialogue: 0,0:09:38.00,0:09:42.63,Default,,0000,0000,0000,,mappings by matching semantics of triads\Nin the microcode ROM readout and the Dialogue: 0,0:09:42.63,0:09:47.78,Default,,0000,0000,0000,,semantics when we force execution of a\Ncertain microcode address. And gathering Dialogue: 0,0:09:47.78,0:09:52.33,Default,,0000,0000,0000,,the semantics of the read-out microcode,\Nwe implemented a simple microcode Dialogue: 0,0:09:52.33,0:09:58.82,Default,,0000,0000,0000,,simulator. Essentially it works on triad\Nlevel, so you give it an input state and a Dialogue: 0,0:09:58.82,0:10:03.43,Default,,0000,0000,0000,,triad and it calculates the output state\Nof it. Input and output state are Dialogue: 0,0:10:03.43,0:10:08.46,Default,,0000,0000,0000,,comprised out of the x86-state which is\Nyour standard registers and also the Dialogue: 0,0:10:08.46,0:10:12.32,Default,,0000,0000,0000,,internal microcode registers. There are\Nmultiple temporary registers that get Dialogue: 0,0:10:12.32,0:10:18.35,Default,,0000,0000,0000,,reset for every new x86 instruction that\Nis executed, but they can also be modified Dialogue: 0,0:10:18.35,0:10:24.13,Default,,0000,0000,0000,,by microcode of course. Our emulator\Nsupports all known arithmetic operations Dialogue: 0,0:10:24.13,0:10:29.23,Default,,0000,0000,0000,,and we have a white-list of operations\Nthat do not form or produce any observable Dialogue: 0,0:10:29.23,0:10:32.95,Default,,0000,0000,0000,,change in state just so that we could\Nprocess more triades and give them more Dialogue: 0,0:10:32.95,0:10:41.31,Default,,0000,0000,0000,,data points. In total we gathered 54\Nadditional data-address pairs which turned Dialogue: 0,0:10:41.31,0:10:46.65,Default,,0000,0000,0000,,out to be enough to recover the whole\Nmapping. This mapping, essentially you Dialogue: 0,0:10:46.65,0:10:50.82,Default,,0000,0000,0000,,have the four different arrays that map to\Nindividual blocks and these blocks in Dialogue: 0,0:10:50.82,0:10:56.75,Default,,0000,0000,0000,,these arrays or then again permuted a bit\Nand then the triads inside these blocks Dialogue: 0,0:10:56.75,0:11:02.33,Default,,0000,0000,0000,,have some table-based permutations. So\Nthis is not an obfuscation. This is just Dialogue: 0,0:11:02.33,0:11:07.68,Default,,0000,0000,0000,,from a hardware design standpoint it can\Nmake sense to reroute it a bit differently Dialogue: 0,0:11:09.33,0:11:14.63,Default,,0000,0000,0000,,Also now that we can actually\Nmap a certain address to the microcode ROM Dialogue: 0,0:11:14.63,0:11:19.09,Default,,0000,0000,0000,,readout and we know the addresses of\Ndifferent x86 instructions from our Dialogue: 0,0:11:19.09,0:11:24.24,Default,,0000,0000,0000,,earlier experiments, we can look at the\Nimplementation of instructions. So let's Dialogue: 0,0:11:24.24,0:11:29.13,Default,,0000,0000,0000,,start with a pretty simple one. Shift-\NRight-Double which essentially takes a Dialogue: 0,0:11:29.13,0:11:33.25,Default,,0000,0000,0000,,register, shift it by a given amount and\Nshifts in bits from another register. So Dialogue: 0,0:11:33.25,0:11:38.18,Default,,0000,0000,0000,,of course you would expect a lot of shifts\Nand rolls in its implementation and this Dialogue: 0,0:11:38.18,0:11:45.34,Default,,0000,0000,0000,,is exactly what we're seeing here. You\Nhave two shift-right operands and you can Dialogue: 0,0:11:45.34,0:11:50.83,Default,,0000,0000,0000,,see regmd6 and regmd4. These are\Nplace holders. The microcode engine can Dialogue: 0,0:11:50.83,0:11:55.63,Default,,0000,0000,0000,,replace certain bit combinations with the\Nregisters that are used in the x86 Dialogue: 0,0:11:55.63,0:12:01.56,Default,,0000,0000,0000,,operation. For example this one would be\Nreplaced by ECX or EAX depending on what Dialogue: 0,0:12:01.56,0:12:08.34,Default,,0000,0000,0000,,you wrote in x86. And at this point we can\Nalso already gather more information about Dialogue: 0,0:12:08.34,0:12:13.60,Default,,0000,0000,0000,,microcodes than we previously knew because\Nwe know "OK, so this is source, this is Dialogue: 0,0:12:13.60,0:12:18.53,Default,,0000,0000,0000,,also a source and this is a destination".\NBut this source which indicates the shift Dialogue: 0,0:12:18.53,0:12:22.75,Default,,0000,0000,0000,,amount, this one was previously unknown,\Nbecause it is a high temporary microcode Dialogue: 0,0:12:22.75,0:12:28.28,Default,,0000,0000,0000,,register and we found out that these\Nusually implement specific different Dialogue: 0,0:12:28.28,0:12:31.80,Default,,0000,0000,0000,,purpose. They are not - if you write to\Nthem, sometimes the CPU behaves Dialogue: 0,0:12:31.80,0:12:35.89,Default,,0000,0000,0000,,erratically, sometimes it crashes,\Nsometimes nothing happens. But in this Dialogue: 0,0:12:35.89,0:12:40.30,Default,,0000,0000,0000,,case, this seems to be the shift count,\Nand the shift count is given by a third Dialogue: 0,0:12:40.30,0:12:45.28,Default,,0000,0000,0000,,operand in the instruction. So in this\Ncase, we already learned "OK, if you want Dialogue: 0,0:12:45.28,0:12:51.38,Default,,0000,0000,0000,,to read the third operand of an\Ninstruction, we need to read t41". And Dialogue: 0,0:12:51.38,0:12:56.24,Default,,0000,0000,0000,,this is how we went about recovering more\Nand more information about microcode. The Dialogue: 0,0:12:56.24,0:13:00.16,Default,,0000,0000,0000,,rest of the implementation is essentially\Nconcerned with implementing the rest of Dialogue: 0,0:13:00.16,0:13:05.72,Default,,0000,0000,0000,,the semantics of the x86 instruction and\Nupdating the flags correctly. OK, so now Dialogue: 0,0:13:05.72,0:13:11.98,Default,,0000,0000,0000,,let's look at a instruction set that is a\Nbit more complicated. If you check out Dialogue: 0,0:13:11.98,0:13:19.62,Default,,0000,0000,0000,,rdtsc. rdtsc returns a internal cycle\Ncounter in EDX and EAX, so the upper part Dialogue: 0,0:13:19.62,0:13:25.52,Default,,0000,0000,0000,,ends up in EDX, lower part in EAX. So in\Nthe end we want to see writes to these Dialogue: 0,0:13:25.52,0:13:30.76,Default,,0000,0000,0000,,registers, potentially with a shift\Nsomewhere in there. But somewhere the CPU Dialogue: 0,0:13:30.76,0:13:37.57,Default,,0000,0000,0000,,needs to gather the cycle counter. So in\Nthe beginning we have two load-style Dialogue: 0,0:13:37.57,0:13:41.41,Default,,0000,0000,0000,,operations. This one is a proper load\Nwhich we identified and this one is Dialogue: 0,0:13:41.41,0:13:48.57,Default,,0000,0000,0000,,unknown. But despite that we do not know\Nthe instruction, we know the target Dialogue: 0,0:13:48.57,0:13:52.72,Default,,0000,0000,0000,,because the result of this instruction\Nwill end up in t9 and the result of this Dialogue: 0,0:13:52.72,0:13:58.06,Default,,0000,0000,0000,,instruction will end up in t10, so we can\Nfollow the uses of these two registers. So Dialogue: 0,0:13:58.06,0:14:04.45,Default,,0000,0000,0000,,for simplicity I'm going to start with t10\Nand t10, which we later found out, this is Dialogue: 0,0:14:04.45,0:14:09.73,Default,,0000,0000,0000,,another register which essentially denotes\Na specific internal register. And if you Dialogue: 0,0:14:09.73,0:14:15.45,Default,,0000,0000,0000,,play around with these bits you notice\Nthat this combination encodes cr4. The x86 Dialogue: 0,0:14:15.45,0:14:22.99,Default,,0000,0000,0000,,will just see cr4. You can also address\Ncr1 and cr2. And if you look further, t10 Dialogue: 0,0:14:22.99,0:14:29.16,Default,,0000,0000,0000,,is then ended with this bit mask and if\Nyou look in the manual you find out that Dialogue: 0,0:14:29.16,0:14:34.93,Default,,0000,0000,0000,,this bit in cr4 denotes the bit that\Ndetermines whether oddity C is Dialogue: 0,0:14:34.93,0:14:40.02,Default,,0000,0000,0000,,available from user space or not. So this\Nis the check if this instruction should be Dialogue: 0,0:14:40.02,0:14:48.17,Default,,0000,0000,0000,,executed. So now let's just keep in mind\Nthat t9 holds some other loaded value from Dialogue: 0,0:14:48.17,0:14:53.93,Default,,0000,0000,0000,,some other internal register and we will\Ncome back to this one a bit later. For Dialogue: 0,0:14:53.93,0:14:58.85,Default,,0000,0000,0000,,now, let's follow execution. This triad is\Nessentially a padding triad. It is a Dialogue: 0,0:14:58.85,0:15:04.88,Default,,0000,0000,0000,,common pattern we see. So let's look at\Nwhere this branch takes us. Dialogue: 0,0:15:05.90,0:15:07.18,Default,,0000,0000,0000,,And this branch Dialogue: 0,0:15:07.18,0:15:15.96,Default,,0000,0000,0000,,takes us to a conditional branch\Ntriad. And if you look a bit up, this end Dialogue: 0,0:15:15.96,0:15:21.74,Default,,0000,0000,0000,,instruction actually updated this flag. So\Nthis is a conditional branch that Dialogue: 0,0:15:21.74,0:15:26.36,Default,,0000,0000,0000,,determines whether this check was\Nsuccessful or not. So it branches toward Dialogue: 0,0:15:26.36,0:15:32.57,Default,,0000,0000,0000,,the error triad or the success triad. But\Nhere we already see the exit. We see a Dialogue: 0,0:15:32.57,0:15:41.17,Default,,0000,0000,0000,,write to RDX or EDX in this case with a\Nshift from t9 by 32 bit, which is exactly Dialogue: 0,0:15:41.17,0:15:45.91,Default,,0000,0000,0000,,what you would expect to write the time\Nstamp counter on the upper 32 bits of the Dialogue: 0,0:15:45.91,0:15:50.83,Default,,0000,0000,0000,,time stamp counter to edx. And you have an\Nunknown instruction, but we know, okay, we Dialogue: 0,0:15:50.83,0:15:57.88,Default,,0000,0000,0000,,move something from t9 to eax, which is\Nthe lower 32 bits. But we're not done Dialogue: 0,0:15:57.88,0:16:02.69,Default,,0000,0000,0000,,here, because we can still look at the\Nerror pass that is taken if the access is Dialogue: 0,0:16:02.69,0:16:09.21,Default,,0000,0000,0000,,denied. So if you scroll a bit down we can\Nsee a move of an immediate into a certain Dialogue: 0,0:16:09.21,0:16:14.53,Default,,0000,0000,0000,,internal register. And this is immediate\Nactually encodes a general protection Dialogue: 0,0:16:14.53,0:16:21.79,Default,,0000,0000,0000,,fault interrupt code. D denotes to the\Nexception handler that this was a general Dialogue: 0,0:16:21.79,0:16:28.68,Default,,0000,0000,0000,,protection fault. And later this triad\Nbranches to this address, and if you look Dialogue: 0,0:16:28.68,0:16:34.01,Default,,0000,0000,0000,,at the uses of this address we can find\Nother immediates that also correspond on Dialogue: 0,0:16:34.01,0:16:36.96,Default,,0000,0000,0000,,to x86 instructions. So now we learned Dialogue: 0,0:16:36.96,0:16:39.95,Default,,0000,0000,0000,,how we can actually raise our\Nown interrupts. We Dialogue: 0,0:16:39.95,0:16:46.10,Default,,0000,0000,0000,,just need to load the code we want into\Nthe specific register and branch to this Dialogue: 0,0:16:46.10,0:16:52.82,Default,,0000,0000,0000,,address. And now we learned a lot about\Nhow we can actually write microcode, but Dialogue: 0,0:16:52.82,0:16:57.00,Default,,0000,0000,0000,,it's also interesting to see how certain\Ninstructions are implemented. So let's Dialogue: 0,0:16:57.00,0:17:03.67,Default,,0000,0000,0000,,look at a pretty complicated one: wrmsr\N(Write MSR). wrmsr essentially writes some Dialogue: 0,0:17:03.67,0:17:08.45,Default,,0000,0000,0000,,data it is given to a machine specific\Nregister. This machine specific register Dialogue: 0,0:17:08.45,0:17:12.98,Default,,0000,0000,0000,,differs between CPUs, between vendors,\Nsometimes between revisions. And these Dialogue: 0,0:17:12.98,0:17:17.91,Default,,0000,0000,0000,,implement non-standard extensions or\Npretty complex features. For example, you Dialogue: 0,0:17:17.91,0:17:23.95,Default,,0000,0000,0000,,trigger a microcode update by writing to a\Nmachine specific register. The register Dialogue: 0,0:17:23.95,0:17:30.57,Default,,0000,0000,0000,,addresses you want to write to is given in\Necx. And now we can see ecx is read and Dialogue: 0,0:17:30.57,0:17:39.68,Default,,0000,0000,0000,,it is shifted by sixteen bits to t10. So\Nagain, we follow uses of t10 and we see Dialogue: 0,0:17:39.68,0:17:46.07,Default,,0000,0000,0000,,it as XOR'd with a certain bitmask. And\Nthis bitmask is C000, which actually Dialogue: 0,0:17:46.07,0:17:52.43,Default,,0000,0000,0000,,denotes a namespace of the model specific\Nregisters. In this case this should be an Dialogue: 0,0:17:52.43,0:17:58.45,Default,,0000,0000,0000,,AMD-specific namespace. And, of course,\Nthis one again sets some flags, and you Dialogue: 0,0:17:58.45,0:18:04.24,Default,,0000,0000,0000,,can see your conditional branch depending\Non these flags to what should be the Dialogue: 0,0:18:04.24,0:18:06.24,Default,,0000,0000,0000,,handler for this namespace. Dialogue: 0,0:18:06.70,0:18:10.77,Default,,0000,0000,0000,,Next one: We have another XOR\Nthat uses a different bit Dialogue: 0,0:18:10.77,0:18:16.89,Default,,0000,0000,0000,,mask — in this case C001. C001 is the\Nnamespace where the microcode update Dialogue: 0,0:18:16.89,0:18:25.05,Default,,0000,0000,0000,,routine is actually located in. So again,\Nwe branch to this handler. And if you just Dialogue: 0,0:18:25.05,0:18:31.01,Default,,0000,0000,0000,,continue on, there are more operations on\Nrcx, followed by more branches, and this Dialogue: 0,0:18:31.01,0:18:35.79,Default,,0000,0000,0000,,continues until everything is dispatched\Nto the correct handler. And this is how, Dialogue: 0,0:18:35.79,0:18:40.34,Default,,0000,0000,0000,,internally, wrmsr is implemented, and also\NRead MSR is going to be implemented pretty Dialogue: 0,0:18:40.34,0:18:43.64,Default,,0000,0000,0000,,similar, because it implements some kind\Nof similar thing. Dialogue: 0,0:18:47.75,0:18:49.19,Default,,0000,0000,0000,,OK, so now I showed you Dialogue: 0,0:18:49.19,0:18:52.47,Default,,0000,0000,0000,,how we actually went ahead of\Nreconstructing the knowledge we Dialogue: 0,0:18:52.47,0:18:57.94,Default,,0000,0000,0000,,currently have. And now I'm going to show\Nyou what we can actually do with it. And Dialogue: 0,0:18:57.94,0:19:02.44,Default,,0000,0000,0000,,for this I am going to quickly cover what\Napplications we wrote in microcode. We Dialogue: 0,0:19:02.44,0:19:04.94,Default,,0000,0000,0000,,wrote a simple configurable\Nrdtsc precision. Dialogue: 0,0:19:04.94,0:19:07.71,Default,,0000,0000,0000,,This means a certain bit mask is AND'd to Dialogue: 0,0:19:07.71,0:19:11.89,Default,,0000,0000,0000,,the result of rdtsc, so you can\Nreduce the accuracy of it, which can Dialogue: 0,0:19:11.89,0:19:18.28,Default,,0000,0000,0000,,sometimes prevent timing attacks. We also\Nimplemented microcode-assisted address Dialogue: 0,0:19:18.28,0:19:23.26,Default,,0000,0000,0000,,sanitizer, which I'll cover quickly in a\Nsecond. We also have some basic microcode Dialogue: 0,0:19:23.26,0:19:29.07,Default,,0000,0000,0000,,instruction set randomization. Some\Nmicrocode-assisted instrumentation. What Dialogue: 0,0:19:29.07,0:19:33.52,Default,,0000,0000,0000,,this means is, you can write a filter for\Nyour instrumentation in microcode itself. Dialogue: 0,0:19:33.52,0:19:37.58,Default,,0000,0000,0000,,So instead of hooking an instruction,\Ninstead of debugging your code or Dialogue: 0,0:19:37.58,0:19:42.16,Default,,0000,0000,0000,,emulating it, you can just say whenever\Nthe instruction is executed filter if this Dialogue: 0,0:19:42.16,0:19:47.18,Default,,0000,0000,0000,,is relevant for me, and if it is, call my\Nx86 handler — entirely in microcode, Dialogue: 0,0:19:47.18,0:19:52.47,Default,,0000,0000,0000,,without changing the instruction in the\NRAM. We also implemented some basic Dialogue: 0,0:19:52.47,0:20:00.00,Default,,0000,0000,0000,,authenticated microcode updates. The usual\Nupdate mechanism is weak — that's how we Dialogue: 0,0:20:00.00,0:20:05.43,Default,,0000,0000,0000,,got our foot in the door in the first\Nplace. So we improved upon it a bit. Also Dialogue: 0,0:20:05.43,0:20:09.80,Default,,0000,0000,0000,,we found out that microcode actually has\Nsome enclave-like features because once Dialogue: 0,0:20:09.80,0:20:13.73,Default,,0000,0000,0000,,we're executing in Microcode, your kernel\Ncan't interupt you, your hypervisor can't Dialogue: 0,0:20:13.73,0:20:18.61,Default,,0000,0000,0000,,interrupt you and any state you want\Nvisible to the outside world. You actually Dialogue: 0,0:20:18.61,0:20:22.84,Default,,0000,0000,0000,,need to write explicitly. So all these\Nmicrocode internal registers are not Dialogue: 0,0:20:22.84,0:20:26.60,Default,,0000,0000,0000,,accessible from the outside world. So any\Ncomputation you perform in micro code Dialogue: 0,0:20:26.60,0:20:30.36,Default,,0000,0000,0000,,cannot be interfered with. So you can\Nimplement a simple enclave on top of this Dialogue: 0,0:20:30.36,0:20:37.04,Default,,0000,0000,0000,,one. So our hardware-assisted address\Nsanitizer variant is based on the work by Dialogue: 0,0:20:37.04,0:20:41.97,Default,,0000,0000,0000,,the original authors and address sanitizer\Nis a software instrumentation that detects Dialogue: 0,0:20:41.97,0:20:47.07,Default,,0000,0000,0000,,invalid memory access by using a shadow\Nmap shadow memory to just say which memory Dialogue: 0,0:20:47.07,0:20:50.75,Default,,0000,0000,0000,,is valid to be read and written to. Dialogue: 0,0:20:50.75,0:20:53.84,Default,,0000,0000,0000,,The authors proposed hardware\Naddress sanitizer Dialogue: 0,0:20:53.84,0:20:59.01,Default,,0000,0000,0000,,which is essentially doing the same checks\Nbut using a new instruction. And the Dialogue: 0,0:20:59.01,0:21:03.94,Default,,0000,0000,0000,,instruction should raise a fault if an\Ninvalid access is detected. This algorithm Dialogue: 0,0:21:03.94,0:21:07.67,Default,,0000,0000,0000,,they proposed - The details are not\Nimportant. What is important is in Dialogue: 0,0:21:07.67,0:21:12.08,Default,,0000,0000,0000,,essence: It's pretty simple. You load from\Na certain adress, performs the operations Dialogue: 0,0:21:12.08,0:21:18.82,Default,,0000,0000,0000,,on it and if there is the shadow after\Nthis operations you just report a bug. Dialogue: 0,0:21:18.82,0:21:24.91,Default,,0000,0000,0000,,Advantages of hardware address sanitizer\Nare for example you get better performance Dialogue: 0,0:21:24.91,0:21:29.17,Default,,0000,0000,0000,,out of it. Because you only have a single\Ninstruction maybe you can do some fancy Dialogue: 0,0:21:29.17,0:21:34.45,Default,,0000,0000,0000,,tricks inside your CPU that are faster\Nthan using x86 instructions, you get more Dialogue: 0,0:21:34.45,0:21:38.88,Default,,0000,0000,0000,,compact code and you have the possibility\Nof one time configuration which is a bit Dialogue: 0,0:21:38.88,0:21:45.21,Default,,0000,0000,0000,,hard with software address sanitizer. We\Nimplemented hardware address sanitizer our Dialogue: 0,0:21:45.21,0:21:49.27,Default,,0000,0000,0000,,variant by replacing the bound instruction\NBound is an old instruction that is no Dialogue: 0,0:21:49.27,0:21:54.87,Default,,0000,0000,0000,,longer used by compilers because in fact\Nit is slower to use bound instead of Dialogue: 0,0:21:54.87,0:21:58.90,Default,,0000,0000,0000,,performing the checks with multiple x86\Ninstructions. We changed the interface. Dialogue: 0,0:21:58.90,0:22:04.09,Default,,0000,0000,0000,,The first argument is the register which\Nholds the address you want to access. And Dialogue: 0,0:22:04.09,0:22:07.84,Default,,0000,0000,0000,,the second argument holds the size you\Nwant this access to be. Dialogue: 0,0:22:07.84,0:22:11.05,Default,,0000,0000,0000,,So, 1 byte, 2 byte and so on. Dialogue: 0,0:22:11.05,0:22:14.95,Default,,0000,0000,0000,,This instruction is a no-op if the\Ncheck succeeds. So if there is no bug it Dialogue: 0,0:22:14.95,0:22:19.98,Default,,0000,0000,0000,,just continues on like nothing happened.\NHowever if we detect an invalid access we Dialogue: 0,0:22:19.98,0:22:25.36,Default,,0000,0000,0000,,can take a configurable action, we can for\Nexample just raise your normal page fault Dialogue: 0,0:22:25.36,0:22:29.63,Default,,0000,0000,0000,,or we can raise a bound interrupt, which\Nis a custom interrupt, that only denotes Dialogue: 0,0:22:29.63,0:22:34.30,Default,,0000,0000,0000,,this one or we can branch to an x86\Nhandler that either performs additional Dialogue: 0,0:22:34.30,0:22:39.76,Default,,0000,0000,0000,,checking, for example whitelisting, or it\Ngenerates a pretty error report for you. Dialogue: 0,0:22:41.34,0:22:47.48,Default,,0000,0000,0000,,Most importantly this is a single\Ninstruction. We also do not dirty any x86 Dialogue: 0,0:22:47.48,0:22:52.69,Default,,0000,0000,0000,,registers because they are some\Nintermediate results. You need to store Dialogue: 0,0:22:52.69,0:22:56.36,Default,,0000,0000,0000,,these somewhere and this you usually do in\Nthe x86 registers. So you increase Dialogue: 0,0:22:56.36,0:23:00.01,Default,,0000,0000,0000,,register pressure. Maybe you cause\Nspilling. So overall your performance gets Dialogue: 0,0:23:00.01,0:23:07.23,Default,,0000,0000,0000,,worse. We also found out that we are\Nactually faster than doing the checking Dialogue: 0,0:23:07.23,0:23:12.39,Default,,0000,0000,0000,,using x86 instructions. So just by moving\Nthe implementation from x86 level to Dialogue: 0,0:23:12.39,0:23:16.80,Default,,0000,0000,0000,,microcode, which in some way is still kind\Nof like software, we already improved the Dialogue: 0,0:23:16.80,0:23:22.16,Default,,0000,0000,0000,,performance. Also on top of this you get\Nbetter cache utilization because you have Dialogue: 0,0:23:22.16,0:23:27.02,Default,,0000,0000,0000,,less instructions, there are less bytes in\Nthe cache, so we get fuller cache lines. Dialogue: 0,0:23:27.02,0:23:31.63,Default,,0000,0000,0000,,And also it is really easy to tell which\Nis testing code and which is your actual Dialogue: 0,0:23:31.63,0:23:40.08,Default,,0000,0000,0000,,program code. Lastly I'm going to show you\Njust a rough overview of our framework Dialogue: 0,0:23:40.08,0:23:45.92,Default,,0000,0000,0000,,which we used during our development and\Nwhich you can also find on GitHub. Early Dialogue: 0,0:23:45.92,0:23:50.08,Default,,0000,0000,0000,,on we found out that we are probably going\Nto need to test a lot of microcode Dialogue: 0,0:23:50.08,0:23:55.64,Default,,0000,0000,0000,,updates, because in the beginning you just\Nthrow everything at the CPU and see how it Dialogue: 0,0:23:55.64,0:24:01.40,Default,,0000,0000,0000,,behaves and we wanted to do this in\Nparallel. So we developed a small custom Dialogue: 0,0:24:01.40,0:24:07.18,Default,,0000,0000,0000,,OS called "Angry OS" and deployed it to\Nmainboards. These mainboards are just old Dialogue: 0,0:24:07.18,0:24:13.27,Default,,0000,0000,0000,,AMD mainboards. All these mainboards were\Nhooked up via serial for communication and Dialogue: 0,0:24:13.27,0:24:19.40,Default,,0000,0000,0000,,GPIO to a Raspberry Pi. With the GPIO you\Ncan reset, support power on, power down Dialogue: 0,0:24:19.40,0:24:23.89,Default,,0000,0000,0000,,and just have remote control of this\Nmainboard and then you can connect to that Dialogue: 0,0:24:23.89,0:24:28.72,Default,,0000,0000,0000,,Raspberry Pi from anywhere on earth and\Njust deploy and play around with it. Dialogue: 0,0:24:28.72,0:24:30.64,Default,,0000,0000,0000,,This was the first version. Dialogue: 0,0:24:30.64,0:24:34.49,Default,,0000,0000,0000,,In the beginning we\Ndidn't really know much about electronics Dialogue: 0,0:24:34.49,0:24:38.52,Default,,0000,0000,0000,,so we used one Raspberry Pi per mainboard.\NAnd it turns out Raspberry Pis are more Dialogue: 0,0:24:38.52,0:24:43.97,Default,,0000,0000,0000,,expensive than these old mainboards, but\Nwe improved upon this and now we're down Dialogue: 0,0:24:43.97,0:24:48.01,Default,,0000,0000,0000,,to one Raspberry Pi for\Nfour / five setups. Dialogue: 0,0:24:48.01,0:24:51.59,Default,,0000,0000,0000,,For example you only need 3 GPIO ports per Dialogue: 0,0:24:51.59,0:24:57.36,Default,,0000,0000,0000,,mainboard. You connect each of these to\Noptocouplers just to separate the voltage Dialogue: 0,0:24:57.36,0:25:01.86,Default,,0000,0000,0000,,levels and then you connect one side of\Nthe optocoupler to the GPIO the other side Dialogue: 0,0:25:01.86,0:25:05.91,Default,,0000,0000,0000,,to your reset pin, to your power pin and\Nfor input to know whether your board is up Dialogue: 0,0:25:05.91,0:25:11.23,Default,,0000,0000,0000,,or down you connect the power LED. And\Nthat way you can save a lot of space, a Dialogue: 0,0:25:11.23,0:25:17.20,Default,,0000,0000,0000,,lot of money. And also if you're really\Nconstrained you can just remove the power Dialogue: 0,0:25:17.20,0:25:23.53,Default,,0000,0000,0000,,LED sensing because usually you know it is\Nin the state your setup is in. As I Dialogue: 0,0:25:23.53,0:25:28.23,Default,,0000,0000,0000,,already said we wrote our custom operating\Nsystem and it is intentionally really Dialogue: 0,0:25:28.23,0:25:32.66,Default,,0000,0000,0000,,really minimal because the major feature\Nwe wanted is control over every Dialogue: 0,0:25:32.66,0:25:36.74,Default,,0000,0000,0000,,instructions that's going to be executed\Nfrom a certain point on, because we're Dialogue: 0,0:25:36.74,0:25:40.78,Default,,0000,0000,0000,,playing around with instruction encoding\Nand if we execute an instructions that we Dialogue: 0,0:25:40.78,0:25:45.53,Default,,0000,0000,0000,,did not intend we might crash the CPU, we\Nmight go into an invalid state and we do Dialogue: 0,0:25:45.53,0:25:50.85,Default,,0000,0000,0000,,not even know which instruction caused it.\NAnd Angry OS essentially only listens on Dialogue: 0,0:25:50.85,0:26:00.15,Default,,0000,0000,0000,,the serial port for something to do. What\Nit can do is apply an update. These Dialogue: 0,0:26:00.15,0:26:04.82,Default,,0000,0000,0000,,updates are just microcode updates. They\Nare streamed via serial. We can also Dialogue: 0,0:26:04.82,0:26:10.04,Default,,0000,0000,0000,,stream x86 code which is then run by Angry\NOS and this is just so that we do not need Dialogue: 0,0:26:10.04,0:26:14.41,Default,,0000,0000,0000,,to reflash the USB stick every time we\Nwant to update our testing code and the Dialogue: 0,0:26:14.41,0:26:19.28,Default,,0000,0000,0000,,result, all the errors are reported back\Nto the Raspberry Pi and thus they are Dialogue: 0,0:26:19.28,0:26:26.85,Default,,0000,0000,0000,,forwarded to us. The framework we use most\Nimportantly has the microcode assembler Dialogue: 0,0:26:26.85,0:26:30.71,Default,,0000,0000,0000,,and a pretty verbose disassembler. This\Ndisassembler generates the output I showed Dialogue: 0,0:26:30.71,0:26:36.92,Default,,0000,0000,0000,,you earlier and using this you can just\Nquickly write your own microcode. We also Dialogue: 0,0:26:36.92,0:26:42.24,Default,,0000,0000,0000,,included an x86 assembler because we\Nwanted to rapidly test different x86 Dialogue: 0,0:26:42.24,0:26:47.73,Default,,0000,0000,0000,,testing codes. Using this framework we\Nwere able to disassemble the existing Dialogue: 0,0:26:47.73,0:26:53.50,Default,,0000,0000,0000,,updates and we also used it to disassemble\Nour ROM after we reordered it and also Dialogue: 0,0:26:53.50,0:27:01.17,Default,,0000,0000,0000,,during the process when we fed it to our\Nemulator. And we can also create the Dialogue: 0,0:27:01.17,0:27:07.91,Default,,0000,0000,0000,,proper binary files that can be loaded by\Nthe Linux kernel driver. We modified the Dialogue: 0,0:27:07.91,0:27:12.78,Default,,0000,0000,0000,,stock one to just load any update you give\Nit without checking if it's the correct Dialogue: 0,0:27:12.78,0:27:20.06,Default,,0000,0000,0000,,CPU ID and all these things just for\Ntesting purposes. It's also available. And Dialogue: 0,0:27:20.06,0:27:25.74,Default,,0000,0000,0000,,also of course the framework can control\NAngry OS to make your testing easier. And Dialogue: 0,0:27:25.74,0:27:29.65,Default,,0000,0000,0000,,we implemented a pretty basic remote\Nexecution wrapper, so you can work on a Dialogue: 0,0:27:29.65,0:27:33.39,Default,,0000,0000,0000,,remote Raspberry Pi as if you were using\Nit locally. Dialogue: 0,0:27:34.81,0:27:36.80,Default,,0000,0000,0000,,And this brings me to the end Dialogue: 0,0:27:36.80,0:27:40.80,Default,,0000,0000,0000,,of talk. And in conclusion we can say\Nreversing the ROM opened up a lot of new Dialogue: 0,0:27:40.80,0:27:44.81,Default,,0000,0000,0000,,possibilities. We learned a lot about how\Nmicrocode works. We learned about how to Dialogue: 0,0:27:44.81,0:27:49.72,Default,,0000,0000,0000,,actually use it properly instead of just\Ninferring from a really small dataset, Dialogue: 0,0:27:49.72,0:27:55.06,Default,,0000,0000,0000,,that we have from the updates, or from the\Nrandom bits things we send to the CPU and Dialogue: 0,0:27:55.06,0:27:59.53,Default,,0000,0000,0000,,observe what happened. But there's a lot\Nleft to do. So if you really want to hack Dialogue: 0,0:27:59.53,0:28:04.09,Default,,0000,0000,0000,,on it, just get in contact, we were happy\Nto share our findings with you. And as I Dialogue: 0,0:28:04.09,0:28:09.01,Default,,0000,0000,0000,,said the framework AngryOS, example\Nprograms, that we implemented, and some Dialogue: 0,0:28:09.01,0:28:13.85,Default,,0000,0000,0000,,other stuff like the wiring is available\Non GitHub. So that's that. And we are Dialogue: 0,0:28:13.85,0:28:16.81,Default,,0000,0000,0000,,happy to answer any questions you might\Nhave. Dialogue: 0,0:28:16.81,0:28:22.23,Default,,0000,0000,0000,,{\i1}applause{\i0} Dialogue: 0,0:28:24.91,0:28:28.44,Default,,0000,0000,0000,,Herald Angel: Thank you very much. So we Dialogue: 0,0:28:28.44,0:28:34.26,Default,,0000,0000,0000,,have 10 minutes for questions please line\Nup at the microphones. We start with this Dialogue: 0,0:28:34.26,0:28:39.22,Default,,0000,0000,0000,,one: microphone number 2.\NM2: Hi. Thanks for a nice talk. A few Dialogue: 0,0:28:39.22,0:28:42.78,Default,,0000,0000,0000,,questions about your hardware address\Nsanitizer. Dialogue: 0,0:28:42.78,0:28:49.83,Default,,0000,0000,0000,,Benjamin: Mhm\NM2: As I understand you don't need the Dialogue: 0,0:28:49.83,0:28:56.01,Default,,0000,0000,0000,,source code instrumentation because the\Nmicrocode is responsible for checking the Dialogue: 0,0:28:56.01,0:29:02.93,Default,,0000,0000,0000,,shadow memory, right?\NBenjamin: No... The original hardware Dialogue: 0,0:29:02.93,0:29:07.95,Default,,0000,0000,0000,,sanitizer implementation is also based on\Na compiler extension, that inserts a new Dialogue: 0,0:29:07.95,0:29:12.20,Default,,0000,0000,0000,,instruction because it doesn't exist\Nusually. And it also inserts a bootstrap Dialogue: 0,0:29:12.20,0:29:18.05,Default,,0000,0000,0000,,code that in inits your shadow map and\Nalso instruments your allocators to update Dialogue: 0,0:29:18.05,0:29:23.02,Default,,0000,0000,0000,,the shadow map doing runtime and we\Nessentially need the same component, but Dialogue: 0,0:29:23.02,0:29:26.85,Default,,0000,0000,0000,,we do not need the software address\Nsanitizer component that essentially Dialogue: 0,0:29:26.85,0:29:33.74,Default,,0000,0000,0000,,inserts 10 or 20 x86 instructions before\Nevery memory access. So yes we still need Dialogue: 0,0:29:33.74,0:29:37.65,Default,,0000,0000,0000,,a compile time component and we are still\Nsource code based in a sense. Dialogue: 0,0:29:39.39,0:29:45.60,Default,,0000,0000,0000,,Herald: And, so..\NM2: And I didn't see, maybe I missed the Dialogue: 0,0:29:45.60,0:29:51.30,Default,,0000,0000,0000,,numbers. How much it is faster than this\Ninitial version? Dialogue: 0,0:29:51.30,0:29:56.42,Default,,0000,0000,0000,,Benjamin: You mean the initial hardware\Nsanitizer version or the software address Dialogue: 0,0:29:56.42,0:29:59.90,Default,,0000,0000,0000,,sanitizer.\NM2: I mean let's say custom kernel address Dialogue: 0,0:29:59.90,0:30:05.18,Default,,0000,0000,0000,,sanitizer for Linux kernel which is the\Nthe usual one and your approach. Dialogue: 0,0:30:05.18,0:30:10.27,Default,,0000,0000,0000,,Benjamin: We only performed a micro\Nbenchmark on Angry OS and we essentially Dialogue: 0,0:30:10.27,0:30:16.06,Default,,0000,0000,0000,,took the instrumentation as emitted by the\Ncompiler for some memory access which is Dialogue: 0,0:30:16.06,0:30:20.59,Default,,0000,0000,0000,,your standard software address sanitizer\Nand compared it to our version using only Dialogue: 0,0:30:20.59,0:30:24.64,Default,,0000,0000,0000,,the modified bound instruction. So I\Nreally can't talk about how it compares to Dialogue: 0,0:30:24.64,0:30:28.82,Default,,0000,0000,0000,,KASAN or something or some like real world\Nimplementation, because we only have the Dialogue: 0,0:30:28.82,0:30:34.07,Default,,0000,0000,0000,,prototype and the basic instrumentation.\NM2: Thank you very much. Dialogue: 0,0:30:34.07,0:30:36.49,Default,,0000,0000,0000,,Herald Angel: OK. Microphone number 4\Nplease. Dialogue: 0,0:30:36.49,0:30:51.14,Default,,0000,0000,0000,,M4: Hey thanks for the talk and did you\Nfind any weird microcode Dialogue: 0,0:30:51.14,0:31:00.53,Default,,0000,0000,0000,,implementations. I don't mean security\Nwise, just like you rarely expected to Dialogue: 0,0:31:00.53,0:31:07.33,Default,,0000,0000,0000,,see it be implemented that way. Dialogue: 0,0:31:09.04,0:31:11.70,Default,,0000,0000,0000,,Benjamin: The problem is there's a lot of Dialogue: 0,0:31:11.70,0:31:20.27,Default,,0000,0000,0000,,microcode to begin with. You have f000\Ntriads. Each of which has 3 op-codes. So Dialogue: 0,0:31:20.27,0:31:25.00,Default,,0000,0000,0000,,you have a lot of ground to cover and also\Nwe have read-out errors. Sometimes you are Dialogue: 0,0:31:25.00,0:31:29.17,Default,,0000,0000,0000,,seeing bit flips, which kind of slows you\Ndown because you then need to always Dialogue: 0,0:31:29.17,0:31:32.82,Default,,0000,0000,0000,,consider: OK, maybe this register is\Nsomething else, maybe this address is Dialogue: 0,0:31:32.82,0:31:37.42,Default,,0000,0000,0000,,wrong. And also sometimes you have a dust\Nparticles that kind of knocks out an Dialogue: 0,0:31:37.42,0:31:42.55,Default,,0000,0000,0000,,entire region. So we only looked at the\Ncomponents, we were pretty sure that we Dialogue: 0,0:31:42.55,0:31:46.52,Default,,0000,0000,0000,,recovered correctly, and we'd only looked\Nat a really tiny subset compared to all of Dialogue: 0,0:31:46.52,0:31:52.94,Default,,0000,0000,0000,,the microcode ROM. It's just not feasible\Nto do and to go through it and look at Dialogue: 0,0:31:52.94,0:31:57.33,Default,,0000,0000,0000,,everything. So no we didn't find anything\Nfunny but we also wouldn't know what funny Dialogue: 0,0:31:57.33,0:32:00.79,Default,,0000,0000,0000,,looks like because we don't know what the\Nofficial spec for microcode is. Dialogue: 0,0:32:01.18,0:32:03.99,Default,,0000,0000,0000,,M4: Thanks.\NHerald Angel: Interesting. We have one Dialogue: 0,0:32:04.03,0:32:05.81,Default,,0000,0000,0000,,question from the Internet, from the Dialogue: 0,0:32:05.81,0:32:09.79,Default,,0000,0000,0000,,Signal Angel please.\NSignal Angel: Yes. Which AMD CPU Dialogue: 0,0:32:09.79,0:32:15.51,Default,,0000,0000,0000,,generations does this apply to?\NBenjamin: Yeah this is still based on the Dialogue: 0,0:32:15.51,0:32:21.29,Default,,0000,0000,0000,,work of our first talk and this only works\Non pretty old ones: K8, K10. So until, Dialogue: 0,0:32:21.29,0:32:26.94,Default,,0000,0000,0000,,CPUs produced until 2013. Yeah this was\Nthe last year AMD produced anything like Dialogue: 0,0:32:26.94,0:32:32.52,Default,,0000,0000,0000,,that. Newer ones use some public key based\Ncryptography from what we can tell and we Dialogue: 0,0:32:32.52,0:32:36.56,Default,,0000,0000,0000,,haven't yet managed to break it. Same goes\Nfor Intel, they seem to be using public Dialogue: 0,0:32:36.56,0:32:39.92,Default,,0000,0000,0000,,key cryptography and we haven't gotten a\Nfoot in the door yet. Dialogue: 0,0:32:40.99,0:32:44.79,Default,,0000,0000,0000,,Herald Angel: Thank you. We go one around.\NOn microphone number 3 please. Dialogue: 0,0:32:44.79,0:32:51.29,Default,,0000,0000,0000,,M3: Yeah. Thank you. I would like to know\Nhow complex could the microcode programs Dialogue: 0,0:32:51.29,0:32:59.16,Default,,0000,0000,0000,,be, that you could write. So what's the\Ncomplexity of new operations you could Dialogue: 0,0:32:59.16,0:33:03.30,Default,,0000,0000,0000,,implement.\NBenjamin: The only limiting factor is the Dialogue: 0,0:33:03.30,0:33:07.92,Default,,0000,0000,0000,,size of your microcode update RAM. But\Nthis one is really really limited. Dialogue: 0,0:33:07.92,0:33:12.68,Default,,0000,0000,0000,,For example on K8, where we performed the\Nmajority of our experiments. We are Dialogue: 0,0:33:12.68,0:33:19.05,Default,,0000,0000,0000,,limited to 32 triads, which comes down to\Na sixty nine instructions and you also Dialogue: 0,0:33:19.05,0:33:22.44,Default,,0000,0000,0000,,have some constraints on these\Ninstructions for example the next triad Dialogue: 0,0:33:22.44,0:33:27.81,Default,,0000,0000,0000,,will always be executed no matter what.\NSome operations can only go at the second Dialogue: 0,0:33:27.81,0:33:33.86,Default,,0000,0000,0000,,slot. Some can only go on another slot, so\Nit's really really hard. And you're also Dialogue: 0,0:33:33.86,0:33:38.93,Default,,0000,0000,0000,,limited from our knowledge to loading 16\Nbit immediates instead of 32 bit or even Dialogue: 0,0:33:38.93,0:33:44.47,Default,,0000,0000,0000,,64 bit immediates. So your whole program\Ngrows really fast if you're trying to do Dialogue: 0,0:33:44.47,0:33:49.40,Default,,0000,0000,0000,,something complex. For example our\Nauthenticated microcode update mechanism Dialogue: 0,0:33:49.40,0:33:54.44,Default,,0000,0000,0000,,is the most complex one we wrote it nearly\Nfills out the RAM and we used TEA – Tiny Dialogue: 0,0:33:54.44,0:33:58.70,Default,,0000,0000,0000,,Encryption Algorithm – because that was\Nthe only one we managed to fit mostly due Dialogue: 0,0:33:58.70,0:34:04.51,Default,,0000,0000,0000,,to S-box and other constants we would need\Nto load. So it's really small. Dialogue: 0,0:34:04.51,0:34:08.54,Default,,0000,0000,0000,,Herald Angel: Thank you Microphone number\N1. Dialogue: 0,0:34:08.54,0:34:14.71,Default,,0000,0000,0000,,M1: So you said the microcode is used for\Ninstruction decoding and it needs to meet Dialogue: 0,0:34:14.71,0:34:19.43,Default,,0000,0000,0000,,the micro-ops to the scheduler and micro\Nqueue in some way. Did you find out how Dialogue: 0,0:34:19.43,0:34:27.52,Default,,0000,0000,0000,,that works?\NBejamin: In essence we are not actually Dialogue: 0,0:34:27.52,0:34:33.54,Default,,0000,0000,0000,,executing code inside in microcode engine.\NFrom what from what we understand, the Dialogue: 0,0:34:33.54,0:34:38.57,Default,,0000,0000,0000,,microcode engine is just some kind of a\Nsoftware based recipe, that describes how Dialogue: 0,0:34:38.57,0:34:43.48,Default,,0000,0000,0000,,to decode an instruction, so you don't\Nactually get execution, you just commit Dialogue: 0,0:34:43.48,0:34:47.27,Default,,0000,0000,0000,,instructions into the pipelines, that do\Nwhat you want. And because we have some Dialogue: 0,0:34:47.27,0:34:51.27,Default,,0000,0000,0000,,control flow possibility, that is actually\Ninside the micro code engine, because you Dialogue: 0,0:34:51.27,0:34:55.27,Default,,0000,0000,0000,,can branch to different addresses, you can\Nconditionally branch and loop. You kind of Dialogue: 0,0:34:55.27,0:34:59.09,Default,,0000,0000,0000,,get an execution, but in essence to just\Ncommit stuff in the pipeline and the CPU Dialogue: 0,0:34:59.09,0:35:01.44,Default,,0000,0000,0000,,does what you tell it to. Dialogue: 0,0:35:04.24,0:35:07.16,Default,,0000,0000,0000,,Herald Angel: One more question.\NMicrophone number 2, please. Dialogue: 0,0:35:07.16,0:35:11.93,Default,,0000,0000,0000,,M2: How did you take the picture of the\Ninternal CPU? Did you open it? Dialogue: 0,0:35:11.93,0:35:14.97,Default,,0000,0000,0000,,Benjamin: Yeah. We worked together with Dialogue: 0,0:35:14.97,0:35:19.68,Default,,0000,0000,0000,,Chris. He's our hardware guy. He has\Naccess to his equipment to delayer it and Dialogue: 0,0:35:19.68,0:35:24.29,Default,,0000,0000,0000,,to take high resolution optical shots and\Nhe also takes shots with a scanning Dialogue: 0,0:35:24.29,0:35:29.28,Default,,0000,0000,0000,,electron microscope. So I think about five\Nor six CPUs were harmed in the making of Dialogue: 0,0:35:29.28,0:35:30.36,Default,,0000,0000,0000,,this paper. Dialogue: 0,0:35:33.81,0:35:37.82,Default,,0000,0000,0000,,Herald Angel: So we have one more last\Nquestion. Microphone number 2 please. Dialogue: 0,0:35:39.25,0:35:41.39,Default,,0000,0000,0000,,M2: Are you aware of research done by Dialogue: 0,0:35:41.39,0:35:49.40,Default,,0000,0000,0000,,Christopher Domas, where he mapped out the\Ninstruction set for x86 processors? Dialogue: 0,0:35:49.40,0:35:57.12,Default,,0000,0000,0000,,B: You mean sandsifter? We\Nactually talked with him and yeah we are Dialogue: 0,0:35:57.12,0:36:02.91,Default,,0000,0000,0000,,aware, that there's a map essentially of\Nthe instruction set and also maybe you can Dialogue: 0,0:36:02.91,0:36:07.28,Default,,0000,0000,0000,,combine it, because in the beginning we\Nreverse engineered where certain x86 Dialogue: 0,0:36:07.28,0:36:11.34,Default,,0000,0000,0000,,instructions are implemented in microcode.\NSo if you plug these two together you kind Dialogue: 0,0:36:11.34,0:36:15.17,Default,,0000,0000,0000,,of map out the whole microcode ROM at the\Nsame time that you map out a whole Dialogue: 0,0:36:15.17,0:36:18.99,Default,,0000,0000,0000,,instruction set. However there are some\Ncomponents of the microcode ROM that are Dialogue: 0,0:36:18.99,0:36:23.47,Default,,0000,0000,0000,,most likely not triggered by instructions.\NFor example it seems like power management Dialogue: 0,0:36:23.47,0:36:27.37,Default,,0000,0000,0000,,or everything that is behind a write MSR\N[wrmsr] or read MSR [rdmsr]. wrmsr is a Dialogue: 0,0:36:27.37,0:36:31.25,Default,,0000,0000,0000,,single instruction, but depending on the\Narguments you give it it just branches to Dialogue: 0,0:36:31.25,0:36:36.44,Default,,0000,0000,0000,,totally different triads and the microcode\Nitself is implemented in microcode. And Dialogue: 0,0:36:36.44,0:36:40.19,Default,,0000,0000,0000,,this one is a huge chunk you wouldn't even\Nfind without brute forcing all Dialogue: 0,0:36:40.19,0:36:44.16,Default,,0000,0000,0000,,combinations for all instructions which is\Nnot really feasible. Dialogue: 0,0:36:46.48,0:36:51.28,Default,,0000,0000,0000,,Herald Angel: Thank you. Thank you\NBenjamin. Dialogue: 0,0:36:51.28,0:36:57.21,Default,,0000,0000,0000,,{\i1}applause{\i0} Dialogue: 0,0:36:57.21,0:37:01.81,Default,,0000,0000,0000,,{\i1}35c3 postroll music{\i0} Dialogue: 0,0:37:01.81,0:37:21.00,Default,,0000,0000,0000,,subtitles created by c3subtitles.de\Nin the years 2019-2020. Join, and help us!