-
35C3 preroll music
-
Herald: So the next talk Benjamin Kollenda
and Philipp Koppe - they will refresh our
-
memories because they already had a talk
on 34C3 where they talked about the micro
-
code ROM and today they're gonna give us
more insights on how micro code works. And
-
more details on the ROM itself. Benjamin
is a PhD student and has a focus on
-
software attacks and defenses and together
with Phillip they will now abuse AMD
-
microcode for fun and security. Please
enjoy.
-
Applause
-
Benjamin: Thank you. So as mentioned we
were able to reverse engineer the AMD
-
microcode and the AMD microcode ROM and
I'm going to talk about our journey. What
-
we learned on the way and how we did it.
So this joint work with my colleagues at
-
Ruhr Universtat Bochum and a quick outline
how are we going to do it. We're going to
-
start with a quick crash course on micro
architectural basics and what microcode
-
actually is. Then I talk about how we
reconstructed the
-
microcode ROM and what we learned
-
along the way. Then I quickly give some
examples of the applications we
-
implemented with the knowledge we gained
from second step. And lastly I talk about
-
a framework we used. How it works and what
we can do with it. And also this framework
-
is available on GitHub along with some
other tools so you're free to continue our
-
work. OK. So when I'm talking about
microcode you can think of it essentially
-
as a firmware for your processor. It
handles multiple purposes for example
-
you can use it to fix CPU bugs that you
have in silicon and you want to fix later
-
in the design phase. It is used for
instruction decoding - I cover this one a
-
bit more. It is also used for exception
handling. For example, if an exception or
-
interrupt is raised, microcode has a first
chance of modifying this interrupt
-
ignoring it or just passing it along to
the operating system. It's also used for
-
power management and some other complex
features like Intel SGX. And most
-
importantly for us microcode is updatable.
This used to patch errors in the field.
-
Everyone remembers Spectre / Meltdown
patches and there's
-
a microcode update. So your
-
x86 CPU takes multiple steps to execute an
instruction. The first step is decoding
-
a x86 instruction into multiple smaller
micro ops.
-
These are then scheduled into the pipeline
-
From there, they are dispatched to
the different functional units
-
like your ALU / AGU
-
multiplication division units
-
For our purposes the decode step is the
-
most interesting one. In the decode step
you have a instruction buffer that feeds
-
instructions to some decoders. You have
short decoders that handle really simple
-
instructions. There are long decoders that
can handle some more advance instructions.
-
And finally, the vector decoder. The
vector decoder handles the most complex
-
instructions with the help of microcode.
So the microcode engine is essentially the
-
vector decoder.
-
The Microcode engine in essence
is compromised out of a microcode
-
ROM that stores the instructions for the
microcode engine. Think of it as your
-
standard instructions. Then there is also
a writeable memory the microcode RAM. This
-
is where the microcode updates end up when
you apply microcode updates. And of course
-
around the storage has a whole lot of
things that make it actually run. For this
-
talk, you only need to know what is a
Match Registers. Match Registers are
-
essentially breakpoint registers. So if we
write an address from inside the microcode
-
ROM inside a Match Register whenever this
address is fetched, execution, control is
-
transferred to the microcode RAM so our
patch gets executed. And the microcode
-
updates are usually loaded by the BIOS or
by the kernel. Linux has an update driver,
-
sometimes the BIOS updates it with a
pre-installed version and they have a
-
pretty simple structure, a partially
documented header, and followed by the
-
actual microcode that is loaded inside the
CPU. And so microcode is organized in
-
something called triads. Each triad has
three operations essentially x86
-
instructions, but based on differences.
And lastly, you have a sequence word. The
-
sequence word indicates which microcode
instructions should be executed next. We
-
have options of executing just the next
triad, executing another one by branching
-
to it, or just saying OK, I'm done with
decoding this instruction continue with
-
x86 code. These updates are protected by
some weak authentication which we were
-
able to break so we can create our own. We
can analyze existing ones and we can apply
-
these to your standard laptop and desktop.
However there can only ever be one update
-
loaded at the time and when you reboot
your machine this update will be gone.
-
Also for the talk we are going to look at
some microcode and we will present this
-
microcode using a register transfer
language. It is heavily based on x86. I'm
-
just going to cover the differences
between these two. Most importantly the
-
microcode can have three operands for an
instruction in comparison to x86 which
-
usually only has two. So you can specify a
destination and two source operands.
-
Also,
-
microcode has some certain bit flags that
need to be set and these we do we see with
-
these annotations for example ".C" means
says instruction also updates a carry flag
-
based on the result. Then you have the
instruction "jcc" which is a conditional
-
branch and the first operand denotes the
condition up on which this branch is
-
taken. In this case branch if the carry
flag is one and [the] second operand
-
indicates the offset to add to the
instruction pointer. Then we also have
-
some sequence word annotations: "next",
"complete", and "branch". Also it should
-
be noted that the internal microcode
architecture is a load-store architecture.
-
You can't use memory operands in other
instructions like you can on x86 you
-
always need to load and store memory
explicitly.
-
Now we are going to talk about
-
how we manage to recover the microcode
ROM. The microcode ROM is baked into your
-
CPU, you can't change it anymore. It is
defined in the silicon during the
-
fabrication process and in this picture
you can see a die shot taken with a
-
electron microscope and this is one of
three regions that contains the bits for
-
the microcode operations. And if you zoom
in a bit more, each of these regions
-
consist out of four arrays and these are
further subdivided into blocks. Really
-
interesting is "Array 2" which is a bit
smaller than the other ones but it has
-
some structures above it which are of a
different visual layout. This is SRAM
-
which stores the microcode update. So this
is one-time reprogrammable memory that is
-
still pretty fast. So the microcode RAM is
located right next to the microcode ROM
-
which also makes sense from a design
standpoint.
-
Just an overview of how we
-
went ahead and how we went about. We
started with pictures and then we used
-
some OCR-ike process to transform them
into bit strings which we can then further
-
process. These bitstrings were then
arranged into triads. We could already
-
gather that we got individual triades
right because there were data dependencies
-
all over the place, but between triads,
there were no or very few data
-
dependencies so the ordering of the
triades was still wrong and this was a
-
major problem when we went ahead and what
we had to reverse engineer and this is
-
mapping a certain physical address of a
triad that we gathered from the ROM
-
readout to a virtual address that is used
inside the microcode update or the
-
microcode ROM. But after reverse engineer
this, you can just do a linear sweep
-
disassembly of the microcode ROM and
arrive at human readable output. But this
-
recovery was a bit tricky because we
required physical virtual address pairs.
-
But gathering these is a bit harder
because we worked there through the
-
available updates, but we could only find
two pairs of them. These pairs were
-
actually easy to find because every update
replaces a certain triad inside your
-
microcode ROM and this triad is usually
also placed in the microcode update. So by
-
matching the address this update replaces
with a microcode ROM readout. You can just
-
get your two data points. But we had to
get more data points so we generated these
-
mappings by matching semantics of triads
in the microcode ROM readout and the
-
semantics when we force execution of a
certain microcode address. And gathering
-
the semantics of the read-out microcode,
we implemented a simple microcode
-
simulator. Essentially it works on triad
level, so you give it an input state and a
-
triad and it calculates the output state
of it. Input and output state are
-
comprised out of the x86-state which is
your standard registers and also the
-
internal microcode registers. There are
multiple temporary registers that get
-
reset for every new x86 instruction that
is executed, but they can also be modified
-
by microcode of course. Our emulator
supports all known arithmetic operations
-
and we have a white-list of operations
that do not form or produce any observable
-
change in state just so that we could
process more triades and give them more
-
data points. In total we gathered 54
additional data-address pairs which turned
-
out to be enough to recover the whole
mapping. This mapping, essentially you
-
have the four different arrays that map to
individual blocks and these blocks in
-
these arrays or then again permuted a bit
and then the triads inside these blocks
-
have some table-based permutations. So
this is not an obfuscation. This is just
-
from a hardware design standpoint it can
make sense to reroute it a bit differently
-
Also now that we can actually
map a certain address to the microcode ROM
-
readout and we know the addresses of
different x86 instructions from our
-
earlier experiments, we can look at the
implementation of instructions. So let's
-
start with a pretty simple one. Shift-
Right-Double which essentially takes a
-
register, shift it by a given amount and
shifts in bits from another register. So
-
of course you would expect a lot of shifts
and rolls in its implementation and this
-
is exactly what we're seeing here. You
have two shift-right operands and you can
-
see regmd6 and regmd4. These are
place holders. The microcode engine can
-
replace certain bit combinations with the
registers that are used in the x86
-
operation. For example this one would be
replaced by ECX or EAX depending on what
-
you wrote in x86. And at this point we can
also already gather more information about
-
microcodes than we previously knew because
we know "OK, so this is source, this is
-
also a source and this is a destination".
But this source which indicates the shift
-
amount, this one was previously unknown,
because it is a high temporary microcode
-
register and we found out that these
usually implement specific different
-
purpose. They are not - if you write to
them, sometimes the CPU behaves
-
erratically, sometimes it crashes,
sometimes nothing happens. But in this
-
case, this seems to be the shift count,
and the shift count is given by a third
-
operand in the instruction. So in this
case, we already learned "OK, if you want
-
to read the third operand of an
instruction, we need to read t41". And
-
this is how we went about recovering more
and more information about microcode. The
-
rest of the implementation is essentially
concerned with implementing the rest of
-
the semantics of the x86 instruction and
updating the flags correctly. OK, so now
-
let's look at a instruction set that is a
bit more complicated. If you check out
-
rdtsc. rdtsc returns a internal cycle
counter in EDX and EAX, so the upper part
-
ends up in EDX, lower part in EAX. So in
the end we want to see writes to these
-
registers, potentially with a shift
somewhere in there. But somewhere the CPU
-
needs to gather the cycle counter. So in
the beginning we have two load-style
-
operations. This one is a proper load
which we identified and this one is
-
unknown. But despite that we do not know
the instruction, we know the target
-
because the result of this instruction
will end up in t9 and the result of this
-
instruction will end up in t10, so we can
follow the uses of these two registers. So
-
for simplicity I'm going to start with t10
and t10, which we later found out, this is
-
another register which essentially denotes
a specific internal register. And if you
-
play around with these bits you notice
that this combination encodes cr4. The x86
-
will just see cr4. You can also address
cr1 and cr2. And if you look further, t10
-
is then ended with this bit mask and if
you look in the manual you find out that
-
this bit in cr4 denotes the bit that
determines whether oddity C is
-
available from user space or not. So this
is the check if this instruction should be
-
executed. So now let's just keep in mind
that t9 holds some other loaded value from
-
some other internal register and we will
come back to this one a bit later. For
-
now, let's follow execution. This triad is
essentially a padding triad. It is a
-
common pattern we see. So let's look at
where this branch takes us.
-
And this branch
-
takes us to a conditional branch
triad. And if you look a bit up, this end
-
instruction actually updated this flag. So
this is a conditional branch that
-
determines whether this check was
successful or not. So it branches toward
-
the error triad or the success triad. But
here we already see the exit. We see a
-
write to RDX or EDX in this case with a
shift from t9 by 32 bit, which is exactly
-
what you would expect to write the time
stamp counter on the upper 32 bits of the
-
time stamp counter to edx. And you have an
unknown instruction, but we know, okay, we
-
move something from t9 to eax, which is
the lower 32 bits. But we're not done
-
here, because we can still look at the
error pass that is taken if the access is
-
denied. So if you scroll a bit down we can
see a move of an immediate into a certain
-
internal register. And this is immediate
actually encodes a general protection
-
fault interrupt code. D denotes to the
exception handler that this was a general
-
protection fault. And later this triad
branches to this address, and if you look
-
at the uses of this address we can find
other immediates that also correspond on
-
to x86 instructions. So now we learned
-
how we can actually raise our
own interrupts. We
-
just need to load the code we want into
the specific register and branch to this
-
address. And now we learned a lot about
how we can actually write microcode, but
-
it's also interesting to see how certain
instructions are implemented. So let's
-
look at a pretty complicated one: wrmsr
(Write MSR). wrmsr essentially writes some
-
data it is given to a machine specific
register. This machine specific register
-
differs between CPUs, between vendors,
sometimes between revisions. And these
-
implement non-standard extensions or
pretty complex features. For example, you
-
trigger a microcode update by writing to a
machine specific register. The register
-
addresses you want to write to is given in
ecx. And now we can see ecx is read and
-
it is shifted by sixteen bits to t10. So
again, we follow uses of t10 and we see
-
it as XOR'd with a certain bitmask. And
this bitmask is C000, which actually
-
denotes a namespace of the model specific
registers. In this case this should be an
-
AMD-specific namespace. And, of course,
this one again sets some flags, and you
-
can see your conditional branch depending
on these flags to what should be the
-
handler for this namespace.
-
Next one: We have another XOR
that uses a different bit
-
mask — in this case C001. C001 is the
namespace where the microcode update
-
routine is actually located in. So again,
we branch to this handler. And if you just
-
continue on, there are more operations on
rcx, followed by more branches, and this
-
continues until everything is dispatched
to the correct handler. And this is how,
-
internally, wrmsr is implemented, and also
Read MSR is going to be implemented pretty
-
similar, because it implements some kind
of similar thing.
-
OK, so now I showed you
-
how we actually went ahead of
reconstructing the knowledge we
-
currently have. And now I'm going to show
you what we can actually do with it. And
-
for this I am going to quickly cover what
applications we wrote in microcode. We
-
wrote a simple configurable
rdtsc precision.
-
This means a certain bit mask is AND'd to
-
the result of rdtsc, so you can
reduce the accuracy of it, which can
-
sometimes prevent timing attacks. We also
implemented microcode-assisted address
-
sanitizer, which I'll cover quickly in a
second. We also have some basic microcode
-
instruction set randomization. Some
microcode-assisted instrumentation. What
-
this means is, you can write a filter for
your instrumentation in microcode itself.
-
So instead of hooking an instruction,
instead of debugging your code or
-
emulating it, you can just say whenever
the instruction is executed filter if this
-
is relevant for me, and if it is, call my
x86 handler — entirely in microcode,
-
without changing the instruction in the
RAM. We also implemented some basic
-
authenticated microcode updates. The usual
update mechanism is weak — that's how we
-
got our foot in the door in the first
place. So we improved upon it a bit. Also
-
we found out that microcode actually has
some enclave-like features because once
-
we're executing in Microcode, your kernel
can't interupt you, your hypervisor can't
-
interrupt you and any state you want
visible to the outside world. You actually
-
need to write explicitly. So all these
microcode internal registers are not
-
accessible from the outside world. So any
computation you perform in micro code
-
cannot be interfered with. So you can
implement a simple enclave on top of this
-
one. So our hardware-assisted address
sanitizer variant is based on the work by
-
the original authors and address sanitizer
is a software instrumentation that detects
-
invalid memory access by using a shadow
map shadow memory to just say which memory
-
is valid to be read and written to.
-
The authors proposed hardware
address sanitizer
-
which is essentially doing the same checks
but using a new instruction. And the
-
instruction should raise a fault if an
invalid access is detected. This algorithm
-
they proposed - The details are not
important. What is important is in
-
essence: It's pretty simple. You load from
a certain adress, performs the operations
-
on it and if there is the shadow after
this operations you just report a bug.
-
Advantages of hardware address sanitizer
are for example you get better performance
-
out of it. Because you only have a single
instruction maybe you can do some fancy
-
tricks inside your CPU that are faster
than using x86 instructions, you get more
-
compact code and you have the possibility
of one time configuration which is a bit
-
hard with software address sanitizer. We
implemented hardware address sanitizer our
-
variant by replacing the bound instruction
Bound is an old instruction that is no
-
longer used by compilers because in fact
it is slower to use bound instead of
-
performing the checks with multiple x86
instructions. We changed the interface.
-
The first argument is the register which
holds the address you want to access. And
-
the second argument holds the size you
want this access to be.
-
So, 1 byte, 2 byte and so on.
-
This instruction is a no-op if the
check succeeds. So if there is no bug it
-
just continues on like nothing happened.
However if we detect an invalid access we
-
can take a configurable action, we can for
example just raise your normal page fault
-
or we can raise a bound interrupt, which
is a custom interrupt, that only denotes
-
this one or we can branch to an x86
handler that either performs additional
-
checking, for example whitelisting, or it
generates a pretty error report for you.
-
Most importantly this is a single
instruction. We also do not dirty any x86
-
registers because they are some
intermediate results. You need to store
-
these somewhere and this you usually do in
the x86 registers. So you increase
-
register pressure. Maybe you cause
spilling. So overall your performance gets
-
worse. We also found out that we are
actually faster than doing the checking
-
using x86 instructions. So just by moving
the implementation from x86 level to
-
microcode, which in some way is still kind
of like software, we already improved the
-
performance. Also on top of this you get
better cache utilization because you have
-
less instructions, there are less bytes in
the cache, so we get fuller cache lines.
-
And also it is really easy to tell which
is testing code and which is your actual
-
program code. Lastly I'm going to show you
just a rough overview of our framework
-
which we used during our development and
which you can also find on GitHub. Early
-
on we found out that we are probably going
to need to test a lot of microcode
-
updates, because in the beginning you just
throw everything at the CPU and see how it
-
behaves and we wanted to do this in
parallel. So we developed a small custom
-
OS called "Angry OS" and deployed it to
mainboards. These mainboards are just old
-
AMD mainboards. All these mainboards were
hooked up via serial for communication and
-
GPIO to a Raspberry Pi. With the GPIO you
can reset, support power on, power down
-
and just have remote control of this
mainboard and then you can connect to that
-
Raspberry Pi from anywhere on earth and
just deploy and play around with it.
-
This was the first version.
-
In the beginning we
didn't really know much about electronics
-
so we used one Raspberry Pi per mainboard.
And it turns out Raspberry Pis are more
-
expensive than these old mainboards, but
we improved upon this and now we're down
-
to one Raspberry Pi for
four / five setups.
-
For example you only need 3 GPIO ports per
-
mainboard. You connect each of these to
optocouplers just to separate the voltage
-
levels and then you connect one side of
the optocoupler to the GPIO the other side
-
to your reset pin, to your power pin and
for input to know whether your board is up
-
or down you connect the power LED. And
that way you can save a lot of space, a
-
lot of money. And also if you're really
constrained you can just remove the power
-
LED sensing because usually you know it is
in the state your setup is in. As I
-
already said we wrote our custom operating
system and it is intentionally really
-
really minimal because the major feature
we wanted is control over every
-
instructions that's going to be executed
from a certain point on, because we're
-
playing around with instruction encoding
and if we execute an instructions that we
-
did not intend we might crash the CPU, we
might go into an invalid state and we do
-
not even know which instruction caused it.
And Angry OS essentially only listens on
-
the serial port for something to do. What
it can do is apply an update. These
-
updates are just microcode updates. They
are streamed via serial. We can also
-
stream x86 code which is then run by Angry
OS and this is just so that we do not need
-
to reflash the USB stick every time we
want to update our testing code and the
-
result, all the errors are reported back
to the Raspberry Pi and thus they are
-
forwarded to us. The framework we use most
importantly has the microcode assembler
-
and a pretty verbose disassembler. This
disassembler generates the output I showed
-
you earlier and using this you can just
quickly write your own microcode. We also
-
included an x86 assembler because we
wanted to rapidly test different x86
-
testing codes. Using this framework we
were able to disassemble the existing
-
updates and we also used it to disassemble
our ROM after we reordered it and also
-
during the process when we fed it to our
emulator. And we can also create the
-
proper binary files that can be loaded by
the Linux kernel driver. We modified the
-
stock one to just load any update you give
it without checking if it's the correct
-
CPU ID and all these things just for
testing purposes. It's also available. And
-
also of course the framework can control
Angry OS to make your testing easier. And
-
we implemented a pretty basic remote
execution wrapper, so you can work on a
-
remote Raspberry Pi as if you were using
it locally.
-
And this brings me to the end
-
of talk. And in conclusion we can say
reversing the ROM opened up a lot of new
-
possibilities. We learned a lot about how
microcode works. We learned about how to
-
actually use it properly instead of just
inferring from a really small dataset,
-
that we have from the updates, or from the
random bits things we send to the CPU and
-
observe what happened. But there's a lot
left to do. So if you really want to hack
-
on it, just get in contact, we were happy
to share our findings with you. And as I
-
said the framework AngryOS, example
programs, that we implemented, and some
-
other stuff like the wiring is available
on GitHub. So that's that. And we are
-
happy to answer any questions you might
have.
-
applause
-
Herald Angel: Thank you very much. So we
-
have 10 minutes for questions please line
up at the microphones. We start with this
-
one: microphone number 2.
M2: Hi. Thanks for a nice talk. A few
-
questions about your hardware address
sanitizer.
-
Benjamin: Mhm
M2: As I understand you don't need the
-
source code instrumentation because the
microcode is responsible for checking the
-
shadow memory, right?
Benjamin: No... The original hardware
-
sanitizer implementation is also based on
a compiler extension, that inserts a new
-
instruction because it doesn't exist
usually. And it also inserts a bootstrap
-
code that in inits your shadow map and
also instruments your allocators to update
-
the shadow map doing runtime and we
essentially need the same component, but
-
we do not need the software address
sanitizer component that essentially
-
inserts 10 or 20 x86 instructions before
every memory access. So yes we still need
-
a compile time component and we are still
source code based in a sense.
-
Herald: And, so..
M2: And I didn't see, maybe I missed the
-
numbers. How much it is faster than this
initial version?
-
Benjamin: You mean the initial hardware
sanitizer version or the software address
-
sanitizer.
M2: I mean let's say custom kernel address
-
sanitizer for Linux kernel which is the
the usual one and your approach.
-
Benjamin: We only performed a micro
benchmark on Angry OS and we essentially
-
took the instrumentation as emitted by the
compiler for some memory access which is
-
your standard software address sanitizer
and compared it to our version using only
-
the modified bound instruction. So I
really can't talk about how it compares to
-
KASAN or something or some like real world
implementation, because we only have the
-
prototype and the basic instrumentation.
M2: Thank you very much.
-
Herald Angel: OK. Microphone number 4
please.
-
M4: Hey thanks for the talk and did you
find any weird microcode
-
implementations. I don't mean security
wise, just like you rarely expected to
-
see it be implemented that way.
-
Benjamin: The problem is there's a lot of
-
microcode to begin with. You have f000
triads. Each of which has 3 op-codes. So
-
you have a lot of ground to cover and also
we have read-out errors. Sometimes you are
-
seeing bit flips, which kind of slows you
down because you then need to always
-
consider: OK, maybe this register is
something else, maybe this address is
-
wrong. And also sometimes you have a dust
particles that kind of knocks out an
-
entire region. So we only looked at the
components, we were pretty sure that we
-
recovered correctly, and we'd only looked
at a really tiny subset compared to all of
-
the microcode ROM. It's just not feasible
to do and to go through it and look at
-
everything. So no we didn't find anything
funny but we also wouldn't know what funny
-
looks like because we don't know what the
official spec for microcode is.
-
M4: Thanks.
Herald Angel: Interesting. We have one
-
question from the Internet, from the
-
Signal Angel please.
Signal Angel: Yes. Which AMD CPU
-
generations does this apply to?
Benjamin: Yeah this is still based on the
-
work of our first talk and this only works
on pretty old ones: K8, K10. So until,
-
CPUs produced until 2013. Yeah this was
the last year AMD produced anything like
-
that. Newer ones use some public key based
cryptography from what we can tell and we
-
haven't yet managed to break it. Same goes
for Intel, they seem to be using public
-
key cryptography and we haven't gotten a
foot in the door yet.
-
Herald Angel: Thank you. We go one around.
On microphone number 3 please.
-
M3: Yeah. Thank you. I would like to know
how complex could the microcode programs
-
be, that you could write. So what's the
complexity of new operations you could
-
implement.
Benjamin: The only limiting factor is the
-
size of your microcode update RAM. But
this one is really really limited.
-
For example on K8, where we performed the
majority of our experiments. We are
-
limited to 32 triads, which comes down to
a sixty nine instructions and you also
-
have some constraints on these
instructions for example the next triad
-
will always be executed no matter what.
Some operations can only go at the second
-
slot. Some can only go on another slot, so
it's really really hard. And you're also
-
limited from our knowledge to loading 16
bit immediates instead of 32 bit or even
-
64 bit immediates. So your whole program
grows really fast if you're trying to do
-
something complex. For example our
authenticated microcode update mechanism
-
is the most complex one we wrote it nearly
fills out the RAM and we used TEA – Tiny
-
Encryption Algorithm – because that was
the only one we managed to fit mostly due
-
to S-box and other constants we would need
to load. So it's really small.
-
Herald Angel: Thank you Microphone number
1.
-
M1: So you said the microcode is used for
instruction decoding and it needs to meet
-
the micro-ops to the scheduler and micro
queue in some way. Did you find out how
-
that works?
Bejamin: In essence we are not actually
-
executing code inside in microcode engine.
From what from what we understand, the
-
microcode engine is just some kind of a
software based recipe, that describes how
-
to decode an instruction, so you don't
actually get execution, you just commit
-
instructions into the pipelines, that do
what you want. And because we have some
-
control flow possibility, that is actually
inside the micro code engine, because you
-
can branch to different addresses, you can
conditionally branch and loop. You kind of
-
get an execution, but in essence to just
commit stuff in the pipeline and the CPU
-
does what you tell it to.
-
Herald Angel: One more question.
Microphone number 2, please.
-
M2: How did you take the picture of the
internal CPU? Did you open it?
-
Benjamin: Yeah. We worked together with
-
Chris. He's our hardware guy. He has
access to his equipment to delayer it and
-
to take high resolution optical shots and
he also takes shots with a scanning
-
electron microscope. So I think about five
or six CPUs were harmed in the making of
-
this paper.
-
Herald Angel: So we have one more last
question. Microphone number 2 please.
-
M2: Are you aware of research done by
-
Christopher Domas, where he mapped out the
instruction set for x86 processors?
-
B: You mean sandsifter? We
actually talked with him and yeah we are
-
aware, that there's a map essentially of
the instruction set and also maybe you can
-
combine it, because in the beginning we
reverse engineered where certain x86
-
instructions are implemented in microcode.
So if you plug these two together you kind
-
of map out the whole microcode ROM at the
same time that you map out a whole
-
instruction set. However there are some
components of the microcode ROM that are
-
most likely not triggered by instructions.
For example it seems like power management
-
or everything that is behind a write MSR
[wrmsr] or read MSR [rdmsr]. wrmsr is a
-
single instruction, but depending on the
arguments you give it it just branches to
-
totally different triads and the microcode
itself is implemented in microcode. And
-
this one is a huge chunk you wouldn't even
find without brute forcing all
-
combinations for all instructions which is
not really feasible.
-
Herald Angel: Thank you. Thank you
Benjamin.
-
applause
-
35c3 postroll music
-
subtitles created by c3subtitles.de
in the years 2019-2020. Join, and help us!