36C3 preroll music
Herald: The next talk is an intel
management engine, deep dive.
Understanding the ME at the OS and
hardware level and it is by Peter Bos,
Please welcome him with a great round of
applause!
Applause
Peter Bosch: Right. So everybody. Harry.
Nice. OK. So welcome. Well, this is me.
I'm a student at Leiden University. Yeah,
I've always been really interested in how
stuff works. And when I got a new laptop,
I was like, you know, how does this thing
really boot? I knew everything from reset
vector onwards. I wanted to know what
happened before it. So first I started
looking at the boot guard ACM. While
looking through it, I realized that not
everything was as it was supposed to be.
That led to a later part in the boot
process being vulnerable, which ended up
being discovered by me. And I found out
here last year that I wasn't the only one
to find it. Trammell Hudson also found it,
and we reported it together, presented it
at Hack in the Box. And then at the same
time, I was already also looking at the
management engine. Well, there had been a
lot of research done on that before. The
public info was mostly on the file system
and on specific vulnerabilities, which
still made it pretty hard to get started
on reverse-engineering it. So that's why I
thought it might be useful for me to
present this work here. It's basically
broken up into three parts. The first bit
is just a quick introduction into the
operating system it runs. So if you want
to work on this yourself, you're more
easily able to understand whats in your
face in your Disassembler. So and then
after that, I'll go over its role in the
boot process and then also how this
information can be used to to start
developing a new firmware for it or do
more security research on it. So first of
all, what exactly is the management
engine? There's been a lot of fuss about
it being a backdoor and everything, in
reality, if it is or not depends on the
software that it runs. It's basically a
processor with his own RAM and his own IO
and MMUs and everything's sitting inside
your south ridge. It's not in the CPU,
It's in its outreach. So when I say this
is gonna be about the sixth and seventh
generation of Intel chips, I mean, mostly
motherboards from those generations. If
you run a newer CPU on it, it will also
work for that. So yeah. Bit more detail.
CPU it runs is based on the 80486, which,
you know, is funny. It's quite an old CPU
you and it's still being used in almost
every computer nowadays. So it has a
little bit of its own RAM. It has quite a
bit of built in ROM, has a hardware
accelerated cryptographic unit and it has
fuses which are right once memory is used
to store security settings and keys and
everything. Some of the more scary
features it has: Bus bridges to all of the
buses inside the south ridge, it can
access the RAM on the CPU and it can
access the network, which makes it really
quite dangerous. If there is a
vulnerability or if it runs anything
nefarious and it's tasks nowadays include
starting the computer as well as adding
management features. This is mostly used
in servers where it can serve as a board
management controller, do like a remote
keyboard and video and it does security
boot guard, which is the signing of a
firmware and verification of signatures.
It implements a firmware TPM and there is
also a SDK to use it as a general purpose
secure enclave. So on the software side of
it, it runs a custom operating system,
parts of which are taken from MINIX, the
teaching operating system by Andrew
Tanenbaum. It's a micro kernel operating
system. It runs binaries that are in a
completely custom format. It's really
quite high level system actually. If you
look at it in terms of the operating
system, it runs, it's mostly like Unix,
which makes it kind of familiar, but it
also has large custom parts. Like I said
before in this talk, I'm going to be
speaking about sixth and seventh
generation Intel core chipsets, so that's
Sunrise Point. Lewisburg, which is the
server version of this and also the laptop
system on a chip they're just called Intel
core low power. They also include the
chipset as a separate die. So it also
applies to them. In fact, I've been
testing most of this stuff. I'm going to
tell you about on the laptop that's
sitting right here, which is a Lenovo T
460. The version of the firmware I've been
looking at is 11001205. Right. So I do
need to put this up there. I'm not a part
of Intel, nor have I signed any contracts
to them. I've found everything in ways
that you could also do. I didn't have any
leaked NDA stuff or anything that you
couldn't get your hands on. It's also a
very wide subject area, so there might be
some mistakes here or there, but generally
it should be right. Well, if you want to
get started working on an ME firmware,
want to reverse-engineer it or modify it
in some way first, you've got to deal with
the image file. You've got your SPI flash.
It's where most of its firmware lives in
the same flash chip as your BIOS. So
you've got that image. And then how do you
get the code out? Well, there's tools for
that. It's already been extensively
documented, documented by other people.
And you can basically just download a tool
and run it against it. Which makes this
really easy. This is also the reason why
there hasn't been a lot of research done
yet before these tools were around. You
couldn't get to all of the code. The
kernel was compressed using Huffman
tables, which were stored in ROM. You
couldn't get to the ROM without getting
code execution on the thing. So there was
basically no way of getting access to the
kernel code. And I think also to see some
library. But that's not a problem anymore.
You can just download a tool and unpack
it. Also, the intel tool to generate
firmware images, which you can find in
some open directories on the internet, has
Qt resources, XML-files which basically have the
description for all of the file formats
used by these ME versions, including names
and comments to go with those structured
definitions. So that's really useful. So
we look at one of these images. It has a
couple of partitions, some of them overlap
and some of them are storage, some are
code. So there is the main partitions,
FTPR and NFTP, which contain the programs
it runs. There's MFS, which is the read-write
file system it uses for persistent
storage. And then there is a log to flash
option, the possibility to embed a token
that will tell the system to unlock all
debug access which has to be signed by
Intel so it's not really of any use to us.
And then there is something interesting,
ROM bypass. Like I said, you can't get
access to the ROM without running code on
it. And ROM is mask ROM. So it's internal
to the chip, but Intel has to develop new
ROM code and have to test it without
respinning the die every time. So they
have a possibility on a unlocked
preproduction chipset to completely bypass
the internal ROM and load even the early
boot code from the flash chip. Some of
these images have leaked and you can use
them to get a look at the ROM code, even
without being able to dump it. That's
going to be really useful later on. So
then you've got these code partitions and
they contain a whole lot of files. So
there is the binaries themselves which
don't have any extension. There is the
metadata files. So the binary format they
use has no headers, nothing included. And
all of that data is in the metadata file.
And when you use the unME11 tool, you can
actually, it'll convert those to text
files for you so you can just get started
without really understanding how they
work. Yes. So the metadata. It's type-
length-value structure, which contains a
whole lot of information the operating
system needs. It has the info on the
module, whether it's data or code, where
it should be loaded, what the privileges
of the process should be, a SHA
checksum for validating it and also some
higher level stuff such as device file
definitions if it's a device driver or any
other kind of server. I've actually
written some code that uses this, that's
on GitHub, so if you want a closer look at
it, some of the slides have a link to to
get a file in there which contains the
full definitions. Right. So all the code
on the ME is signed and verified by Intel.
So you can't just go and put in a new
binary and say, hey, let's run this. The
way they do this is in Intel's
manufacture-time fuses, they have a hash
of the public key that they use to sign
it. And then on each flash partition,
there is a manifest which is signed by the
key and it contains the SHA hashes for all
the metadata files, which then contain a
SHA hash for the code files. It doesn't
seem to be any major problems in verifying
this, so it's useful to know, but it's
you're not really gonna use this. And then
the modules themself, as I've said,
they're flat binaries. Mostly. The
metadata contains all the info the kernel
uses to reconstruct the actual program
image in memory. And a curious thing here
is that the actual base address for all
the modules for old programs is the same
across an image. So if you have a
different version, it's going to be
different. But if you have two programs
from the same firmware it's gonna be
loaded at the same virtual address. Right.
So when you want to look at it, you're
gonna load it in some disassembler, like
for example IDA, and you'll see this, it
disassembles fine, but it's gonna
reference all kinds of memory that you
don't have access to. So usually you'd
think maybe I've loaded up a wrong address
or or am I missing some library? Well,
here you've loaded it correctly if you use
that, the address from the metadata file.
But you are in fact missing a lot of
memory segments. And let's just take a
look at each of these. It's calling and
switching code. It's pushing a pointer
there, which is data. And what's that? So
it has shared libraries, even though it's
flat binaries. It actually does use shared
libraries because you only have 1.5
megabyte of RAM. You don't want to
link your C library into everything and
waste what little memory you have. So
there is the main system library which is
like libc on a Linux system. It's in a
flash partition, so you can actually just
load it and take a look at it easily and
it starts out with a jump table. So
there's no symbols in the metadata file or
anything. It doesn't do dynamic linking.
It loads the pages for the shared library
at a fixed address, which is also in the
shared library's metadata. And then it's
just there in the processor's memory and
it's gonna jump there if it needs a
function. And the functions themself are
just using the normal System V, x86
calling conventions. So it's pretty easy
to look at that using your normal tools.
There's no weird register argument passing
going on here. So, right. Now, shared
libraries. There's two of them. And this
is where it gets annoying. The system
library, you've got access to that so you
can just take your time and go through it
and try to figure out, you know, oh, hey,
is this open or is this read or what's
this function doing? But then there's also
another second really large library, which
is in ROM. They have all the C library
functions and some of their custom helper
routines that don't interact with the
kernel directly, such as strings
functions. They live in ROM. So when
you've got your code and this is basically
where I was when I was here last year,
you're looking through it and you're
seeing calls to a function you don't have
the code for all over the place. And you
have to figure out by its signature what
is it doing. And that works for some of
the functions and it's really difficult
for other ones. That really had me stopped
for a while. Then I managed to find one of
these ROM bypass images and I had the code
for a very early development build of the
ROM. This is where I got lucky. So the
actual entry point addresses are fixed
across a entire chipset family. So if you
have an image for the server version of
like 100 series chipset or for client
version or for a desktop or laptop
version, it's all gonna be the same ROM
addresses. So even though the code might
be different, you'll have the jump table,
which means the addresses can say fixed.
So this only needs to be done once. And in
fact when I upload my slides later, there
is a slide in there at the end that has
the addresses for the most used functions.
So you're not going to have to repeat that
work, at least not for this chipset. So if
you want to look at a simple module,
you've loaded it, now you've applied the
things I just said, and you still don't
have the data sections. If I don't know
what that function there is doing, but
it's not very important. It actually
returns a value, I think, that's not used
anywhere, but it must have a purpose
because it's there. Right. So then you
look at the entry point and this is a lot
of stuff. And the main thing that matters
here is on the right half of the screen,
there is a listing from a MINIX repository
and on the left half there is a
disassembly from an ME module. So it's
mostly the same. There is one key
difference, though. The ME module actually
has a little bit of code that runs before
its C library startup function. And that
function actually does all the ME specific
initialization, does a lot of stuff
related to how C library data is kept
because there is also no data segments for
the C library being allocated by the
kernel. So each process actually reserves
a part of its own memory and tells the C
library, like, any global variables you
can store in there. But when you look at
that function, one of the most important
things that it calls is this function.
It's very simple, it just copies a bunch
of RAM. So they don't have support for
initialized data sections. It's a flat
binary. What they do is they they actually
use the .bss segment, the zeroed segment
at the end of the address space, and copy
over a bunch of data in the program. The
program itself is not aware of this. It's
really in the initialization code and in
linker script. So this is also something
that's very important because you're going
to need to also at that address in the
data section, you're going to need to load
the last bit of the of the binary.
Otherwise you're missing constants or at
least initialization values. Right. Then
there is the full memory map to the
processes themselves. It's a flat 32 bit
address space. It's got everything you
expect in there. It's got a stack and a
heap and everything. There's a little bit
of heap allocated right on initialization.
This is this is basically how you derive
the address space layout from the
metadata, especially like the data
segment, then, and the stack itself is
like the address location varies a lot
because of the number of threads that are
in use or the size of data sections. And
also those stack guards, they're not
really stack guards. There is also
metadata for each thread in there. But
that's nothing that's relevant to the
process itself, only to the kernel. And
well, if you then skip forward a bit and
you've done all these - you look at your
simple driver like this. This is taken
from a driver used to talk to the CPU,
like, OK. So when I say CPU or host, by
the way, I mean the CPU, like your big
SkyLake, or KabyLake, or CoffeeLake,
whatever your big CPU that runs your own
operating system. Right. So this is used
to to send messages there. But if you look
at what's going on here, OK - think I had
a problem with the animation here - it
sets up some stuff and then it calls a
library function that's in the main syslib
library, which actually has a main loop
for the program. That's because Intel was
smart and they added a nice framework for
device driver implementing programs,
because it's a micro kernel, so device
drivers are just usual programs, calling
specific APIs. Then there's normal POSIX
file I/O. No standard I/O, but it has all
the normal open, and read, and ioctl and
everything functions. And then there's
more initialization for the srv library.
And this is basically what all the simple
drivers look like in it. And then there's
this. Because they're so low a memory,
they don't actually use standard I/O, or
even printf itself to do most of the
debugging. It uses a thing that's called
"sven", I'll touch on that later. So there
is the familiar APIs that I talked about.
It even has POSIX threads, or at least a
subset of it, and there is all the
functions that you'd expect to find on
some generic Unix machine. So that
shouldn't be too much of a problem to do
with, but then there's also their own
tracing solution, sven. That's what Intel
calls it. The name is in all the development
tools that you can download
from their site, and basically, they don't
include format strings for a lot of the
stuff. They just have a 32-bit identifier
that is sent over debug port, and it
refers to a format string in a dictionary
that you don't have. There is one of the
dictionaries for a server chip that's
floating around the internet, but even
that is incomplete. And the normal non-NDA
version of the Intel developer tools has
some 50 format strings for really common
status messages it might output, but yeah,
like, if you see these functions, just
realize it's doing some debug print. There
might be dumping some states or just
telling it it's gonna do something else.
It's no important logic actually happens
in here. Right. So then for device files.
They're actually defined in a manifest.
When the kernel loads a program, and that
program wants to expose some kind of
interface to other programs its manifest
will contai,n or it's metadata file will
contain a special file producer entry, and
that says, you know, you have these device
files, with a name, and an access mode and
the user, and group ID, and everything,
and the minor numbers, and the kernel
sends this to the- or not kernel- the
program loader sends this to the virtual
file system server and it automatically
gets a device file, pointing to the right
major or minor number. And then there's
also a library, as I said, to provide a
framework for a driver. And that looks
like this. It's really easy to use. If you
were a ME developer you just write some
callbacks for open, and close, and
everything, and it automatically calls
them for you, when a message comes in,
telling you that that happened, which also
makes it really easy to reverse engineer,
'cause if you look at a driver, it just
loads some callbacks, and you can know, by
their offset in a structure, what actual
call they're implementing. Right, so then
there is one of the more weird things
that's going on here: How the actual
userland programs get access to memory map
registers. There's a lot of this going on.
Calls to a couple of functions that have
some magic arguments. The second one you
can easily tell is the offset, because it
has- it increases in very nice power-of-
two steps, so it's probably the register
offsets, and then what comes after it
looks like a value. And then the first bit
seems to be a magic number. Well, it's
not. There is also an extension in the
metadata, saying these are the memory
mapped I/O ranges, and those ranges,
they'd each list a physical base address,
and a size, and permissions for them. Then
the index in that list does not directly
correspond to the magic value. The magic
value actually you need to do a little
computation on the offset, and you can
access it through those functions. The
computation itself might be familiar.
Yeah, so these are the functions. The
value is a segment selector. So they use
them. Actually, don't use paging for inter
process isolation, they use segments like
x86 Protected Mode segments. And for each
memory mapped I/O range there is a
separate segments, and you manually specify
that, which is just weird to me, like, why
would you use x86 segmenting on a modern
system? Minix does it, but, yeah, to
extent that even to this? Luckily, normal
address space is flat, like, to the
process, not to the kernel. Right, so now
we can access memory mapped I/O. That's
all the, like the really high level stuff.
So what's going on under there? It's got
all the basic microkernel stuff, so
message passing, and then some
optimizations to actually make it perform
well on a really slow CPU. The basics are,
you can send a message, you can receive a
message, and you can send and receive a
message, where you basically say "Send a
message, wait till a response comes in,
then continue", which is used to wrap
function calls. This is mostly the same as
in Minix. There's some subtle changes,
which I'll get to later. And then memory
grants are something that only appeared in
Minix really recently. It's a way for a
process to basically create a new name for
a piece of memory it has, and give a
different process access to it, just by
sharing the number. These are referred to
by the process ID and a number of that
range. So the process IDs are actually
local per process, so to uniquely identify
one you need to say process ID plus that
number, and they're only granted to a
single process. So when a process creates
one of these, it can't even access it
itself, unless it creates a grant for
itself, which is not really that useful,
usually. These grants are used to prevent
having to copy over all the data inside
the IPC message used to implement a system
call. Yeah, these are the basic operations
on it. You can create one, you can copy
into and from it. So, you can't actually
map it. A process that receives one of
these has to say to the kernel, using a
system call, "please write this data into
that area of memory that belongs to a
different process." And then there's also
indirect grants, because, you know, in
Minix they do have this, but also only
recently, and usually if you have a
microkernel system, you would have to copy
your buffer for a read call first to the
file system server and then back to, like,
either the hard disk driver, or the device
driver that's implementing a device file.
So the ME actually allows you to create a
grant, pointing to a grant, that was given
to you by someone else. And then that
grant will inherit the privileges of the
process that creates it, combined with
those that it assignes to it. So if the
process has a read/write grant it can
create a read-only or write-only grant,
but it cannot, if it only has a read
grant, it cannot add write rights to it
for a different process, obviously. So
then there is also some big differences
from MINIX. In MINIX you address a process
by its process ID or thread ID with a
generation number attached to it. In the
ME you can actually address IPC to a file
descriptor. Kernel doesn't actually know a
lot about file descriptors, it just
implements the basic thing where you have
a list of files and each process has a
list of file descriptors assigning integer
numbers to those files to refer to them
by. And this is used so you can as a
process, you can actually directly talk to
a device driver without knowing what is
process ID is. So you don't send it to the
file system server, you send it to the
file descriptor or the Kernel just
magically corrects it for you. And they
moved select into the kernel so you can
tell the kernel: "Hey, I want to wait till
the file system server tells me that it
has not available or till a message comes
in." This is one of the most complicated
system calls the ME offers that's used in
a normal program. You can mostly ignore it
and just look like: "Hey, those arguments
sort of define a file descriptor set as a
bit field." And then there's the message
that might have been received and there's
DMA locks because you don't just want to
write to registers. You actually might
want to do the direct memory access from
hardware so you you can actually tell the
kernel to lock one of these memory grounds
in RAM for you, it won't be swapped out
anymore. And yeah, it will even tell you
the physical address so you can just load
that into a register and it's not really
that complicated. Just lock it, get a
physical access, write into the register
and continue. Well, that's the most
important stuff about the operating
system. The hardware itself is a lot more
complicated because the operating system,
once you have the code, you can just
reverse engineer it and get to know it.
The hardware. Well, let's just say it's a
real pain to have to reverse engineer a
piece of hardware together with its
driver. Like if you've got the driver
code, but you don't know what the
registers do. So you don't know what a lot
of logic does. And you're trying to both
figure out what the logic is and what the
actual registers do. Right. So first you
want to know which physical address goes
where? The metadata listings I showed you
actually have names in there. Those are
not in the metadata files themself, I
annotated those. So you just see the
physical address and size. But there is
one module, the bus driver module and the
bus driver is normal user process, but it
implements stuff like PCI configuration
space accesses and those things. And it
has a nice table in it with names for
devices. So if you just run strings on it,
you'll see these things. When I saw this,
I was was pretty glad because at least I
could make sense what device was being
talked to in a in a certain program. So
the bus driver does all these things. It
manages power getting to devices, it
manages configuration space access, it
manages the different kinds of buses and
IOMU that are on the system. And it makes
sure that the normal driver never has to
know any of these details. It just asked
it for a device by a number assigned to it
a build time. And then the bus driver
says, OK, here's a range of physical
address space you can now write to. So
that's a really nice abstraction and also
gives us a lot of information because the
really old builds for sunrise point
actually have a hell of a lot of debug
strings in there as printf format strings,
not as catalogue ID. It's
one of the only pieces of code for the ME
that does this, so that already tells you
a lot. And then there's also the table
that I just talked about that has the
actual info on the devices and names. So I
generated some DocuWiki content from this
that I use myself and this is what's in
the table, part of it. So it tells you
what address PCI configuration space lives
at. That tells you to do the bus device
function for it through that. It tells you
on what chipset SKU they're present using
a bitfield. And it tells you their names
in different fields. It also contains the
values that are used to write the base
address registers for PCI. So also their
normal memory ranges. And there's even
more devices. So the ME has access to a
lot of stuff. A lot of it is private to
it. A lot of it is components that also
exist in the rest of the computer. And
there's not a lot of information. A lot of
these are basically all the things that
are out there together with conference
slides published by other people who have
done research on the ME. I didn't have
time to add links to those, but they're
easy to find on Google. I'll get later to
this, I actually wrote a emulator for the
ME, a partial emulator to be able to run
ME code and analyze it, which obviously
needs to know a bit about the hardware so
you can look at the app. There is some
files in Intel's debugger package,
specific versions of that that have really
detailed info on some of the devices, also
not all of it. And I wrote some tool to
parse some of the files. It's really rough
code. I published it because people wanted
to see what I was doing. It doesn't work
out of the box. And there is a nice talk
on this by Mark Ermolov and Maxim
Goryachy.. Actually I don't know if I'm
pronouncing that correctly, but they've
done a lot of work on the ME and this
particular talk by them is really useful.
And then there's also something else.
There is a second ME on server chipsets,
the innovation engine. It's basically a
copy paste of the ME to provide a ME that
the vendor can write code for. Don't think
it's used a lot. I've only been able to
find HP software that actually targets it
and that has some more debug strings, but
also not a lot, it mostly has a table
containing register names, but they're
really abbreviated and for a really small
subset of the devices, there is
documentation out there in a Pentium N and
J series datasheet. It's seems like they
compile their a lot of code or whatever
with the wrong defines because it doesn't
actually fit into the manual that well,
it's just a section that has like some 20
tables that shouldn't be in there. So this
is from that talk I just referenced and
it's a overview of the innovation engine
and the bus bridges and everything in
there. This isn't very precise. So based
on some of those files from System Studio,
I try to get a better understanding of
this, which is this. This is the entire
chipset. The little DMA block in the top
left corner is what connects to your CPU.
And all of the big blocks with a lot of
ports are our bus bridges or switches for
PCIexpress-like fabric. So there's a lot
going on. The highlighted area is the
management engine memory space and the
rest of it is like the global chipset. The
things I've highlighted in green hair are
on the primary PCI bus. So there's this
weird thing going on where there seems to
be two PCI hierarchies, at least
logically. So in reality it's not even
PCI, but on intel systems, there's a lot
of stuff that behaves as if it is PCI. So
it has like a bus device function and
numbers, PCI configuration space registers
and they have two different roots for the
configuration space. So even though the
configuration space address includes a bus
number, they have two completely different
things with each. Each of which has its
own bus zero. So that's that's weird also
because they don't make sense when you
look at how the hardware is laid out. So
this is stuff that's on the primary PCI
configuration space that's directly
accessed by the EM, by the north bridge on
the ME CPU. So that's the minute I A
system agent. System agent is what Intel
calls a Northbridge nowadays, now that
it's not a separate chip anymore. It's
basically just a Northbridge and a crypto
unit that's on there and the stuff that's
directly attached to Northbridge being the
ROM and the RAM. So the processor itself
is, as I said, derived from a 486, but it
does actually have some more modern
features that it does CPU ID, at least on
my systems. Some other researchers said
theirs didn't. It's basically the core
that's in the quark MCU, which is really
great because it's one of the only cores
made by Intel that has public
documentation on how to do run control. So
breakpoints and accessing registers and
everything over JTAG. Intel doesn't
publish this stuff except for the quark
MCU, because they were targeted makers.
But they reused that in here, which is
really useful. It even has an official
port to the OpenOCD debugger, which I have
not gotten to test because I don't have a
JTAG probe, which is compatible with Intel
voltage levels and supported by OpenOCD
and also has like a set CPU ID and MSRs.
It has some really fancy features like
branch tracing and some more strict paging
permission enforcement stuff. They don't
use the interrupt pins on this. So it's an
IP block but if there are some files out
there, that's where it is this screenshot
is from, that actually are used by a
built in logic analyzer Intel has on the
chipset and you can select different
signals on the chip to to watch, which is
a really great source of information on
how the IP blocks are laid out and what
signals are in there, because you
basically get a tree view of the IP blocks
and chip and some of their signals. They
don't use the legacy interrupt system,
they only use message based interrupts by
what a device writes a value into a
register on the interrupt controller
instead of asserting a pin. And then there
is the Northbridge. It's partially
documented in that data sheet I mentioned,
it does support x86 IO address space, but
it's never used. Everything in the ME is
in memory space or expose as memory space
through bridges, in the Northbridge
implements access to the ROM,RAM, it has a
IOMMU which is only used for transactions
coming from the rest of the system and
it's always initialized to, at least in
the firmware I looked up, it's always
initialized to the inverse of the page
table, so linear addresses can be used for
memory maps, sorry, for DMA. It also does
PCI configuration space access to the
primary PCI bus. And it has a firewall
that allows the operating system to deny
any IP block in the chipset from sending a
completion on the bus request. So it can
actually say: "Hey, I want to read some
register and only these devices are
allowed to send me value for it." So
they've actually thought about security
here, which is great. Then there is one of
the most important blocks in the ME, which
is the crypto engine. It does some sort of
more well-known crypto algorithms. AES,
SHA hashes, RSA and it has a secure key
store, which I'm not gonna [audio dropped]
... all about it in their ME talk at
Blackhat. And a lot of these things have
DMA engines, which all seem to be the
same. And there is no other DM agents ...
engines in ME, so this is also used from
memory to memory copy or DMA into other
devices. So that's used in a lot of
things. This is actually a diagram which I
don't have the vector for anymore. So
that's why the libre office background is
in there. I'm sorry. So this is basically
what that crypto engine looks like when
you look at that signal tree that I was
talking about earlier. The DMA engines are
both able to do memory to memory copies
until directly targets the crypto unit
they're part of. Basically, when you, I
don't know about the control bits that go
with this, but when you set the target
address to zero and the right control
bits, it will copy into the buffer that's
used for the encryption. So that is how it
accelerates memory access for crypto. And
these are the actual register offsets.
They're the same for all of the DMA
engines in there relative to the base
address of the subunit they're in. And
then there's the second PCI bus or bus
hierarchy, which is like in some places
called the PCI fixed bus. I'm actually not
entirely sure whether this is actually
implemented as a PCI bus as I've drawn it
here, but this is what it behaves like. So
it has all the ME private stuff, that's
not a part of the normal chipset. So it's
timers for the ME, it has the
implementation of the secure enclave
stuff, that the firmware TPM registers.
And it has the gen device which I've
mostly ignored because it's only used the
boot time. It's only used by the actual
boot ROM for the ME mostly. It is what the
ME uses to get the fuses Intel burns. So
that's the intel public key, whether it's
a production or pre-production part, but
it's pretty much a black box. It's not
used that much, fortunately. There is the
IPC block which allows the ME to talk to
the sensor hub, which is a different CPU
in the chipset. It allows it to talk to
power management controller and all kinds
of other embedded CPUs. So it's inter
processor communication not interprocess.
Confused me for a bit. And here's the host
embedded controller interface, which is
how the ME talks to the rest of the
computer when it wants the computer to
know that it's talking so it can directly
access a lot of stuff. But when it wants
to send a message to the EFI or to Windows
or Linux, it'll use this. And it also has
status registers, which are really simple
things where the ME writes in a value. And
even if the ME crashes, the host can still
read the value, which is how you can see
whether the ME is running, whether it's
disabled, whether it fully booted, or
whether it crashed halfway through. But at
a point where it could still get the rest
of the computer running and there is some
corporate code to to read it. I've also
implemented some decoding for it on the
emulator because it's useful to see what
those values mean. So then there's
something really interesting, the primary
adverse translation table, which is the
bus bridge that allows the ME to actually
access the PCIexpress fabric of the
computer. For a lot of the, what in this
table call ME peripherals, that are
actually outside the ME domain and the
chipset, it uses this to access it. It
also uses it to access the UMA, which is
an area of host RAM that's used as a swap
device for the ME and to Trace Hub, which is
the debug port, but also has a couple of
windows which allow the ME to access any
random area of host RAM, which is the most
scary bit because UMA is specified by
host, but the host DRAM area is where you
can just point it anywhere. You can read
or write any value that that Windows or
Linux or whatever you're running has
sitting there. So that's scary to me. So
and then there's the rest of it, the rest
of the devices which are behind the
primary ATT. And that's a lot of stuff,
that's debug, that's also the older normal
peripherals that your P.C. has, but it
also includes things like the power
management controller, which actually
turns on and off all the different parts
of your computer. It controls clocks and
resets. So this is really important. There
is a concept that you'll come across where
you're reading Intel manuals or ME related
stuff that's root spaces besides your
normal addressing information for a PCI
device, it also has a root space number,
which is basically how you have a single
PCI device exposing two completely
different address spaces. And it's 0 for
the host, it's one for the ME. Some
devices expose the same information on
there. Other ones behave completely
different. That's something you don't
usually see. And then there's the side
band fabric. So besides all this stuff
they just covered, which is PCI like at
least. There is also something completely
different, side band fabric, which is a
completely packet switched network, where
you don't use any memory mapping by
default. You just have a one byte address
for a device and some other addressing
fields and you're just sending a message
saying: "Hey, I want to read configuration
or data or memory." And there is actually
a lot of information out there on this,
because Intel, it seems like I just copy
pasted their internal specification into a
patent. This is how you address it. This
is all devices on there, which is quite a
lot. It's also what you, if any of you are
kernel developers, and you've had to deal
with GPIO on Intel SoCs. There's this P2SB
device that you have to use. That's what
the host uses to access this. Their
documentation on it is really, really bad.
This was all done using static analysis.
But then I wanted to figure out how some
of the logic actually works and it was
really complicated to play around with the
ME. There was this nice talk by Ermolov
and Goryachy, where they said: "You know,
we found a an exploit that gives you code
execution and you can you can get JTAG
access to." It sounds really nice. It's
actually not that easy. So arbitrary code
execution in the BUP module, they actually
describe their exploit and how you should
use it. But they didn't describe anything
that's needed to actually implement that.
So if you want to do that, what you need
to do to figure out where to stack lives,
you need to know where you need to write a
payload that will actually get it from a
buffer overflow on a stack that, by the
way, uses stack cookies. So you can't just
overwrite the return address to turn that
into an arbitrary write. And you need to
find out what the return pointer address
is so you can overwrite it and find ROP
gadgets because the stack is not
executable. And then when you've done
that, you can just turn on debug access or
change to custom firmware or whatever. So
what I did is I had a bit of trouble
getting that running and in order to test
your payload, you have to flash it into
the system and it takes a while and then
the system just doesn't power on if the
ME's not working, if you're crashing it
instead of getting code execution. So it's
not really valuable to to develop it that
way, I think. Some people did. I respect
that because it's really, really hard. And
then I wrote this ME Loader, it's called
Loader because at first I started out like
writing it as a sort of a wine thing where
you where you would just mmap the right
ranges at the right place and jump into
it, execute it, patch some system calls.
But because the ME is a micro kernel
system in almost every user space program
accesses hardware directly, it ended up
implementing like a good part of the
chipset, at least as stubs or enough logic
to get the code running. And I later on
added some features that actually allowed
to talk to the hardware. I can use it as a
debugger, but just because it's actually
running the ME firmware or parts of it
inside a normal Linux process, I can just
use gdb to debug it. And back in April
last year, I got that working to the point
where I could run the bootstrap process,
which is where the vulnerability is. And
then you just develop the exploit against
it, which I did. And then I made a mistake
cleaning up some old change root
environments for close source software.
And I nuked my home dir. Yeah. I hadn't
yet pushed everything to GitHub. So I
stuck with an old version and I decided,
you know, let's refactor this and turn it
into something that might actually at some
point be published, which by the way I
did last summer. This is all public code. The
ME Loader thing. It's on GitHub. And
someone else beat me to it and replicated
that exploit by the Russian guys. Which up to
then they have produced a proof of concept
thing for Apollo like chipsets, which were
completely different for from what you had
to do for normal ME. I was a bit
disappointed by that one, not being the
first one to actually replicate this. But
then I did about a week later, I got it
got my loader back to the point where I
could actually get to the vulnerable code
and develop that exploit and got it
working not too long after. And here's the
great thing. Then I went to the hacker
space. I flash it into my laptop. The
image that I had just been using only on
the emulator. I didn't change it. I flash.
I was like, this is never gonna work on
it. It works. some laughter And I've still got an image
on a flash ship with me because that's
what I used to actually turn on the
debugger. And then you need a debug probe
because that USB based debugging stuff
that's mentioned here only works pretty
late in boot. Which is also why I only
really see Apollo Lake stuff because on
those chipsets you can actually use this
for the ME. And then you need this thing
because there's a second channel, that is
using the USB plug, but it's a completely
different physical layer and you need an
adapter for it, which I don't think was
intended to be publicly available. Because
if you go to Intel site to say, I want to
buy this, they say, here's the C-NDA,
please sign it. But it appeared on mouser.
And luckily I knew some people, who had
done some other stuff, got a nice bounty
for it and bought it and I let me use it.
Thanks to them. It's expensive, but you
can buy it if it's still up there. Haven't
checked. That's the Link. So I'm a bit
late, so I'm gonna use the time for
questions as well. So the main thing the
ME does that you cannot replace is the
boot process. It's not just breaking the
system. If you don't turn it on, it
actually does stuff that has to be done.
So you gonna have to use the ME anyway if
you want to boot a computer. I don't
necessarily have to use Intel's firmware.
The ME itself boots is like a micro kernel
system, so it has a process which
implements a lot of the servers that will
allow it to get to a point where it can
start those servers. This process has very
high privileges in older versions, which
is what is being used on these chipsets.
And if you exploit that, you're still ring
3, but you can turn on debugger and you
can use the debugger to become ring 0. So
this is what normal boot process for a
computer looks like. And this is what
happens when you use Boot Guard. There's a
bit of code that runs even before the
reset vector, and that's started by micro
code initialization, of course. And this
is what actually happens. The ME loads a
new firmware into a power management
controller, it then ready some stuff in a
chipset and it tells the power mentioning
controller like please stop pulling that
CPU reset pin low and the CPU will start.
Power managment controller is a completely
independent thing I say 8051 derived
microcontroller that runs a real time
operating system from the 90s. This is the
only string in the firmware by the way,
that's quoted there. And depending on the
chipsset that you have, it's either loaded
with a patch or with a complete binary
from the ME, and it does a lot of
important stuff. No documentation on it
besides ACPI interface, which is not
really any useful. The ME has to do these
things. It needs to load the keys for the
Boot Guard process needs to set up clock
controllers and then tell the PMC to turn
on the power to to the CPU. It needs to
configure PCI express fabric and reset -
like get the CPU to come out of reset.
There's a lot of code involved in this, so
I really didn't want to do this all
statically. What I did is I added hardware
support, hardware passthrough support to
the emulator and booted my laptop that
way. Actually had a video of this, but I
don't have the time to show it, which is a
pity. But this is what I - the bring up
process from the ME running in a Linux
process, sending whatever hardware access
as it was trying to do that are important
for boot to the debugger. And then that
was using a ME in real hardware that was
halted to actually do to register accesses
and it works. It's not going to show this.
It actually booted the computer reliably.
Then Boot Guard configuration is fun
because you know where they say they fuse
in the keys. Well yeah. But the ME loads
them from fuses and then manually loads
them into registers. So if you have code
execution on the ME before it does this,
you can just load your own values and you
can run core boot even on a machine that
has Boot Guard. Yeah. So I'm gonna go
through this really quickly. This is, by
the way, these are the registers that
configure what security model the CPU is
gonna enforce for the firmware. I'm going
to release this code after my talk. It's
part of a Python script that I wrote that
uses the debugger to start the CPU without
ME firmware. I traced all the of the ME
firmware did. And I now have a Python
script that can just start a computer
without Intel's code. If you translate
this into a rough sequence or even into
binary for the ME, you can start a
computer without the ME itself or at least
without it running the operating system.
applause
So, yeah, future goals. I really do want
to share this because if there is a way to
escalate, to ring 0 fruit, a rope chain,
then you could just start your own kernel
in the ME and have custom firmware, at
least from the vulnerability on. But you
could also build a mod chip that uses the
debugger interface to load a new firmware.
There's lots of stuff still needs to be
discovered, but I'm gonna hang out at the
open source firmware village later, at
least part of the week here. So because I
really want to get started on open source
ME firmware using this. Right. And there's
a lot of people that's played a role in
getting me to this point. Also would like
to thank the guy from Hague hacker space,
BinoAlpha, who basically allowed me to use
his laptop to prepare the demo, which I
ended up not being able to show, but.
Right. I was gonna ask what are the
worrying questions? But I don't think
there's really any time for any more.
Herald: Peter, thank you so much. Applause
Unfortunately, we don't have any more time
left.
Peter: I'll be around. I'll be around.
Herald: I think it's very, very
interesting because I hope that your talk
will inspire many people to keep looking
into how the management engine works and
hopefully uncover even more stuff. I think
we have time for just one single question.
I don't know, do we? How one from the
Internet. Thank you so much.
Signal Angel: OK. First off, I have to
tell you. Your shirt is nice. Chat wanted
me to say this. And they asked how
reliable this exploit is and does it work
on every boot?
Peter: Right, Yeah. That's actually
something really important that I forgot
to mention. So they patch a vulnerability,
but they didn't provide downgrade
protection. If you could flash a
vulnerable image with an exploit in it,
it'll just boot every time on these chips
that's so six or seven generation chips
that's put in that image and it will
reliably turn on the debugger every time
you turn on the computer. applause
Herald: Thank you so much for the
question. And Peter Bosch thank you so
much. Please give him a great round of
applause.
applause
subtitles created by c3subtitles.de
in the year 20??. Join, and help us!