-
36C3 preroll music
-
Herald: The next talk is an intel
management engine, deep dive.
-
Understanding the ME at the OS and
hardware level and it is by Peter Bos,
-
Please welcome him with a great round of
applause!
-
Applause
-
Peter Bosch: Right. So everybody. Harry.
Nice. OK. So welcome. Well, this is me.
-
I'm a student at Leiden University. Yeah,
I've always been really interested in how
-
stuff works. And when I got a new laptop,
I was like, you know, how does this thing
-
really boot? I knew everything from reset
vector onwards. I wanted to know what
-
happened before it. So first I started
looking at the boot guard ACM. While
-
looking through it, I realized that not
everything was as it was supposed to be.
-
That led to a later part in the boot
process being vulnerable, which ended up
-
being discovered by me. And I found out
here last year that I wasn't the only one
-
to find it. Trammell Hudson also found it,
and we reported it together, presented it
-
at Hack in the Box. And then at the same
time, I was already also looking at the
-
management engine. Well, there had been a
lot of research done on that before. The
-
public info was mostly on the file system
and on specific vulnerabilities, which
-
still made it pretty hard to get started
on reverse-engineering it. So that's why I
-
thought it might be useful for me to
present this work here. It's basically
-
broken up into three parts. The first bit
is just a quick introduction into the
-
operating system it runs. So if you want
to work on this yourself, you're more
-
easily able to understand whats in your
face in your Disassembler. So and then
-
after that, I'll go over its role in the
boot process and then also how this
-
information can be used to to start
developing a new firmware for it or do
-
more security research on it. So first of
all, what exactly is the management
-
engine? There's been a lot of fuss about
it being a backdoor and everything, in
-
reality, if it is or not depends on the
software that it runs. It's basically a
-
processor with his own RAM and his own IO
and MMUs and everything's sitting inside
-
your south ridge. It's not in the CPU,
It's in its outreach. So when I say this
-
is gonna be about the sixth and seventh
generation of Intel chips, I mean, mostly
-
motherboards from those generations. If
you run a newer CPU on it, it will also
-
work for that. So yeah. Bit more detail.
CPU it runs is based on the 80486, which,
-
you know, is funny. It's quite an old CPU
you and it's still being used in almost
-
every computer nowadays. So it has a
little bit of its own RAM. It has quite a
-
bit of built in ROM, has a hardware
accelerated cryptographic unit and it has
-
fuses which are right once memory is used
to store security settings and keys and
-
everything. Some of the more scary
features it has: Bus bridges to all of the
-
buses inside the south ridge, it can
access the RAM on the CPU and it can
-
access the network, which makes it really
quite dangerous. If there is a
-
vulnerability or if it runs anything
nefarious and it's tasks nowadays include
-
starting the computer as well as adding
management features. This is mostly used
-
in servers where it can serve as a board
management controller, do like a remote
-
keyboard and video and it does security
boot guard, which is the signing of a
-
firmware and verification of signatures.
It implements a firmware TPM and there is
-
also a SDK to use it as a general purpose
secure enclave. So on the software side of
-
it, it runs a custom operating system,
parts of which are taken from MINIX, the
-
teaching operating system by Andrew
Tanenbaum. It's a micro kernel operating
-
system. It runs binaries that are in a
completely custom format. It's really
-
quite high level system actually. If you
look at it in terms of the operating
-
system, it runs, it's mostly like Unix,
which makes it kind of familiar, but it
-
also has large custom parts. Like I said
before in this talk, I'm going to be
-
speaking about sixth and seventh
generation Intel core chipsets, so that's
-
Sunrise Point. Lewisburg, which is the
server version of this and also the laptop
-
system on a chip they're just called Intel
core low power. They also include the
-
chipset as a separate die. So it also
applies to them. In fact, I've been
-
testing most of this stuff. I'm going to
tell you about on the laptop that's
-
sitting right here, which is a Lenovo T
460. The version of the firmware I've been
-
looking at is 11001205. Right. So I do
need to put this up there. I'm not a part
-
of Intel, nor have I signed any contracts
to them. I've found everything in ways
-
that you could also do. I didn't have any
leaked NDA stuff or anything that you
-
couldn't get your hands on. It's also a
very wide subject area, so there might be
-
some mistakes here or there, but generally
it should be right. Well, if you want to
-
get started working on an ME firmware,
want to reverse-engineer it or modify it
-
in some way first, you've got to deal with
the image file. You've got your SPI flash.
-
It's where most of its firmware lives in
the same flash chip as your BIOS. So
-
you've got that image. And then how do you
get the code out? Well, there's tools for
-
that. It's already been extensively
documented, documented by other people.
-
And you can basically just download a tool
and run it against it. Which makes this
-
really easy. This is also the reason why
there hasn't been a lot of research done
-
yet before these tools were around. You
couldn't get to all of the code. The
-
kernel was compressed using Huffman
tables, which were stored in ROM. You
-
couldn't get to the ROM without getting
code execution on the thing. So there was
-
basically no way of getting access to the
kernel code. And I think also to see some
-
library. But that's not a problem anymore.
You can just download a tool and unpack
-
it. Also, the intel tool to generate
firmware images, which you can find in
-
some open directories on the internet, has
Qt resources, XML-files which basically have the
-
description for all of the file formats
used by these ME versions, including names
-
and comments to go with those structured
definitions. So that's really useful. So
-
we look at one of these images. It has a
couple of partitions, some of them overlap
-
and some of them are storage, some are
code. So there is the main partitions,
-
FTPR and NFTP, which contain the programs
it runs. There's MFS, which is the read-write
-
file system it uses for persistent
storage. And then there is a log to flash
-
option, the possibility to embed a token
that will tell the system to unlock all
-
debug access which has to be signed by
Intel so it's not really of any use to us.
-
And then there is something interesting,
ROM bypass. Like I said, you can't get
-
access to the ROM without running code on
it. And ROM is mask ROM. So it's internal
-
to the chip, but Intel has to develop new
ROM code and have to test it without
-
respinning the die every time. So they
have a possibility on a unlocked
-
preproduction chipset to completely bypass
the internal ROM and load even the early
-
boot code from the flash chip. Some of
these images have leaked and you can use
-
them to get a look at the ROM code, even
without being able to dump it. That's
-
going to be really useful later on. So
then you've got these code partitions and
-
they contain a whole lot of files. So
there is the binaries themselves which
-
don't have any extension. There is the
metadata files. So the binary format they
-
use has no headers, nothing included. And
all of that data is in the metadata file.
-
And when you use the unME11 tool, you can
actually, it'll convert those to text
-
files for you so you can just get started
without really understanding how they
-
work. Yes. So the metadata. It's type-
length-value structure, which contains a
-
whole lot of information the operating
system needs. It has the info on the
-
module, whether it's data or code, where
it should be loaded, what the privileges
-
of the process should be, a SHA
checksum for validating it and also some
-
higher level stuff such as device file
definitions if it's a device driver or any
-
other kind of server. I've actually
written some code that uses this, that's
-
on GitHub, so if you want a closer look at
it, some of the slides have a link to to
-
get a file in there which contains the
full definitions. Right. So all the code
-
on the ME is signed and verified by Intel.
So you can't just go and put in a new
-
binary and say, hey, let's run this. The
way they do this is in Intel's
-
manufacture-time fuses, they have a hash
of the public key that they use to sign
-
it. And then on each flash partition,
there is a manifest which is signed by the
-
key and it contains the SHA hashes for all
the metadata files, which then contain a
-
SHA hash for the code files. It doesn't
seem to be any major problems in verifying
-
this, so it's useful to know, but it's
you're not really gonna use this. And then
-
the modules themself, as I've said,
they're flat binaries. Mostly. The
-
metadata contains all the info the kernel
uses to reconstruct the actual program
-
image in memory. And a curious thing here
is that the actual base address for all
-
the modules for old programs is the same
across an image. So if you have a
-
different version, it's going to be
different. But if you have two programs
-
from the same firmware it's gonna be
loaded at the same virtual address. Right.
-
So when you want to look at it, you're
gonna load it in some disassembler, like
-
for example IDA, and you'll see this, it
disassembles fine, but it's gonna
-
reference all kinds of memory that you
don't have access to. So usually you'd
-
think maybe I've loaded up a wrong address
or or am I missing some library? Well,
-
here you've loaded it correctly if you use
that, the address from the metadata file.
-
But you are in fact missing a lot of
memory segments. And let's just take a
-
look at each of these. It's calling and
switching code. It's pushing a pointer
-
there, which is data. And what's that? So
it has shared libraries, even though it's
-
flat binaries. It actually does use shared
libraries because you only have 1.5
-
megabyte of RAM. You don't want to
link your C library into everything and
-
waste what little memory you have. So
there is the main system library which is
-
like libc on a Linux system. It's in a
flash partition, so you can actually just
-
load it and take a look at it easily and
it starts out with a jump table. So
-
there's no symbols in the metadata file or
anything. It doesn't do dynamic linking.
-
It loads the pages for the shared library
at a fixed address, which is also in the
-
shared library's metadata. And then it's
just there in the processor's memory and
-
it's gonna jump there if it needs a
function. And the functions themself are
-
just using the normal System V, x86
calling conventions. So it's pretty easy
-
to look at that using your normal tools.
There's no weird register argument passing
-
going on here. So, right. Now, shared
libraries. There's two of them. And this
-
is where it gets annoying. The system
library, you've got access to that so you
-
can just take your time and go through it
and try to figure out, you know, oh, hey,
-
is this open or is this read or what's
this function doing? But then there's also
-
another second really large library, which
is in ROM. They have all the C library
-
functions and some of their custom helper
routines that don't interact with the
-
kernel directly, such as strings
functions. They live in ROM. So when
-
you've got your code and this is basically
where I was when I was here last year,
-
you're looking through it and you're
seeing calls to a function you don't have
-
the code for all over the place. And you
have to figure out by its signature what
-
is it doing. And that works for some of
the functions and it's really difficult
-
for other ones. That really had me stopped
for a while. Then I managed to find one of
-
these ROM bypass images and I had the code
for a very early development build of the
-
ROM. This is where I got lucky. So the
actual entry point addresses are fixed
-
across a entire chipset family. So if you
have an image for the server version of
-
like 100 series chipset or for client
version or for a desktop or laptop
-
version, it's all gonna be the same ROM
addresses. So even though the code might
-
be different, you'll have the jump table,
which means the addresses can say fixed.
-
So this only needs to be done once. And in
fact when I upload my slides later, there
-
is a slide in there at the end that has
the addresses for the most used functions.
-
So you're not going to have to repeat that
work, at least not for this chipset. So if
-
you want to look at a simple module,
you've loaded it, now you've applied the
-
things I just said, and you still don't
have the data sections. If I don't know
-
what that function there is doing, but
it's not very important. It actually
-
returns a value, I think, that's not used
anywhere, but it must have a purpose
-
because it's there. Right. So then you
look at the entry point and this is a lot
-
of stuff. And the main thing that matters
here is on the right half of the screen,
-
there is a listing from a MINIX repository
and on the left half there is a
-
disassembly from an ME module. So it's
mostly the same. There is one key
-
difference, though. The ME module actually
has a little bit of code that runs before
-
its C library startup function. And that
function actually does all the ME specific
-
initialization, does a lot of stuff
related to how C library data is kept
-
because there is also no data segments for
the C library being allocated by the
-
kernel. So each process actually reserves
a part of its own memory and tells the C
-
library, like, any global variables you
can store in there. But when you look at
-
that function, one of the most important
things that it calls is this function.
-
It's very simple, it just copies a bunch
of RAM. So they don't have support for
-
initialized data sections. It's a flat
binary. What they do is they they actually
-
use the .bss segment, the zeroed segment
at the end of the address space, and copy
-
over a bunch of data in the program. The
program itself is not aware of this. It's
-
really in the initialization code and in
linker script. So this is also something
-
that's very important because you're going
to need to also at that address in the
-
data section, you're going to need to load
the last bit of the of the binary.
-
Otherwise you're missing constants or at
least initialization values. Right. Then
-
there is the full memory map to the
processes themselves. It's a flat 32 bit
-
address space. It's got everything you
expect in there. It's got a stack and a
-
heap and everything. There's a little bit
of heap allocated right on initialization.
-
This is this is basically how you derive
the address space layout from the
-
metadata, especially like the data
segment, then, and the stack itself is
-
like the address location varies a lot
because of the number of threads that are
-
in use or the size of data sections. And
also those stack guards, they're not
-
really stack guards. There is also
metadata for each thread in there. But
-
that's nothing that's relevant to the
process itself, only to the kernel. And
-
well, if you then skip forward a bit and
you've done all these - you look at your
-
simple driver like this. This is taken
from a driver used to talk to the CPU,
-
like, OK. So when I say CPU or host, by
the way, I mean the CPU, like your big
-
SkyLake, or KabyLake, or CoffeeLake,
whatever your big CPU that runs your own
-
operating system. Right. So this is used
to to send messages there. But if you look
-
at what's going on here, OK - think I had
a problem with the animation here - it
-
sets up some stuff and then it calls a
library function that's in the main syslib
-
library, which actually has a main loop
for the program. That's because Intel was
-
smart and they added a nice framework for
device driver implementing programs,
-
because it's a micro kernel, so device
drivers are just usual programs, calling
-
specific APIs. Then there's normal POSIX
file I/O. No standard I/O, but it has all
-
the normal open, and read, and ioctl and
everything functions. And then there's
-
more initialization for the srv library.
And this is basically what all the simple
-
drivers look like in it. And then there's
this. Because they're so low a memory,
-
they don't actually use standard I/O, or
even printf itself to do most of the
-
debugging. It uses a thing that's called
"sven", I'll touch on that later. So there
-
is the familiar APIs that I talked about.
It even has POSIX threads, or at least a
-
subset of it, and there is all the
functions that you'd expect to find on
-
some generic Unix machine. So that
shouldn't be too much of a problem to do
-
with, but then there's also their own
tracing solution, sven. That's what Intel
-
calls it. The name is in all the development
tools that you can download
-
from their site, and basically, they don't
include format strings for a lot of the
-
stuff. They just have a 32-bit identifier
that is sent over debug port, and it
-
refers to a format string in a dictionary
that you don't have. There is one of the
-
dictionaries for a server chip that's
floating around the internet, but even
-
that is incomplete. And the normal non-NDA
version of the Intel developer tools has
-
some 50 format strings for really common
status messages it might output, but yeah,
-
like, if you see these functions, just
realize it's doing some debug print. There
-
might be dumping some states or just
telling it it's gonna do something else.
-
It's no important logic actually happens
in here. Right. So then for device files.
-
They're actually defined in a manifest.
When the kernel loads a program, and that
-
program wants to expose some kind of
interface to other programs its manifest
-
will contai,n or it's metadata file will
contain a special file producer entry, and
-
that says, you know, you have these device
files, with a name, and an access mode and
-
the user, and group ID, and everything,
and the minor numbers, and the kernel
-
sends this to the- or not kernel- the
program loader sends this to the virtual
-
file system server and it automatically
gets a device file, pointing to the right
-
major or minor number. And then there's
also a library, as I said, to provide a
-
framework for a driver. And that looks
like this. It's really easy to use. If you
-
were a ME developer you just write some
callbacks for open, and close, and
-
everything, and it automatically calls
them for you, when a message comes in,
-
telling you that that happened, which also
makes it really easy to reverse engineer,
-
'cause if you look at a driver, it just
loads some callbacks, and you can know, by
-
their offset in a structure, what actual
call they're implementing. Right, so then
-
there is one of the more weird things
that's going on here: How the actual
-
userland programs get access to memory map
registers. There's a lot of this going on.
-
Calls to a couple of functions that have
some magic arguments. The second one you
-
can easily tell is the offset, because it
has- it increases in very nice power-of-
-
two steps, so it's probably the register
offsets, and then what comes after it
-
looks like a value. And then the first bit
seems to be a magic number. Well, it's
-
not. There is also an extension in the
metadata, saying these are the memory
-
mapped I/O ranges, and those ranges,
they'd each list a physical base address,
-
and a size, and permissions for them. Then
the index in that list does not directly
-
correspond to the magic value. The magic
value actually you need to do a little
-
computation on the offset, and you can
access it through those functions. The
-
computation itself might be familiar.
Yeah, so these are the functions. The
-
value is a segment selector. So they use
them. Actually, don't use paging for inter
-
process isolation, they use segments like
x86 Protected Mode segments. And for each
-
memory mapped I/O range there is a
separate segments, and you manually specify
-
that, which is just weird to me, like, why
would you use x86 segmenting on a modern
-
system? Minix does it, but, yeah, to
extent that even to this? Luckily, normal
-
address space is flat, like, to the
process, not to the kernel. Right, so now
-
we can access memory mapped I/O. That's
all the, like the really high level stuff.
-
So what's going on under there? It's got
all the basic microkernel stuff, so
-
message passing, and then some
optimizations to actually make it perform
-
well on a really slow CPU. The basics are,
you can send a message, you can receive a
-
message, and you can send and receive a
message, where you basically say "Send a
-
message, wait till a response comes in,
then continue", which is used to wrap
-
function calls. This is mostly the same as
in Minix. There's some subtle changes,
-
which I'll get to later. And then memory
grants are something that only appeared in
-
Minix really recently. It's a way for a
process to basically create a new name for
-
a piece of memory it has, and give a
different process access to it, just by
-
sharing the number. These are referred to
by the process ID and a number of that
-
range. So the process IDs are actually
local per process, so to uniquely identify
-
one you need to say process ID plus that
number, and they're only granted to a
-
single process. So when a process creates
one of these, it can't even access it
-
itself, unless it creates a grant for
itself, which is not really that useful,
-
usually. These grants are used to prevent
having to copy over all the data inside
-
the IPC message used to implement a system
call. Yeah, these are the basic operations
-
on it. You can create one, you can copy
into and from it. So, you can't actually
-
map it. A process that receives one of
these has to say to the kernel, using a
-
system call, "please write this data into
that area of memory that belongs to a
-
different process." And then there's also
indirect grants, because, you know, in
-
Minix they do have this, but also only
recently, and usually if you have a
-
microkernel system, you would have to copy
your buffer for a read call first to the
-
file system server and then back to, like,
either the hard disk driver, or the device
-
driver that's implementing a device file.
So the ME actually allows you to create a
-
grant, pointing to a grant, that was given
to you by someone else. And then that
-
grant will inherit the privileges of the
process that creates it, combined with
-
those that it assignes to it. So if the
process has a read/write grant it can
-
create a read-only or write-only grant,
but it cannot, if it only has a read
-
grant, it cannot add write rights to it
for a different process, obviously. So
-
then there is also some big differences
from MINIX. In MINIX you address a process
-
by its process ID or thread ID with a
generation number attached to it. In the
-
ME you can actually address IPC to a file
descriptor. Kernel doesn't actually know a
-
lot about file descriptors, it just
implements the basic thing where you have
-
a list of files and each process has a
list of file descriptors assigning integer
-
numbers to those files to refer to them
by. And this is used so you can as a
-
process, you can actually directly talk to
a device driver without knowing what is
-
process ID is. So you don't send it to the
file system server, you send it to the
-
file descriptor or the Kernel just
magically corrects it for you. And they
-
moved select into the kernel so you can
tell the kernel: "Hey, I want to wait till
-
the file system server tells me that it
has not available or till a message comes
-
in." This is one of the most complicated
system calls the ME offers that's used in
-
a normal program. You can mostly ignore it
and just look like: "Hey, those arguments
-
sort of define a file descriptor set as a
bit field." And then there's the message
-
that might have been received and there's
DMA locks because you don't just want to
-
write to registers. You actually might
want to do the direct memory access from
-
hardware so you you can actually tell the
kernel to lock one of these memory grounds
-
in RAM for you, it won't be swapped out
anymore. And yeah, it will even tell you
-
the physical address so you can just load
that into a register and it's not really
-
that complicated. Just lock it, get a
physical access, write into the register
-
and continue. Well, that's the most
important stuff about the operating
-
system. The hardware itself is a lot more
complicated because the operating system,
-
once you have the code, you can just
reverse engineer it and get to know it.
-
The hardware. Well, let's just say it's a
real pain to have to reverse engineer a
-
piece of hardware together with its
driver. Like if you've got the driver
-
code, but you don't know what the
registers do. So you don't know what a lot
-
of logic does. And you're trying to both
figure out what the logic is and what the
-
actual registers do. Right. So first you
want to know which physical address goes
-
where? The metadata listings I showed you
actually have names in there. Those are
-
not in the metadata files themself, I
annotated those. So you just see the
-
physical address and size. But there is
one module, the bus driver module and the
-
bus driver is normal user process, but it
implements stuff like PCI configuration
-
space accesses and those things. And it
has a nice table in it with names for
-
devices. So if you just run strings on it,
you'll see these things. When I saw this,
-
I was was pretty glad because at least I
could make sense what device was being
-
talked to in a in a certain program. So
the bus driver does all these things. It
-
manages power getting to devices, it
manages configuration space access, it
-
manages the different kinds of buses and
IOMU that are on the system. And it makes
-
sure that the normal driver never has to
know any of these details. It just asked
-
it for a device by a number assigned to it
a build time. And then the bus driver
-
says, OK, here's a range of physical
address space you can now write to. So
-
that's a really nice abstraction and also
gives us a lot of information because the
-
really old builds for sunrise point
actually have a hell of a lot of debug
-
strings in there as printf format strings,
not as catalogue ID. It's
-
one of the only pieces of code for the ME
that does this, so that already tells you
-
a lot. And then there's also the table
that I just talked about that has the
-
actual info on the devices and names. So I
generated some DocuWiki content from this
-
that I use myself and this is what's in
the table, part of it. So it tells you
-
what address PCI configuration space lives
at. That tells you to do the bus device
-
function for it through that. It tells you
on what chipset SKU they're present using
-
a bitfield. And it tells you their names
in different fields. It also contains the
-
values that are used to write the base
address registers for PCI. So also their
-
normal memory ranges. And there's even
more devices. So the ME has access to a
-
lot of stuff. A lot of it is private to
it. A lot of it is components that also
-
exist in the rest of the computer. And
there's not a lot of information. A lot of
-
these are basically all the things that
are out there together with conference
-
slides published by other people who have
done research on the ME. I didn't have
-
time to add links to those, but they're
easy to find on Google. I'll get later to
-
this, I actually wrote a emulator for the
ME, a partial emulator to be able to run
-
ME code and analyze it, which obviously
needs to know a bit about the hardware so
-
you can look at the app. There is some
files in Intel's debugger package,
-
specific versions of that that have really
detailed info on some of the devices, also
-
not all of it. And I wrote some tool to
parse some of the files. It's really rough
-
code. I published it because people wanted
to see what I was doing. It doesn't work
-
out of the box. And there is a nice talk
on this by Mark Ermolov and Maxim
-
Goryachy.. Actually I don't know if I'm
pronouncing that correctly, but they've
-
done a lot of work on the ME and this
particular talk by them is really useful.
-
And then there's also something else.
There is a second ME on server chipsets,
-
the innovation engine. It's basically a
copy paste of the ME to provide a ME that
-
the vendor can write code for. Don't think
it's used a lot. I've only been able to
-
find HP software that actually targets it
and that has some more debug strings, but
-
also not a lot, it mostly has a table
containing register names, but they're
-
really abbreviated and for a really small
subset of the devices, there is
-
documentation out there in a Pentium N and
J series datasheet. It's seems like they
-
compile their a lot of code or whatever
with the wrong defines because it doesn't
-
actually fit into the manual that well,
it's just a section that has like some 20
-
tables that shouldn't be in there. So this
is from that talk I just referenced and
-
it's a overview of the innovation engine
and the bus bridges and everything in
-
there. This isn't very precise. So based
on some of those files from System Studio,
-
I try to get a better understanding of
this, which is this. This is the entire
-
chipset. The little DMA block in the top
left corner is what connects to your CPU.
-
And all of the big blocks with a lot of
ports are our bus bridges or switches for
-
PCIexpress-like fabric. So there's a lot
going on. The highlighted area is the
-
management engine memory space and the
rest of it is like the global chipset. The
-
things I've highlighted in green hair are
on the primary PCI bus. So there's this
-
weird thing going on where there seems to
be two PCI hierarchies, at least
-
logically. So in reality it's not even
PCI, but on intel systems, there's a lot
-
of stuff that behaves as if it is PCI. So
it has like a bus device function and
-
numbers, PCI configuration space registers
and they have two different roots for the
-
configuration space. So even though the
configuration space address includes a bus
-
number, they have two completely different
things with each. Each of which has its
-
own bus zero. So that's that's weird also
because they don't make sense when you
-
look at how the hardware is laid out. So
this is stuff that's on the primary PCI
-
configuration space that's directly
accessed by the EM, by the north bridge on
-
the ME CPU. So that's the minute I A
system agent. System agent is what Intel
-
calls a Northbridge nowadays, now that
it's not a separate chip anymore. It's
-
basically just a Northbridge and a crypto
unit that's on there and the stuff that's
-
directly attached to Northbridge being the
ROM and the RAM. So the processor itself
-
is, as I said, derived from a 486, but it
does actually have some more modern
-
features that it does CPU ID, at least on
my systems. Some other researchers said
-
theirs didn't. It's basically the core
that's in the quark MCU, which is really
-
great because it's one of the only cores
made by Intel that has public
-
documentation on how to do run control. So
breakpoints and accessing registers and
-
everything over JTAG. Intel doesn't
publish this stuff except for the quark
-
MCU, because they were targeted makers.
But they reused that in here, which is
-
really useful. It even has an official
port to the OpenOCD debugger, which I have
-
not gotten to test because I don't have a
JTAG probe, which is compatible with Intel
-
voltage levels and supported by OpenOCD
and also has like a set CPU ID and MSRs.
-
It has some really fancy features like
branch tracing and some more strict paging
-
permission enforcement stuff. They don't
use the interrupt pins on this. So it's an
-
IP block but if there are some files out
there, that's where it is this screenshot
-
is from, that actually are used by a
built in logic analyzer Intel has on the
-
chipset and you can select different
signals on the chip to to watch, which is
-
a really great source of information on
how the IP blocks are laid out and what
-
signals are in there, because you
basically get a tree view of the IP blocks
-
and chip and some of their signals. They
don't use the legacy interrupt system,
-
they only use message based interrupts by
what a device writes a value into a
-
register on the interrupt controller
instead of asserting a pin. And then there
-
is the Northbridge. It's partially
documented in that data sheet I mentioned,
-
it does support x86 IO address space, but
it's never used. Everything in the ME is
-
in memory space or expose as memory space
through bridges, in the Northbridge
-
implements access to the ROM,RAM, it has a
IOMMU which is only used for transactions
-
coming from the rest of the system and
it's always initialized to, at least in
-
the firmware I looked up, it's always
initialized to the inverse of the page
-
table, so linear addresses can be used for
memory maps, sorry, for DMA. It also does
-
PCI configuration space access to the
primary PCI bus. And it has a firewall
-
that allows the operating system to deny
any IP block in the chipset from sending a
-
completion on the bus request. So it can
actually say: "Hey, I want to read some
-
register and only these devices are
allowed to send me value for it." So
-
they've actually thought about security
here, which is great. Then there is one of
-
the most important blocks in the ME, which
is the crypto engine. It does some sort of
-
more well-known crypto algorithms. AES,
SHA hashes, RSA and it has a secure key
-
store, which I'm not gonna [audio dropped]
... all about it in their ME talk at
-
Blackhat. And a lot of these things have
DMA engines, which all seem to be the
-
same. And there is no other DM agents ...
engines in ME, so this is also used from
-
memory to memory copy or DMA into other
devices. So that's used in a lot of
-
things. This is actually a diagram which I
don't have the vector for anymore. So
-
that's why the libre office background is
in there. I'm sorry. So this is basically
-
what that crypto engine looks like when
you look at that signal tree that I was
-
talking about earlier. The DMA engines are
both able to do memory to memory copies
-
until directly targets the crypto unit
they're part of. Basically, when you, I
-
don't know about the control bits that go
with this, but when you set the target
-
address to zero and the right control
bits, it will copy into the buffer that's
-
used for the encryption. So that is how it
accelerates memory access for crypto. And
-
these are the actual register offsets.
They're the same for all of the DMA
-
engines in there relative to the base
address of the subunit they're in. And
-
then there's the second PCI bus or bus
hierarchy, which is like in some places
-
called the PCI fixed bus. I'm actually not
entirely sure whether this is actually
-
implemented as a PCI bus as I've drawn it
here, but this is what it behaves like. So
-
it has all the ME private stuff, that's
not a part of the normal chipset. So it's
-
timers for the ME, it has the
implementation of the secure enclave
-
stuff, that the firmware TPM registers.
And it has the gen device which I've
-
mostly ignored because it's only used the
boot time. It's only used by the actual
-
boot ROM for the ME mostly. It is what the
ME uses to get the fuses Intel burns. So
-
that's the intel public key, whether it's
a production or pre-production part, but
-
it's pretty much a black box. It's not
used that much, fortunately. There is the
-
IPC block which allows the ME to talk to
the sensor hub, which is a different CPU
-
in the chipset. It allows it to talk to
power management controller and all kinds
-
of other embedded CPUs. So it's inter
processor communication not interprocess.
-
Confused me for a bit. And here's the host
embedded controller interface, which is
-
how the ME talks to the rest of the
computer when it wants the computer to
-
know that it's talking so it can directly
access a lot of stuff. But when it wants
-
to send a message to the EFI or to Windows
or Linux, it'll use this. And it also has
-
status registers, which are really simple
things where the ME writes in a value. And
-
even if the ME crashes, the host can still
read the value, which is how you can see
-
whether the ME is running, whether it's
disabled, whether it fully booted, or
-
whether it crashed halfway through. But at
a point where it could still get the rest
-
of the computer running and there is some
corporate code to to read it. I've also
-
implemented some decoding for it on the
emulator because it's useful to see what
-
those values mean. So then there's
something really interesting, the primary
-
adverse translation table, which is the
bus bridge that allows the ME to actually
-
access the PCIexpress fabric of the
computer. For a lot of the, what in this
-
table call ME peripherals, that are
actually outside the ME domain and the
-
chipset, it uses this to access it. It
also uses it to access the UMA, which is
-
an area of host RAM that's used as a swap
device for the ME and to Trace Hub, which is
-
the debug port, but also has a couple of
windows which allow the ME to access any
-
random area of host RAM, which is the most
scary bit because UMA is specified by
-
host, but the host DRAM area is where you
can just point it anywhere. You can read
-
or write any value that that Windows or
Linux or whatever you're running has
-
sitting there. So that's scary to me. So
and then there's the rest of it, the rest
-
of the devices which are behind the
primary ATT. And that's a lot of stuff,
-
that's debug, that's also the older normal
peripherals that your P.C. has, but it
-
also includes things like the power
management controller, which actually
-
turns on and off all the different parts
of your computer. It controls clocks and
-
resets. So this is really important. There
is a concept that you'll come across where
-
you're reading Intel manuals or ME related
stuff that's root spaces besides your
-
normal addressing information for a PCI
device, it also has a root space number,
-
which is basically how you have a single
PCI device exposing two completely
-
different address spaces. And it's 0 for
the host, it's one for the ME. Some
-
devices expose the same information on
there. Other ones behave completely
-
different. That's something you don't
usually see. And then there's the side
-
band fabric. So besides all this stuff
they just covered, which is PCI like at
-
least. There is also something completely
different, side band fabric, which is a
-
completely packet switched network, where
you don't use any memory mapping by
-
default. You just have a one byte address
for a device and some other addressing
-
fields and you're just sending a message
saying: "Hey, I want to read configuration
-
or data or memory." And there is actually
a lot of information out there on this,
-
because Intel, it seems like I just copy
pasted their internal specification into a
-
patent. This is how you address it. This
is all devices on there, which is quite a
-
lot. It's also what you, if any of you are
kernel developers, and you've had to deal
-
with GPIO on Intel SoCs. There's this P2SB
device that you have to use. That's what
-
the host uses to access this. Their
documentation on it is really, really bad.
-
This was all done using static analysis.
But then I wanted to figure out how some
-
of the logic actually works and it was
really complicated to play around with the
-
ME. There was this nice talk by Ermolov
and Goryachy, where they said: "You know,
-
we found a an exploit that gives you code
execution and you can you can get JTAG
-
access to." It sounds really nice. It's
actually not that easy. So arbitrary code
-
execution in the BUP module, they actually
describe their exploit and how you should
-
use it. But they didn't describe anything
that's needed to actually implement that.
-
So if you want to do that, what you need
to do to figure out where to stack lives,
-
you need to know where you need to write a
payload that will actually get it from a
-
buffer overflow on a stack that, by the
way, uses stack cookies. So you can't just
-
overwrite the return address to turn that
into an arbitrary write. And you need to
-
find out what the return pointer address
is so you can overwrite it and find ROP
-
gadgets because the stack is not
executable. And then when you've done
-
that, you can just turn on debug access or
change to custom firmware or whatever. So
-
what I did is I had a bit of trouble
getting that running and in order to test
-
your payload, you have to flash it into
the system and it takes a while and then
-
the system just doesn't power on if the
ME's not working, if you're crashing it
-
instead of getting code execution. So it's
not really valuable to to develop it that
-
way, I think. Some people did. I respect
that because it's really, really hard. And
-
then I wrote this ME Loader, it's called
Loader because at first I started out like
-
writing it as a sort of a wine thing where
you where you would just mmap the right
-
ranges at the right place and jump into
it, execute it, patch some system calls.
-
But because the ME is a micro kernel
system in almost every user space program
-
accesses hardware directly, it ended up
implementing like a good part of the
-
chipset, at least as stubs or enough logic
to get the code running. And I later on
-
added some features that actually allowed
to talk to the hardware. I can use it as a
-
debugger, but just because it's actually
running the ME firmware or parts of it
-
inside a normal Linux process, I can just
use gdb to debug it. And back in April
-
last year, I got that working to the point
where I could run the bootstrap process,
-
which is where the vulnerability is. And
then you just develop the exploit against
-
it, which I did. And then I made a mistake
cleaning up some old change root
-
environments for close source software.
And I nuked my home dir. Yeah. I hadn't
-
yet pushed everything to GitHub. So I
stuck with an old version and I decided,
-
you know, let's refactor this and turn it
into something that might actually at some
-
point be published, which by the way I
did last summer. This is all public code. The
-
ME Loader thing. It's on GitHub. And
someone else beat me to it and replicated
-
that exploit by the Russian guys. Which up to
then they have produced a proof of concept
-
thing for Apollo like chipsets, which were
completely different for from what you had
-
to do for normal ME. I was a bit
disappointed by that one, not being the
-
first one to actually replicate this. But
then I did about a week later, I got it
-
got my loader back to the point where I
could actually get to the vulnerable code
-
and develop that exploit and got it
working not too long after. And here's the
-
great thing. Then I went to the hacker
space. I flash it into my laptop. The
-
image that I had just been using only on
the emulator. I didn't change it. I flash.
-
I was like, this is never gonna work on
it. It works. some laughter And I've still got an image
-
on a flash ship with me because that's
what I used to actually turn on the
-
debugger. And then you need a debug probe
because that USB based debugging stuff
-
that's mentioned here only works pretty
late in boot. Which is also why I only
-
really see Apollo Lake stuff because on
those chipsets you can actually use this
-
for the ME. And then you need this thing
because there's a second channel, that is
-
using the USB plug, but it's a completely
different physical layer and you need an
-
adapter for it, which I don't think was
intended to be publicly available. Because
-
if you go to Intel site to say, I want to
buy this, they say, here's the C-NDA,
-
please sign it. But it appeared on mouser.
And luckily I knew some people, who had
-
done some other stuff, got a nice bounty
for it and bought it and I let me use it.
-
Thanks to them. It's expensive, but you
can buy it if it's still up there. Haven't
-
checked. That's the Link. So I'm a bit
late, so I'm gonna use the time for
-
questions as well. So the main thing the
ME does that you cannot replace is the
-
boot process. It's not just breaking the
system. If you don't turn it on, it
-
actually does stuff that has to be done.
So you gonna have to use the ME anyway if
-
you want to boot a computer. I don't
necessarily have to use Intel's firmware.
-
The ME itself boots is like a micro kernel
system, so it has a process which
-
implements a lot of the servers that will
allow it to get to a point where it can
-
start those servers. This process has very
high privileges in older versions, which
-
is what is being used on these chipsets.
And if you exploit that, you're still ring
-
3, but you can turn on debugger and you
can use the debugger to become ring 0. So
-
this is what normal boot process for a
computer looks like. And this is what
-
happens when you use Boot Guard. There's a
bit of code that runs even before the
-
reset vector, and that's started by micro
code initialization, of course. And this
-
is what actually happens. The ME loads a
new firmware into a power management
-
controller, it then ready some stuff in a
chipset and it tells the power mentioning
-
controller like please stop pulling that
CPU reset pin low and the CPU will start.
-
Power managment controller is a completely
independent thing I say 8051 derived
-
microcontroller that runs a real time
operating system from the 90s. This is the
-
only string in the firmware by the way,
that's quoted there. And depending on the
-
chipsset that you have, it's either loaded
with a patch or with a complete binary
-
from the ME, and it does a lot of
important stuff. No documentation on it
-
besides ACPI interface, which is not
really any useful. The ME has to do these
-
things. It needs to load the keys for the
Boot Guard process needs to set up clock
-
controllers and then tell the PMC to turn
on the power to to the CPU. It needs to
-
configure PCI express fabric and reset -
like get the CPU to come out of reset.
-
There's a lot of code involved in this, so
I really didn't want to do this all
-
statically. What I did is I added hardware
support, hardware passthrough support to
-
the emulator and booted my laptop that
way. Actually had a video of this, but I
-
don't have the time to show it, which is a
pity. But this is what I - the bring up
-
process from the ME running in a Linux
process, sending whatever hardware access
-
as it was trying to do that are important
for boot to the debugger. And then that
-
was using a ME in real hardware that was
halted to actually do to register accesses
-
and it works. It's not going to show this.
It actually booted the computer reliably.
-
Then Boot Guard configuration is fun
because you know where they say they fuse
-
in the keys. Well yeah. But the ME loads
them from fuses and then manually loads
-
them into registers. So if you have code
execution on the ME before it does this,
-
you can just load your own values and you
can run core boot even on a machine that
-
has Boot Guard. Yeah. So I'm gonna go
through this really quickly. This is, by
-
the way, these are the registers that
configure what security model the CPU is
-
gonna enforce for the firmware. I'm going
to release this code after my talk. It's
-
part of a Python script that I wrote that
uses the debugger to start the CPU without
-
ME firmware. I traced all the of the ME
firmware did. And I now have a Python
-
script that can just start a computer
without Intel's code. If you translate
-
this into a rough sequence or even into
binary for the ME, you can start a
-
computer without the ME itself or at least
without it running the operating system.
-
applause
So, yeah, future goals. I really do want
-
to share this because if there is a way to
escalate, to ring 0 fruit, a rope chain,
-
then you could just start your own kernel
in the ME and have custom firmware, at
-
least from the vulnerability on. But you
could also build a mod chip that uses the
-
debugger interface to load a new firmware.
There's lots of stuff still needs to be
-
discovered, but I'm gonna hang out at the
open source firmware village later, at
-
least part of the week here. So because I
really want to get started on open source
-
ME firmware using this. Right. And there's
a lot of people that's played a role in
-
getting me to this point. Also would like
to thank the guy from Hague hacker space,
-
BinoAlpha, who basically allowed me to use
his laptop to prepare the demo, which I
-
ended up not being able to show, but.
Right. I was gonna ask what are the
-
worrying questions? But I don't think
there's really any time for any more.
-
Herald: Peter, thank you so much. Applause
Unfortunately, we don't have any more time
-
left.
Peter: I'll be around. I'll be around.
-
Herald: I think it's very, very
interesting because I hope that your talk
-
will inspire many people to keep looking
into how the management engine works and
-
hopefully uncover even more stuff. I think
we have time for just one single question.
-
I don't know, do we? How one from the
Internet. Thank you so much.
-
Signal Angel: OK. First off, I have to
tell you. Your shirt is nice. Chat wanted
-
me to say this. And they asked how
reliable this exploit is and does it work
-
on every boot?
Peter: Right, Yeah. That's actually
-
something really important that I forgot
to mention. So they patch a vulnerability,
-
but they didn't provide downgrade
protection. If you could flash a
-
vulnerable image with an exploit in it,
it'll just boot every time on these chips
-
that's so six or seven generation chips
that's put in that image and it will
-
reliably turn on the debugger every time
you turn on the computer. applause
-
Herald: Thank you so much for the
question. And Peter Bosch thank you so
-
much. Please give him a great round of
applause.
-
applause
-
subtitles created by c3subtitles.de
in the year 20??. Join, and help us!