1
00:00:00,000 --> 00:00:19,152
36C3 preroll music
2
00:00:19,152 --> 00:00:22,520
Herald: The next talk is an intel
management engine, deep dive.
3
00:00:22,520 --> 00:00:27,230
Understanding the ME at the OS and
hardware level and it is by Peter Bos,
4
00:00:27,230 --> 00:00:31,089
Please welcome him with a great round of
applause!
5
00:00:31,089 --> 00:00:38,780
Applause
6
00:00:38,780 --> 00:00:49,409
Peter Bosch: Right. So everybody. Harry.
Nice. OK. So welcome. Well, this is me.
7
00:00:49,409 --> 00:00:59,510
I'm a student at Leiden University. Yeah,
I've always been really interested in how
8
00:00:59,510 --> 00:01:04,610
stuff works. And when I got a new laptop,
I was like, you know, how does this thing
9
00:01:04,610 --> 00:01:08,410
really boot? I knew everything from reset
vector onwards. I wanted to know what
10
00:01:08,410 --> 00:01:15,221
happened before it. So first I started
looking at the boot guard ACM. While
11
00:01:15,221 --> 00:01:21,420
looking through it, I realized that not
everything was as it was supposed to be.
12
00:01:21,420 --> 00:01:26,280
That led to a later part in the boot
process being vulnerable, which ended up
13
00:01:26,280 --> 00:01:34,249
being discovered by me. And I found out
here last year that I wasn't the only one
14
00:01:34,249 --> 00:01:38,310
to find it. Trammell Hudson also found it,
and we reported it together, presented it
15
00:01:38,310 --> 00:01:43,399
at Hack in the Box. And then at the same
time, I was already also looking at the
16
00:01:43,399 --> 00:01:49,350
management engine. Well, there had been a
lot of research done on that before. The
17
00:01:49,350 --> 00:01:58,140
public info was mostly on the file system
and on specific vulnerabilities, which
18
00:01:58,140 --> 00:02:04,400
still made it pretty hard to get started
on reverse-engineering it. So that's why I
19
00:02:04,400 --> 00:02:10,340
thought it might be useful for me to
present this work here. It's basically
20
00:02:10,340 --> 00:02:16,910
broken up into three parts. The first bit
is just a quick introduction into the
21
00:02:16,910 --> 00:02:22,250
operating system it runs. So if you want
to work on this yourself, you're more
22
00:02:22,250 --> 00:02:28,690
easily able to understand whats in your
face in your Disassembler. So and then
23
00:02:28,690 --> 00:02:37,950
after that, I'll go over its role in the
boot process and then also how this
24
00:02:37,950 --> 00:02:45,780
information can be used to to start
developing a new firmware for it or do
25
00:02:45,780 --> 00:02:49,730
more security research on it. So first of
all, what exactly is the management
26
00:02:49,730 --> 00:02:57,280
engine? There's been a lot of fuss about
it being a backdoor and everything, in
27
00:02:57,280 --> 00:03:05,000
reality, if it is or not depends on the
software that it runs. It's basically a
28
00:03:05,000 --> 00:03:09,110
processor with his own RAM and his own IO
and MMUs and everything's sitting inside
29
00:03:09,110 --> 00:03:16,049
your south ridge. It's not in the CPU,
It's in its outreach. So when I say this
30
00:03:16,049 --> 00:03:24,010
is gonna be about the sixth and seventh
generation of Intel chips, I mean, mostly
31
00:03:24,010 --> 00:03:28,489
motherboards from those generations. If
you run a newer CPU on it, it will also
32
00:03:28,489 --> 00:03:39,584
work for that. So yeah. Bit more detail.
CPU it runs is based on the 80486, which,
33
00:03:39,584 --> 00:03:43,510
you know, is funny. It's quite an old CPU
you and it's still being used in almost
34
00:03:43,510 --> 00:03:51,079
every computer nowadays. So it has a
little bit of its own RAM. It has quite a
35
00:03:51,079 --> 00:03:58,150
bit of built in ROM, has a hardware
accelerated cryptographic unit and it has
36
00:03:58,150 --> 00:04:05,450
fuses which are right once memory is used
to store security settings and keys and
37
00:04:05,450 --> 00:04:11,079
everything. Some of the more scary
features it has: Bus bridges to all of the
38
00:04:11,079 --> 00:04:16,419
buses inside the south ridge, it can
access the RAM on the CPU and it can
39
00:04:16,419 --> 00:04:21,359
access the network, which makes it really
quite dangerous. If there is a
40
00:04:21,359 --> 00:04:28,409
vulnerability or if it runs anything
nefarious and it's tasks nowadays include
41
00:04:28,409 --> 00:04:35,860
starting the computer as well as adding
management features. This is mostly used
42
00:04:35,860 --> 00:04:41,190
in servers where it can serve as a board
management controller, do like a remote
43
00:04:41,190 --> 00:04:49,001
keyboard and video and it does security
boot guard, which is the signing of a
44
00:04:49,001 --> 00:04:54,830
firmware and verification of signatures.
It implements a firmware TPM and there is
45
00:04:54,830 --> 00:05:02,590
also a SDK to use it as a general purpose
secure enclave. So on the software side of
46
00:05:02,630 --> 00:05:12,650
it, it runs a custom operating system,
parts of which are taken from MINIX, the
47
00:05:12,650 --> 00:05:17,250
teaching operating system by Andrew
Tanenbaum. It's a micro kernel operating
48
00:05:17,250 --> 00:05:32,930
system. It runs binaries that are in a
completely custom format. It's really
49
00:05:32,930 --> 00:05:36,030
quite high level system actually. If you
look at it in terms of the operating
50
00:05:36,030 --> 00:05:40,681
system, it runs, it's mostly like Unix,
which makes it kind of familiar, but it
51
00:05:40,681 --> 00:05:46,819
also has large custom parts. Like I said
before in this talk, I'm going to be
52
00:05:46,819 --> 00:05:52,740
speaking about sixth and seventh
generation Intel core chipsets, so that's
53
00:05:52,740 --> 00:05:58,949
Sunrise Point. Lewisburg, which is the
server version of this and also the laptop
54
00:05:58,949 --> 00:06:04,410
system on a chip they're just called Intel
core low power. They also include the
55
00:06:04,410 --> 00:06:08,360
chipset as a separate die. So it also
applies to them. In fact, I've been
56
00:06:08,360 --> 00:06:11,979
testing most of this stuff. I'm going to
tell you about on the laptop that's
57
00:06:11,979 --> 00:06:19,430
sitting right here, which is a Lenovo T
460. The version of the firmware I've been
58
00:06:19,430 --> 00:06:30,820
looking at is 11001205. Right. So I do
need to put this up there. I'm not a part
59
00:06:30,820 --> 00:06:38,520
of Intel, nor have I signed any contracts
to them. I've found everything in ways
60
00:06:38,520 --> 00:06:43,500
that you could also do. I didn't have any
leaked NDA stuff or anything that you
61
00:06:43,500 --> 00:06:53,099
couldn't get your hands on. It's also a
very wide subject area, so there might be
62
00:06:53,099 --> 00:07:00,580
some mistakes here or there, but generally
it should be right. Well, if you want to
63
00:07:00,580 --> 00:07:04,220
get started working on an ME firmware,
want to reverse-engineer it or modify it
64
00:07:04,220 --> 00:07:08,580
in some way first, you've got to deal with
the image file. You've got your SPI flash.
65
00:07:08,580 --> 00:07:12,009
It's where most of its firmware lives in
the same flash chip as your BIOS. So
66
00:07:12,009 --> 00:07:17,410
you've got that image. And then how do you
get the code out? Well, there's tools for
67
00:07:17,410 --> 00:07:22,949
that. It's already been extensively
documented, documented by other people.
68
00:07:22,949 --> 00:07:28,681
And you can basically just download a tool
and run it against it. Which makes this
69
00:07:28,681 --> 00:07:31,690
really easy. This is also the reason why
there hasn't been a lot of research done
70
00:07:31,690 --> 00:07:35,940
yet before these tools were around. You
couldn't get to all of the code. The
71
00:07:35,940 --> 00:07:41,349
kernel was compressed using Huffman
tables, which were stored in ROM. You
72
00:07:41,349 --> 00:07:45,360
couldn't get to the ROM without getting
code execution on the thing. So there was
73
00:07:45,360 --> 00:07:52,639
basically no way of getting access to the
kernel code. And I think also to see some
74
00:07:52,639 --> 00:07:55,800
library. But that's not a problem anymore.
You can just download a tool and unpack
75
00:07:55,800 --> 00:08:02,520
it. Also, the intel tool to generate
firmware images, which you can find in
76
00:08:02,520 --> 00:08:11,979
some open directories on the internet, has
Qt resources, XML-files which basically have the
77
00:08:11,979 --> 00:08:18,330
description for all of the file formats
used by these ME versions, including names
78
00:08:18,330 --> 00:08:26,050
and comments to go with those structured
definitions. So that's really useful. So
79
00:08:26,050 --> 00:08:30,430
we look at one of these images. It has a
couple of partitions, some of them overlap
80
00:08:30,430 --> 00:08:38,150
and some of them are storage, some are
code. So there is the main partitions,
81
00:08:38,150 --> 00:08:45,709
FTPR and NFTP, which contain the programs
it runs. There's MFS, which is the read-write
82
00:08:45,709 --> 00:08:51,980
file system it uses for persistent
storage. And then there is a log to flash
83
00:08:51,980 --> 00:08:57,320
option, the possibility to embed a token
that will tell the system to unlock all
84
00:08:57,320 --> 00:09:02,850
debug access which has to be signed by
Intel so it's not really of any use to us.
85
00:09:02,850 --> 00:09:07,439
And then there is something interesting,
ROM bypass. Like I said, you can't get
86
00:09:07,439 --> 00:09:13,160
access to the ROM without running code on
it. And ROM is mask ROM. So it's internal
87
00:09:13,160 --> 00:09:17,540
to the chip, but Intel has to develop new
ROM code and have to test it without
88
00:09:17,540 --> 00:09:23,270
respinning the die every time. So they
have a possibility on a unlocked
89
00:09:23,270 --> 00:09:28,170
preproduction chipset to completely bypass
the internal ROM and load even the early
90
00:09:28,170 --> 00:09:33,670
boot code from the flash chip. Some of
these images have leaked and you can use
91
00:09:33,670 --> 00:09:39,250
them to get a look at the ROM code, even
without being able to dump it. That's
92
00:09:39,250 --> 00:09:45,610
going to be really useful later on. So
then you've got these code partitions and
93
00:09:45,610 --> 00:09:51,230
they contain a whole lot of files. So
there is the binaries themselves which
94
00:09:51,230 --> 00:09:57,569
don't have any extension. There is the
metadata files. So the binary format they
95
00:09:57,569 --> 00:10:05,350
use has no headers, nothing included. And
all of that data is in the metadata file.
96
00:10:05,350 --> 00:10:12,000
And when you use the unME11 tool, you can
actually, it'll convert those to text
97
00:10:12,000 --> 00:10:16,069
files for you so you can just get started
without really understanding how they
98
00:10:16,069 --> 00:10:26,640
work. Yes. So the metadata. It's type-
length-value structure, which contains a
99
00:10:26,640 --> 00:10:31,180
whole lot of information the operating
system needs. It has the info on the
100
00:10:31,180 --> 00:10:35,820
module, whether it's data or code, where
it should be loaded, what the privileges
101
00:10:35,820 --> 00:10:43,390
of the process should be, a SHA
checksum for validating it and also some
102
00:10:43,390 --> 00:10:49,000
higher level stuff such as device file
definitions if it's a device driver or any
103
00:10:49,000 --> 00:10:55,430
other kind of server. I've actually
written some code that uses this, that's
104
00:10:55,430 --> 00:11:01,460
on GitHub, so if you want a closer look at
it, some of the slides have a link to to
105
00:11:01,460 --> 00:11:09,780
get a file in there which contains the
full definitions. Right. So all the code
106
00:11:09,780 --> 00:11:16,801
on the ME is signed and verified by Intel.
So you can't just go and put in a new
107
00:11:16,801 --> 00:11:24,689
binary and say, hey, let's run this. The
way they do this is in Intel's
108
00:11:24,689 --> 00:11:30,300
manufacture-time fuses, they have a hash
of the public key that they use to sign
109
00:11:30,300 --> 00:11:36,070
it. And then on each flash partition,
there is a manifest which is signed by the
110
00:11:36,070 --> 00:11:40,820
key and it contains the SHA hashes for all
the metadata files, which then contain a
111
00:11:40,820 --> 00:11:47,150
SHA hash for the code files. It doesn't
seem to be any major problems in verifying
112
00:11:47,150 --> 00:11:52,530
this, so it's useful to know, but it's
you're not really gonna use this. And then
113
00:11:52,530 --> 00:12:00,300
the modules themself, as I've said,
they're flat binaries. Mostly. The
114
00:12:00,300 --> 00:12:05,560
metadata contains all the info the kernel
uses to reconstruct the actual program
115
00:12:05,560 --> 00:12:13,530
image in memory. And a curious thing here
is that the actual base address for all
116
00:12:13,530 --> 00:12:17,459
the modules for old programs is the same
across an image. So if you have a
117
00:12:17,459 --> 00:12:19,930
different version, it's going to be
different. But if you have two programs
118
00:12:19,930 --> 00:12:25,949
from the same firmware it's gonna be
loaded at the same virtual address. Right.
119
00:12:25,949 --> 00:12:32,820
So when you want to look at it, you're
gonna load it in some disassembler, like
120
00:12:32,820 --> 00:12:39,540
for example IDA, and you'll see this, it
disassembles fine, but it's gonna
121
00:12:39,540 --> 00:12:44,270
reference all kinds of memory that you
don't have access to. So usually you'd
122
00:12:44,270 --> 00:12:49,459
think maybe I've loaded up a wrong address
or or am I missing some library? Well,
123
00:12:49,459 --> 00:12:55,150
here you've loaded it correctly if you use
that, the address from the metadata file.
124
00:12:55,150 --> 00:13:02,310
But you are in fact missing a lot of
memory segments. And let's just take a
125
00:13:02,310 --> 00:13:09,829
look at each of these. It's calling and
switching code. It's pushing a pointer
126
00:13:09,829 --> 00:13:15,890
there, which is data. And what's that? So
it has shared libraries, even though it's
127
00:13:15,890 --> 00:13:19,920
flat binaries. It actually does use shared
libraries because you only have 1.5
128
00:13:19,920 --> 00:13:24,319
megabyte of RAM. You don't want to
link your C library into everything and
129
00:13:24,319 --> 00:13:32,800
waste what little memory you have. So
there is the main system library which is
130
00:13:32,800 --> 00:13:39,270
like libc on a Linux system. It's in a
flash partition, so you can actually just
131
00:13:39,270 --> 00:13:45,689
load it and take a look at it easily and
it starts out with a jump table. So
132
00:13:45,689 --> 00:13:48,770
there's no symbols in the metadata file or
anything. It doesn't do dynamic linking.
133
00:13:48,770 --> 00:13:56,549
It loads the pages for the shared library
at a fixed address, which is also in the
134
00:13:56,549 --> 00:14:01,620
shared library's metadata. And then it's
just there in the processor's memory and
135
00:14:01,620 --> 00:14:06,130
it's gonna jump there if it needs a
function. And the functions themself are
136
00:14:06,130 --> 00:14:12,890
just using the normal System V, x86
calling conventions. So it's pretty easy
137
00:14:12,890 --> 00:14:17,980
to look at that using your normal tools.
There's no weird register argument passing
138
00:14:17,980 --> 00:14:24,559
going on here. So, right. Now, shared
libraries. There's two of them. And this
139
00:14:24,559 --> 00:14:28,160
is where it gets annoying. The system
library, you've got access to that so you
140
00:14:28,160 --> 00:14:32,850
can just take your time and go through it
and try to figure out, you know, oh, hey,
141
00:14:32,850 --> 00:14:39,880
is this open or is this read or what's
this function doing? But then there's also
142
00:14:39,880 --> 00:14:49,150
another second really large library, which
is in ROM. They have all the C library
143
00:14:49,150 --> 00:14:54,300
functions and some of their custom helper
routines that don't interact with the
144
00:14:54,300 --> 00:15:00,920
kernel directly, such as strings
functions. They live in ROM. So when
145
00:15:00,920 --> 00:15:04,700
you've got your code and this is basically
where I was when I was here last year,
146
00:15:04,700 --> 00:15:07,040
you're looking through it and you're
seeing calls to a function you don't have
147
00:15:07,040 --> 00:15:11,010
the code for all over the place. And you
have to figure out by its signature what
148
00:15:11,010 --> 00:15:14,870
is it doing. And that works for some of
the functions and it's really difficult
149
00:15:14,870 --> 00:15:20,610
for other ones. That really had me stopped
for a while. Then I managed to find one of
150
00:15:20,610 --> 00:15:25,070
these ROM bypass images and I had the code
for a very early development build of the
151
00:15:25,070 --> 00:15:29,370
ROM. This is where I got lucky. So the
actual entry point addresses are fixed
152
00:15:29,370 --> 00:15:33,939
across a entire chipset family. So if you
have an image for the server version of
153
00:15:33,939 --> 00:15:39,310
like 100 series chipset or for client
version or for a desktop or laptop
154
00:15:39,310 --> 00:15:47,540
version, it's all gonna be the same ROM
addresses. So even though the code might
155
00:15:47,540 --> 00:15:51,930
be different, you'll have the jump table,
which means the addresses can say fixed.
156
00:15:51,930 --> 00:15:56,760
So this only needs to be done once. And in
fact when I upload my slides later, there
157
00:15:56,760 --> 00:16:02,919
is a slide in there at the end that has
the addresses for the most used functions.
158
00:16:02,919 --> 00:16:07,350
So you're not going to have to repeat that
work, at least not for this chipset. So if
159
00:16:07,350 --> 00:16:15,160
you want to look at a simple module,
you've loaded it, now you've applied the
160
00:16:15,160 --> 00:16:21,860
things I just said, and you still don't
have the data sections. If I don't know
161
00:16:21,860 --> 00:16:26,669
what that function there is doing, but
it's not very important. It actually
162
00:16:26,669 --> 00:16:33,230
returns a value, I think, that's not used
anywhere, but it must have a purpose
163
00:16:33,230 --> 00:16:40,220
because it's there. Right. So then you
look at the entry point and this is a lot
164
00:16:40,220 --> 00:16:44,660
of stuff. And the main thing that matters
here is on the right half of the screen,
165
00:16:44,660 --> 00:16:50,189
there is a listing from a MINIX repository
and on the left half there is a
166
00:16:50,189 --> 00:16:54,809
disassembly from an ME module. So it's
mostly the same. There is one key
167
00:16:54,809 --> 00:16:58,419
difference, though. The ME module actually
has a little bit of code that runs before
168
00:16:58,419 --> 00:17:06,230
its C library startup function. And that
function actually does all the ME specific
169
00:17:06,230 --> 00:17:13,980
initialization, does a lot of stuff
related to how C library data is kept
170
00:17:13,980 --> 00:17:21,520
because there is also no data segments for
the C library being allocated by the
171
00:17:21,520 --> 00:17:25,820
kernel. So each process actually reserves
a part of its own memory and tells the C
172
00:17:25,820 --> 00:17:31,290
library, like, any global variables you
can store in there. But when you look at
173
00:17:31,290 --> 00:17:37,610
that function, one of the most important
things that it calls is this function.
174
00:17:37,610 --> 00:17:41,510
It's very simple, it just copies a bunch
of RAM. So they don't have support for
175
00:17:41,510 --> 00:17:46,650
initialized data sections. It's a flat
binary. What they do is they they actually
176
00:17:46,650 --> 00:17:51,520
use the .bss segment, the zeroed segment
at the end of the address space, and copy
177
00:17:51,520 --> 00:17:57,070
over a bunch of data in the program. The
program itself is not aware of this. It's
178
00:17:57,070 --> 00:18:04,180
really in the initialization code and in
linker script. So this is also something
179
00:18:04,180 --> 00:18:09,170
that's very important because you're going
to need to also at that address in the
180
00:18:09,170 --> 00:18:13,310
data section, you're going to need to load
the last bit of the of the binary.
181
00:18:13,310 --> 00:18:20,520
Otherwise you're missing constants or at
least initialization values. Right. Then
182
00:18:20,520 --> 00:18:26,150
there is the full memory map to the
processes themselves. It's a flat 32 bit
183
00:18:26,150 --> 00:18:31,970
address space. It's got everything you
expect in there. It's got a stack and a
184
00:18:31,970 --> 00:18:39,500
heap and everything. There's a little bit
of heap allocated right on initialization.
185
00:18:39,500 --> 00:18:44,690
This is this is basically how you derive
the address space layout from the
186
00:18:44,690 --> 00:18:51,100
metadata, especially like the data
segment, then, and the stack itself is
187
00:18:51,100 --> 00:18:56,180
like the address location varies a lot
because of the number of threads that are
188
00:18:56,180 --> 00:19:03,380
in use or the size of data sections. And
also those stack guards, they're not
189
00:19:03,380 --> 00:19:07,960
really stack guards. There is also
metadata for each thread in there. But
190
00:19:07,960 --> 00:19:13,640
that's nothing that's relevant to the
process itself, only to the kernel. And
191
00:19:13,640 --> 00:19:21,890
well, if you then skip forward a bit and
you've done all these - you look at your
192
00:19:21,890 --> 00:19:28,790
simple driver like this. This is taken
from a driver used to talk to the CPU,
193
00:19:28,790 --> 00:19:34,630
like, OK. So when I say CPU or host, by
the way, I mean the CPU, like your big
194
00:19:34,630 --> 00:19:39,370
SkyLake, or KabyLake, or CoffeeLake,
whatever your big CPU that runs your own
195
00:19:39,370 --> 00:19:46,070
operating system. Right. So this is used
to to send messages there. But if you look
196
00:19:46,070 --> 00:19:51,680
at what's going on here, OK - think I had
a problem with the animation here - it
197
00:19:51,680 --> 00:19:57,000
sets up some stuff and then it calls a
library function that's in the main syslib
198
00:19:57,000 --> 00:20:01,270
library, which actually has a main loop
for the program. That's because Intel was
199
00:20:01,270 --> 00:20:06,440
smart and they added a nice framework for
device driver implementing programs,
200
00:20:06,440 --> 00:20:10,130
because it's a micro kernel, so device
drivers are just usual programs, calling
201
00:20:10,130 --> 00:20:20,060
specific APIs. Then there's normal POSIX
file I/O. No standard I/O, but it has all
202
00:20:20,060 --> 00:20:26,530
the normal open, and read, and ioctl and
everything functions. And then there's
203
00:20:26,530 --> 00:20:30,170
more initialization for the srv library.
And this is basically what all the simple
204
00:20:30,170 --> 00:20:38,890
drivers look like in it. And then there's
this. Because they're so low a memory,
205
00:20:38,890 --> 00:20:50,040
they don't actually use standard I/O, or
even printf itself to do most of the
206
00:20:50,040 --> 00:20:54,820
debugging. It uses a thing that's called
"sven", I'll touch on that later. So there
207
00:20:54,820 --> 00:20:59,150
is the familiar APIs that I talked about.
It even has POSIX threads, or at least a
208
00:20:59,150 --> 00:21:04,510
subset of it, and there is all the
functions that you'd expect to find on
209
00:21:04,510 --> 00:21:08,700
some generic Unix machine. So that
shouldn't be too much of a problem to do
210
00:21:08,700 --> 00:21:14,570
with, but then there's also their own
tracing solution, sven. That's what Intel
211
00:21:14,570 --> 00:21:17,350
calls it. The name is in all the development
tools that you can download
212
00:21:17,350 --> 00:21:23,370
from their site, and basically, they don't
include format strings for a lot of the
213
00:21:23,370 --> 00:21:28,390
stuff. They just have a 32-bit identifier
that is sent over debug port, and it
214
00:21:28,390 --> 00:21:34,270
refers to a format string in a dictionary
that you don't have. There is one of the
215
00:21:34,270 --> 00:21:38,820
dictionaries for a server chip that's
floating around the internet, but even
216
00:21:38,820 --> 00:21:45,940
that is incomplete. And the normal non-NDA
version of the Intel developer tools has
217
00:21:45,940 --> 00:21:53,810
some 50 format strings for really common
status messages it might output, but yeah,
218
00:21:53,810 --> 00:21:57,391
like, if you see these functions, just
realize it's doing some debug print. There
219
00:21:57,391 --> 00:22:00,550
might be dumping some states or just
telling it it's gonna do something else.
220
00:22:00,550 --> 00:22:12,020
It's no important logic actually happens
in here. Right. So then for device files.
221
00:22:12,020 --> 00:22:16,190
They're actually defined in a manifest.
When the kernel loads a program, and that
222
00:22:16,190 --> 00:22:20,830
program wants to expose some kind of
interface to other programs its manifest
223
00:22:20,830 --> 00:22:27,780
will contai,n or it's metadata file will
contain a special file producer entry, and
224
00:22:27,780 --> 00:22:33,120
that says, you know, you have these device
files, with a name, and an access mode and
225
00:22:33,120 --> 00:22:39,210
the user, and group ID, and everything,
and the minor numbers, and the kernel
226
00:22:39,210 --> 00:22:42,830
sends this to the- or not kernel- the
program loader sends this to the virtual
227
00:22:42,830 --> 00:22:47,720
file system server and it automatically
gets a device file, pointing to the right
228
00:22:47,720 --> 00:22:51,800
major or minor number. And then there's
also a library, as I said, to provide a
229
00:22:51,800 --> 00:23:03,680
framework for a driver. And that looks
like this. It's really easy to use. If you
230
00:23:03,680 --> 00:23:08,070
were a ME developer you just write some
callbacks for open, and close, and
231
00:23:08,070 --> 00:23:11,000
everything, and it automatically calls
them for you, when a message comes in,
232
00:23:11,000 --> 00:23:15,400
telling you that that happened, which also
makes it really easy to reverse engineer,
233
00:23:15,400 --> 00:23:21,100
'cause if you look at a driver, it just
loads some callbacks, and you can know, by
234
00:23:21,100 --> 00:23:27,510
their offset in a structure, what actual
call they're implementing. Right, so then
235
00:23:27,510 --> 00:23:31,950
there is one of the more weird things
that's going on here: How the actual
236
00:23:31,950 --> 00:23:37,470
userland programs get access to memory map
registers. There's a lot of this going on.
237
00:23:37,470 --> 00:23:42,830
Calls to a couple of functions that have
some magic arguments. The second one you
238
00:23:42,830 --> 00:23:50,640
can easily tell is the offset, because it
has- it increases in very nice power-of-
239
00:23:50,640 --> 00:23:54,670
two steps, so it's probably the register
offsets, and then what comes after it
240
00:23:54,670 --> 00:24:00,160
looks like a value. And then the first bit
seems to be a magic number. Well, it's
241
00:24:00,160 --> 00:24:05,479
not. There is also an extension in the
metadata, saying these are the memory
242
00:24:05,479 --> 00:24:12,170
mapped I/O ranges, and those ranges,
they'd each list a physical base address,
243
00:24:12,170 --> 00:24:19,360
and a size, and permissions for them. Then
the index in that list does not directly
244
00:24:19,360 --> 00:24:23,150
correspond to the magic value. The magic
value actually you need to do a little
245
00:24:23,150 --> 00:24:27,680
computation on the offset, and you can
access it through those functions. The
246
00:24:27,680 --> 00:24:38,600
computation itself might be familiar.
Yeah, so these are the functions. The
247
00:24:38,600 --> 00:24:44,610
value is a segment selector. So they use
them. Actually, don't use paging for inter
248
00:24:44,610 --> 00:24:51,820
process isolation, they use segments like
x86 Protected Mode segments. And for each
249
00:24:51,820 --> 00:24:56,610
memory mapped I/O range there is a
separate segments, and you manually specify
250
00:24:56,610 --> 00:25:04,280
that, which is just weird to me, like, why
would you use x86 segmenting on a modern
251
00:25:04,280 --> 00:25:10,610
system? Minix does it, but, yeah, to
extent that even to this? Luckily, normal
252
00:25:10,610 --> 00:25:16,130
address space is flat, like, to the
process, not to the kernel. Right, so now
253
00:25:16,130 --> 00:25:24,870
we can access memory mapped I/O. That's
all the, like the really high level stuff.
254
00:25:24,870 --> 00:25:28,700
So what's going on under there? It's got
all the basic microkernel stuff, so
255
00:25:28,700 --> 00:25:33,020
message passing, and then some
optimizations to actually make it perform
256
00:25:33,020 --> 00:25:40,140
well on a really slow CPU. The basics are,
you can send a message, you can receive a
257
00:25:40,140 --> 00:25:46,160
message, and you can send and receive a
message, where you basically say "Send a
258
00:25:46,160 --> 00:25:50,930
message, wait till a response comes in,
then continue", which is used to wrap
259
00:25:50,930 --> 00:25:58,400
function calls. This is mostly the same as
in Minix. There's some subtle changes,
260
00:25:58,400 --> 00:26:08,230
which I'll get to later. And then memory
grants are something that only appeared in
261
00:26:08,230 --> 00:26:13,080
Minix really recently. It's a way for a
process to basically create a new name for
262
00:26:13,080 --> 00:26:16,690
a piece of memory it has, and give a
different process access to it, just by
263
00:26:16,690 --> 00:26:21,630
sharing the number. These are referred to
by the process ID and a number of that
264
00:26:21,630 --> 00:26:28,470
range. So the process IDs are actually
local per process, so to uniquely identify
265
00:26:28,470 --> 00:26:35,461
one you need to say process ID plus that
number, and they're only granted to a
266
00:26:35,461 --> 00:26:38,300
single process. So when a process creates
one of these, it can't even access it
267
00:26:38,300 --> 00:26:42,490
itself, unless it creates a grant for
itself, which is not really that useful,
268
00:26:42,490 --> 00:26:51,880
usually. These grants are used to prevent
having to copy over all the data inside
269
00:26:51,880 --> 00:26:57,500
the IPC message used to implement a system
call. Yeah, these are the basic operations
270
00:26:57,500 --> 00:27:03,190
on it. You can create one, you can copy
into and from it. So, you can't actually
271
00:27:03,190 --> 00:27:07,010
map it. A process that receives one of
these has to say to the kernel, using a
272
00:27:07,010 --> 00:27:12,721
system call, "please write this data into
that area of memory that belongs to a
273
00:27:12,721 --> 00:27:17,930
different process." And then there's also
indirect grants, because, you know, in
274
00:27:17,930 --> 00:27:25,309
Minix they do have this, but also only
recently, and usually if you have a
275
00:27:25,309 --> 00:27:30,360
microkernel system, you would have to copy
your buffer for a read call first to the
276
00:27:30,360 --> 00:27:36,540
file system server and then back to, like,
either the hard disk driver, or the device
277
00:27:36,540 --> 00:27:40,620
driver that's implementing a device file.
So the ME actually allows you to create a
278
00:27:40,620 --> 00:27:45,860
grant, pointing to a grant, that was given
to you by someone else. And then that
279
00:27:45,860 --> 00:27:52,820
grant will inherit the privileges of the
process that creates it, combined with
280
00:27:52,820 --> 00:27:57,530
those that it assignes to it. So if the
process has a read/write grant it can
281
00:27:57,530 --> 00:28:01,340
create a read-only or write-only grant,
but it cannot, if it only has a read
282
00:28:01,340 --> 00:28:08,860
grant, it cannot add write rights to it
for a different process, obviously. So
283
00:28:08,860 --> 00:28:12,880
then there is also some big differences
from MINIX. In MINIX you address a process
284
00:28:12,880 --> 00:28:18,080
by its process ID or thread ID with a
generation number attached to it. In the
285
00:28:18,080 --> 00:28:25,440
ME you can actually address IPC to a file
descriptor. Kernel doesn't actually know a
286
00:28:25,440 --> 00:28:28,610
lot about file descriptors, it just
implements the basic thing where you have
287
00:28:28,610 --> 00:28:32,350
a list of files and each process has a
list of file descriptors assigning integer
288
00:28:32,350 --> 00:28:39,320
numbers to those files to refer to them
by. And this is used so you can as a
289
00:28:39,320 --> 00:28:43,040
process, you can actually directly talk to
a device driver without knowing what is
290
00:28:43,040 --> 00:28:47,110
process ID is. So you don't send it to the
file system server, you send it to the
291
00:28:47,110 --> 00:28:51,740
file descriptor or the Kernel just
magically corrects it for you. And they
292
00:28:51,740 --> 00:28:55,550
moved select into the kernel so you can
tell the kernel: "Hey, I want to wait till
293
00:28:55,550 --> 00:28:59,720
the file system server tells me that it
has not available or till a message comes
294
00:28:59,720 --> 00:29:05,440
in." This is one of the most complicated
system calls the ME offers that's used in
295
00:29:05,440 --> 00:29:12,010
a normal program. You can mostly ignore it
and just look like: "Hey, those arguments
296
00:29:12,010 --> 00:29:16,760
sort of define a file descriptor set as a
bit field." And then there's the message
297
00:29:16,760 --> 00:29:21,040
that might have been received and there's
DMA locks because you don't just want to
298
00:29:21,040 --> 00:29:24,790
write to registers. You actually might
want to do the direct memory access from
299
00:29:24,790 --> 00:29:30,720
hardware so you you can actually tell the
kernel to lock one of these memory grounds
300
00:29:30,720 --> 00:29:38,260
in RAM for you, it won't be swapped out
anymore. And yeah, it will even tell you
301
00:29:38,260 --> 00:29:42,020
the physical address so you can just load
that into a register and it's not really
302
00:29:42,020 --> 00:29:46,760
that complicated. Just lock it, get a
physical access, write into the register
303
00:29:46,760 --> 00:29:53,580
and continue. Well, that's the most
important stuff about the operating
304
00:29:53,580 --> 00:29:58,929
system. The hardware itself is a lot more
complicated because the operating system,
305
00:29:58,929 --> 00:30:03,300
once you have the code, you can just
reverse engineer it and get to know it.
306
00:30:03,300 --> 00:30:11,010
The hardware. Well, let's just say it's a
real pain to have to reverse engineer a
307
00:30:11,010 --> 00:30:16,179
piece of hardware together with its
driver. Like if you've got the driver
308
00:30:16,179 --> 00:30:18,450
code, but you don't know what the
registers do. So you don't know what a lot
309
00:30:18,450 --> 00:30:24,440
of logic does. And you're trying to both
figure out what the logic is and what the
310
00:30:24,440 --> 00:30:30,050
actual registers do. Right. So first you
want to know which physical address goes
311
00:30:30,050 --> 00:30:39,881
where? The metadata listings I showed you
actually have names in there. Those are
312
00:30:39,881 --> 00:30:47,940
not in the metadata files themself, I
annotated those. So you just see the
313
00:30:47,940 --> 00:30:56,680
physical address and size. But there is
one module, the bus driver module and the
314
00:30:56,680 --> 00:31:04,230
bus driver is normal user process, but it
implements stuff like PCI configuration
315
00:31:04,230 --> 00:31:09,550
space accesses and those things. And it
has a nice table in it with names for
316
00:31:09,550 --> 00:31:17,049
devices. So if you just run strings on it,
you'll see these things. When I saw this,
317
00:31:17,049 --> 00:31:20,960
I was was pretty glad because at least I
could make sense what device was being
318
00:31:20,960 --> 00:31:26,680
talked to in a in a certain program. So
the bus driver does all these things. It
319
00:31:26,680 --> 00:31:30,990
manages power getting to devices, it
manages configuration space access, it
320
00:31:30,990 --> 00:31:35,960
manages the different kinds of buses and
IOMU that are on the system. And it makes
321
00:31:35,960 --> 00:31:39,500
sure that the normal driver never has to
know any of these details. It just asked
322
00:31:39,500 --> 00:31:45,520
it for a device by a number assigned to it
a build time. And then the bus driver
323
00:31:45,520 --> 00:31:50,360
says, OK, here's a range of physical
address space you can now write to. So
324
00:31:50,360 --> 00:31:56,640
that's a really nice abstraction and also
gives us a lot of information because the
325
00:31:56,640 --> 00:32:01,640
really old builds for sunrise point
actually have a hell of a lot of debug
326
00:32:01,640 --> 00:32:07,021
strings in there as printf format strings,
not as catalogue ID. It's
327
00:32:07,021 --> 00:32:11,910
one of the only pieces of code for the ME
that does this, so that already tells you
328
00:32:11,910 --> 00:32:15,480
a lot. And then there's also the table
that I just talked about that has the
329
00:32:15,480 --> 00:32:23,760
actual info on the devices and names. So I
generated some DocuWiki content from this
330
00:32:23,760 --> 00:32:28,570
that I use myself and this is what's in
the table, part of it. So it tells you
331
00:32:28,570 --> 00:32:33,070
what address PCI configuration space lives
at. That tells you to do the bus device
332
00:32:33,070 --> 00:32:38,130
function for it through that. It tells you
on what chipset SKU they're present using
333
00:32:38,130 --> 00:32:44,640
a bitfield. And it tells you their names
in different fields. It also contains the
334
00:32:44,640 --> 00:32:48,540
values that are used to write the base
address registers for PCI. So also their
335
00:32:48,540 --> 00:32:54,190
normal memory ranges. And there's even
more devices. So the ME has access to a
336
00:32:54,190 --> 00:32:58,860
lot of stuff. A lot of it is private to
it. A lot of it is components that also
337
00:32:58,860 --> 00:33:06,110
exist in the rest of the computer. And
there's not a lot of information. A lot of
338
00:33:06,110 --> 00:33:11,410
these are basically all the things that
are out there together with conference
339
00:33:11,410 --> 00:33:15,140
slides published by other people who have
done research on the ME. I didn't have
340
00:33:15,140 --> 00:33:21,980
time to add links to those, but they're
easy to find on Google. I'll get later to
341
00:33:21,980 --> 00:33:28,230
this, I actually wrote a emulator for the
ME, a partial emulator to be able to run
342
00:33:28,230 --> 00:33:34,230
ME code and analyze it, which obviously
needs to know a bit about the hardware so
343
00:33:34,230 --> 00:33:41,030
you can look at the app. There is some
files in Intel's debugger package,
344
00:33:41,030 --> 00:33:46,150
specific versions of that that have really
detailed info on some of the devices, also
345
00:33:46,150 --> 00:33:51,460
not all of it. And I wrote some tool to
parse some of the files. It's really rough
346
00:33:51,460 --> 00:33:57,040
code. I published it because people wanted
to see what I was doing. It doesn't work
347
00:33:57,040 --> 00:34:04,080
out of the box. And there is a nice talk
on this by Mark Ermolov and Maxim
348
00:34:04,080 --> 00:34:06,870
Goryachy.. Actually I don't know if I'm
pronouncing that correctly, but they've
349
00:34:06,870 --> 00:34:12,049
done a lot of work on the ME and this
particular talk by them is really useful.
350
00:34:12,049 --> 00:34:16,339
And then there's also something else.
There is a second ME on server chipsets,
351
00:34:16,339 --> 00:34:21,299
the innovation engine. It's basically a
copy paste of the ME to provide a ME that
352
00:34:21,299 --> 00:34:24,760
the vendor can write code for. Don't think
it's used a lot. I've only been able to
353
00:34:24,760 --> 00:34:31,639
find HP software that actually targets it
and that has some more debug strings, but
354
00:34:31,639 --> 00:34:36,639
also not a lot, it mostly has a table
containing register names, but they're
355
00:34:36,639 --> 00:34:41,869
really abbreviated and for a really small
subset of the devices, there is
356
00:34:41,869 --> 00:34:48,280
documentation out there in a Pentium N and
J series datasheet. It's seems like they
357
00:34:48,280 --> 00:34:52,409
compile their a lot of code or whatever
with the wrong defines because it doesn't
358
00:34:52,409 --> 00:35:00,350
actually fit into the manual that well,
it's just a section that has like some 20
359
00:35:00,350 --> 00:35:08,640
tables that shouldn't be in there. So this
is from that talk I just referenced and
360
00:35:08,640 --> 00:35:12,609
it's a overview of the innovation engine
and the bus bridges and everything in
361
00:35:12,609 --> 00:35:20,070
there. This isn't very precise. So based
on some of those files from System Studio,
362
00:35:20,070 --> 00:35:24,500
I try to get a better understanding of
this, which is this. This is the entire
363
00:35:24,500 --> 00:35:29,760
chipset. The little DMA block in the top
left corner is what connects to your CPU.
364
00:35:29,760 --> 00:35:36,570
And all of the big blocks with a lot of
ports are our bus bridges or switches for
365
00:35:36,570 --> 00:35:45,470
PCIexpress-like fabric. So there's a lot
going on. The highlighted area is the
366
00:35:45,470 --> 00:35:59,081
management engine memory space and the
rest of it is like the global chipset. The
367
00:35:59,081 --> 00:36:02,840
things I've highlighted in green hair are
on the primary PCI bus. So there's this
368
00:36:02,840 --> 00:36:08,210
weird thing going on where there seems to
be two PCI hierarchies, at least
369
00:36:08,210 --> 00:36:13,741
logically. So in reality it's not even
PCI, but on intel systems, there's a lot
370
00:36:13,741 --> 00:36:19,600
of stuff that behaves as if it is PCI. So
it has like a bus device function and
371
00:36:19,600 --> 00:36:28,650
numbers, PCI configuration space registers
and they have two different roots for the
372
00:36:28,650 --> 00:36:32,310
configuration space. So even though the
configuration space address includes a bus
373
00:36:32,310 --> 00:36:36,480
number, they have two completely different
things with each. Each of which has its
374
00:36:36,480 --> 00:36:41,290
own bus zero. So that's that's weird also
because they don't make sense when you
375
00:36:41,290 --> 00:36:45,680
look at how the hardware is laid out. So
this is stuff that's on the primary PCI
376
00:36:45,680 --> 00:36:50,780
configuration space that's directly
accessed by the EM, by the north bridge on
377
00:36:50,780 --> 00:36:55,260
the ME CPU. So that's the minute I A
system agent. System agent is what Intel
378
00:36:55,260 --> 00:37:00,619
calls a Northbridge nowadays, now that
it's not a separate chip anymore. It's
379
00:37:00,619 --> 00:37:07,530
basically just a Northbridge and a crypto
unit that's on there and the stuff that's
380
00:37:07,530 --> 00:37:12,530
directly attached to Northbridge being the
ROM and the RAM. So the processor itself
381
00:37:12,530 --> 00:37:16,960
is, as I said, derived from a 486, but it
does actually have some more modern
382
00:37:16,960 --> 00:37:21,830
features that it does CPU ID, at least on
my systems. Some other researchers said
383
00:37:21,830 --> 00:37:29,369
theirs didn't. It's basically the core
that's in the quark MCU, which is really
384
00:37:29,369 --> 00:37:33,260
great because it's one of the only cores
made by Intel that has public
385
00:37:33,260 --> 00:37:39,800
documentation on how to do run control. So
breakpoints and accessing registers and
386
00:37:39,800 --> 00:37:44,420
everything over JTAG. Intel doesn't
publish this stuff except for the quark
387
00:37:44,420 --> 00:37:50,920
MCU, because they were targeted makers.
But they reused that in here, which is
388
00:37:50,920 --> 00:37:58,200
really useful. It even has an official
port to the OpenOCD debugger, which I have
389
00:37:58,200 --> 00:38:03,100
not gotten to test because I don't have a
JTAG probe, which is compatible with Intel
390
00:38:03,100 --> 00:38:11,000
voltage levels and supported by OpenOCD
and also has like a set CPU ID and MSRs.
391
00:38:11,000 --> 00:38:21,170
It has some really fancy features like
branch tracing and some more strict paging
392
00:38:21,170 --> 00:38:30,480
permission enforcement stuff. They don't
use the interrupt pins on this. So it's an
393
00:38:30,480 --> 00:38:34,710
IP block but if there are some files out
there, that's where it is this screenshot
394
00:38:34,710 --> 00:38:40,601
is from, that actually are used by a
built in logic analyzer Intel has on the
395
00:38:40,601 --> 00:38:46,680
chipset and you can select different
signals on the chip to to watch, which is
396
00:38:46,680 --> 00:38:50,900
a really great source of information on
how the IP blocks are laid out and what
397
00:38:50,900 --> 00:38:54,200
signals are in there, because you
basically get a tree view of the IP blocks
398
00:38:54,200 --> 00:39:00,800
and chip and some of their signals. They
don't use the legacy interrupt system,
399
00:39:00,800 --> 00:39:07,920
they only use message based interrupts by
what a device writes a value into a
400
00:39:07,920 --> 00:39:13,050
register on the interrupt controller
instead of asserting a pin. And then there
401
00:39:13,050 --> 00:39:21,700
is the Northbridge. It's partially
documented in that data sheet I mentioned,
402
00:39:21,700 --> 00:39:29,020
it does support x86 IO address space, but
it's never used. Everything in the ME is
403
00:39:29,020 --> 00:39:36,600
in memory space or expose as memory space
through bridges, in the Northbridge
404
00:39:36,600 --> 00:39:43,070
implements access to the ROM,RAM, it has a
IOMMU which is only used for transactions
405
00:39:43,070 --> 00:39:48,750
coming from the rest of the system and
it's always initialized to, at least in
406
00:39:48,750 --> 00:39:51,660
the firmware I looked up, it's always
initialized to the inverse of the page
407
00:39:51,660 --> 00:40:00,200
table, so linear addresses can be used for
memory maps, sorry, for DMA. It also does
408
00:40:00,200 --> 00:40:06,270
PCI configuration space access to the
primary PCI bus. And it has a firewall
409
00:40:06,270 --> 00:40:15,080
that allows the operating system to deny
any IP block in the chipset from sending a
410
00:40:15,080 --> 00:40:18,890
completion on the bus request. So it can
actually say: "Hey, I want to read some
411
00:40:18,890 --> 00:40:25,040
register and only these devices are
allowed to send me value for it." So
412
00:40:25,040 --> 00:40:29,570
they've actually thought about security
here, which is great. Then there is one of
413
00:40:29,570 --> 00:40:38,190
the most important blocks in the ME, which
is the crypto engine. It does some sort of
414
00:40:38,190 --> 00:40:47,100
more well-known crypto algorithms. AES,
SHA hashes, RSA and it has a secure key
415
00:40:47,100 --> 00:40:56,330
store, which I'm not gonna [audio dropped]
... all about it in their ME talk at
416
00:40:56,330 --> 00:41:04,250
Blackhat. And a lot of these things have
DMA engines, which all seem to be the
417
00:41:04,250 --> 00:41:09,500
same. And there is no other DM agents ...
engines in ME, so this is also used from
418
00:41:09,500 --> 00:41:23,170
memory to memory copy or DMA into other
devices. So that's used in a lot of
419
00:41:23,170 --> 00:41:27,400
things. This is actually a diagram which I
don't have the vector for anymore. So
420
00:41:27,400 --> 00:41:35,260
that's why the libre office background is
in there. I'm sorry. So this is basically
421
00:41:35,260 --> 00:41:39,020
what that crypto engine looks like when
you look at that signal tree that I was
422
00:41:39,020 --> 00:41:44,910
talking about earlier. The DMA engines are
both able to do memory to memory copies
423
00:41:44,910 --> 00:41:52,570
until directly targets the crypto unit
they're part of. Basically, when you, I
424
00:41:52,570 --> 00:41:57,490
don't know about the control bits that go
with this, but when you set the target
425
00:41:57,490 --> 00:42:02,150
address to zero and the right control
bits, it will copy into the buffer that's
426
00:42:02,150 --> 00:42:11,960
used for the encryption. So that is how it
accelerates memory access for crypto. And
427
00:42:11,960 --> 00:42:15,590
these are the actual register offsets.
They're the same for all of the DMA
428
00:42:15,590 --> 00:42:21,580
engines in there relative to the base
address of the subunit they're in. And
429
00:42:21,580 --> 00:42:27,290
then there's the second PCI bus or bus
hierarchy, which is like in some places
430
00:42:27,290 --> 00:42:33,540
called the PCI fixed bus. I'm actually not
entirely sure whether this is actually
431
00:42:33,540 --> 00:42:38,840
implemented as a PCI bus as I've drawn it
here, but this is what it behaves like. So
432
00:42:38,840 --> 00:42:43,920
it has all the ME private stuff, that's
not a part of the normal chipset. So it's
433
00:42:43,920 --> 00:42:51,310
timers for the ME, it has the
implementation of the secure enclave
434
00:42:51,310 --> 00:42:58,010
stuff, that the firmware TPM registers.
And it has the gen device which I've
435
00:42:58,010 --> 00:43:01,780
mostly ignored because it's only used the
boot time. It's only used by the actual
436
00:43:01,780 --> 00:43:10,869
boot ROM for the ME mostly. It is what the
ME uses to get the fuses Intel burns. So
437
00:43:10,869 --> 00:43:15,420
that's the intel public key, whether it's
a production or pre-production part, but
438
00:43:15,420 --> 00:43:20,260
it's pretty much a black box. It's not
used that much, fortunately. There is the
439
00:43:20,260 --> 00:43:24,340
IPC block which allows the ME to talk to
the sensor hub, which is a different CPU
440
00:43:24,340 --> 00:43:28,190
in the chipset. It allows it to talk to
power management controller and all kinds
441
00:43:28,190 --> 00:43:34,180
of other embedded CPUs. So it's inter
processor communication not interprocess.
442
00:43:34,180 --> 00:43:39,090
Confused me for a bit. And here's the host
embedded controller interface, which is
443
00:43:39,090 --> 00:43:44,320
how the ME talks to the rest of the
computer when it wants the computer to
444
00:43:44,320 --> 00:43:47,960
know that it's talking so it can directly
access a lot of stuff. But when it wants
445
00:43:47,960 --> 00:43:54,250
to send a message to the EFI or to Windows
or Linux, it'll use this. And it also has
446
00:43:54,250 --> 00:43:59,080
status registers, which are really simple
things where the ME writes in a value. And
447
00:43:59,080 --> 00:44:05,290
even if the ME crashes, the host can still
read the value, which is how you can see
448
00:44:05,290 --> 00:44:11,160
whether the ME is running, whether it's
disabled, whether it fully booted, or
449
00:44:11,160 --> 00:44:15,400
whether it crashed halfway through. But at
a point where it could still get the rest
450
00:44:15,400 --> 00:44:21,230
of the computer running and there is some
corporate code to to read it. I've also
451
00:44:21,230 --> 00:44:27,080
implemented some decoding for it on the
emulator because it's useful to see what
452
00:44:27,080 --> 00:44:33,210
those values mean. So then there's
something really interesting, the primary
453
00:44:33,210 --> 00:44:37,240
adverse translation table, which is the
bus bridge that allows the ME to actually
454
00:44:37,240 --> 00:44:44,200
access the PCIexpress fabric of the
computer. For a lot of the, what in this
455
00:44:44,200 --> 00:44:50,010
table call ME peripherals, that are
actually outside the ME domain and the
456
00:44:50,010 --> 00:45:00,320
chipset, it uses this to access it. It
also uses it to access the UMA, which is
457
00:45:00,320 --> 00:45:04,960
an area of host RAM that's used as a swap
device for the ME and to Trace Hub, which is
458
00:45:04,960 --> 00:45:11,190
the debug port, but also has a couple of
windows which allow the ME to access any
459
00:45:11,190 --> 00:45:19,060
random area of host RAM, which is the most
scary bit because UMA is specified by
460
00:45:19,060 --> 00:45:24,650
host, but the host DRAM area is where you
can just point it anywhere. You can read
461
00:45:24,650 --> 00:45:28,750
or write any value that that Windows or
Linux or whatever you're running has
462
00:45:28,750 --> 00:45:37,460
sitting there. So that's scary to me. So
and then there's the rest of it, the rest
463
00:45:37,460 --> 00:45:46,490
of the devices which are behind the
primary ATT. And that's a lot of stuff,
464
00:45:46,490 --> 00:45:53,450
that's debug, that's also the older normal
peripherals that your P.C. has, but it
465
00:45:53,450 --> 00:45:56,200
also includes things like the power
management controller, which actually
466
00:45:56,200 --> 00:45:59,789
turns on and off all the different parts
of your computer. It controls clocks and
467
00:45:59,789 --> 00:46:07,680
resets. So this is really important. There
is a concept that you'll come across where
468
00:46:07,680 --> 00:46:14,261
you're reading Intel manuals or ME related
stuff that's root spaces besides your
469
00:46:14,261 --> 00:46:20,320
normal addressing information for a PCI
device, it also has a root space number,
470
00:46:20,320 --> 00:46:24,980
which is basically how you have a single
PCI device exposing two completely
471
00:46:24,980 --> 00:46:31,151
different address spaces. And it's 0 for
the host, it's one for the ME. Some
472
00:46:31,151 --> 00:46:34,940
devices expose the same information on
there. Other ones behave completely
473
00:46:34,940 --> 00:46:43,370
different. That's something you don't
usually see. And then there's the side
474
00:46:43,370 --> 00:46:48,560
band fabric. So besides all this stuff
they just covered, which is PCI like at
475
00:46:48,560 --> 00:46:52,880
least. There is also something completely
different, side band fabric, which is a
476
00:46:52,880 --> 00:47:00,990
completely packet switched network, where
you don't use any memory mapping by
477
00:47:00,990 --> 00:47:06,370
default. You just have a one byte address
for a device and some other addressing
478
00:47:06,370 --> 00:47:09,590
fields and you're just sending a message
saying: "Hey, I want to read configuration
479
00:47:09,590 --> 00:47:14,320
or data or memory." And there is actually
a lot of information out there on this,
480
00:47:14,320 --> 00:47:18,480
because Intel, it seems like I just copy
pasted their internal specification into a
481
00:47:18,480 --> 00:47:26,860
patent. This is how you address it. This
is all devices on there, which is quite a
482
00:47:26,860 --> 00:47:32,590
lot. It's also what you, if any of you are
kernel developers, and you've had to deal
483
00:47:32,590 --> 00:47:40,110
with GPIO on Intel SoCs. There's this P2SB
device that you have to use. That's what
484
00:47:40,110 --> 00:47:48,240
the host uses to access this. Their
documentation on it is really, really bad.
485
00:47:48,240 --> 00:47:52,420
This was all done using static analysis.
But then I wanted to figure out how some
486
00:47:52,420 --> 00:47:57,410
of the logic actually works and it was
really complicated to play around with the
487
00:47:57,410 --> 00:48:07,310
ME. There was this nice talk by Ermolov
and Goryachy, where they said: "You know,
488
00:48:07,310 --> 00:48:11,790
we found a an exploit that gives you code
execution and you can you can get JTAG
489
00:48:11,790 --> 00:48:18,813
access to." It sounds really nice. It's
actually not that easy. So arbitrary code
490
00:48:18,813 --> 00:48:23,359
execution in the BUP module, they actually
describe their exploit and how you should
491
00:48:23,359 --> 00:48:30,270
use it. But they didn't describe anything
that's needed to actually implement that.
492
00:48:30,270 --> 00:48:35,690
So if you want to do that, what you need
to do to figure out where to stack lives,
493
00:48:35,690 --> 00:48:40,230
you need to know where you need to write a
payload that will actually get it from a
494
00:48:40,230 --> 00:48:44,640
buffer overflow on a stack that, by the
way, uses stack cookies. So you can't just
495
00:48:44,640 --> 00:48:51,369
overwrite the return address to turn that
into an arbitrary write. And you need to
496
00:48:51,369 --> 00:48:56,369
find out what the return pointer address
is so you can overwrite it and find ROP
497
00:48:56,369 --> 00:49:03,320
gadgets because the stack is not
executable. And then when you've done
498
00:49:03,320 --> 00:49:09,920
that, you can just turn on debug access or
change to custom firmware or whatever. So
499
00:49:09,920 --> 00:49:13,660
what I did is I had a bit of trouble
getting that running and in order to test
500
00:49:13,660 --> 00:49:17,720
your payload, you have to flash it into
the system and it takes a while and then
501
00:49:17,720 --> 00:49:20,880
the system just doesn't power on if the
ME's not working, if you're crashing it
502
00:49:20,880 --> 00:49:24,580
instead of getting code execution. So it's
not really valuable to to develop it that
503
00:49:24,580 --> 00:49:32,910
way, I think. Some people did. I respect
that because it's really, really hard. And
504
00:49:32,910 --> 00:49:38,790
then I wrote this ME Loader, it's called
Loader because at first I started out like
505
00:49:38,790 --> 00:49:42,849
writing it as a sort of a wine thing where
you where you would just mmap the right
506
00:49:42,849 --> 00:49:47,380
ranges at the right place and jump into
it, execute it, patch some system calls.
507
00:49:47,380 --> 00:49:51,849
But because the ME is a micro kernel
system in almost every user space program
508
00:49:51,849 --> 00:49:57,480
accesses hardware directly, it ended up
implementing like a good part of the
509
00:49:57,480 --> 00:50:08,080
chipset, at least as stubs or enough logic
to get the code running. And I later on
510
00:50:08,080 --> 00:50:14,510
added some features that actually allowed
to talk to the hardware. I can use it as a
511
00:50:14,510 --> 00:50:18,530
debugger, but just because it's actually
running the ME firmware or parts of it
512
00:50:18,530 --> 00:50:26,200
inside a normal Linux process, I can just
use gdb to debug it. And back in April
513
00:50:26,200 --> 00:50:30,320
last year, I got that working to the point
where I could run the bootstrap process,
514
00:50:30,320 --> 00:50:38,580
which is where the vulnerability is. And
then you just develop the exploit against
515
00:50:38,580 --> 00:50:43,960
it, which I did. And then I made a mistake
cleaning up some old change root
516
00:50:43,960 --> 00:50:52,010
environments for close source software.
And I nuked my home dir. Yeah. I hadn't
517
00:50:52,010 --> 00:50:56,599
yet pushed everything to GitHub. So I
stuck with an old version and I decided,
518
00:50:56,599 --> 00:51:00,160
you know, let's refactor this and turn it
into something that might actually at some
519
00:51:00,160 --> 00:51:03,930
point be published, which by the way I
did last summer. This is all public code. The
520
00:51:03,930 --> 00:51:09,790
ME Loader thing. It's on GitHub. And
someone else beat me to it and replicated
521
00:51:09,790 --> 00:51:15,250
that exploit by the Russian guys. Which up to
then they have produced a proof of concept
522
00:51:15,250 --> 00:51:22,760
thing for Apollo like chipsets, which were
completely different for from what you had
523
00:51:22,760 --> 00:51:33,690
to do for normal ME. I was a bit
disappointed by that one, not being the
524
00:51:33,690 --> 00:51:38,580
first one to actually replicate this. But
then I did about a week later, I got it
525
00:51:38,580 --> 00:51:44,270
got my loader back to the point where I
could actually get to the vulnerable code
526
00:51:44,270 --> 00:51:51,120
and develop that exploit and got it
working not too long after. And here's the
527
00:51:51,120 --> 00:51:54,720
great thing. Then I went to the hacker
space. I flash it into my laptop. The
528
00:51:54,720 --> 00:51:59,040
image that I had just been using only on
the emulator. I didn't change it. I flash.
529
00:51:59,040 --> 00:52:05,280
I was like, this is never gonna work on
it. It works. some laughter And I've still got an image
530
00:52:05,280 --> 00:52:08,480
on a flash ship with me because that's
what I used to actually turn on the
531
00:52:08,480 --> 00:52:14,490
debugger. And then you need a debug probe
because that USB based debugging stuff
532
00:52:14,490 --> 00:52:18,810
that's mentioned here only works pretty
late in boot. Which is also why I only
533
00:52:18,810 --> 00:52:21,880
really see Apollo Lake stuff because on
those chipsets you can actually use this
534
00:52:21,880 --> 00:52:33,010
for the ME. And then you need this thing
because there's a second channel, that is
535
00:52:33,010 --> 00:52:36,360
using the USB plug, but it's a completely
different physical layer and you need an
536
00:52:36,360 --> 00:52:40,911
adapter for it, which I don't think was
intended to be publicly available. Because
537
00:52:40,911 --> 00:52:44,859
if you go to Intel site to say, I want to
buy this, they say, here's the C-NDA,
538
00:52:44,859 --> 00:52:54,460
please sign it. But it appeared on mouser.
And luckily I knew some people, who had
539
00:52:54,460 --> 00:52:59,120
done some other stuff, got a nice bounty
for it and bought it and I let me use it.
540
00:52:59,120 --> 00:53:05,430
Thanks to them. It's expensive, but you
can buy it if it's still up there. Haven't
541
00:53:05,430 --> 00:53:11,520
checked. That's the Link. So I'm a bit
late, so I'm gonna use the time for
542
00:53:11,520 --> 00:53:15,760
questions as well. So the main thing the
ME does that you cannot replace is the
543
00:53:15,760 --> 00:53:21,250
boot process. It's not just breaking the
system. If you don't turn it on, it
544
00:53:21,250 --> 00:53:25,240
actually does stuff that has to be done.
So you gonna have to use the ME anyway if
545
00:53:25,240 --> 00:53:30,730
you want to boot a computer. I don't
necessarily have to use Intel's firmware.
546
00:53:30,730 --> 00:53:35,810
The ME itself boots is like a micro kernel
system, so it has a process which
547
00:53:35,810 --> 00:53:39,859
implements a lot of the servers that will
allow it to get to a point where it can
548
00:53:39,859 --> 00:53:44,710
start those servers. This process has very
high privileges in older versions, which
549
00:53:44,710 --> 00:53:49,160
is what is being used on these chipsets.
And if you exploit that, you're still ring
550
00:53:49,160 --> 00:53:55,680
3, but you can turn on debugger and you
can use the debugger to become ring 0. So
551
00:53:55,680 --> 00:53:59,171
this is what normal boot process for a
computer looks like. And this is what
552
00:53:59,171 --> 00:54:02,050
happens when you use Boot Guard. There's a
bit of code that runs even before the
553
00:54:02,050 --> 00:54:07,170
reset vector, and that's started by micro
code initialization, of course. And this
554
00:54:07,170 --> 00:54:12,120
is what actually happens. The ME loads a
new firmware into a power management
555
00:54:12,120 --> 00:54:16,390
controller, it then ready some stuff in a
chipset and it tells the power mentioning
556
00:54:16,390 --> 00:54:23,660
controller like please stop pulling that
CPU reset pin low and the CPU will start.
557
00:54:23,660 --> 00:54:28,160
Power managment controller is a completely
independent thing I say 8051 derived
558
00:54:28,160 --> 00:54:32,690
microcontroller that runs a real time
operating system from the 90s. This is the
559
00:54:32,690 --> 00:54:38,690
only string in the firmware by the way,
that's quoted there. And depending on the
560
00:54:38,690 --> 00:54:42,410
chipsset that you have, it's either loaded
with a patch or with a complete binary
561
00:54:42,410 --> 00:54:46,690
from the ME, and it does a lot of
important stuff. No documentation on it
562
00:54:46,690 --> 00:54:52,120
besides ACPI interface, which is not
really any useful. The ME has to do these
563
00:54:52,120 --> 00:54:58,710
things. It needs to load the keys for the
Boot Guard process needs to set up clock
564
00:54:58,710 --> 00:55:06,550
controllers and then tell the PMC to turn
on the power to to the CPU. It needs to
565
00:55:06,550 --> 00:55:15,240
configure PCI express fabric and reset -
like get the CPU to come out of reset.
566
00:55:15,240 --> 00:55:18,290
There's a lot of code involved in this, so
I really didn't want to do this all
567
00:55:18,290 --> 00:55:22,150
statically. What I did is I added hardware
support, hardware passthrough support to
568
00:55:22,150 --> 00:55:28,500
the emulator and booted my laptop that
way. Actually had a video of this, but I
569
00:55:28,500 --> 00:55:33,970
don't have the time to show it, which is a
pity. But this is what I - the bring up
570
00:55:33,970 --> 00:55:38,030
process from the ME running in a Linux
process, sending whatever hardware access
571
00:55:38,030 --> 00:55:43,340
as it was trying to do that are important
for boot to the debugger. And then that
572
00:55:43,340 --> 00:55:49,880
was using a ME in real hardware that was
halted to actually do to register accesses
573
00:55:49,880 --> 00:55:56,520
and it works. It's not going to show this.
It actually booted the computer reliably.
574
00:55:56,520 --> 00:56:02,410
Then Boot Guard configuration is fun
because you know where they say they fuse
575
00:56:02,410 --> 00:56:10,990
in the keys. Well yeah. But the ME loads
them from fuses and then manually loads
576
00:56:10,990 --> 00:56:14,530
them into registers. So if you have code
execution on the ME before it does this,
577
00:56:14,530 --> 00:56:18,000
you can just load your own values and you
can run core boot even on a machine that
578
00:56:18,000 --> 00:56:24,190
has Boot Guard. Yeah. So I'm gonna go
through this really quickly. This is, by
579
00:56:24,190 --> 00:56:29,570
the way, these are the registers that
configure what security model the CPU is
580
00:56:29,570 --> 00:56:34,579
gonna enforce for the firmware. I'm going
to release this code after my talk. It's
581
00:56:34,579 --> 00:56:39,810
part of a Python script that I wrote that
uses the debugger to start the CPU without
582
00:56:39,810 --> 00:56:45,670
ME firmware. I traced all the of the ME
firmware did. And I now have a Python
583
00:56:45,670 --> 00:56:51,470
script that can just start a computer
without Intel's code. If you translate
584
00:56:51,470 --> 00:56:55,920
this into a rough sequence or even into
binary for the ME, you can start a
585
00:56:55,920 --> 00:57:02,850
computer without the ME itself or at least
without it running the operating system.
586
00:57:02,850 --> 00:57:12,710
applause
So, yeah, future goals. I really do want
587
00:57:12,710 --> 00:57:20,420
to share this because if there is a way to
escalate, to ring 0 fruit, a rope chain,
588
00:57:20,420 --> 00:57:24,359
then you could just start your own kernel
in the ME and have custom firmware, at
589
00:57:24,359 --> 00:57:29,600
least from the vulnerability on. But you
could also build a mod chip that uses the
590
00:57:29,600 --> 00:57:34,829
debugger interface to load a new firmware.
There's lots of stuff still needs to be
591
00:57:34,829 --> 00:57:41,210
discovered, but I'm gonna hang out at the
open source firmware village later, at
592
00:57:41,210 --> 00:57:46,690
least part of the week here. So because I
really want to get started on open source
593
00:57:46,690 --> 00:57:55,250
ME firmware using this. Right. And there's
a lot of people that's played a role in
594
00:57:55,250 --> 00:58:00,700
getting me to this point. Also would like
to thank the guy from Hague hacker space,
595
00:58:00,700 --> 00:58:07,680
BinoAlpha, who basically allowed me to use
his laptop to prepare the demo, which I
596
00:58:07,680 --> 00:58:14,660
ended up not being able to show, but.
Right. I was gonna ask what are the
597
00:58:14,660 --> 00:58:17,380
worrying questions? But I don't think
there's really any time for any more.
598
00:58:17,380 --> 00:58:22,570
Herald: Peter, thank you so much. Applause
Unfortunately, we don't have any more time
599
00:58:22,570 --> 00:58:30,720
left.
Peter: I'll be around. I'll be around.
600
00:58:30,720 --> 00:58:35,660
Herald: I think it's very, very
interesting because I hope that your talk
601
00:58:35,660 --> 00:58:41,119
will inspire many people to keep looking
into how the management engine works and
602
00:58:41,119 --> 00:58:46,930
hopefully uncover even more stuff. I think
we have time for just one single question.
603
00:58:46,930 --> 00:58:51,040
I don't know, do we? How one from the
Internet. Thank you so much.
604
00:58:51,040 --> 00:58:56,790
Signal Angel: OK. First off, I have to
tell you. Your shirt is nice. Chat wanted
605
00:58:56,790 --> 00:59:05,000
me to say this. And they asked how
reliable this exploit is and does it work
606
00:59:05,000 --> 00:59:09,160
on every boot?
Peter: Right, Yeah. That's actually
607
00:59:09,160 --> 00:59:14,960
something really important that I forgot
to mention. So they patch a vulnerability,
608
00:59:14,960 --> 00:59:17,339
but they didn't provide downgrade
protection. If you could flash a
609
00:59:17,339 --> 00:59:24,170
vulnerable image with an exploit in it,
it'll just boot every time on these chips
610
00:59:24,170 --> 00:59:27,850
that's so six or seven generation chips
that's put in that image and it will
611
00:59:27,850 --> 00:59:31,230
reliably turn on the debugger every time
you turn on the computer. applause
612
00:59:31,230 --> 00:59:36,650
Herald: Thank you so much for the
question. And Peter Bosch thank you so
613
00:59:36,650 --> 00:59:39,160
much. Please give him a great round of
applause.
614
00:59:39,160 --> 00:59:43,625
applause
615
00:59:43,625 --> 01:00:08,000
subtitles created by c3subtitles.de
in the year 20??. Join, and help us!