Herald Angel: So, most of the cloud
services rely on closed source proprietary

server firmware; header[?] firmware; with
known security implications to[?] tenants.

Imagine... So that's where LinuxBoot comes
to the rescue because it wants to replace

this closed source firmware with an open
Linux boot version and our next speaker

Trammell Hudson he's an integral part of
that project and he's here to provide you

an overview of this Linux boot project.
Thank you very much and please give a warm

round round of applause to Trammell
Hudson, please!

<i>applause</i>
Trammell: Thank you!

Securing the boot process is really
fundamental to having secure systems

because of the vulnerabilities in firmware
can affect any security that the operating

system tries to provide. And for that
reason I think it's really important that

we replace the proprietary vendor
firmwares with open source, like Linux.

And this is not a new idea. My
collaborator Ron Minich started a project

called LinuxBIOS back in the 90s when he
was at Los Alamos National Labs. They

built the world's third fastest
supercomputer out of a Linux cluster that

used BIOS in the ROM to make it more
reliable. LinuxBIOS turned into coreboot

in 2005 and the Linux part was removed and
became a generic bootloader, and it now

powers the Chromebooks as well as projects
like the Heads slightly more secure laptop

firmware that I presented last year at
CCC. Unfortunately it doesn't support any

server main boards anymore. Most servers
are running a variant of Intel's UEFI

firmware, which is a project that Intel
started to replace the somewhat aging

16-bit real mode BIOS of the 80s and 90s.
And, like a lot of second systems, it's

pretty complicated. If you've been to any
talks on firmware security you've probably

seen this slide before. It goes through
multiple phases as the system boots, the

first phase does a cryptographic
verification of the pre-EFI phase this the

PEI phase is responsible for bringing up
the memory controller the CPU interconnect

and a few other critical devices. It also
enables paging and long mode and then

jumps into the device execution
environment or DXE phase. This is where

UEFI option ROMs are executed as well, as
well as all of the remaining devices are

initialized. Once the PCI bus and USB
buses have been walked and enumerated, it

transfers to the boot device selection
phase, which figures out which disk or USB

stick or network to boot from. That loads
a boot loader from that device which

eventually loads a real operating system,
which then is the operating system that's

running on the machine. What we're
proposing is that we replace all of this

with the Linux boot kernel and runtime. We
can do all of the device enumeration in

Linux, it already has support for doing
this, and then we can use more

sophisticated protocols and tools to
locate the real kernel that we want to

run, and use the kexec system call to be
able to start that new kernel. And the

reason we want to use Linux here is
because it gives us the ability to have a

more secure system. It gives us a lot more
flexibility and hopefully it lets us

create a more resilient system out of it.
On the security front one of the big areas

that we get some benefit is we reduced the
attack surface that in the DXE phase these

drivers are an enormous amount of code on
the Intel S2600 there are over 400 modules

that get loaded. They do things like the
option roms that I mentioned, and if you

want an example of how dangerous option
roms can be, you can look at my

Thunderstrike talks from a few years ago.
They also do things like display the boot

splash, the vendor logo, and this has been
a place where quite a few buffer overflows

have been found in vendor firmwares in the
past. They have a complete network stack

IPv4 and v6 as well as HTTP and HTTPS.
They have legacy device drivers for things

like floppy drives, and again, these sort
of dusty corners are where vulnerabilities

in Xen have been found that allowed a
hypervisor break. There are also modules

like the Microsoft OEM activation that we
just don't know what they do, or things

like a y2k rollover module that probably
hasn't been tested in two decades. So the

final OS bootloader phase is actually not
part of UEFI, but it's typically in, the

Linux system, its GRUB, the grand unified
bootloader. And y'all -- many of you are

probably familiar with its interface, but
did you know that it has its own file

system, video, and network drivers. About
almost 250 thousand lines of code make up

GRUB. I don't bring up the size of this to
complain about the space it takes, but

because of how much it increases our
attack surface. You might think that

having three different operating systems
involved in this boot process gives us a

defense in depth, but I would argue that
we are subject to the weakest link in this

chain because if you can compromise UEFI,
you can compromise grub, and if you can

compromise grub you can compromise the
Linux kernel that you want to run on the

machine. So there are lots of ways these
attacks could be launched. As I mentioned

UEFI has a network device driver, grub has
a network device driver, and of course

Linux has a network device driver. This
means that a remote attacker could

potentially get code execution during the
boot process.

UEFI has a USB driver, grub has a
USB driver, and of course Linux has a USB

driver. There have been bugs found in USB
stacks -- which unfortunately are very

complex -- and a buffer overflow in a USB
descriptor handler could allow a local

attacker to plug in a rogue device and
take control of the firmware during the

boot. Of course UEFI has a FAT driver,
GRUB has a FAT driver, Linux has a FAT

driver. This gives an attacker a place to
gain persistence and perhaps leverage code

execution during the initial file system
or partition walk. So what we argue is

that we should have the operating system
that has the most contributors, and the

most code review, and the most frequent
update schedule, for these roles. Linux

has a lot more eyes on it, it undergoes a
much more rapid update schedule than

pretty much any vendor firmware. You might
ask, why do we keep the PEI and the SEC

phase from the UEFI firmware? Couldn't we
use coreboot in this place, and the

problem is that vendors are not
documenting the memory controller or the

CPU interconnect. Instead they're
providing a opaque binary blob called the

firmware support package, or FSP, that
does the memory controller and the CPU

initialization. On most coreboot systems
-- on most modern coreboot systems --

coreboot actually calls into the FSP to do
this initialization. And on a lot of the

devices the FSB has grown in scope so it
now includes video device drivers and

power management, and it's actually larger
than the PEI phase on some of the servers

that we're dealing with. The other wrinkle
is that most modern CPUs don't come out of

reset into the legacy reset vector
anymore. Instead, they execute an

authenticated code module, called boot
guard, that's signed by Intel, and the CPU

will not start up if that's not present.
The good news is that this boot guard ACM

measures the PEI phase into the TPM, which
allows us to detect attempts to modify it

from malicious attacks. The bad news is
that we are not able to change it on many

of these systems. But even with that in
place, we still have a much, much more

flexible system. If you've ever worked
with the UEFI shell or with GRUBs menu

config, it's really not as flexible, and
the tooling is not anywhere near as

mature, as being able to write things with
shell scripts, or with go, or with real

languages. Additionally we can configure
at the Linux boot kernel with standard

Linux config tools. UEFI supports booting
from FAT file systems, but with LinuxBoot

we can boot from any of the hundreds of
file systems that Linux supports. We can

boot from encrypted filesystems, since we
have LUKS and cryptsetup. Most UEFI

firmwares can only boot from the network
device that is installed on the server

motherboard. We can boot from any network
device that Linux supports, and we can use

proper protocols; we're not limited to PXE
and TFTP. We can use SSL, we can do

cryptographic measurements of the kernels
that we receive. And the runtime that

makes up Linux boot is also very flexible.
Last year I presented the Heads runtime

for laptops. This is a very security
focused initial ram disk that attempts to

provide a slightly more secure, measured,
and attested firmware, and this works

really well with Linux boot. My
collaborator Ron Minnich is working on a

go based firmware, called NERF, and this
is written entirely in just-in-time

compiled Go, which is really nice because
it gives you memory safety, and is very

popular inside of Google. Being able to
tailor the device drivers that are

included also allows the system to boot
much faster. UEFI on the Open Compute

Winterfell takes about eight minutes to
startup. With NERF, excuse me -- with

with Linux boot and NERF it starts up in
about 20 seconds. I found similar results

on the Intel mainboard that I'm working on
and hopefully we will get a video there's

an action this is from power-on to
executes the PEI phase out of the

ROM and then jumps into a small wrapper
around the Linux kernel which then prints

to the serial port and we now have the
Linux print case and we have an

interactive shell in about 20 seconds
which is quite a bit better than the four

minutes that the system used to take it
scrolled by pretty fast but you might have

noticed that the print case has ... - that
the Linux kernel thinks it's running under

EFI this because we have a small wrapper
around the kernel but for the most part

the kernel is able to do all of the PCI
and device enumeration that it needs to do

because it already does it since it
doesn't trust the vendor BIOSes in a lot

of cases so I'm really glad that the
Congress has added a track on technical

resiliency and I would encourage Congress
to also add a track on resiliency of our

social systems because it's really vital
that we deal with both online and offline

harassment and I think that that will help
us make a safer and more secure Congress

as well.
<i>applause</i>

So last year when I presented at
Heads I proposed three criteria for a

resilient technical system: that they need
to be built with open-source software,

they need to be reproducibly built, and
they need to be measured into some sort of

cryptographic hardware. The open is, you
know, I think, in this crowd, is not

controversial. But the reason that we need
it is because a lot of the server vendors

don't actually control their own firmware;
they license it from independent BIOS

vendors who then tailor it for whatever
current model of machine the

manufacturer is making. This means that
they typically don't support older

hardware and, if there are
vulnerabilities, it's necessary that we be

able to make these patches on our own
schedule and we need to be able to self-

help when it comes to our own security.
The other problem is that closed source

systems can hide vulnerabilities for
decades — this is especially true for very

privileged devices like the management
engine. There's been several talks here at

Congress about the concerns that we have
with the management engine. Some vendors

are even violating our trust entirely and
using their place in the firmware

to install malware or adware onto the
systems. So for this reason we really need

our own control over this firmware.
Reproducibility is becoming much more of

an issue, and the goal here is to be able
to ensure that everyone who builds the

Linux boot firmware gets exactly the same
result that everyone else does. This is a

requirement to be able to ensure that
we're not introducing accidental

vulnerabilities through picking up the
wrong library, or intentional ones through

compiler supply chain attacks, such as Ken
Thompson's Trusting Trust article. With

the Linux boot firmware, our Kernel and
Initial Ramdisk are reproducibly built, so

we get exactly the same hashes on the
firmware. Unfortunately we don't control

the UEFI portions that we're using — the
PEI and the SEC phase — so those aren't

included in our reproducibility
right now. "Measured" is a another place

where we need to take into account the
runtime security of the system. So

reproducible builds handle the compile
time, but measuring what's running into

cryptographic coprocessors — like the TPM
— gives us the ability to make

attestations as to what is actually
running on the system. On the Heads

firmware we do this to the user that the
firmware can produce a one-time secret

that you can compare against your phone to
know that it has not been tampered with.

In the server case it uses remote
attestation to be able to prove to the

user that the code that is running is what
they expect. This is a collaboration with

the Mass Open Cloud Project, out of Boston
University and MIT, that is attempting to

provide a hardware root of trust for the
servers, so that you can know that a cloud

provider has not tampered with your
system. The TPM is not invulnerable, as

Christopher Tarnovsky showed at DEFCON,
but the level of effort that it takes to

break into a TPM, to decap it, and to read
out the bits with a microscope raises the

bar really significantly. And part of
resiliency is making honest trade-offs

about security threats versus the
difficulty in launching the attacks, and

if the TPM prevents remote attacks or
prevents software-only attacks, that is a

sufficiently high bar for a lot of these
applications. We have quite a bit of

ongoing research with this. As I mentioned
the management engine is an area of great

concern and we are working on figuring out
how to remove most of its capabilities, so

that it's not able to interfere with the
running system. There's another device in

most server motherboards called the board
management controller, or the BMC, that

has a similar level of access to memory
and devices. So we're concerned about

what's running on there, and there's a
project out of Facebook called OpenBMC

that is an open source Linux distribution
to run on that coprocessor, and what

Facebook has done through the Open Compute
Initiative is, they have their OEMs pre-

installing that on the new open compute
nodes, switches, and storage systems. And

this is really where we need to get with
Linux boot as well. Right now it requires

physical access to the SPI Flash and a
hardware programmer to be able to install.

That's not a hurdle for everyone, but this
is not something that we want people to be

doing in their server rooms. We want OEMs
to be providing these systems that are

secure by default so that it's not
necessary to break out your chip clip to

make this happen. But if you do want to
contribute, right now we support three

different main boards: The Intel S2600,
which is a modern Wolf Pass CPU, the Mass

Open Cloud is working with the Dell R630,
which is a Haswell, I believe, and then

Ron Minnich and John Murrie are working on
the Open Compute Hardware, and this is

again a — in conjunction with OpenBMC — a
real potential for having free software in

our firmware again. So, if you'd like more
info, we have a website. There's some

install instructions and we'd love to help
you build more secure, more flexible, and

more resilient systems. And I really want
to thank everyone for coming here today,

and I'd love to answer any questions that
you might have!

<i>applause</i>

Herald: Thank you very much Trammel Hudson

for this talk. We have 10 minutes for Q&amp;A,
so please line up at the microphones if

you have any questions but there are no
questions from the signal angel and the

internet, so please, microphone number
one.

Q: One quick question. Is Two Sigma using
this for any of their internal systems,

and B, and how much vendor outreach is in
there to try and make this beyond just the

Open Compute but also a lot of the vendors
that were on your slides to adopt this.

A: So currently, we don't have any deployed
systems taking advantage of it. It's still

very much at the research stage. I've
been spending quite a bit of time visiting

OEMs, and one of my goals for 2018 is to
have a mainstream OEM shipping it. The

Heads project is shipping firmware on some
laptops from Librem, and I'm hoping we can

get Linux boot on servers as well.
Herald: Microphone number 2, please.

Q: The question I have is about the size
of Linux. So you mention that there is

problems with UEFI, and it's not open
source, and stuff like that. But the issue

you mention is that the main part of Evo
UEFI is EDK, which is open source and

then, I mean, I just have to guess that
the HTTP client and stuff that they have

in the Apple boot, I assume it was, is for
downloading their firmware, but how is

replacing something that's huge with
something that's even bigger going to make

the thing more secure? Because I think the
the whole point of having a security

kernel is to have it really small to be
verifiable and I don't see that happening

with Linux, because at the same time
people are coming up with other things. I

don't remember the the other hypervisor,
which is supposed to be better than KVM,

because KVM is not really verifiable.
A: So that that's a great question. The

concern is that Linux is a huge TCB — a
Trusted Computing Base — and that that is

a big concern. Since we're already running
linux on the server, it essentially is

inside our TCB already, so it is large, it
is difficult to verify. However the

lessons that we've learned in porting
Linux to run in this environment make it

also very conceivable that we could build
other systems. If you want to use a

certified — excuse me, a verified
microkernel, that would be a great place

to bring into the firmware and I'd
love to figure out some way to make that

happen. The second question, just to point
out, that even though EDK 2 — which is the

open source components of UEFI are open
source — there's a huge amount of closed

source that goes into building a UEFI
firmware, and we can't verify the closed

source part, and even the open source
parts don't have the level of inspection

and correctness that the Linux kernel has
gone through, and Linux systems that are

exposed on the internet. Most of the UEFI
development is not focused on that level

of defense that Linux has to deal with
everyday.

H: Microphone number 2, please.
Q: Thank you for your talk. Would it be

possible also to support, apart from
servers, to support laptops? Especially

the one locked down by Boot Guard?
A: So the issue with Boot Guard on laptops

is that the CPU fuses are typically set in
what's called a Verified Boot Mode, and

that will not exit the boot guard ACM if
the firmware does not match the

manufacturer's hash. So this doesn't give
us any way to take advantage– to

circumvent that. Most server chipsets are
set in what's called Measured Boot Mode.

So the Boot Guard ACM just measures the
next stage into the TPM, and then jumps

into it. So if an attacker has modified
the firmware you will be able to detect it

during the attestation phase.
H: Microphone number one, please — just

one question.
Q: Thank you. On ARM it's much faster to

boot something. It's also much simpler:
You have an address, you load the bin

file, and it boots. On x86 is much more
complex, and the amount of codes you saw

was for GRUB relates to that. So my
question: I've seen Allwinner boards,

Cortex A8, booting in four seconds just to
get a shell, and six seconds to get a QT,

so the Linux kernel pretty QT app,
to do a dashboard for a car — so five to

six seconds. So I'm wondering why is there
such a big difference for a server to take

20 or 22 seconds? Is it the peripherals
that needs to be initialized or what's the

reason for it?
A: So there are several things that

contribute to the 20 seconds, and one of
the things that we're looking into is

trying to profile that. We're able to swap
out the PEI core and turn on a lot of

debugging. And what I've seen on the
Dell system, a lot of that time is spent

waiting for the Management Engine to come
online, and then there's also— it appears

to be a one second timeout for every CPU
in the system, that they bring the CPUs on

one at a time, and it takes almost
precisely 1 million microseconds for each

one. So there are things in the vendor
firmware that we currently don't have the

ability to change — that appear to be the
long tent, excuse me, long poll in the

tent on the boot process.
H: Microphone 3 in the back, please.

Q: You addressed a lot about security, but
my question is rather, there's a lot of

settings — for example BIOS, there's UEFI
settings, and there's stuff like remote

booting — which is a whole bunch of weird
protocols, proprietary stuff, and stuff

that's really hard to handle. If you have
a large installation, for example, you

can't just say: Okay deploy all my boot
orders for the BIOS settings. Are you

going to address that in some unified,
nice way, where I can say, okay I have

this one protocol that runs on my Linux
firmware that does that nicely.

A: That's exactly how most sites will
deploy it. That they will write their own

boot scripts that use traditional — excuse
me — that use normal protocols. So in the

Mass Open Cloud they are doing a wget over
SSL that can then measure the received

kernel into the TPM and then kexec it. And
that's done without requiring changes to

in-VRAM-variables, or all the sort of
setup that you have to put into to

configuring a UEFI system. That can be
replaced with a very small shell script.

H: We have time for one last question —
and this is from the Signal Angel, because

the internet has a question.
Q: Yes, the internet has two very simple

technical questions: Do you know if
there's any progress, or do you know if

any ATAs on the Talus 2 project. And are
there any size concerns when writing

firmware in Go?

A: So the Talus 2 project is
a Power based system, and right now we're

mostly focused on the x86 servers, since
that's the very mainstream available sorts

of boards, and the Go firmware is actually
quite small. I've mostly been working on

the Heads side, which is based on shell
scripts. My understanding is that the

just-in-time compiled Go does not add more
than a few hundred kilobytes to the ROM

image and only a few 100 milliseconds to
to the boot time. The advantage of Go is

that it is memory safe, and it's an actual
programming language, so it allows the

initialization scripts to be verified in a
way that shell scripts can be very

difficult to do.
H: So thank you very much for answering

all these questions. Please
give a warm round of applause to

Trammel Hudson. Thank you very much!
<i>applause</i>

<i>postroll music</i>

subtitles created by c3subtitles.de
in the year 2020. Join, and help us!