Herald Angel: So, most of the cloud services rely on closed source proprietary server firmware; header[?] firmware; with known security implications to[?] tenants. Imagine... So that's where LinuxBoot comes to the rescue because it wants to replace this closed source firmware with an open Linux boot version and our next speaker Trammell Hudson he's an integral part of that project and he's here to provide you an overview of this Linux boot project. Thank you very much and please give a warm round round of applause to Trammell Hudson, please! applause Trammell: Thank you! Securing the boot process is really fundamental to having secure systems because of the vulnerabilities in firmware can affect any security that the operating system tries to provide. And for that reason I think it's really important that we replace the proprietary vendor firmwares with open source, like Linux. And this is not a new idea. My collaborator Ron Minich started a project called LinuxBIOS back in the 90s when he was at Los Alamos National Labs. They built the world's third fastest supercomputer out of a Linux cluster that used BIOS in the ROM to make it more reliable. LinuxBIOS turned into coreboot in 2005 and the Linux part was removed and became a generic bootloader, and it now powers the Chromebooks as well as projects like the Heads slightly more secure laptop firmware that I presented last year at CCC. Unfortunately it doesn't support any server main boards anymore. Most servers are running a variant of Intel's UEFI firmware, which is a project that Intel started to replace the somewhat aging 16-bit real mode BIOS of the 80s and 90s. And, like a lot of second systems, it's pretty complicated. If you've been to any talks on firmware security you've probably seen this slide before. It goes through multiple phases as the system boots, the first phase does a cryptographic verification of the pre-EFI phase this the PEI phase is responsible for bringing up the memory controller the CPU interconnect and a few other critical devices. It also enables paging and long mode and then jumps into the device execution environment or DXE phase. This is where UEFI option ROMs are executed as well, as well as all of the remaining devices are initialized. Once the PCI bus and USB buses have been walked and enumerated, it transfers to the boot device selection phase, which figures out which disk or USB stick or network to boot from. That loads a boot loader from that device which eventually loads a real operating system, which then is the operating system that's running on the machine. What we're proposing is that we replace all of this with the Linux boot kernel and runtime. We can do all of the device enumeration in Linux, it already has support for doing this, and then we can use more sophisticated protocols and tools to locate the real kernel that we want to run, and use the kexec system call to be able to start that new kernel. And the reason we want to use Linux here is because it gives us the ability to have a more secure system. It gives us a lot more flexibility and hopefully it lets us create a more resilient system out of it. On the security front one of the big areas that we get some benefit is we reduced the attack surface that in the DXE phase these drivers are an enormous amount of code on the Intel S2600 there are over 400 modules that get loaded. They do things like the option roms that I mentioned, and if you want an example of how dangerous option roms can be, you can look at my Thunderstrike talks from a few years ago. They also do things like display the boot splash, the vendor logo, and this has been a place where quite a few buffer overflows have been found in vendor firmwares in the past. They have a complete network stack IPv4 and v6 as well as HTTP and HTTPS. They have legacy device drivers for things like floppy drives, and again, these sort of dusty corners are where vulnerabilities in Xen have been found that allowed a hypervisor break. There are also modules like the Microsoft OEM activation that we just don't know what they do, or things like a y2k rollover module that probably hasn't been tested in two decades. So the final OS bootloader phase is actually not part of UEFI, but it's typically in, the Linux system, its GRUB, the grand unified bootloader. And y'all -- many of you are probably familiar with its interface, but did you know that it has its own file system, video, and network drivers. About almost 250 thousand lines of code make up GRUB. I don't bring up the size of this to complain about the space it takes, but because of how much it increases our attack surface. You might think that having three different operating systems involved in this boot process gives us a defense in depth, but I would argue that we are subject to the weakest link in this chain because if you can compromise UEFI, you can compromise grub, and if you can compromise grub you can compromise the Linux kernel that you want to run on the machine. So there are lots of ways these attacks could be launched. As I mentioned UEFI has a network device driver, grub has a network device driver, and of course Linux has a network device driver. This means that a remote attacker could potentially get code execution during the boot process. UEFI has a USB driver, grub has a USB driver, and of course Linux has a USB driver. There have been bugs found in USB stacks -- which unfortunately are very complex -- and a buffer overflow in a USB descriptor handler could allow a local attacker to plug in a rogue device and take control of the firmware during the boot. Of course UEFI has a FAT driver, GRUB has a FAT driver, Linux has a FAT driver. This gives an attacker a place to gain persistence and perhaps leverage code execution during the initial file system or partition walk. So what we argue is that we should have the operating system that has the most contributors, and the most code review, and the most frequent update schedule, for these roles. Linux has a lot more eyes on it, it undergoes a much more rapid update schedule than pretty much any vendor firmware. You might ask, why do we keep the PEI and the SEC phase from the UEFI firmware? Couldn't we use coreboot in this place, and the problem is that vendors are not documenting the memory controller or the CPU interconnect. Instead they're providing a opaque binary blob called the firmware support package, or FSP, that does the memory controller and the CPU initialization. On most coreboot systems -- on most modern coreboot systems -- coreboot actually calls into the FSP to do this initialization. And on a lot of the devices the FSB has grown in scope so it now includes video device drivers and power management, and it's actually larger than the PEI phase on some of the servers that we're dealing with. The other wrinkle is that most modern CPUs don't come out of reset into the legacy reset vector anymore. Instead, they execute an authenticated code module, called boot guard, that's signed by Intel, and the CPU will not start up if that's not present. The good news is that this boot guard ACM measures the PEI phase into the TPM, which allows us to detect attempts to modify it from malicious attacks. The bad news is that we are not able to change it on many of these systems. But even with that in place, we still have a much, much more flexible system. If you've ever worked with the UEFI shell or with GRUBs menu config, it's really not as flexible, and the tooling is not anywhere near as mature, as being able to write things with shell scripts, or with go, or with real languages. Additionally we can configure at the Linux boot kernel with standard Linux config tools. UEFI supports booting from FAT file systems, but with LinuxBoot we can boot from any of the hundreds of file systems that Linux supports. We can boot from encrypted filesystems, since we have LUKS and cryptsetup. Most UEFI firmwares can only boot from the network device that is installed on the server motherboard. We can boot from any network device that Linux supports, and we can use proper protocols; we're not limited to PXE and TFTP. We can use SSL, we can do cryptographic measurements of the kernels that we receive. And the runtime that makes up Linux boot is also very flexible. Last year I presented the Heads runtime for laptops. This is a very security focused initial ram disk that attempts to provide a slightly more secure, measured, and attested firmware, and this works really well with Linux boot. My collaborator Ron Minnich is working on a go based firmware, called NERF, and this is written entirely in just-in-time compiled Go, which is really nice because it gives you memory safety, and is very popular inside of Google. Being able to tailor the device drivers that are included also allows the system to boot much faster. UEFI on the Open Compute Winterfell takes about eight minutes to startup. With NERF, excuse me -- with with Linux boot and NERF it starts up in about 20 seconds. I found similar results on the Intel mainboard that I'm working on and hopefully we will get a video there's an action this is from power-on to executes the PEI phase out of the ROM and then jumps into a small wrapper around the Linux kernel which then prints to the serial port and we now have the Linux print case and we have an interactive shell in about 20 seconds which is quite a bit better than the four minutes that the system used to take it scrolled by pretty fast but you might have noticed that the print case has ... - that the Linux kernel thinks it's running under EFI this because we have a small wrapper around the kernel but for the most part the kernel is able to do all of the PCI and device enumeration that it needs to do because it already does it since it doesn't trust the vendor BIOSes in a lot of cases so I'm really glad that the Congress has added a track on technical resiliency and I would encourage Congress to also add a track on resiliency of our social systems because it's really vital that we deal with both online and offline harassment and I think that that will help us make a safer and more secure Congress as well. applause So last year when I presented at Heads I proposed three criteria for a resilient technical system: that they need to be built with open-source software, they need to be reproducibly built, and they need to be measured into some sort of cryptographic hardware. The open is, you know, I think, in this crowd, is not controversial. But the reason that we need it is because a lot of the server vendors don't actually control their own firmware; they license it from independent BIOS vendors who then tailor it for whatever current model of machine the manufacturer is making. This means that they typically don't support older hardware and, if there are vulnerabilities, it's necessary that we be able to make these patches on our own schedule and we need to be able to self- help when it comes to our own security. The other problem is that closed source systems can hide vulnerabilities for decades — this is especially true for very privileged devices like the management engine. There's been several talks here at Congress about the concerns that we have with the management engine. Some vendors are even violating our trust entirely and using their place in the firmware to install malware or adware onto the systems. So for this reason we really need our own control over this firmware. Reproducibility is becoming much more of an issue, and the goal here is to be able to ensure that everyone who builds the Linux boot firmware gets exactly the same result that everyone else does. This is a requirement to be able to ensure that we're not introducing accidental vulnerabilities through picking up the wrong library, or intentional ones through compiler supply chain attacks, such as Ken Thompson's Trusting Trust article. With the Linux boot firmware, our Kernel and Initial Ramdisk are reproducibly built, so we get exactly the same hashes on the firmware. Unfortunately we don't control the UEFI portions that we're using — the PEI and the SEC phase — so those aren't included in our reproducibility right now. "Measured" is a another place where we need to take into account the runtime security of the system. So reproducible builds handle the compile time, but measuring what's running into cryptographic coprocessors — like the TPM — gives us the ability to make attestations as to what is actually running on the system. On the Heads firmware we do this to the user that the firmware can produce a one-time secret that you can compare against your phone to know that it has not been tampered with. In the server case it uses remote attestation to be able to prove to the user that the code that is running is what they expect. This is a collaboration with the Mass Open Cloud Project, out of Boston University and MIT, that is attempting to provide a hardware root of trust for the servers, so that you can know that a cloud provider has not tampered with your system. The TPM is not invulnerable, as Christopher Tarnovsky showed at DEFCON, but the level of effort that it takes to break into a TPM, to decap it, and to read out the bits with a microscope raises the bar really significantly. And part of resiliency is making honest trade-offs about security threats versus the difficulty in launching the attacks, and if the TPM prevents remote attacks or prevents software-only attacks, that is a sufficiently high bar for a lot of these applications. We have quite a bit of ongoing research with this. As I mentioned the management engine is an area of great concern and we are working on figuring out how to remove most of its capabilities, so that it's not able to interfere with the running system. There's another device in most server motherboards called the board management controller, or the BMC, that has a similar level of access to memory and devices. So we're concerned about what's running on there, and there's a project out of Facebook called OpenBMC that is an open source Linux distribution to run on that coprocessor, and what Facebook has done through the Open Compute Initiative is, they have their OEMs pre- installing that on the new open compute nodes, switches, and storage systems. And this is really where we need to get with Linux boot as well. Right now it requires physical access to the SPI Flash and a hardware programmer to be able to install. That's not a hurdle for everyone, but this is not something that we want people to be doing in their server rooms. We want OEMs to be providing these systems that are secure by default so that it's not necessary to break out your chip clip to make this happen. But if you do want to contribute, right now we support three different main boards: The Intel S2600, which is a modern Wolf Pass CPU, the Mass Open Cloud is working with the Dell R630, which is a Haswell, I believe, and then Ron Minnich and John Murrie are working on the Open Compute Hardware, and this is again a — in conjunction with OpenBMC — a real potential for having free software in our firmware again. So, if you'd like more info, we have a website. There's some install instructions and we'd love to help you build more secure, more flexible, and more resilient systems. And I really want to thank everyone for coming here today, and I'd love to answer any questions that you might have! applause Herald: Thank you very much Trammel Hudson for this talk. We have 10 minutes for Q&A, so please line up at the microphones if you have any questions but there are no questions from the signal angel and the internet, so please, microphone number one. Q: One quick question. Is Two Sigma using this for any of their internal systems, and B, and how much vendor outreach is in there to try and make this beyond just the Open Compute but also a lot of the vendors that were on your slides to adopt this. A: So currently, we don't have any deployed systems taking advantage of it. It's still very much at the research stage. I've been spending quite a bit of time visiting OEMs, and one of my goals for 2018 is to have a mainstream OEM shipping it. The Heads project is shipping firmware on some laptops from Librem, and I'm hoping we can get Linux boot on servers as well. Herald: Microphone number 2, please. Q: The question I have is about the size of Linux. So you mention that there is problems with UEFI, and it's not open source, and stuff like that. But the issue you mention is that the main part of Evo UEFI is EDK, which is open source and then, I mean, I just have to guess that the HTTP client and stuff that they have in the Apple boot, I assume it was, is for downloading their firmware, but how is replacing something that's huge with something that's even bigger going to make the thing more secure? Because I think the the whole point of having a security kernel is to have it really small to be verifiable and I don't see that happening with Linux, because at the same time people are coming up with other things. I don't remember the the other hypervisor, which is supposed to be better than KVM, because KVM is not really verifiable. A: So that that's a great question. The concern is that Linux is a huge TCB — a Trusted Computing Base — and that that is a big concern. Since we're already running linux on the server, it essentially is inside our TCB already, so it is large, it is difficult to verify. However the lessons that we've learned in porting Linux to run in this environment make it also very conceivable that we could build other systems. If you want to use a certified — excuse me, a verified microkernel, that would be a great place to bring into the firmware and I'd love to figure out some way to make that happen. The second question, just to point out, that even though EDK 2 — which is the open source components of UEFI are open source — there's a huge amount of closed source that goes into building a UEFI firmware, and we can't verify the closed source part, and even the open source parts don't have the level of inspection and correctness that the Linux kernel has gone through, and Linux systems that are exposed on the internet. Most of the UEFI development is not focused on that level of defense that Linux has to deal with everyday. H: Microphone number 2, please. Q: Thank you for your talk. Would it be possible also to support, apart from servers, to support laptops? Especially the one locked down by Boot Guard? A: So the issue with Boot Guard on laptops is that the CPU fuses are typically set in what's called a Verified Boot Mode, and that will not exit the boot guard ACM if the firmware does not match the manufacturer's hash. So this doesn't give us any way to take advantage– to circumvent that. Most server chipsets are set in what's called Measured Boot Mode. So the Boot Guard ACM just measures the next stage into the TPM, and then jumps into it. So if an attacker has modified the firmware you will be able to detect it during the attestation phase. H: Microphone number one, please — just one question. Q: Thank you. On ARM it's much faster to boot something. It's also much simpler: You have an address, you load the bin file, and it boots. On x86 is much more complex, and the amount of codes you saw was for GRUB relates to that. So my question: I've seen Allwinner boards, Cortex A8, booting in four seconds just to get a shell, and six seconds to get a QT, so the Linux kernel pretty QT app, to do a dashboard for a car — so five to six seconds. So I'm wondering why is there such a big difference for a server to take 20 or 22 seconds? Is it the peripherals that needs to be initialized or what's the reason for it? A: So there are several things that contribute to the 20 seconds, and one of the things that we're looking into is trying to profile that. We're able to swap out the PEI core and turn on a lot of debugging. And what I've seen on the Dell system, a lot of that time is spent waiting for the Management Engine to come online, and then there's also— it appears to be a one second timeout for every CPU in the system, that they bring the CPUs on one at a time, and it takes almost precisely 1 million microseconds for each one. So there are things in the vendor firmware that we currently don't have the ability to change — that appear to be the long tent, excuse me, long poll in the tent on the boot process. H: Microphone 3 in the back, please. Q: You addressed a lot about security, but my question is rather, there's a lot of settings — for example BIOS, there's UEFI settings, and there's stuff like remote booting — which is a whole bunch of weird protocols, proprietary stuff, and stuff that's really hard to handle. If you have a large installation, for example, you can't just say: Okay deploy all my boot orders for the BIOS settings. Are you going to address that in some unified, nice way, where I can say, okay I have this one protocol that runs on my Linux firmware that does that nicely. A: That's exactly how most sites will deploy it. That they will write their own boot scripts that use traditional — excuse me — that use normal protocols. So in the Mass Open Cloud they are doing a wget over SSL that can then measure the received kernel into the TPM and then kexec it. And that's done without requiring changes to in-VRAM-variables, or all the sort of setup that you have to put into to configuring a UEFI system. That can be replaced with a very small shell script. H: We have time for one last question — and this is from the Signal Angel, because the internet has a question. Q: Yes, the internet has two very simple technical questions: Do you know if there's any progress, or do you know if any ATAs on the Talus 2 project. And are there any size concerns when writing firmware in Go? A: So the Talus 2 project is a Power based system, and right now we're mostly focused on the x86 servers, since that's the very mainstream available sorts of boards, and the Go firmware is actually quite small. I've mostly been working on the Heads side, which is based on shell scripts. My understanding is that the just-in-time compiled Go does not add more than a few hundred kilobytes to the ROM image and only a few 100 milliseconds to to the boot time. The advantage of Go is that it is memory safe, and it's an actual programming language, so it allows the initialization scripts to be verified in a way that shell scripts can be very difficult to do. H: So thank you very much for answering all these questions. Please give a warm round of applause to Trammel Hudson. Thank you very much! applause postroll music subtitles created by c3subtitles.de in the year 2020. Join, and help us!