36C3 - Leaving legacy behind

Edit subtitles

0:00 - 0:19

36c3 intro
0:19 - 0:24

Herald: Good morning again. Thanks. First
off for today is by Hannes Mehnert. It's
0:24 - 0:29

titled "Leaving Legacy Behind". It's about
the reduction of carbon footprint through
0:29 - 0:33

micro kernels in MirageOS. Give a warm
welcome to Hannes.
0:33 - 0:39

Applause
0:39 - 0:45

Hannes Mehnert: Thank you. So let's talk a
bit about legacy, so legacy we had have.
0:45 - 0:50

Nowadays we run services usually on a Unix
based operating system, which is
0:50 - 0:55

demonstrated here on the left a bit the
layering. So at the lowest layer we have
0:55 - 1:01

the hardware. So some physical CPU, some
lock devices, maybe a network interface
1:01 - 1:07

card and maybe some memories, some non-
persistent memory. On top of that, we
1:07 - 1:14

usually run the Unix kernel. So to say.
That is marked here in brown which is
1:14 - 1:20

which consists of a filesystem. Then it
has a scheduler, it has some process
1:20 - 1:25

management that has network stacks. So the
TCP/IP stack, it also has some user
1:25 - 1:32

management and hardware and drivers. So it
has drivers for the physical hard drive,
1:32 - 1:38

for their network interface and so on.
The ground stuff. So the kernel runs in
1:38 - 1:46

privilege mode. It exposes a system called
API or and/or a socket API to the
1:46 - 1:52

actual application where we are there to
run, which is here in orange. So the
1:52 - 1:56

actual application is on top, which is the
application binary and may depend on some
1:56 - 2:03

configuration files distributed randomly
across the filesystem with some file
2:03 - 2:08

permissions set on. Then the application
itself also depends likely on a programming
2:08 - 2:14

language runtime that may either be a Java
virtual machine if you run Java or Python
2:14 - 2:20

interpreter if you run Python, or a ruby
interpreter if you run Ruby and so on.
2:20 - 2:25

Then additionally we usually have a system
library. Lip C which is just runtime
2:25 - 2:31

library basically of the C programming
language and it exposes a much nicer
2:31 - 2:38

interface than the system calls. We may as
well have open SSL or another crypto
2:38 - 2:45

library as part of the application binary
which is also here in Orange. So what's a
2:45 - 2:50

drop of the kernel? So the brown stuff
actually has a virtual memory subsystem
2:50 - 2:55

and it should separate the orange stuff
from each other. So you have multiple
2:55 - 3:02

applications running there and the brown
stuff is responsible to ensure that the
3:02 - 3:07

orange that different pieces of orange
stuff don't interfere with each other so
3:07 - 3:13

that they are not randomly writing into
each other's memory and so on. Now if the
3:13 - 3:17

orange stuff is compromised. So if you
have some attacker from the network or
3:17 - 3:27

from wherever else who's able to find a
flaw in the orange stuff, the kernel is still
3:27 - 3:32

responsible for strict isolation between
the orange stuff. So as long as the
3:32 - 3:38

attacker only gets access to the orange
stuff, it should be very well contained.
3:38 - 3:43

But then we look at the bridge between the
brown and orange stuff. So between kernel
3:43 - 3:49

and user space and there we have an API
which is roughly 600 system calls at
3:49 - 3:56

least on my FreeBSD machine here in Sys
calls. So it's 600 different functions or
3:56 - 4:05

the width of this API is 600 different
functions, which is quite big. And it's
4:05 - 4:12

quite easy to hide some flaws in there.
And as soon as you're able to find a flaw
4:12 - 4:17

in any of those system calls, you can
escalate your privileges and then you
4:17 - 4:22

basically run into brown moats and kernel
mode and you have access to the raw
4:22 - 4:26

physical hardware. And you can also read
arbitrary memory from any processor
4:26 - 4:34

running there. So now over the years it
actually evolved and we added some more
4:34 - 4:39

layers, which is hypervisors. So at the
lowest layer, we still have the hardware
4:39 - 4:46

stack, but on top of the hardware we now
have a hypervisor, which responsibility it
4:46 - 4:51

is to split the physical hardware into
pieces and slice it up and run different
4:51 - 4:57

virtual machines. So now we have the byte
stuff, which is the hypervisor. And on top
4:57 - 5:04

of that, we have multiple brown things and
multiple orange things as well. So now the
5:04 - 5:12

hypervisor is responsible for distributing
the CPUs to virtual machines. And the
5:12 - 5:17

memory to virtual machines and so on. It
is also responsible for selecting which
5:17 - 5:22

virtual machine to run on which physical
CPU. So it actually includes the scheduler
5:22 - 5:29

as well. And the hypervisors
responsibility is again to isolate the
5:29 - 5:34

different virtual machines from each
other. Initially, hypervisors were done
5:34 - 5:40

mostly in software. Nowadays, there are a
lot of CPU features available, which
5:40 - 5:47

allows you to have some CPU support, which
makes them fast, and you don't have to
5:47 - 5:52

trust so much software anymore, but you
have to trust in the hardware. So that's
5:52 - 6:00

extended page tables and VTD and VTX
stuff. OK, so that's the legacy we have
6:00 - 6:08

right now. So when you ship a binary, you
actually care about some tip of the
6:08 - 6:12

iceberg. That is the code you actually
write and you care about. You care about
6:12 - 6:19

deeply because it should work well and you
want to run it. But at the bottom you have
6:19 - 6:24

the sole operating system and that is the
code. The operating system insist that you
6:24 - 6:30

need it. So you can't get it without the
bottom of the iceberg. So you will always
6:30 - 6:35

have a process management and user
management and likely as well the
6:35 - 6:41

filesystem around on a UNIX system. Then
in addition, back in May, I think their
6:41 - 6:49

was a blog entry from someone who analyzed
from Google Project Zero, which is a
6:49 - 6:55

security research team and red team which
tries to fund a lot of flaws in vitally
6:55 - 7:02

use applications . And they found in a
year maybe 110 different vulnerabilities
7:02 - 7:08

which they reported and so on. And someone
analyzed what these 110 vulnerabilities
7:08 - 7:14

were about and it turned out that more
than two thirds of them, that the root
7:14 - 7:19

cause of the flaw was memory corruption.
And memory corruption means arbitrary
7:19 - 7:23

reads of rights from from arbitrary
memory, which a process that's not
7:23 - 7:30

supposed to be in. So why does that
happen? That happens because we on the
7:30 - 7:36

Unix system, we mainly use program
languages where we have tight control over
7:36 - 7:40

the memory management. So we do it
ourselves. So we allocate the memory
7:40 - 7:45

ourselves and we free it ourselves. There
is a lot of boilerplate we need to write
7:45 - 7:53

down and that is also a lot of boilerplate
which you can get wrong. So now we talked
7:53 - 7:58

a bit about legacy. Let's talk about the
goals of this talk. The goals is on the
7:58 - 8:07

one side to be more secure. So to reduce
the attack vectors because C and languages
8:07 - 8:12

like that from the 70s and we may have
some languages from the 80s or even from
8:12 - 8:18

the 90s who offer you automated memory
management and memory safety languages
8:18 - 8:25

such as Java or Rust or Python or
something like that. But it turns out not
8:25 - 8:30

many people are writing operating systems
in those languages. Another point here is
8:30 - 8:37

I want to reduce the attack surface. So we
have seen this huge stack here and I want
8:37 - 8:46

to minimize the orange and the brown part.
Then as an implication of that. I also
8:46 - 8:50

want to reduce the runtime complexity
because that is actually pretty cumbersome
8:50 - 8:56

to figure out what is now wrong. Why does
your application not start? And if the
8:56 - 9:02

whole reason is because some file on your
harddisk has the wrong filesystem
9:02 - 9:10

permissions, then it's pretty hard to
get across if you're not yet a Unix expert
9:10 - 9:17

who has a lift in the system for years or
at least months. And then the final goal,
9:17 - 9:22

thanks to the topic of this conference and
to some analysis I did, is to actually
9:22 - 9:30

reduce the carbon footprint. So if you run
a service, you certainly that service does
9:30 - 9:38

some computation and this computation
takes some CPU takes. So it takes some CPU
9:38 - 9:45

time in order to be evaluated. And now
reducing that means if you condense down
9:45 - 9:50

the complexity and the code size, we also
reduce the amount of computation which
9:50 - 9:58

needs to be done. These are the goals. So
what are MirageOS unikernels? That is
9:58 - 10:07

basically the project i have been involved
in since six years or so. The general idea
10:07 - 10:14

is that each service is isolated in a
separate MirageOS unikernel. So your DNS
10:14 - 10:20

resover or your web server don't run on
this general purpose UNIX system as a
10:20 - 10:26

process, but you have a separate virtual
machine for each of them. So you have one
10:26 - 10:31

unikernel which only does DNS resolution
and in that unikernel you don't even need
10:31 - 10:36

a user management. You don't even need
process management because there's only a
10:36 - 10:42

single process. There's a DNS resolver.
Actually, a DNS resolver also doesn't
10:42 - 10:47

really need a file system. So we got rid
of that. We also don't really need virtual
10:47 - 10:52

memory because we only have one process.
So we don't need virtual memory and we
10:52 - 10:57

just use a single address space. So
everything is mapped in a single address
10:57 - 11:03

space. We use program language called
OCaml, which is functional programming
11:03 - 11:08

language which provides us with memory
safety. So it has automated memory
11:08 - 11:17

measurement and we use this memory
management and the isolation, which the
11:17 - 11:24

program manager guarantees us by its type
system. We use that to say, okay, we can
11:24 - 11:28

all live in a single address space and
it'll still be safe as long as the
11:28 - 11:35

components are safe. And as long as we
minimize the components which are by
11:35 - 11:43

definition unsafe. So we need to run some
C code there as well. So in addition,
11:43 - 11:48

well. Now, if we have a single service, we
only put in the libraries or the stuff we
11:48 - 11:52

actually need in that service. So as I
mentioned that the DNS resolver won't need
11:52 - 11:57

a user management, it doesn't need a
shell. Why would I need to shell? What
11:57 - 12:03

should I need to do there? And so on. So
we have a lot of libraries, a lot of OCaml
12:03 - 12:10

libraries which are picked by the single
servers or which are mixed and matched for
12:10 - 12:14

the different services. So libraries are
developed independently of the whole
12:14 - 12:20

system or of the unikernel and are reused
across the different components or across
12:20 - 12:27

the different services. Some further
limitation which I take as freedom and
12:27 - 12:33

simplicity is not even we have a single
address space. We are also only focusing
12:33 - 12:38

on single core and have a single process.
So we don't have a process. We don't know
12:38 - 12:47

the concept of process yet. We also don't
work in a preemptive way. So preemptive
12:47 - 12:53

means that if you run on a CPU as a
function or as a program, you can at any
12:53 - 12:58

time be interrupted because something
which is much more important than you can
12:58 - 13:04

now get access to the CPU. And we don't do
that. We do co-operative tasks. So we are
13:04 - 13:09

never interrupted. We don't even have
interrupts. So there are no interrupts.
13:09 - 13:13

And as I mentioned, it's executed as a
virtual machine. So how does that look
13:13 - 13:18

like? So now we have the same picture as
previously. We have at the bottom the
13:18 - 13:23

hypervisor. Then we have the host system,
which is the brownish stuff. And on top of
13:23 - 13:30

that we have maybe some virtual machines.
Some of them run via KVM and qemu UNIX
13:30 - 13:35

system. Using some Virtio that is on the
right and on the left. And in the middle
13:35 - 13:42

we have this MirageOS as Unicode where we
and the whole system don't run any qemu,
13:42 - 13:50

but we run a minimized so-called tender,
which is this solo5-hvt monitor process.
13:50 - 13:55

So that's something which just tries to
allocate or will allocate some host system
13:55 - 14:02

resources for the virtual machine and then
does interaction with the virtual machine.
14:02 - 14:07

So what does this solo5-hvt do in this
case is to set up the memory, load the
14:07 - 14:12

unikernel image which is a statically
linked ELF binary and it sets up the
14:12 - 14:18

virtual CPU. So the CPU needs some
initialization and then booting is jumped
14:18 - 14:25

to an address. It's already in 64 bit mode.
There's no need to boot via 16 or 32 bit
14:25 - 14:34

modes. Now solo5-hvt and the MirageOS they
also have an interface and the interface
14:34 - 14:39

is called hyper calls and that interface
is rather small. So it only contains in
14:39 - 14:46

total 14 different functions. Which main
function yields a way to get the argument
14:46 - 14:53

vector clock. Actually, two clocks, one is
a POSIX clock, which takes care of this
14:53 - 14:58

whole time stamping and timezone business
and another one in a monotonic clock which
14:58 - 15:07

by its name guarantees that time will pass
monotonically. Then the other console
15:07 - 15:13

interface. The console interface is only
one way. So we only output data. We never
15:13 - 15:18

read from console. A block device. Well a
block devices and network interfaces and
15:18 - 15:26

that's all the hyper calls we have. To
look a bit further down into detail of how
15:26 - 15:35

a MirageOS unikernel looks like. Here I
pictured on the left again the tender at
15:35 - 15:41

the bottom, and then the hyper calls. And
then in pink I have the pieces of code
15:41 - 15:47

which still contain some C code and the
MirageOS unikernel. And in green I have
15:47 - 15:55

the pieces of code which does not include
any C code, but only OCaml code. So
15:55 - 16:00

looking at the C code which is dangerous
because in C we have to deal with memory
16:00 - 16:06

management on our own, which means it's a
bit brittle. We need to carefully review
16:06 - 16:11

that code. It is definitely the OCaml
runtime which we have here, which is round
16:11 - 16:19

25 thousand lines of code. Then we have a
library which is called nolibc which is
16:19 - 16:24

basically a C library which implements
malloc and string compare and some
16:24 - 16:29

basic functions which are needed by the
OCaml runtime. That's roughly 8000 lines
16:29 - 16:37

of code. That nolibc also provides a lot
of stops which just exit to or return
16:37 - 16:47

null for the OCaml runtime because we use
an unmodified OCaml runtime to be able to
16:47 - 16:51

upgrade our software more easily. We don't
have any patents for The OCaml runtime.
16:51 - 16:57

Then we have a library called
solo5-bindings, which is basically
16:57 - 17:03

something which translates into hyper
calls or which can access a hyper calls
17:03 - 17:08

and which communicates with the host
system via hyper calls. That is roughly
17:08 - 17:15

2000 lines of code. Then we have a math
library for sinus and cosinus and tangents
17:15 - 17:21

and so on. And that is just the openlibm
which is originally from the freeBSD
17:21 - 17:27

project and has roughly 20000 lines of
code. So that's it. So I talked a bit
17:27 - 17:32

about solo5, about the bottom layer and I
will go a bit more into detail about the
17:32 - 17:40

solo5 stuff, which is really the stuff
which you run at the bottom
17:40 - 17:46

of the MirageOS. There's another choice.
You can also run Xen or Qubes OS at
17:46 - 17:51

the bottom of the MirageOS unikernel. But
I'm focusing here mainly on solo5. So
17:51 - 17:57

solo5 has a sandbox execution environment
for unikernels. It handles resources from
17:57 - 18:04

the host system, but only aesthetically.
So you say at startup time how much memory
18:04 - 18:09

it will take. How many network interfaces
and which ones are taken and how many
18:09 - 18:14

block devices and which ones are taken by
the virtual machine. You don't have any
18:14 - 18:19

dynamic resource management, so you can't
add at a later point in time a new network
18:19 - 18:28

interface. That's just not supported. And it
makes the code much easier. We don't even
18:28 - 18:36

have dynamic allocation inside of
solo5. We have a hyper cool interface. As I
18:36 - 18:42

mentioned, it's only 14 functions. We have
bindings for different targets. So we can
18:42 - 18:50

run on KVM, which is hypervisor developed
for the Linux project, but also for
18:50 - 18:57

Beehive, which is a free BSD hypervisor or
VMM which is openBSD hypervisor. We also
18:57 - 19:02

target other systems such as the g-node,
wich is an operating system, based on a
19:02 - 19:09

micro kernel written mainly in C++,
virtio, which is a protocol usually spoken
19:09 - 19:15

between the host system and the guest
system, and virtio is used in a lot of
19:15 - 19:23

cloud deployments. So it's OK. So qemu for
example, provides you with a virtio
19:23 - 19:29

protocol implementation. And a last
implementation of solo5 or bindings for
19:29 - 19:39

solo5 is seccomb. So Linux seccomb is a
filter in the Linux kernel where you can
19:39 - 19:47

restrict your process that will only use a
certain number or a certain amount of
19:47 - 19:54

system calls and we use seccomb so you can
deploy it without virtual machine in the
19:54 - 20:02

second case, but you are restricted to
which system calls you can use. So solo5
20:02 - 20:06

also provides you with the host system
tender where applicable. So in the virtio
20:06 - 20:12

case it not applicable. In the g-note case
it is also not applicable. In KVM we
20:12 - 20:19

already saw the solo5 HVT, wich is a
hardware virtualized tender. Which is just
20:19 - 20:26

a small binary because if you run qemu at
least hundreds of thousands of lines of
20:26 - 20:36

code in the solo5 HVT case, it's more like
thousands of lines of code. So here we
20:36 - 20:43

have a comparison from left to right of
solo5 and how the host system or the host
20:43 - 20:49

system kernel and the guest system works.
In the middle we have a virtual machine, a
20:49 - 20:54

common Linux qemu KVM based virtual
machine for example, and on the right hand
20:54 - 21:00

we have the host system and the container.
Container is also a technology where you
21:00 - 21:08

try to restrict as much access as you can
from process. So it is contained and the
21:08 - 21:15

potential compromise is also very isolated
and contained. On the left hand side we
21:15 - 21:21

see that solo5 is basically some bits and
pieces in the host system. So is the solo5
21:21 - 21:27

HVT and then some bits and pieces in
Unikernel. So is the solo5 findings I
21:27 - 21:31

mentioned earlier. And that is to
communicate between the host and the guest
21:31 - 21:37

system. In the middle we see that the API
between the host system and the virtual
21:37 - 21:41

machine. It's much bigger than this. And
commonly using virtio and virtio is really
21:41 - 21:49

a huge protocol which does feature
negotiation and all sorts of things where
21:49 - 21:54

you can always do something wrong, like
you can do something wrong and a floppy
21:54 - 21:59

disk driver. And that led to some
exploitable vulnerability, although
21:59 - 22:04

nowadays most operating systems don't
really need a floppy disk drive anymore.
22:04 - 22:08

And on the right hand side, you can see
that the whole system interface for a
22:08 - 22:13

container is much bigger than for a
virtual machine because the whole system
22:13 - 22:18

interface for a container is exactly those
system calls you saw earlier. So it's run
22:18 - 22:24

600 different calls. And in order to
evaluate the security, you need basically
22:24 - 22:33

to audit all of them. So that's just a
brief comparison between those. If we look
22:33 - 22:38

into more detail, what solo5 what shapes
it can have here on the left side. We can
22:38 - 22:43

see it running in a hardware virtualized
tender, which is you have the Linux
22:43 - 22:50

freebies, your openBSD at the bottom and
you have solo5 blob, which is a blue thing
22:50 - 22:55

here in the middle. And then on top you
have the unikernel. On the right hand side
22:55 - 23:03

you can see the Linux satcom process and
you have a much smaller solo5 blob because
23:03 - 23:07

it doesn't need to do that much anymore,
because all the hyper calls are basically
23:07 - 23:12

translated to system calls. So you
actually get rid of them and you don't
23:12 - 23:17

need to communicate between the host and
the guest system because in seccomb you
23:17 - 23:23

run as a whole system process so you don't
have this virtualization. The advantage of
23:23 - 23:29

using seccomb as well, but you can deploy
it without having access to virtualization
23:29 - 23:38

features of the CPU. Now to get it in even
smaller shape. There's another backend I
23:38 - 23:43

haven't talked to you about. It's called
the Muen. It's a separation kernel
23:43 - 23:51

developed in Ada. So you basically ... so
now we try to get rid of this huge Unix
23:51 - 23:58

system below it. Which is this big kernel
thingy here. And Muen is an open source
23:58 - 24:03

project developed in Switzerland in Ada,
as I mentioned, and that uses SPARK, which
24:03 - 24:13

is proof system, which guarantees the
memory isolation between the different
24:13 - 24:20

components. And Muen now goes a step
further and it says, "Oh yeah. For you as
24:20 - 24:24

a guest system, you don't do static
allocations and you don't do dynamic
24:24 - 24:28

resource management." We as a host system,
we as a hypervisor, we don't do any
24:28 - 24:34

dynamic resource allocation as well. So it
only does static resource management. So
24:34 - 24:39

at compile time of your Muen separation
kernel you decide how many virtual
24:39 - 24:44

machines or how many unikernels you are
running and which resources are given to
24:44 - 24:50

them. You even specify which communication
channels are there. So if one of your
24:50 - 24:56

virtual machines needs to talk to another
one, you need to specify that at
24:56 - 25:01

compile time and at runtime you don't have
any dynamic resource management. So that
25:01 - 25:09

again makes the code much easier, much,
much less complex. And you get to much
25:09 - 25:19

fewer lines of code. So to conclude with
this Mirage and how this and also the Muen
25:19 - 25:26

and solo5. And how that is. I like to cite
Antoine: "Perfection is achieved, not when
25:26 - 25:32

there is nothing more to add, but when
there is nothing left to take away." I
25:32 - 25:37

mean obviously the most secure system is a
system which doesn't exist.
25:37 - 25:40

Laughter
25:40 - 25:42

Let's look a bit further
25:42 - 25:46

into the decisions of MirageOS.
Why do you use this strange
25:46 - 25:51

programming language called OCaml and
what's it all about? And what are the case
25:51 - 25:59

studies? So OCaml has been around since
more than 20 years. It's a multi paradigm
25:59 - 26:06

programming language. The goal for us and
for OCaml is usually to have declarative
26:06 - 26:14

code. To achieve declarative code you need
to provide the developers with some
26:14 - 26:21

orthogonal abstraction facilities such as
here we have variables then functions you
26:21 - 26:25

likely know if you're a software
developer. Also higher order functions. So
26:25 - 26:32

that just means that the function is able
to take a function as input. Then in OCaml
26:32 - 26:37

we tried to always focus on the problem
and do not distract with boilerplate. So
26:37 - 26:44

some running example again would be this
memory management. We don't manually deal
26:44 - 26:53

with that, but we have computers to
actually deal with that. In OCaml you have
26:53 - 27:00

a very expressive and static type system,
which can spot a lot of invariance or
27:00 - 27:07

violation of invariance at build time.
So the program won't compile if you don't
27:07 - 27:14

handle all the potential return types or
return values of your function. So now a
27:14 - 27:20

type system, you know, you may know it
from Java is a bit painful. If you have to
27:20 - 27:24

express at every location where you want
to have a variable, which type this
27:24 - 27:32

variable is. What OCaml provides is type
inference similar to Scala and other
27:32 - 27:38

languages. So you don't need to type all
the types manually. And types are also
27:38 - 27:44

unlike in Java. Types are erased during
compilation. So types are only information
27:44 - 27:49

about values the compiler has at compile
time. But at runtime these are all erased
27:49 - 27:55

so they don't exist. You don't see them.
And OCaml compiles to native machine code,
27:55 - 28:02

which I think is important for security
and performance. Because otherwise you run
28:02 - 28:07

an interpreter or an abstract machine and
you have to emulate something else and
28:07 - 28:15

that is never as fast as you can. OCaml
has one distinct feature, which is its
28:15 - 28:21

module system. So you have all your
values, which types or functions. And now
28:21 - 28:27

each of those values is defined inside of
a so-called module. And the simplest
28:27 - 28:33

module is just the filename. But you can
nest modules so you can explicitly say, oh
28:33 - 28:40

yeah, this value or this binding is now
living in a sub module here off. So each
28:40 - 28:45

module you can also give it a type. So it
has a set of types and a set of functions
28:45 - 28:53

and that is called its signature, which is
the interface of the module. Now you have
28:53 - 29:00

another abstraction mechanism in OCaml
which is functors. And functors are
29:00 - 29:04

basically compile time functions from
module to module. So they allow a
29:04 - 29:10

pyramidisation. Like you can implement
your generic map structure and all you
29:10 - 29:19

require. So map is just a hash map or a
implementation is maybe a binary tree. And
29:19 - 29:26

you need to have is some comparison for
the keys and that is modeled in OCaml by
29:26 - 29:32

module. So you have a module called map
and you have a functor called make. And the
29:32 - 29:38

make takes some module which implements
this comparison method and then provides
29:38 - 29:46

you with map data structure for that key
type. And then MirageOS we actually use a
29:46 - 29:52

module system quite a bit more because we
have all these resources which are
29:52 - 29:58

different between Xen and KVM and so on.
So each of the different resources like a
29:58 - 30:07

network interface has a signature. OK, and
target specific implementation. So we have
30:07 - 30:11

the TCP/IP stack, which is much higher
than the network card, but it doesn't
30:11 - 30:17

really care if you run on Xen or if you
run on KVM. You just program against this
30:17 - 30:22

abstract interface against the interface
of the network device. But you don't need
30:22 - 30:28

to program. You don't need to write in
your TCP/IP stack any code to run on Xen
30:28 - 30:38

or to run on KVM. So MirageOS also
doesn't really use the complete OCaml
30:38 - 30:44

programming language. OCaml also provides
you with an object system and we barely
30:44 - 30:50

use that. We also have in MirageOS... well
OCaml also allows you for with mutable
30:50 - 30:58

state. And we barely used that mutable
state, but we use mostly immutable data
30:58 - 31:05

whenever sensible. We also have a value
passing style, so we put state and data as
31:05 - 31:12

inputs. So stage is just some abstract
state and data is just a byte vector
31:12 - 31:17

in a protocol implementation. And then the
output is also a new state which may be
31:17 - 31:22

modified and some reply maybe so some
other byte vector or some application
31:22 - 31:32

data. Or the output data may as well be an
error because the incoming data and state
31:32 - 31:38

may be invalid or might maybe violate some
some constraints. And errors are also
31:38 - 31:44

explicitly types, so they are declared in
the API and the call of a function needs
31:44 - 31:52

to handle all these errors explicitly. As
I said, the single core, but we have some
31:52 - 32:01

promise based or some even based
concurrent programming stuff. And yeah, we
32:01 - 32:04

have the ability to express a really
strong and variants like this is a read-
32:04 - 32:08

only buffer in the type system. And the
type system is, as I mentioned, only
32:08 - 32:15

compile time, no runtime overhead. So it's
all pretty nice and good. So let's take a
32:15 - 32:21

look at some of the case studies. The
first one is unikernel. So it's called the
32:21 - 32:30

Bitcoin Pinata. It started in 2015 when we
were happy with from the scratch developed
32:30 - 32:35

TLS stack. TLS is transport layer
security. So what use if you browse to
32:35 - 32:42

HTTPS. So we have an TLS stack in OCaml
and we wanted to do some marketing for
32:42 - 32:51

that. Bitcoin Pinata is basically
unikernel which uses TLS and provides you
32:51 - 32:58

with TLS endpoints, and it contains the
private key for a bitcoin wallet which is
32:58 - 33:06

filled with, which used to be filled with
10 bitcoins. And this means it's a
33:06 - 33:11

security bait. So if you can compromise
the system itself, you get the private key
33:11 - 33:16

and you can do whatever you want with it.
And being on this bitcoin block chain, it
33:16 - 33:23

also means it's transparent so everyone
can see that that has been hacked or not.
33:23 - 33:30

Yeah and it has been online since three years
and it was not hacked. But the bitcoin we
33:30 - 33:36

got were only borrowed from friends of us
and they were then reused in other
33:36 - 33:40

projects. It's still online. And you can
see here on the right that we had some
33:40 - 33:50

HTTP traffic, like an aggregate of maybe
600,000 hits there. Now I have a size
33:50 - 33:55

comparison of the Bitcoin Pinata on the
left. You can see the unikernel, which is
33:55 - 34:00

less than 10 megabytes in size or in
source code it's maybe a hundred thousand
34:00 - 34:06

lines of code. On the right hand side you
have a very similar thing, but running as
34:06 - 34:16

a Linux service so it runs an openSSL S
server, which is a minimal TLS server you
34:16 - 34:23

can get basically on a Linux system using
openSSL. And there we have mainly maybe a
34:23 - 34:29

size of 200 megabytes and maybe two
million two lines of code. So that's
34:29 - 34:36

roughly a vector of 25. In other examples,
we even got a bit less code, much bigger
34:36 - 34:45

effect. Performance analysis I showed that
... Well, in 2015 we did some evaluation
34:45 - 34:51

of our TLS stack and it turns out we're in
the same ballpark as other
34:51 - 34:57

implementations. Another case study is
CalDAV server, which we developed last
34:57 - 35:05

year with a grant from Prototypefund which
is a German government funding. It is
35:05 - 35:09

intolerable with other clients. It stores
data in a remote git repository. So we
35:09 - 35:14

don't use any block device or persistent
storage, but we store it in a git
35:14 - 35:19

repository so whenever you add the
calendar event, it does actually a git
35:19 - 35:25

push. And we also recently got some
integration with CalDAV web, which is a
35:25 - 35:31

JavaScript user interface doing in
JavaScript, doing a user interface. And we
35:31 - 35:37

just bundle that with the thing. It's
online, open source, there is a demo
35:37 - 35:42

server and the data repository online.
Yes, some statistics and I zoom in
35:42 - 35:48

directly to the CPU usage. So we had the
luck that we for half of a month, we used
35:48 - 35:56

it as a process on a freeBSD system. And
that happened roughly the first half until
35:56 - 36:01

here. And then at some point we thought,
oh, yeah, let's migrated it to MirageOS
36:01 - 36:06

unikernel and don't run the freeBSD system
below it. And you can see here on the x
36:06 - 36:11

axis the time. So that was the month of
June, starting with the first of June on
36:11 - 36:17

the left and the last of June on the
right. And on the y axis, you have the
36:17 - 36:23

number of CPU seconds here on the left or
the number of CPU ticks here on the right.
36:23 - 36:29

The CPU ticks are virtual CPU ticks
which debug counters from the hypervisor.
36:29 - 36:33

So from beehive and freeBSD here in that
system. And what you can see here is this
36:33 - 36:39

massive drop by a factor of roughly 10.
And that is when we switched from a Unix
36:39 - 36:46

virtual machine with the process to a
freestanding Unikernel. So we actually use
36:46 - 36:51

much less resources. And if we look into
the bigger picture here, we also see that
36:51 - 36:58

the memory dropped by a factor of 10 or
even more. This is now a logarithmic scale
36:58 - 37:03

here on the y axis, the network bandwidth
increased quite a bit because now we do
37:03 - 37:10

all the monitoring traffic, also via net
interface and so on. Okay, that's CalDAV.
37:10 - 37:17

Another case study is authoritative DNS
servers. And I just recently wrote a
37:17 - 37:22

tutorial on that. Which I will skip
because I'm a bit short on time. Another
37:22 - 37:27

case study is a firewall for QubesOS.
QubesOS is a reasonable, secure operating
37:27 - 37:33

system which uses Xen for isolation of
workspaces and applications such as PDF
37:33 - 37:39

reader. So whenever you receive a PDF, you
start your virtual machine, which is only
37:39 - 37:48

run once and you, well which is just run to
open and read your PDF. And Qubes Mirage
37:48 - 37:54

firewall is now small or a tiny
replacement for the Linux based firewall
37:54 - 38:02

written in OCaml now. And instead of
roughly 300mb, you only use 32mb
38:02 - 38:09

of memory. There's now also recently
some support for dynamic firewall rules
38:09 - 38:17

as defined by Qubes 4.0. And that is not
yet merged into master, but it's under
38:17 - 38:23

review. Libraries in MirageOS yeah we have
since we write everything from scratch and
38:23 - 38:30

in OCaml we don't have now. We don't have
every protocol, but we have quite a few
38:30 - 38:35

protocols. There are also more unikernels
right now, which you can see here in the
38:35 - 38:42

slides. Also online in the Fahrplan so you
can click on the links later. Repeaters
38:42 - 38:48

were built. So for security purposes we
don't get shipped binaries. But I plan to
38:48 - 38:52

ship binaries and in order to ship
binaries. I don't want to ship non
38:52 - 38:57

reputable binaries. What is reproducible
builds? Well it means that if you have the
38:57 - 39:06

same source code, you should get the
binary identical output. And issues are
39:06 - 39:15

temporary filenames and timestamps and so
on. In December we managed in MirageOS to
39:15 - 39:21

get some tooling on track to actually test
the reproducibility of unikernels and we
39:21 - 39:28

fixed some issues and now all the tests in
MirageOS unikernels reporducable, which
39:28 - 39:34

are basically most of them from this list.
Another topic, a supply chain security,
39:34 - 39:42

which is important, I think, and we have
this is still a work in progress. We still
39:42 - 39:49

haven't deployed that widely. But there
are some test repositories out there to
39:49 - 39:57

provide more, to provide signatures signed
by the actual authors of a library and
39:57 - 40:03

getting you across until the use of the
library can verify that. And some
40:03 - 40:09

decentralized authorization and delegation
of that. What about deployment? Well, in
40:09 - 40:16

conventional orchestration systems such as
Kubernetes and so on. We don't yet have
40:16 - 40:24

a proper integration of MirageOS, but we
would like to get some proper integration
40:24 - 40:32

there. If you already generate some
libvirt.xml files from Mirage. So for each
40:32 - 40:38

unikernel you get the libvirt.xml and you
can do that and run that in your libvirt
40:38 - 40:45

based orchestration system. For Xen, we
also generate those .xl and .xe files,
40:45 - 40:50

which I personally don't really
know much about, but that's it. On the
40:50 - 40:56

other side, I developed an orchestration
system called Albatross because I was a
40:56 - 41:03

bit worried if I now have those tiny
unikernels which are megabytes in size
41:03 - 41:09

and now I should trust the big Kubernetes,
which is maybe a million lines of code
41:09 - 41:16

running on the host system with
privileges. So I thought, oh well let's
41:16 - 41:21

try to come up with a minimal
orchestration system which allows me some
41:21 - 41:27

console access. So I want to see the debug
messages or whenever it fails to boot I
41:27 - 41:32

want to see the output of the console.
Want to get some metrics like the Graphana
41:32 - 41:39

screenshot you just saw. And that's
basically it. Then since I developed also
41:39 - 41:45

a TLS stack, I thought, oh yeah, well why
not just use it for remote deployment? So
41:45 - 41:51

in TLS you have mutual authentication, you
can have client certificates and
41:51 - 41:57

certificate itself is more or less an
authenticated key value store because you
41:57 - 42:04

have those extensions and X 509 version 3
and you can put arbitrary data in there
42:04 - 42:09

with keys being so-called object
identifiers and values being whatever
42:09 - 42:17

else. TLS certificates have this great
advantage that or X 509 certificates have
42:17 - 42:24

the advantage that during a TLS handshake
they are transferred on the wire in not
42:24 - 42:34

base64 or PEM encoding as you usually see
them, but in basic encoding which is much
42:34 - 42:41

nicer to the amount of bits you transfer.
So it's not transferred in base64, but
42:41 - 42:46

directly in raw basically. And with
Alabtross you can basically do a TLS
42:46 - 42:51

handshake and in that client certificate
you present, you already have the
42:51 - 42:58

unikernel image and the name and the boot
arguments and you just deploy it directly.
42:58 - 43:04

You can alter an X 509. You have a chain
of certificate authorities, which you send
43:04 - 43:09

with and this chain of certificate
authorities also contain some extensions
43:09 - 43:15

in order to specify which policies are
active. So how many virtual machines are
43:15 - 43:22

you able to deploy on my system? How much
memory you you have access to and which
43:22 - 43:27

bridges or which network interfaces you
have access to? So Albatross is really a
43:27 - 43:34

minimal orchestration system running as a
family of Unix processes. It's maybe 3000
43:34 - 43:41

lines of code or so. OCaml code. But using
then the TLS stack and so on. But yeah, it
43:41 - 43:47

seems to work pretty well. I at least use
it for more than two dozen unikernels at
43:47 - 43:52

any point in time. What about the
community? Well the whole MirageOS project
43:52 - 43:58

started around 2008 at University of
Cambridge, so it used to be a research
43:58 - 44:04

project with which still has a lot of
ongoing student projects at University of
44:04 - 44:11

Cambridge. But now it's an open source
permissive license, mostly BSD licensed
44:11 - 44:21

thing, where we have community event every
half a year and a retreat in Morocco where
44:21 - 44:26

we also use our own unikernels like the
DHTP server and the DNS resolve and so on.
44:26 - 44:32

We just use them to test them and to see
how does it behave and does it work for
44:32 - 44:40

us? We have quite a lot of open source
computer contributors from all over and
44:40 - 44:46

some of the MirageOS libraries have also
been used or are still used in this Docker
44:46 - 44:52

technology, Docker for Mac and Docker for
Windows, which emulates the guest system
44:52 - 45:02

or which needs some wrappers. And there is
a lot of OCaml code is used. So to finish
45:02 - 45:07

my talk, I would like to have another
side, which is that Rome wasn't built in a
45:07 - 45:15

day. So where we are is to conclude here
we have a radical approach to operating
45:15 - 45:22

systems development. We have a security
from the ground up with much fewer code
45:22 - 45:30

and we also have much fewer attack vectors
because we use a memory safe
45:30 - 45:39

language. So we have reduced the carbon
footprint, as I mentioned in the start of
45:39 - 45:46

the talk, because we use much less CPU
time, but also much less memory. So we use
45:46 - 45:53

less resources. MirageOS itself and O'Caml
have a reasonable performance. We have
45:53 - 45:57

seen some statistics about the TLS stack
that it was in the same ballpark as
45:57 - 46:06

OpenSSL and PolarSSL, which is nowadays
MBed TLS, and MirageOS unikernels, since
46:06 - 46:11

they don't really need to negotiate
features and wait for the Scottie Pass and
46:11 - 46:15

so on. They actually do it in
milliseconds, not in seconds, so they do
46:15 - 46:22

not hardware probing and so on. But they
know that startup time what they expect. I
46:22 - 46:27

would like to thank everybody who is and
was involved in this whole technology
46:27 - 46:33

stack because I myself I program quite a
bit of O'Caml, but I wouldn't have been
46:33 - 46:39

able to do that on my own. It is just a
bit too big. MirageOS currently spends
46:39 - 46:45

around maybe 200 different git
repositories with the libraries, mostly
46:45 - 46:52

developed on GitHub and open source. I
am at the moment working in a nonprofit
46:52 - 46:57

company in Germany, which is called the
Center for the Cultivation of Technology
46:57 - 47:03

with a project called robur. So we work in
a collective way to develop full-stack
47:03 - 47:08

MirageOS unikernels. That's why I'm happy
to do that from Dublin. And if you're
47:08 - 47:14

interested, please talk to us. I have some
selected related talks, there are much
47:14 - 47:21

more talks about MirageOS. But here is
just a short list of something, if you're
47:21 - 47:30

interested in some certain aspects, please
help yourself to view them.
47:30 - 47:32

That's all from me.
47:32 - 47:37

Applause
47:37 - 47:46

Herald: Thank you very much. There's a bit
over 10 minutes of time for questions. If
47:46 - 47:50

you have any questions go to the
microphone. There's several microphones
47:50 - 47:54

around the room. Go ahead.
Question: Thank you very much for the talk
47:54 - 47:57

-
Herald: Writ of order. Thanking the
47:57 - 48:01

speaker can be done afterwards. Questions
are questions, so short sentences ending
48:01 - 48:06

with a question mark. Sorry, do go ahead.
Question: If I want to try this at home,
48:06 - 48:09

what do I need? Is a raspi sufficient? No,
it isn't.
48:09 - 48:15

Hannes: That is an excellent question. So
I usually develop it on such a thinkpad
48:15 - 48:23

machine, but we actually support also
ARM64 mode. So if you have a Raspberry Pi
48:23 - 48:29

3+, which I think has the virtualization
bits and the Linux kernel, which is reason
48:29 - 48:35

enough to support KVM on that Raspberry Pi
3+, then you can try it out there.
48:35 - 48:42

Herald: Next question.
Question: Well, currently most MirageOS
48:42 - 48:52

unikernels are used for running server
applications. And so obviously this all
48:52 - 48:58

static preconfiguration of OCaml and
maybe Ada SPARK is fine for that. But what
48:58 - 49:04

do you think about... Will it ever be
possible to use the same approach with all
49:04 - 49:10

this static reconfiguration for these very
dynamic end user desktop systems, for
49:10 - 49:15

example, like which at least currently use
quite a lot of plug-and-play.
49:15 - 49:19

Hannes: Do you have an example? What are
you thinking about?
49:19 - 49:26

Question: Well, I'm not that much into
the topic of its SPARK stuff, but you said
49:26 - 49:32

that all the communication's paths have to
be defined in advance. So especially with
49:32 - 49:38

plug-and-play devices like all this USB
stuff, we either have to allow everything
49:38 - 49:47

in advance or we may have to reboot parts
of the unikernels in between to allow
49:47 - 49:55

rerouting stuff.
Hannes: Yes. Yes. So I mean if you want to
49:55 - 50:01

design a USB plug-and-play system, you can
think of it as you plug in somewhere the
50:01 - 50:08

USB stick and then you start the unikernel
which only has access to that USB stick.
50:08 - 50:15

But having a unikernel... Well I wouldn't
design a unikernel which randomly does
50:15 - 50:24

plug and play with the the outer world,
basically. So. And one of the applications
50:24 - 50:31

I've listed here is at the top is a
picture viewer, which is a unikernel that
50:31 - 50:37

also at the moment, I think has static
embedded data in it. But is able on Qubes
50:37 - 50:44

OS or on Unix and SDL to display the
images and you can think of some way we
50:44 - 50:49

are a network or so to access the images
actually. So you didn't need to compile
50:49 - 50:54

the images in, but you can have a good
repository or TCP server or whatever in
50:54 - 51:01

order to receive the images. So I am
saying. So what I didn't mention is that
51:01 - 51:06

MirageOS instead of being general purpose
and having a shell and you can do
51:06 - 51:11

everything with it, it is that each
service, each unikernel is a single
51:11 - 51:17

service thing. So you can't do everything
with it. And I think that is an advantage
51:17 - 51:23

from a lot of points of view. I agree
that if you have a highly dynamic system,
51:23 - 51:28

that you may have some trouble on how to
integrate that.
51:28 - 51:39

Herald: Are there any other questions?
No, it appears not. In which case,
51:39 - 51:41

thank you again, Hannes.
Warm applause for Hannes.
51:41 - 51:45

Applause
51:45 - 51:49

Outro music
51:49 - 52:12

subtitles created by c3subtitles.de
in the year 2020. Join, and help us!

Title:: 36C3 - Leaving legacy behind
Description:: more » « less
Video Language:: English
Duration:: 52:12

	C3Subtitles edited English subtitles for 36C3 - Leaving legacy behind
	Tamis edited English subtitles for 36C3 - Leaving legacy behind
	Tamis edited English subtitles for 36C3 - Leaving legacy behind
	Tamis edited English subtitles for 36C3 - Leaving legacy behind
	Tamis edited English subtitles for 36C3 - Leaving legacy behind
	Tamis edited English subtitles for 36C3 - Leaving legacy behind
	Tamis edited English subtitles for 36C3 - Leaving legacy behind
	Tamis edited English subtitles for 36C3 - Leaving legacy behind

Show all

English subtitles

Revisions

Revision 10 Edited

C3Subtitles

36C3 - Leaving legacy behind

Revisions

Our website uses cookies

Operating cookies (Required)