< Return to Video

36C3 - Leaving legacy behind

  • 0:00 - 0:19
    36c3 intro
  • 0:19 - 0:24
    Herald: Good morning again. Thanks. First
    off for today is by Hannes Mehnert. It's
  • 0:24 - 0:29
    titled "Leaving Legacy Behind". It's about
    the reduction of carbon footprint through
  • 0:29 - 0:33
    micro kernels in MirageOS. Give a warm
    welcome to Hannes.
  • 0:33 - 0:39
    Applause
  • 0:39 - 0:45
    Hannes Mehnert: Thank you. So let's talk a
    bit about legacy, so legacy we had have.
  • 0:45 - 0:50
    Nowadays we run services usually on a Unix
    based operating system, which is
  • 0:50 - 0:55
    demonstrated here on the left a bit the
    layering. So at the lowest layer we have
  • 0:55 - 1:01
    the hardware. So some physical CPU, some
    lock devices, maybe a network interface
  • 1:01 - 1:07
    card and maybe some memories, some non-
    persistent memory. On top of that, we
  • 1:07 - 1:14
    usually run the Unix kernel. So to say.
    That is marked here in brown which is
  • 1:14 - 1:20
    which consists of a filesystem. Then it
    has a scheduler, it has some process
  • 1:20 - 1:25
    management that has network stacks. So the
    TCP/IP stack, it also has some user
  • 1:25 - 1:32
    management and hardware and drivers. So it
    has drivers for the physical hard drive,
  • 1:32 - 1:38
    for their network interface and so on.
    The ground stuff. So the kernel runs in
  • 1:38 - 1:46
    privilege mode. It exposes a system called
    API or and/or a socket API to the
  • 1:46 - 1:52
    actual application where we are there to
    run, which is here in orange. So the
  • 1:52 - 1:56
    actual application is on top, which is the
    application binary and may depend on some
  • 1:56 - 2:03
    configuration files distributed randomly
    across the filesystem with some file
  • 2:03 - 2:08
    permissions set on. Then the application
    itself also depends likely on a programming
  • 2:08 - 2:14
    language runtime that may either be a Java
    virtual machine if you run Java or Python
  • 2:14 - 2:20
    interpreter if you run Python, or a ruby
    interpreter if you run Ruby and so on.
  • 2:20 - 2:25
    Then additionally we usually have a system
    library. Lip C which is just runtime
  • 2:25 - 2:31
    library basically of the C programming
    language and it exposes a much nicer
  • 2:31 - 2:38
    interface than the system calls. We may as
    well have open SSL or another crypto
  • 2:38 - 2:45
    library as part of the application binary
    which is also here in Orange. So what's a
  • 2:45 - 2:50
    drop of the kernel? So the brown stuff
    actually has a virtual memory subsystem
  • 2:50 - 2:55
    and it should separate the orange stuff
    from each other. So you have multiple
  • 2:55 - 3:02
    applications running there and the brown
    stuff is responsible to ensure that the
  • 3:02 - 3:07
    orange that different pieces of orange
    stuff don't interfere with each other so
  • 3:07 - 3:13
    that they are not randomly writing into
    each other's memory and so on. Now if the
  • 3:13 - 3:17
    orange stuff is compromised. So if you
    have some attacker from the network or
  • 3:17 - 3:27
    from wherever else who's able to find a
    flaw in the orange stuff, the kernel is still
  • 3:27 - 3:32
    responsible for strict isolation between
    the orange stuff. So as long as the
  • 3:32 - 3:38
    attacker only gets access to the orange
    stuff, it should be very well contained.
  • 3:38 - 3:43
    But then we look at the bridge between the
    brown and orange stuff. So between kernel
  • 3:43 - 3:49
    and user space and there we have an API
    which is roughly 600 system calls at
  • 3:49 - 3:56
    least on my FreeBSD machine here in Sys
    calls. So it's 600 different functions or
  • 3:56 - 4:05
    the width of this API is 600 different
    functions, which is quite big. And it's
  • 4:05 - 4:12
    quite easy to hide some flaws in there.
    And as soon as you're able to find a flaw
  • 4:12 - 4:17
    in any of those system calls, you can
    escalate your privileges and then you
  • 4:17 - 4:22
    basically run into brown moats and kernel
    mode and you have access to the raw
  • 4:22 - 4:26
    physical hardware. And you can also read
    arbitrary memory from any processor
  • 4:26 - 4:34
    running there. So now over the years it
    actually evolved and we added some more
  • 4:34 - 4:39
    layers, which is hypervisors. So at the
    lowest layer, we still have the hardware
  • 4:39 - 4:46
    stack, but on top of the hardware we now
    have a hypervisor, which responsibility it
  • 4:46 - 4:51
    is to split the physical hardware into
    pieces and slice it up and run different
  • 4:51 - 4:57
    virtual machines. So now we have the byte
    stuff, which is the hypervisor. And on top
  • 4:57 - 5:04
    of that, we have multiple brown things and
    multiple orange things as well. So now the
  • 5:04 - 5:12
    hypervisor is responsible for distributing
    the CPUs to virtual machines. And the
  • 5:12 - 5:17
    memory to virtual machines and so on. It
    is also responsible for selecting which
  • 5:17 - 5:22
    virtual machine to run on which physical
    CPU. So it actually includes the scheduler
  • 5:22 - 5:29
    as well. And the hypervisors
    responsibility is again to isolate the
  • 5:29 - 5:34
    different virtual machines from each
    other. Initially, hypervisors were done
  • 5:34 - 5:40
    mostly in software. Nowadays, there are a
    lot of CPU features available, which
  • 5:40 - 5:47
    allows you to have some CPU support, which
    makes them fast, and you don't have to
  • 5:47 - 5:52
    trust so much software anymore, but you
    have to trust in the hardware. So that's
  • 5:52 - 6:00
    extended page tables and VTD and VTX
    stuff. OK, so that's the legacy we have
  • 6:00 - 6:08
    right now. So when you ship a binary, you
    actually care about some tip of the
  • 6:08 - 6:12
    iceberg. That is the code you actually
    write and you care about. You care about
  • 6:12 - 6:19
    deeply because it should work well and you
    want to run it. But at the bottom you have
  • 6:19 - 6:24
    the sole operating system and that is the
    code. The operating system insist that you
  • 6:24 - 6:30
    need it. So you can't get it without the
    bottom of the iceberg. So you will always
  • 6:30 - 6:35
    have a process management and user
    management and likely as well the
  • 6:35 - 6:41
    filesystem around on a UNIX system. Then
    in addition, back in May, I think their
  • 6:41 - 6:49
    was a blog entry from someone who analyzed
    from Google Project Zero, which is a
  • 6:49 - 6:55
    security research team and red team which
    tries to fund a lot of flaws in vitally
  • 6:55 - 7:02
    use applications . And they found in a
    year maybe 110 different vulnerabilities
  • 7:02 - 7:08
    which they reported and so on. And someone
    analyzed what these 110 vulnerabilities
  • 7:08 - 7:14
    were about and it turned out that more
    than two thirds of them, that the root
  • 7:14 - 7:19
    cause of the flaw was memory corruption.
    And memory corruption means arbitrary
  • 7:19 - 7:23
    reads of rights from from arbitrary
    memory, which a process that's not
  • 7:23 - 7:30
    supposed to be in. So why does that
    happen? That happens because we on the
  • 7:30 - 7:36
    Unix system, we mainly use program
    languages where we have tight control over
  • 7:36 - 7:40
    the memory management. So we do it
    ourselves. So we allocate the memory
  • 7:40 - 7:45
    ourselves and we free it ourselves. There
    is a lot of boilerplate we need to write
  • 7:45 - 7:53
    down and that is also a lot of boilerplate
    which you can get wrong. So now we talked
  • 7:53 - 7:58
    a bit about legacy. Let's talk about the
    goals of this talk. The goals is on the
  • 7:58 - 8:07
    one side to be more secure. So to reduce
    the attack vectors because C and languages
  • 8:07 - 8:12
    like that from the 70s and we may have
    some languages from the 80s or even from
  • 8:12 - 8:18
    the 90s who offer you automated memory
    management and memory safety languages
  • 8:18 - 8:25
    such as Java or Rust or Python or
    something like that. But it turns out not
  • 8:25 - 8:30
    many people are writing operating systems
    in those languages. Another point here is
  • 8:30 - 8:37
    I want to reduce the attack surface. So we
    have seen this huge stack here and I want
  • 8:37 - 8:46
    to minimize the orange and the brown part.
    Then as an implication of that. I also
  • 8:46 - 8:50
    want to reduce the runtime complexity
    because that is actually pretty cumbersome
  • 8:50 - 8:56
    to figure out what is now wrong. Why does
    your application not start? And if the
  • 8:56 - 9:02
    whole reason is because some file on your
    harddisk has the wrong filesystem
  • 9:02 - 9:10
    permissions, then it's pretty hard to
    get across if you're not yet a Unix expert
  • 9:10 - 9:17
    who has a lift in the system for years or
    at least months. And then the final goal,
  • 9:17 - 9:22
    thanks to the topic of this conference and
    to some analysis I did, is to actually
  • 9:22 - 9:30
    reduce the carbon footprint. So if you run
    a service, you certainly that service does
  • 9:30 - 9:38
    some computation and this computation
    takes some CPU takes. So it takes some CPU
  • 9:38 - 9:45
    time in order to be evaluated. And now
    reducing that means if you condense down
  • 9:45 - 9:50
    the complexity and the code size, we also
    reduce the amount of computation which
  • 9:50 - 9:58
    needs to be done. These are the goals. So
    what are MirageOS unikernels? That is
  • 9:58 - 10:07
    basically the project i have been involved
    in since six years or so. The general idea
  • 10:07 - 10:14
    is that each service is isolated in a
    separate MirageOS unikernel. So your DNS
  • 10:14 - 10:20
    resover or your web server don't run on
    this general purpose UNIX system as a
  • 10:20 - 10:26
    process, but you have a separate virtual
    machine for each of them. So you have one
  • 10:26 - 10:31
    unikernel which only does DNS resolution
    and in that unikernel you don't even need
  • 10:31 - 10:36
    a user management. You don't even need
    process management because there's only a
  • 10:36 - 10:42
    single process. There's a DNS resolver.
    Actually, a DNS resolver also doesn't
  • 10:42 - 10:47
    really need a file system. So we got rid
    of that. We also don't really need virtual
  • 10:47 - 10:52
    memory because we only have one process.
    So we don't need virtual memory and we
  • 10:52 - 10:57
    just use a single address space. So
    everything is mapped in a single address
  • 10:57 - 11:03
    space. We use program language called
    OCaml, which is functional programming
  • 11:03 - 11:08
    language which provides us with memory
    safety. So it has automated memory
  • 11:08 - 11:17
    measurement and we use this memory
    management and the isolation, which the
  • 11:17 - 11:24
    program manager guarantees us by its type
    system. We use that to say, okay, we can
  • 11:24 - 11:28
    all live in a single address space and
    it'll still be safe as long as the
  • 11:28 - 11:35
    components are safe. And as long as we
    minimize the components which are by
  • 11:35 - 11:43
    definition unsafe. So we need to run some
    C code there as well. So in addition,
  • 11:43 - 11:48
    well. Now, if we have a single service, we
    only put in the libraries or the stuff we
  • 11:48 - 11:52
    actually need in that service. So as I
    mentioned that the DNS resolver won't need
  • 11:52 - 11:57
    a user management, it doesn't need a
    shell. Why would I need to shell? What
  • 11:57 - 12:03
    should I need to do there? And so on. So
    we have a lot of libraries, a lot of OCaml
  • 12:03 - 12:10
    libraries which are picked by the single
    servers or which are mixed and matched for
  • 12:10 - 12:14
    the different services. So libraries are
    developed independently of the whole
  • 12:14 - 12:20
    system or of the unikernel and are reused
    across the different components or across
  • 12:20 - 12:27
    the different services. Some further
    limitation which I take as freedom and
  • 12:27 - 12:33
    simplicity is not even we have a single
    address space. We are also only focusing
  • 12:33 - 12:38
    on single core and have a single process.
    So we don't have a process. We don't know
  • 12:38 - 12:47
    the concept of process yet. We also don't
    work in a preemptive way. So preemptive
  • 12:47 - 12:53
    means that if you run on a CPU as a
    function or as a program, you can at any
  • 12:53 - 12:58
    time be interrupted because something
    which is much more important than you can
  • 12:58 - 13:04
    now get access to the CPU. And we don't do
    that. We do co-operative tasks. So we are
  • 13:04 - 13:09
    never interrupted. We don't even have
    interrupts. So there are no interrupts.
  • 13:09 - 13:13
    And as I mentioned, it's executed as a
    virtual machine. So how does that look
  • 13:13 - 13:18
    like? So now we have the same picture as
    previously. We have at the bottom the
  • 13:18 - 13:23
    hypervisor. Then we have the host system,
    which is the brownish stuff. And on top of
  • 13:23 - 13:30
    that we have maybe some virtual machines.
    Some of them run via KVM and qemu UNIX
  • 13:30 - 13:35
    system. Using some Virtio that is on the
    right and on the left. And in the middle
  • 13:35 - 13:42
    we have this MirageOS as Unicode where we
    and the whole system don't run any qemu,
  • 13:42 - 13:50
    but we run a minimized so-called tender,
    which is this solo5-hvt monitor process.
  • 13:50 - 13:55
    So that's something which just tries to
    allocate or will allocate some host system
  • 13:55 - 14:02
    resources for the virtual machine and then
    does interaction with the virtual machine.
  • 14:02 - 14:07
    So what does this solo5-hvt do in this
    case is to set up the memory, load the
  • 14:07 - 14:12
    unikernel image which is a statically
    linked ELF binary and it sets up the
  • 14:12 - 14:18
    virtual CPU. So the CPU needs some
    initialization and then booting is jumped
  • 14:18 - 14:25
    to an address. It's already in 64 bit mode.
    There's no need to boot via 16 or 32 bit
  • 14:25 - 14:34
    modes. Now solo5-hvt and the MirageOS they
    also have an interface and the interface
  • 14:34 - 14:39
    is called hyper calls and that interface
    is rather small. So it only contains in
  • 14:39 - 14:46
    total 14 different functions. Which main
    function yields a way to get the argument
  • 14:46 - 14:53
    vector clock. Actually, two clocks, one is
    a POSIX clock, which takes care of this
  • 14:53 - 14:58
    whole time stamping and timezone business
    and another one in a monotonic clock which
  • 14:58 - 15:07
    by its name guarantees that time will pass
    monotonically. Then the other console
  • 15:07 - 15:13
    interface. The console interface is only
    one way. So we only output data. We never
  • 15:13 - 15:18
    read from console. A block device. Well a
    block devices and network interfaces and
  • 15:18 - 15:26
    that's all the hyper calls we have. To
    look a bit further down into detail of how
  • 15:26 - 15:35
    a MirageOS unikernel looks like. Here I
    pictured on the left again the tender at
  • 15:35 - 15:41
    the bottom, and then the hyper calls. And
    then in pink I have the pieces of code
  • 15:41 - 15:47
    which still contain some C code and the
    MirageOS unikernel. And in green I have
  • 15:47 - 15:55
    the pieces of code which does not include
    any C code, but only OCaml code. So
  • 15:55 - 16:00
    looking at the C code which is dangerous
    because in C we have to deal with memory
  • 16:00 - 16:06
    management on our own, which means it's a
    bit brittle. We need to carefully review
  • 16:06 - 16:11
    that code. It is definitely the OCaml
    runtime which we have here, which is round
  • 16:11 - 16:19
    25 thousand lines of code. Then we have a
    library which is called nolibc which is
  • 16:19 - 16:24
    basically a C library which implements
    malloc and string compare and some
  • 16:24 - 16:29
    basic functions which are needed by the
    OCaml runtime. That's roughly 8000 lines
  • 16:29 - 16:37
    of code. That nolibc also provides a lot
    of stops which just exit to or return
  • 16:37 - 16:47
    null for the OCaml runtime because we use
    an unmodified OCaml runtime to be able to
  • 16:47 - 16:51
    upgrade our software more easily. We don't
    have any patents for The OCaml runtime.
  • 16:51 - 16:57
    Then we have a library called
    solo5-bindings, which is basically
  • 16:57 - 17:03
    something which translates into hyper
    calls or which can access a hyper calls
  • 17:03 - 17:08
    and which communicates with the host
    system via hyper calls. That is roughly
  • 17:08 - 17:15
    2000 lines of code. Then we have a math
    library for sinus and cosinus and tangents
  • 17:15 - 17:21
    and so on. And that is just the openlibm
    which is originally from the freeBSD
  • 17:21 - 17:27
    project and has roughly 20000 lines of
    code. So that's it. So I talked a bit
  • 17:27 - 17:32
    about solo5, about the bottom layer and I
    will go a bit more into detail about the
  • 17:32 - 17:40
    solo5 stuff, which is really the stuff
    which you run at the bottom
  • 17:40 - 17:46
    of the MirageOS. There's another choice.
    You can also run Xen or Qubes OS at
  • 17:46 - 17:51
    the bottom of the MirageOS unikernel. But
    I'm focusing here mainly on solo5. So
  • 17:51 - 17:57
    solo5 has a sandbox execution environment
    for unikernels. It handles resources from
  • 17:57 - 18:04
    the host system, but only aesthetically.
    So you say at startup time how much memory
  • 18:04 - 18:09
    it will take. How many network interfaces
    and which ones are taken and how many
  • 18:09 - 18:14
    block devices and which ones are taken by
    the virtual machine. You don't have any
  • 18:14 - 18:19
    dynamic resource management, so you can't
    add at a later point in time a new network
  • 18:19 - 18:28
    interface. That's just not supported. And it
    makes the code much easier. We don't even
  • 18:28 - 18:36
    have dynamic allocation inside of
    solo5. We have a hyper cool interface. As I
  • 18:36 - 18:42
    mentioned, it's only 14 functions. We have
    bindings for different targets. So we can
  • 18:42 - 18:50
    run on KVM, which is hypervisor developed
    for the Linux project, but also for
  • 18:50 - 18:57
    Beehive, which is a free BSD hypervisor or
    VMM which is openBSD hypervisor. We also
  • 18:57 - 19:02
    target other systems such as the g-node,
    wich is an operating system, based on a
  • 19:02 - 19:09
    micro kernel written mainly in C++,
    virtio, which is a protocol usually spoken
  • 19:09 - 19:15
    between the host system and the guest
    system, and virtio is used in a lot of
  • 19:15 - 19:23
    cloud deployments. So it's OK. So qemu for
    example, provides you with a virtio
  • 19:23 - 19:29
    protocol implementation. And a last
    implementation of solo5 or bindings for
  • 19:29 - 19:39
    solo5 is seccomb. So Linux seccomb is a
    filter in the Linux kernel where you can
  • 19:39 - 19:47
    restrict your process that will only use a
    certain number or a certain amount of
  • 19:47 - 19:54
    system calls and we use seccomb so you can
    deploy it without virtual machine in the
  • 19:54 - 20:02
    second case, but you are restricted to
    which system calls you can use. So solo5
  • 20:02 - 20:06
    also provides you with the host system
    tender where applicable. So in the virtio
  • 20:06 - 20:12
    case it not applicable. In the g-note case
    it is also not applicable. In KVM we
  • 20:12 - 20:19
    already saw the solo5 HVT, wich is a
    hardware virtualized tender. Which is just
  • 20:19 - 20:26
    a small binary because if you run qemu at
    least hundreds of thousands of lines of
  • 20:26 - 20:36
    code in the solo5 HVT case, it's more like
    thousands of lines of code. So here we
  • 20:36 - 20:43
    have a comparison from left to right of
    solo5 and how the host system or the host
  • 20:43 - 20:49
    system kernel and the guest system works.
    In the middle we have a virtual machine, a
  • 20:49 - 20:54
    common Linux qemu KVM based virtual
    machine for example, and on the right hand
  • 20:54 - 21:00
    we have the host system and the container.
    Container is also a technology where you
  • 21:00 - 21:08
    try to restrict as much access as you can
    from process. So it is contained and the
  • 21:08 - 21:15
    potential compromise is also very isolated
    and contained. On the left hand side we
  • 21:15 - 21:21
    see that solo5 is basically some bits and
    pieces in the host system. So is the solo5
  • 21:21 - 21:27
    HVT and then some bits and pieces in
    Unikernel. So is the solo5 findings I
  • 21:27 - 21:31
    mentioned earlier. And that is to
    communicate between the host and the guest
  • 21:31 - 21:37
    system. In the middle we see that the API
    between the host system and the virtual
  • 21:37 - 21:41
    machine. It's much bigger than this. And
    commonly using virtio and virtio is really
  • 21:41 - 21:49
    a huge protocol which does feature
    negotiation and all sorts of things where
  • 21:49 - 21:54
    you can always do something wrong, like
    you can do something wrong and a floppy
  • 21:54 - 21:59
    disk driver. And that led to some
    exploitable vulnerability, although
  • 21:59 - 22:04
    nowadays most operating systems don't
    really need a floppy disk drive anymore.
  • 22:04 - 22:08
    And on the right hand side, you can see
    that the whole system interface for a
  • 22:08 - 22:13
    container is much bigger than for a
    virtual machine because the whole system
  • 22:13 - 22:18
    interface for a container is exactly those
    system calls you saw earlier. So it's run
  • 22:18 - 22:24
    600 different calls. And in order to
    evaluate the security, you need basically
  • 22:24 - 22:33
    to audit all of them. So that's just a
    brief comparison between those. If we look
  • 22:33 - 22:38
    into more detail, what solo5 what shapes
    it can have here on the left side. We can
  • 22:38 - 22:43
    see it running in a hardware virtualized
    tender, which is you have the Linux
  • 22:43 - 22:50
    freebies, your openBSD at the bottom and
    you have solo5 blob, which is a blue thing
  • 22:50 - 22:55
    here in the middle. And then on top you
    have the unikernel. On the right hand side
  • 22:55 - 23:03
    you can see the Linux satcom process and
    you have a much smaller solo5 blob because
  • 23:03 - 23:07
    it doesn't need to do that much anymore,
    because all the hyper calls are basically
  • 23:07 - 23:12
    translated to system calls. So you
    actually get rid of them and you don't
  • 23:12 - 23:17
    need to communicate between the host and
    the guest system because in seccomb you
  • 23:17 - 23:23
    run as a whole system process so you don't
    have this virtualization. The advantage of
  • 23:23 - 23:29
    using seccomb as well, but you can deploy
    it without having access to virtualization
  • 23:29 - 23:38
    features of the CPU. Now to get it in even
    smaller shape. There's another backend I
  • 23:38 - 23:43
    haven't talked to you about. It's called
    the Muen. It's a separation kernel
  • 23:43 - 23:51
    developed in Ada. So you basically ... so
    now we try to get rid of this huge Unix
  • 23:51 - 23:58
    system below it. Which is this big kernel
    thingy here. And Muen is an open source
  • 23:58 - 24:03
    project developed in Switzerland in Ada,
    as I mentioned, and that uses SPARK, which
  • 24:03 - 24:13
    is proof system, which guarantees the
    memory isolation between the different
  • 24:13 - 24:20
    components. And Muen now goes a step
    further and it says, "Oh yeah. For you as
  • 24:20 - 24:24
    a guest system, you don't do static
    allocations and you don't do dynamic
  • 24:24 - 24:28
    resource management." We as a host system,
    we as a hypervisor, we don't do any
  • 24:28 - 24:34
    dynamic resource allocation as well. So it
    only does static resource management. So
  • 24:34 - 24:39
    at compile time of your Muen separation
    kernel you decide how many virtual
  • 24:39 - 24:44
    machines or how many unikernels you are
    running and which resources are given to
  • 24:44 - 24:50
    them. You even specify which communication
    channels are there. So if one of your
  • 24:50 - 24:56
    virtual machines needs to talk to another
    one, you need to specify that at
  • 24:56 - 25:01
    compile time and at runtime you don't have
    any dynamic resource management. So that
  • 25:01 - 25:09
    again makes the code much easier, much,
    much less complex. And you get to much
  • 25:09 - 25:19
    fewer lines of code. So to conclude with
    this Mirage and how this and also the Muen
  • 25:19 - 25:26
    and solo5. And how that is. I like to cite
    Antoine: "Perfection is achieved, not when
  • 25:26 - 25:32
    there is nothing more to add, but when
    there is nothing left to take away." I
  • 25:32 - 25:37
    mean obviously the most secure system is a
    system which doesn't exist.
  • 25:37 - 25:40
    Laughter
  • 25:40 - 25:42
    Let's look a bit further
  • 25:42 - 25:46
    into the decisions of MirageOS.
    Why do you use this strange
  • 25:46 - 25:51
    programming language called OCaml and
    what's it all about? And what are the case
  • 25:51 - 25:59
    studies? So OCaml has been around since
    more than 20 years. It's a multi paradigm
  • 25:59 - 26:06
    programming language. The goal for us and
    for OCaml is usually to have declarative
  • 26:06 - 26:14
    code. To achieve declarative code you need
    to provide the developers with some
  • 26:14 - 26:21
    orthogonal abstraction facilities such as
    here we have variables then functions you
  • 26:21 - 26:25
    likely know if you're a software
    developer. Also higher order functions. So
  • 26:25 - 26:32
    that just means that the function is able
    to take a function as input. Then in OCaml
  • 26:32 - 26:37
    we tried to always focus on the problem
    and do not distract with boilerplate. So
  • 26:37 - 26:44
    some running example again would be this
    memory management. We don't manually deal
  • 26:44 - 26:53
    with that, but we have computers to
    actually deal with that. In OCaml you have
  • 26:53 - 27:00
    a very expressive and static type system,
    which can spot a lot of invariance or
  • 27:00 - 27:07
    violation of invariance at build time.
    So the program won't compile if you don't
  • 27:07 - 27:14
    handle all the potential return types or
    return values of your function. So now a
  • 27:14 - 27:20
    type system, you know, you may know it
    from Java is a bit painful. If you have to
  • 27:20 - 27:24
    express at every location where you want
    to have a variable, which type this
  • 27:24 - 27:32
    variable is. What OCaml provides is type
    inference similar to Scala and other
  • 27:32 - 27:38
    languages. So you don't need to type all
    the types manually. And types are also
  • 27:38 - 27:44
    unlike in Java. Types are erased during
    compilation. So types are only information
  • 27:44 - 27:49
    about values the compiler has at compile
    time. But at runtime these are all erased
  • 27:49 - 27:55
    so they don't exist. You don't see them.
    And OCaml compiles to native machine code,
  • 27:55 - 28:02
    which I think is important for security
    and performance. Because otherwise you run
  • 28:02 - 28:07
    an interpreter or an abstract machine and
    you have to emulate something else and
  • 28:07 - 28:15
    that is never as fast as you can. OCaml
    has one distinct feature, which is its
  • 28:15 - 28:21
    module system. So you have all your
    values, which types or functions. And now
  • 28:21 - 28:27
    each of those values is defined inside of
    a so-called module. And the simplest
  • 28:27 - 28:33
    module is just the filename. But you can
    nest modules so you can explicitly say, oh
  • 28:33 - 28:40
    yeah, this value or this binding is now
    living in a sub module here off. So each
  • 28:40 - 28:45
    module you can also give it a type. So it
    has a set of types and a set of functions
  • 28:45 - 28:53
    and that is called its signature, which is
    the interface of the module. Now you have
  • 28:53 - 29:00
    another abstraction mechanism in OCaml
    which is functors. And functors are
  • 29:00 - 29:04
    basically compile time functions from
    module to module. So they allow a
  • 29:04 - 29:10
    pyramidisation. Like you can implement
    your generic map structure and all you
  • 29:10 - 29:19
    require. So map is just a hash map or a
    implementation is maybe a binary tree. And
  • 29:19 - 29:26
    you need to have is some comparison for
    the keys and that is modeled in OCaml by
  • 29:26 - 29:32
    module. So you have a module called map
    and you have a functor called make. And the
  • 29:32 - 29:38
    make takes some module which implements
    this comparison method and then provides
  • 29:38 - 29:46
    you with map data structure for that key
    type. And then MirageOS we actually use a
  • 29:46 - 29:52
    module system quite a bit more because we
    have all these resources which are
  • 29:52 - 29:58
    different between Xen and KVM and so on.
    So each of the different resources like a
  • 29:58 - 30:07
    network interface has a signature. OK, and
    target specific implementation. So we have
  • 30:07 - 30:11
    the TCP/IP stack, which is much higher
    than the network card, but it doesn't
  • 30:11 - 30:17
    really care if you run on Xen or if you
    run on KVM. You just program against this
  • 30:17 - 30:22
    abstract interface against the interface
    of the network device. But you don't need
  • 30:22 - 30:28
    to program. You don't need to write in
    your TCP/IP stack any code to run on Xen
  • 30:28 - 30:38
    or to run on KVM. So MirageOS also
    doesn't really use the complete OCaml
  • 30:38 - 30:44
    programming language. OCaml also provides
    you with an object system and we barely
  • 30:44 - 30:50
    use that. We also have in MirageOS... well
    OCaml also allows you for with mutable
  • 30:50 - 30:58
    state. And we barely used that mutable
    state, but we use mostly immutable data
  • 30:58 - 31:05
    whenever sensible. We also have a value
    passing style, so we put state and data as
  • 31:05 - 31:12
    inputs. So stage is just some abstract
    state and data is just a byte vector
  • 31:12 - 31:17
    in a protocol implementation. And then the
    output is also a new state which may be
  • 31:17 - 31:22
    modified and some reply maybe so some
    other byte vector or some application
  • 31:22 - 31:32
    data. Or the output data may as well be an
    error because the incoming data and state
  • 31:32 - 31:38
    may be invalid or might maybe violate some
    some constraints. And errors are also
  • 31:38 - 31:44
    explicitly types, so they are declared in
    the API and the call of a function needs
  • 31:44 - 31:52
    to handle all these errors explicitly. As
    I said, the single core, but we have some
  • 31:52 - 32:01
    promise based or some even based
    concurrent programming stuff. And yeah, we
  • 32:01 - 32:04
    have the ability to express a really
    strong and variants like this is a read-
  • 32:04 - 32:08
    only buffer in the type system. And the
    type system is, as I mentioned, only
  • 32:08 - 32:15
    compile time, no runtime overhead. So it's
    all pretty nice and good. So let's take a
  • 32:15 - 32:21
    look at some of the case studies. The
    first one is unikernel. So it's called the
  • 32:21 - 32:30
    Bitcoin Pinata. It started in 2015 when we
    were happy with from the scratch developed
  • 32:30 - 32:35
    TLS stack. TLS is transport layer
    security. So what use if you browse to
  • 32:35 - 32:42
    HTTPS. So we have an TLS stack in OCaml
    and we wanted to do some marketing for
  • 32:42 - 32:51
    that. Bitcoin Pinata is basically
    unikernel which uses TLS and provides you
  • 32:51 - 32:58
    with TLS endpoints, and it contains the
    private key for a bitcoin wallet which is
  • 32:58 - 33:06
    filled with, which used to be filled with
    10 bitcoins. And this means it's a
  • 33:06 - 33:11
    security bait. So if you can compromise
    the system itself, you get the private key
  • 33:11 - 33:16
    and you can do whatever you want with it.
    And being on this bitcoin block chain, it
  • 33:16 - 33:23
    also means it's transparent so everyone
    can see that that has been hacked or not.
  • 33:23 - 33:30
    Yeah and it has been online since three years
    and it was not hacked. But the bitcoin we
  • 33:30 - 33:36
    got were only borrowed from friends of us
    and they were then reused in other
  • 33:36 - 33:40
    projects. It's still online. And you can
    see here on the right that we had some
  • 33:40 - 33:50
    HTTP traffic, like an aggregate of maybe
    600,000 hits there. Now I have a size
  • 33:50 - 33:55
    comparison of the Bitcoin Pinata on the
    left. You can see the unikernel, which is
  • 33:55 - 34:00
    less than 10 megabytes in size or in
    source code it's maybe a hundred thousand
  • 34:00 - 34:06
    lines of code. On the right hand side you
    have a very similar thing, but running as
  • 34:06 - 34:16
    a Linux service so it runs an openSSL S
    server, which is a minimal TLS server you
  • 34:16 - 34:23
    can get basically on a Linux system using
    openSSL. And there we have mainly maybe a
  • 34:23 - 34:29
    size of 200 megabytes and maybe two
    million two lines of code. So that's
  • 34:29 - 34:36
    roughly a vector of 25. In other examples,
    we even got a bit less code, much bigger
  • 34:36 - 34:45
    effect. Performance analysis I showed that
    ... Well, in 2015 we did some evaluation
  • 34:45 - 34:51
    of our TLS stack and it turns out we're in
    the same ballpark as other
  • 34:51 - 34:57
    implementations. Another case study is
    CalDAV server, which we developed last
  • 34:57 - 35:05
    year with a grant from Prototypefund which
    is a German government funding. It is
  • 35:05 - 35:09
    intolerable with other clients. It stores
    data in a remote git repository. So we
  • 35:09 - 35:14
    don't use any block device or persistent
    storage, but we store it in a git
  • 35:14 - 35:19
    repository so whenever you add the
    calendar event, it does actually a git
  • 35:19 - 35:25
    push. And we also recently got some
    integration with CalDAV web, which is a
  • 35:25 - 35:31
    JavaScript user interface doing in
    JavaScript, doing a user interface. And we
  • 35:31 - 35:37
    just bundle that with the thing. It's
    online, open source, there is a demo
  • 35:37 - 35:42
    server and the data repository online.
    Yes, some statistics and I zoom in
  • 35:42 - 35:48
    directly to the CPU usage. So we had the
    luck that we for half of a month, we used
  • 35:48 - 35:56
    it as a process on a freeBSD system. And
    that happened roughly the first half until
  • 35:56 - 36:01
    here. And then at some point we thought,
    oh, yeah, let's migrated it to MirageOS
  • 36:01 - 36:06
    unikernel and don't run the freeBSD system
    below it. And you can see here on the x
  • 36:06 - 36:11
    axis the time. So that was the month of
    June, starting with the first of June on
  • 36:11 - 36:17
    the left and the last of June on the
    right. And on the y axis, you have the
  • 36:17 - 36:23
    number of CPU seconds here on the left or
    the number of CPU ticks here on the right.
  • 36:23 - 36:29
    The CPU ticks are virtual CPU ticks
    which debug counters from the hypervisor.
  • 36:29 - 36:33
    So from beehive and freeBSD here in that
    system. And what you can see here is this
  • 36:33 - 36:39
    massive drop by a factor of roughly 10.
    And that is when we switched from a Unix
  • 36:39 - 36:46
    virtual machine with the process to a
    freestanding Unikernel. So we actually use
  • 36:46 - 36:51
    much less resources. And if we look into
    the bigger picture here, we also see that
  • 36:51 - 36:58
    the memory dropped by a factor of 10 or
    even more. This is now a logarithmic scale
  • 36:58 - 37:03
    here on the y axis, the network bandwidth
    increased quite a bit because now we do
  • 37:03 - 37:10
    all the monitoring traffic, also via net
    interface and so on. Okay, that's CalDAV.
  • 37:10 - 37:17
    Another case study is authoritative DNS
    servers. And I just recently wrote a
  • 37:17 - 37:22
    tutorial on that. Which I will skip
    because I'm a bit short on time. Another
  • 37:22 - 37:27
    case study is a firewall for QubesOS.
    QubesOS is a reasonable, secure operating
  • 37:27 - 37:33
    system which uses Xen for isolation of
    workspaces and applications such as PDF
  • 37:33 - 37:39
    reader. So whenever you receive a PDF, you
    start your virtual machine, which is only
  • 37:39 - 37:48
    run once and you, well which is just run to
    open and read your PDF. And Qubes Mirage
  • 37:48 - 37:54
    firewall is now small or a tiny
    replacement for the Linux based firewall
  • 37:54 - 38:02
    written in OCaml now. And instead of
    roughly 300mb, you only use 32mb
  • 38:02 - 38:09
    of memory. There's now also recently
    some support for dynamic firewall rules
  • 38:09 - 38:17
    as defined by Qubes 4.0. And that is not
    yet merged into master, but it's under
  • 38:17 - 38:23
    review. Libraries in MirageOS yeah we have
    since we write everything from scratch and
  • 38:23 - 38:30
    in OCaml we don't have now. We don't have
    every protocol, but we have quite a few
  • 38:30 - 38:35
    protocols. There are also more unikernels
    right now, which you can see here in the
  • 38:35 - 38:42
    slides. Also online in the Fahrplan so you
    can click on the links later. Repeaters
  • 38:42 - 38:48
    were built. So for security purposes we
    don't get shipped binaries. But I plan to
  • 38:48 - 38:52
    ship binaries and in order to ship
    binaries. I don't want to ship non
  • 38:52 - 38:57
    reputable binaries. What is reproducible
    builds? Well it means that if you have the
  • 38:57 - 39:06
    same source code, you should get the
    binary identical output. And issues are
  • 39:06 - 39:15
    temporary filenames and timestamps and so
    on. In December we managed in MirageOS to
  • 39:15 - 39:21
    get some tooling on track to actually test
    the reproducibility of unikernels and we
  • 39:21 - 39:28
    fixed some issues and now all the tests in
    MirageOS unikernels reporducable, which
  • 39:28 - 39:34
    are basically most of them from this list.
    Another topic, a supply chain security,
  • 39:34 - 39:42
    which is important, I think, and we have
    this is still a work in progress. We still
  • 39:42 - 39:49
    haven't deployed that widely. But there
    are some test repositories out there to
  • 39:49 - 39:57
    provide more, to provide signatures signed
    by the actual authors of a library and
  • 39:57 - 40:03
    getting you across until the use of the
    library can verify that. And some
  • 40:03 - 40:09
    decentralized authorization and delegation
    of that. What about deployment? Well, in
  • 40:09 - 40:16
    conventional orchestration systems such as
    Kubernetes and so on. We don't yet have
  • 40:16 - 40:24
    a proper integration of MirageOS, but we
    would like to get some proper integration
  • 40:24 - 40:32
    there. If you already generate some
    libvirt.xml files from Mirage. So for each
  • 40:32 - 40:38
    unikernel you get the libvirt.xml and you
    can do that and run that in your libvirt
  • 40:38 - 40:45
    based orchestration system. For Xen, we
    also generate those .xl and .xe files,
  • 40:45 - 40:50
    which I personally don't really
    know much about, but that's it. On the
  • 40:50 - 40:56
    other side, I developed an orchestration
    system called Albatross because I was a
  • 40:56 - 41:03
    bit worried if I now have those tiny
    unikernels which are megabytes in size
  • 41:03 - 41:09
    and now I should trust the big Kubernetes,
    which is maybe a million lines of code
  • 41:09 - 41:16
    running on the host system with
    privileges. So I thought, oh well let's
  • 41:16 - 41:21
    try to come up with a minimal
    orchestration system which allows me some
  • 41:21 - 41:27
    console access. So I want to see the debug
    messages or whenever it fails to boot I
  • 41:27 - 41:32
    want to see the output of the console.
    Want to get some metrics like the Graphana
  • 41:32 - 41:39
    screenshot you just saw. And that's
    basically it. Then since I developed also
  • 41:39 - 41:45
    a TLS stack, I thought, oh yeah, well why
    not just use it for remote deployment? So
  • 41:45 - 41:51
    in TLS you have mutual authentication, you
    can have client certificates and
  • 41:51 - 41:57
    certificate itself is more or less an
    authenticated key value store because you
  • 41:57 - 42:04
    have those extensions and X 509 version 3
    and you can put arbitrary data in there
  • 42:04 - 42:09
    with keys being so-called object
    identifiers and values being whatever
  • 42:09 - 42:17
    else. TLS certificates have this great
    advantage that or X 509 certificates have
  • 42:17 - 42:24
    the advantage that during a TLS handshake
    they are transferred on the wire in not
  • 42:24 - 42:34
    base64 or PEM encoding as you usually see
    them, but in basic encoding which is much
  • 42:34 - 42:41
    nicer to the amount of bits you transfer.
    So it's not transferred in base64, but
  • 42:41 - 42:46
    directly in raw basically. And with
    Alabtross you can basically do a TLS
  • 42:46 - 42:51
    handshake and in that client certificate
    you present, you already have the
  • 42:51 - 42:58
    unikernel image and the name and the boot
    arguments and you just deploy it directly.
  • 42:58 - 43:04
    You can alter an X 509. You have a chain
    of certificate authorities, which you send
  • 43:04 - 43:09
    with and this chain of certificate
    authorities also contain some extensions
  • 43:09 - 43:15
    in order to specify which policies are
    active. So how many virtual machines are
  • 43:15 - 43:22
    you able to deploy on my system? How much
    memory you you have access to and which
  • 43:22 - 43:27
    bridges or which network interfaces you
    have access to? So Albatross is really a
  • 43:27 - 43:34
    minimal orchestration system running as a
    family of Unix processes. It's maybe 3000
  • 43:34 - 43:41
    lines of code or so. OCaml code. But using
    then the TLS stack and so on. But yeah, it
  • 43:41 - 43:47
    seems to work pretty well. I at least use
    it for more than two dozen unikernels at
  • 43:47 - 43:52
    any point in time. What about the
    community? Well the whole MirageOS project
  • 43:52 - 43:58
    started around 2008 at University of
    Cambridge, so it used to be a research
  • 43:58 - 44:04
    project with which still has a lot of
    ongoing student projects at University of
  • 44:04 - 44:11
    Cambridge. But now it's an open source
    permissive license, mostly BSD licensed
  • 44:11 - 44:21
    thing, where we have community event every
    half a year and a retreat in Morocco where
  • 44:21 - 44:26
    we also use our own unikernels like the
    DHTP server and the DNS resolve and so on.
  • 44:26 - 44:32
    We just use them to test them and to see
    how does it behave and does it work for
  • 44:32 - 44:40
    us? We have quite a lot of open source
    computer contributors from all over and
  • 44:40 - 44:46
    some of the MirageOS libraries have also
    been used or are still used in this Docker
  • 44:46 - 44:52
    technology, Docker for Mac and Docker for
    Windows, which emulates the guest system
  • 44:52 - 45:02
    or which needs some wrappers. And there is
    a lot of OCaml code is used. So to finish
  • 45:02 - 45:07
    my talk, I would like to have another
    side, which is that Rome wasn't built in a
  • 45:07 - 45:15
    day. So where we are is to conclude here
    we have a radical approach to operating
  • 45:15 - 45:22
    systems development. We have a security
    from the ground up with much fewer code
  • 45:22 - 45:30
    and we also have much fewer attack vectors
    because we use a memory safe
  • 45:30 - 45:39
    language. So we have reduced the carbon
    footprint, as I mentioned in the start of
  • 45:39 - 45:46
    the talk, because we use much less CPU
    time, but also much less memory. So we use
  • 45:46 - 45:53
    less resources. MirageOS itself and O'Caml
    have a reasonable performance. We have
  • 45:53 - 45:57
    seen some statistics about the TLS stack
    that it was in the same ballpark as
  • 45:57 - 46:06
    OpenSSL and PolarSSL, which is nowadays
    MBed TLS, and MirageOS unikernels, since
  • 46:06 - 46:11
    they don't really need to negotiate
    features and wait for the Scottie Pass and
  • 46:11 - 46:15
    so on. They actually do it in
    milliseconds, not in seconds, so they do
  • 46:15 - 46:22
    not hardware probing and so on. But they
    know that startup time what they expect. I
  • 46:22 - 46:27
    would like to thank everybody who is and
    was involved in this whole technology
  • 46:27 - 46:33
    stack because I myself I program quite a
    bit of O'Caml, but I wouldn't have been
  • 46:33 - 46:39
    able to do that on my own. It is just a
    bit too big. MirageOS currently spends
  • 46:39 - 46:45
    around maybe 200 different git
    repositories with the libraries, mostly
  • 46:45 - 46:52
    developed on GitHub and open source. I
    am at the moment working in a nonprofit
  • 46:52 - 46:57
    company in Germany, which is called the
    Center for the Cultivation of Technology
  • 46:57 - 47:03
    with a project called robur. So we work in
    a collective way to develop full-stack
  • 47:03 - 47:08
    MirageOS unikernels. That's why I'm happy
    to do that from Dublin. And if you're
  • 47:08 - 47:14
    interested, please talk to us. I have some
    selected related talks, there are much
  • 47:14 - 47:21
    more talks about MirageOS. But here is
    just a short list of something, if you're
  • 47:21 - 47:30
    interested in some certain aspects, please
    help yourself to view them.
  • 47:30 - 47:32
    That's all from me.
  • 47:32 - 47:37
    Applause
  • 47:37 - 47:46
    Herald: Thank you very much. There's a bit
    over 10 minutes of time for questions. If
  • 47:46 - 47:50
    you have any questions go to the
    microphone. There's several microphones
  • 47:50 - 47:54
    around the room. Go ahead.
    Question: Thank you very much for the talk
  • 47:54 - 47:57
    -
    Herald: Writ of order. Thanking the
  • 47:57 - 48:01
    speaker can be done afterwards. Questions
    are questions, so short sentences ending
  • 48:01 - 48:06
    with a question mark. Sorry, do go ahead.
    Question: If I want to try this at home,
  • 48:06 - 48:09
    what do I need? Is a raspi sufficient? No,
    it isn't.
  • 48:09 - 48:15
    Hannes: That is an excellent question. So
    I usually develop it on such a thinkpad
  • 48:15 - 48:23
    machine, but we actually support also
    ARM64 mode. So if you have a Raspberry Pi
  • 48:23 - 48:29
    3+, which I think has the virtualization
    bits and the Linux kernel, which is reason
  • 48:29 - 48:35
    enough to support KVM on that Raspberry Pi
    3+, then you can try it out there.
  • 48:35 - 48:42
    Herald: Next question.
    Question: Well, currently most MirageOS
  • 48:42 - 48:52
    unikernels are used for running server
    applications. And so obviously this all
  • 48:52 - 48:58
    static preconfiguration of OCaml and
    maybe Ada SPARK is fine for that. But what
  • 48:58 - 49:04
    do you think about... Will it ever be
    possible to use the same approach with all
  • 49:04 - 49:10
    this static reconfiguration for these very
    dynamic end user desktop systems, for
  • 49:10 - 49:15
    example, like which at least currently use
    quite a lot of plug-and-play.
  • 49:15 - 49:19
    Hannes: Do you have an example? What are
    you thinking about?
  • 49:19 - 49:26
    Question: Well, I'm not that much into
    the topic of its SPARK stuff, but you said
  • 49:26 - 49:32
    that all the communication's paths have to
    be defined in advance. So especially with
  • 49:32 - 49:38
    plug-and-play devices like all this USB
    stuff, we either have to allow everything
  • 49:38 - 49:47
    in advance or we may have to reboot parts
    of the unikernels in between to allow
  • 49:47 - 49:55
    rerouting stuff.
    Hannes: Yes. Yes. So I mean if you want to
  • 49:55 - 50:01
    design a USB plug-and-play system, you can
    think of it as you plug in somewhere the
  • 50:01 - 50:08
    USB stick and then you start the unikernel
    which only has access to that USB stick.
  • 50:08 - 50:15
    But having a unikernel... Well I wouldn't
    design a unikernel which randomly does
  • 50:15 - 50:24
    plug and play with the the outer world,
    basically. So. And one of the applications
  • 50:24 - 50:31
    I've listed here is at the top is a
    picture viewer, which is a unikernel that
  • 50:31 - 50:37
    also at the moment, I think has static
    embedded data in it. But is able on Qubes
  • 50:37 - 50:44
    OS or on Unix and SDL to display the
    images and you can think of some way we
  • 50:44 - 50:49
    are a network or so to access the images
    actually. So you didn't need to compile
  • 50:49 - 50:54
    the images in, but you can have a good
    repository or TCP server or whatever in
  • 50:54 - 51:01
    order to receive the images. So I am
    saying. So what I didn't mention is that
  • 51:01 - 51:06
    MirageOS instead of being general purpose
    and having a shell and you can do
  • 51:06 - 51:11
    everything with it, it is that each
    service, each unikernel is a single
  • 51:11 - 51:17
    service thing. So you can't do everything
    with it. And I think that is an advantage
  • 51:17 - 51:23
    from a lot of points of view. I agree
    that if you have a highly dynamic system,
  • 51:23 - 51:28
    that you may have some trouble on how to
    integrate that.
  • 51:28 - 51:39
    Herald: Are there any other questions?
    No, it appears not. In which case,
  • 51:39 - 51:41
    thank you again, Hannes.
    Warm applause for Hannes.
  • 51:41 - 51:45
    Applause
  • 51:45 - 51:49
    Outro music
  • 51:49 - 52:12
    subtitles created by c3subtitles.de
    in the year 2020. Join, and help us!
Title:
36C3 - Leaving legacy behind
Description:

more » « less
Video Language:
English
Duration:
52:12

English subtitles

Revisions