< Return to Video

36C3 - No source, no problem! High speed binary fuzzing

  • 0:00 - 0:10
    36c3 prerol music
  • 0:18 - 0:23
    Herald: So, the next talk for this
    afternoon is about high speed binary
  • 0:23 - 0:28
    fuzzing. We have two researchers that will
    be presenting the product of their latest
  • 0:28 - 0:34
    work, which is a framework for static
    binary rewriting. Our speakers are—the
  • 0:34 - 0:39
    first one is a computer science master's
    student at EPFL and the second one is a
  • 0:39 - 0:43
    security researcher and assistant
    professor at EPFL. Please give a big round
  • 0:43 - 0:45
    of applause to Nspace and gannimo.
  • 0:45 - 0:50
    Applause
  • 0:50 - 0:53
    gannimo (Mathias Payer): Thanks for the
    introduction. It's a pleasure to be here,
  • 0:53 - 0:58
    as always. We're going to talk about
    different ways to speed up your fuzzing
  • 0:58 - 1:02
    and to find different kinds of
    vulnerabilities or to tweak your binaries
  • 1:02 - 1:08
    in somewhat unintended ways. I'm Mathias
    Payer or I go by gannimo on Twitter and I
  • 1:08 - 1:14
    am an assistant professor at EPFL working
    on different forms of software security:
  • 1:14 - 1:19
    fuzzing sanitization, but also different
    kinds of mitigations. And Matteo over
  • 1:19 - 1:24
    there is working on his master's thesis on
    different forms of binary rewriting for
  • 1:24 - 1:28
    the kernel. And today we're going to take
    you on a journey on how to actually
  • 1:28 - 1:32
    develop very fast and very efficient
    binary rewriting mechanisms that allow you
  • 1:32 - 1:38
    to do unintended modifications to the
    binaries and allow you to explore
  • 1:38 - 1:46
    different kinds of unintended features in
    binaries. So about this talk. What we
  • 1:46 - 1:50
    discovered or the reason why we set out on
    this journey was that fuzzing binaries is
  • 1:50 - 1:56
    really, really hard. There's very few
    tools in user space. There's—it's
  • 1:56 - 2:00
    extremely hard to set it up and it's
    extremely hard to set it up in a
  • 2:00 - 2:04
    performant way. The setup is complex. You
    have to compile different tools. You have
  • 2:04 - 2:09
    to modify it. And the results are not
    really that satisfactory. As soon as you
  • 2:09 - 2:13
    move to the kernel, fuzzing binaries in a
    kernel is even harder. There's no tooling
  • 2:13 - 2:17
    whatsoever, there's very few users
    actually working with binary code in the
  • 2:17 - 2:23
    kernel or modifying binary code, and it's
    just a nightmare to work with. So what we
  • 2:23 - 2:27
    are presenting today is a new approach
    that allows you to instrument any form of
  • 2:27 - 2:32
    binary code or modern binary code based on
    static rewriting, which gives you full
  • 2:32 - 2:37
    native performance. You only pay for the
    instrumentation that you add, and you can
  • 2:37 - 2:42
    do very heavyweight transformations on top
    of it. The picture, if you look at the
  • 2:42 - 2:47
    modern system, let's say we are looking at
    a modern setup. Let's say you're looking
  • 2:47 - 2:53
    at cat pictures in your browser: Chrome
    plus the kernel plus the libc plus the
  • 2:53 - 2:58
    graphical user interface together clog up
    at about 100 million lines of code.
  • 2:58 - 3:03
    Instrumenting all of this for some form of
    security analysis is a nightmare,
  • 3:03 - 3:07
    especially along this large stack of
    software. There's quite a bit of different
  • 3:07 - 3:11
    compilers involved. There's different
    linkers. It may be compiled on a different
  • 3:11 - 3:15
    system, with different settings and so on.
    And then getting your instrumentation
  • 3:15 - 3:19
    across all of this is pretty much
    impossible and extremely hard to work
  • 3:19 - 3:24
    with. And we want to enable you to select
    those different parts that you're actually
  • 3:24 - 3:30
    interested in. Modify those and then focus
    your fuzzing or analysis approaches on
  • 3:30 - 3:35
    those small subsets of the code, giving
    you a much better and stronger capability
  • 3:35 - 3:39
    to test the systems that you're, or those
    parts of the system that you're really,
  • 3:39 - 3:46
    really interested in. Who's worked on
    fuzzing before? Quick show of hands. Wow,
  • 3:46 - 3:54
    that's a bunch of you. Do you use AFL?
    Yeah, most of you, AFL. Libfuzzer? Cool,
  • 3:54 - 4:00
    about 10, 15 percent libfuzzer, 30 percent
    fuzzing, and AFL. There's a quite good
  • 4:00 - 4:04
    knowledge of fuzzing, so I'm not going to
    spend too much time on fuzzing, but for
  • 4:04 - 4:08
    those that haven't really run their
    fuzzing campaigns yet, it's a very simple
  • 4:08 - 4:12
    software testing technique. You're
    effectively taking a binary, let's say
  • 4:12 - 4:16
    Chrome, as a target and you're running
    this in some form of execution
  • 4:16 - 4:21
    environment. And fuzzing then consists of
    some form of input generation that creates
  • 4:21 - 4:27
    new test cases, throws them at your
    program and sees—and checks what is
  • 4:27 - 4:31
    happening with your program. And either
    everything is OK, and your code is being
  • 4:31 - 4:36
    executed, and your input—the program
    terminates, everything is fine, or you
  • 4:36 - 4:40
    have a bug report. If you have a bug
    report, you can use this. Find the
  • 4:40 - 4:45
    vulnerability, maybe develop a PoC and
    then come up with some form of either
  • 4:45 - 4:49
    exploit or patch or anything else. Right.
    So this is pretty much fuzzing in a in a
  • 4:49 - 4:56
    nutshell. How do you get fuzzing to be
    effective? How can you cover large source
  • 4:56 - 5:00
    bases, complex code, and complex
    environment? Well, there's a couple of
  • 5:00 - 5:05
    simple steps that you can take. And let's
    walk quickly through effective fuzzing
  • 5:05 - 5:13
    101. Well, first, you want to be able to
    create test cases that actually trigger
  • 5:13 - 5:18
    bugs. And this is a very, very
    complicated, complicated part. And we need
  • 5:18 - 5:23
    to have some notion of the inputs that a
    program accepts. And we need to have some
  • 5:23 - 5:28
    notion of how we can explore different
    parts of the program, right? Different
  • 5:28 - 5:31
    parts of functionality. Well, on one hand,
    we could have a developer write all the
  • 5:31 - 5:34
    test cases by hand, but this would be kind
    of boring. It would also require a lot of
  • 5:34 - 5:40
    human effort in creating these different
    inputs and so on. So coverage guided
  • 5:40 - 5:47
    fuzzing has evolved as a very simple way
    to guide the fuzzing process, leveraging
  • 5:47 - 5:51
    the information on which parts of the code
    have been executed by simply tracing the
  • 5:51 - 5:58
    individual path through the program based
    on the execution flow. So we can—the
  • 5:58 - 6:03
    fuzzer can use this feedback to then
    modify the inputs that are being thrown at
  • 6:03 - 6:10
    the fuzzing process. The second step is
    the fuzzer must be able to detect bugs. If
  • 6:10 - 6:13
    you've ever looked at a memory corruption,
    if you're just writing one byte after the
  • 6:13 - 6:18
    end of a buffer, it's highly likely that
    your software is not going to crash. But
  • 6:18 - 6:21
    it's still a bug, and it may still be
    exploitable based on the underlying
  • 6:21 - 6:27
    conditions. So we want to be able to
    detect violations as soon as they happen,
  • 6:27 - 6:32
    for example, based on on some form of
    sanitization that we add, some form of
  • 6:32 - 6:35
    instrumentation that we add to the to the
    binary, that then tells us, hey, there's a
  • 6:35 - 6:40
    violation of the memory safety property,
    and we terminate the application right
  • 6:40 - 6:45
    away as a feedback to the fuzzer. Third,
    but the—and last but not least: Speed is
  • 6:45 - 6:50
    key, right? For if you're running a
    fuzzing campaign, you have a fixed
  • 6:50 - 6:55
    resource budget. You have a couple of
    cores, and you want to run for 24 hours,
  • 6:55 - 6:59
    48 hours, a couple of days. But in any
    way, whatever your constraints are, you
  • 6:59 - 7:04
    have a fixed amount of instructions that
    you can actually execute. And you have to
  • 7:04 - 7:09
    decide, am I spending my instructions on
    generating new inputs, tracking
  • 7:09 - 7:14
    constraints, finding bugs, running
    sanitization or executing the program? And
  • 7:14 - 7:18
    you need to find a balance between all of
    them, as it is a zero sum game. You have a
  • 7:18 - 7:21
    fixed amount of resources and you're
    trying to make the best with these
  • 7:21 - 7:27
    resources. So any overhead is slowing you
    down. And again, this becomes an
  • 7:27 - 7:31
    optimization problem. How can you most
    effectively use the resources that you
  • 7:31 - 7:38
    have available? As we are fuzzing with
    source code, it's quite easy to actually
  • 7:38 - 7:42
    leverage existing mechanisms, and we add
    all that instrumentation at compile time.
  • 7:42 - 7:46
    We take source code, we pipe it through
    the compiler and modern compiler
  • 7:46 - 7:51
    platforms, allow you to instrument and add
    little code snippets during the
  • 7:51 - 7:55
    compilation process that then carry out
    all these tasks that are useful for
  • 7:55 - 8:00
    fuzzing. For example, modern compilers can
    add short snippets of code for coverage
  • 8:00 - 8:04
    tracking that will record which parts of
    the code that you have executed, or for
  • 8:04 - 8:09
    sanitization which record and check every
    single memory access if it is safe or not.
  • 8:09 - 8:12
    And then when you're running the
    instrumented binary, everything is fine
  • 8:12 - 8:17
    and you can detect the policy violations
    as you go along. Now if you would have
  • 8:17 - 8:21
    source code for everything, this would be
    amazing. But it's often not the case,
  • 8:21 - 8:28
    right? We may be able on Linux to cover a
    large part of the protocol stack by
  • 8:28 - 8:34
    focusing only on source-code-based
    approaches. But there may be applications
  • 8:34 - 8:39
    where no source code is available. If we
    move to Android or other mobile systems,
  • 8:39 - 8:43
    there's many drivers that are not
    available as open source or just available
  • 8:43 - 8:49
    as binary blobs, or the full software
    stack may be closed-source and we only get
  • 8:49 - 8:52
    the binaries. And we still want to find
    vulnerabilities in these complex software
  • 8:52 - 9:00
    stacks that span hundreds of millions of
    lines of code in a very efficient way. The
  • 9:00 - 9:05
    only solution to cover this part of
    massive code base is to actually rewrite
  • 9:05 - 9:09
    and focus on binaries. A very simple
    approach could be black box fuzzing, but
  • 9:09 - 9:12
    this is—this doesn't really get you
    anywhere because you don't get any
  • 9:12 - 9:16
    feedback; you don't get any information if
    you're triggering bugs. So one simple
  • 9:16 - 9:20
    approach, and this is the approach that is
    most dominantly used today, is to rewrite
  • 9:20 - 9:26
    the program or the binary dynamically. So
    you're taking the binary and during
  • 9:26 - 9:32
    execution you use some form of dynamic
    binary instrumentation based on Pin, angr,
  • 9:32 - 9:37
    or some other binary rewriting tool and
    translate the targeted runtime, adding
  • 9:37 - 9:43
    this binary instrumentation on top of it
    as you're executing it. It's simple, it's
  • 9:43 - 9:47
    straightforward, but it comes at a
    terrible performance cost of ten to a
  • 9:47 - 9:52
    hundred x slow down, which is not really
    effective. And you're spending all your
  • 9:52 - 9:58
    cores and your cycles on just executing
    the binary instrumentation. So we don't
  • 9:58 - 10:02
    really want to do this and we want to have
    something that's more effective than that.
  • 10:02 - 10:07
    So what we are focusing on is to do static
    rewriting. It involves a much more complex
  • 10:07 - 10:12
    analysis as we are rewriting the binary
    before it is being executed, and we have
  • 10:12 - 10:18
    to recover all of the control flow, all of
    the different mechanisms, but it results
  • 10:18 - 10:25
    in a much better performance. And we can
    get more bang for our buck. So why is
  • 10:25 - 10:31
    static rewriting so challenging? Well,
    first, simply adding code will break the
  • 10:31 - 10:35
    target. So if you are disassembling this
    piece of code here, which is a simple loop
  • 10:35 - 10:41
    that loads data, decrements the registers,
    and then jumps if you're not at the end of
  • 10:41 - 10:46
    the array and keeps iterating through this
    array. Now, as you look at the jump-not-
  • 10:46 - 10:52
    zero instruction, the last instruction of
    the snippet, it is a relative offset. So
  • 10:52 - 10:58
    it jumps backward seven bytes. Which is
    nice if you just execute the code as is.
  • 10:58 - 11:02
    But as soon as you want to insert new
    code, you change the offsets in the
  • 11:02 - 11:07
    program, and you're modifying all these
    different offsets. And simply adding new
  • 11:07 - 11:13
    code somewhere in between will break the
    target. So a core feature that we need to
  • 11:13 - 11:18
    enforce, or core property that we need to
    enforce, is that we must find all the
  • 11:18 - 11:24
    references and properly adjust them, both
    relative offsets and absolute offsets as
  • 11:24 - 11:30
    well. Getting a single one wrong will
    break everything. What makes this problem
  • 11:30 - 11:35
    really, really hard is that if you're
    looking at the binary, a byte is a byte,
  • 11:35 - 11:38
    right? There's no way for us to
    distinguish between scalars and
  • 11:38 - 11:44
    references, and in fact they are
    indistinguishable. Getting a single
  • 11:44 - 11:50
    reference wrong breaks the target and
    would introduce arbitrary crashes. So we
  • 11:50 - 11:54
    have to come up with ways that allow us to
    distinguish between the two. So for
  • 11:54 - 12:00
    example, if you have this code here, it
    takes a value and stores it somewhere on
  • 12:00 - 12:07
    the stack. This could come from two
    different kind of high-level constructs.
  • 12:07 - 12:12
    On one hand, it could be taking the
    address of a function and storing this
  • 12:12 - 12:17
    function address somewhere and in a stack
    variable. Or it could be just storing a
  • 12:17 - 12:22
    scalar in a stack variable. And these two
    are indistinguishable, and rewriting them,
  • 12:22 - 12:25
    as soon as we add new code, the offsets
    will change. If it is a function, we would
  • 12:25 - 12:32
    have to modify the value; if it is a
    scalar, we have to keep the same value. So
  • 12:32 - 12:36
    how can we come up with a way that allows
    us to distinguish between the two and
  • 12:36 - 12:45
    rewrite binaries by recovering this
    missing information? So let us take—let me
  • 12:45 - 12:48
    take you or let us take you on a journey
    towards instrumenting binaries in the
  • 12:48 - 12:53
    kernel. This is what we aim for. We'll
    start with the simple case of
  • 12:53 - 12:57
    instrumenting binaries in user land, talk
    about different kinds of coverage guided
  • 12:57 - 13:02
    fuzzing and what kind of instrumentation
    we can add, what kind of sanitization we
  • 13:02 - 13:06
    can add, and then focusing on taking it
    all together and applying it to kernel
  • 13:06 - 13:11
    binaries to see what what will fall out of
    it. Let's start with instrumenting
  • 13:11 - 13:17
    binaries first. I will now talk a little
    bit about RetroWrite, our mechanism and
  • 13:17 - 13:25
    our tool that enables static binary
    instrumentation by symbolizing existing
  • 13:25 - 13:31
    binaries. So we recover the information
    and we translate relative offsets and
  • 13:31 - 13:40
    absolute offsets into actual labels that
    are added to the assembly file. The
  • 13:40 - 13:43
    instrumentation can then work on the
    recovered assembly file, which can then be
  • 13:43 - 13:48
    reassembled into a binary that can then be
    executed for fuzzing. We implement
  • 13:48 - 13:52
    coverage tracking and binary address
    sanitizer on top of this, leveraging
  • 13:52 - 13:58
    abstraction as we go forward. The key to
    enabling this kind of binary rewriting is
  • 13:58 - 14:02
    position-independent code. And position-
    independent code has become the de-facto
  • 14:02 - 14:07
    standard for any code that is being
    executed on a modern system. And it
  • 14:07 - 14:12
    effectively says that it is code that can
    be loaded at any arbitrary address in your
  • 14:12 - 14:16
    address space as you are executing
    binaries. It is essential and a
  • 14:16 - 14:19
    requirement if you want to have address
    space layout randomization or if you want
  • 14:19 - 14:22
    to use shared libraries, which de facto
    you want to use in all these different
  • 14:22 - 14:26
    systems. So since a couple of years, all
    the code that you're executing on your
  • 14:26 - 14:33
    phones, on your desktops, on your laptops
    is position-independent code. And the idea
  • 14:33 - 14:37
    between the position-independent code is
    that you can load it anywhere in your
  • 14:37 - 14:41
    address space and you can therefore not
    use any hard-coded static addresses and
  • 14:41 - 14:44
    you have to inform the system of
    relocations or pick relative
  • 14:44 - 14:53
    addresses—to—on how the system can
    relocate these different mechanisms. On
  • 14:53 - 14:59
    x86_64, position-independent code
    leverages addressing that is relative to
  • 14:59 - 15:03
    the instruction pointer. So for example,
    it uses the current instruction pointer
  • 15:03 - 15:08
    and then a relative offset to that
    instruction pointer to reference global
  • 15:08 - 15:12
    variables, other functions and so on. And
    this is a very easy way for us to
  • 15:12 - 15:18
    distinguish references from constants,
    especially in PIE binaries. If it is RIP-
  • 15:18 - 15:21
    relative, it is a reference; everything
    else is a constant. And we can build our
  • 15:21 - 15:26
    translation algorithm and our translation
    mechanism based on this fundamental
  • 15:26 - 15:30
    finding to remove any form of heuristic
    that is needed by focusing especially on
  • 15:30 - 15:35
    position-independent code. So we're
    supporting position-independent code; we
  • 15:35 - 15:39
    are—we don't support non-position-
    independent code, but we give you the
  • 15:39 - 15:43
    guarantee that we can rewrite all the
    different code that is out there. So
  • 15:43 - 15:48
    symbolization works as follows: If you
    have the little bit of code on the lower
  • 15:48 - 15:54
    right, symbolization replaces first all
    the references with assembler labels. So
  • 15:54 - 15:58
    look at the call instruction and the jump-
    not-zero instruction; the call instruction
  • 15:58 - 16:02
    references an absolute address and the
    jump-not-zero instruction jumps backward
  • 16:02 - 16:08
    relative 15 bytes. So by focusing on these
    relative jumps and calls, we can replace
  • 16:08 - 16:12
    them with actual labels and rewrite the
    binary as follows: so we're calling a
  • 16:12 - 16:16
    function, replacing it with the actual
    label, and for the jump-not-zero we are
  • 16:16 - 16:21
    inserting an actual label in the assembly
    code and adding a backward reference. For
  • 16:21 - 16:26
    PC-relative addresses, for example the
    data load, we can then replace it with the
  • 16:26 - 16:30
    name of the actual data that we have
    recovered, and we can then add all the
  • 16:30 - 16:36
    different relocations and use that as
    auxiliary information on top of it. After
  • 16:36 - 16:43
    these three steps, we can insert any new
    code in between, and can therefore add
  • 16:43 - 16:47
    different forms of instrumentations or run
    some more higher-level analysis on top of
  • 16:47 - 16:54
    it, and then reassemble the file for
    fuzzing or coverage-guided tracking,
  • 16:54 - 16:59
    address sanitization or whatever else you
    want to do. I will now hand over to
  • 16:59 - 17:04
    Matteo, who will cover coverage-guided
    fuzzing and sanitization and then
  • 17:04 - 17:07
    instrumenting the binaries in the kernel.
    Go ahead.
  • 17:07 - 17:11
    Nspace (Matteo Rizzo): So, now that we
    have this really nice framework to rewrite
  • 17:11 - 17:16
    binaries, one of the things that we want
    to add to actually get the fuzzing is this
  • 17:16 - 17:23
    coverage-tracking instrumentation. So
    coverage-guided fuzzing is a way, a
  • 17:23 - 17:28
    method, for—to let the fuzzer discover
    interesting inputs, an interesting path to
  • 17:28 - 17:36
    the target by itself. So the basic idea is
    that the fuzzer will track coverage—the
  • 17:36 - 17:39
    parts of the programs that are covered by
    different inputs by inserting some kind of
  • 17:39 - 17:43
    instrumentation. So, for example, here we
    have this target program that checks if
  • 17:43 - 17:49
    the input contains the string "PNG" at the
    beginning, and if it does, then it does
  • 17:49 - 17:54
    something interesting, otherwise it just
    bails out and fails. So if we track the
  • 17:54 - 17:58
    part of the programs that each input
    executes, the fuzzer can figure out that
  • 17:58 - 18:03
    an input that contains "P" will have
    discovered a different path through the
  • 18:03 - 18:08
    program than input that doesn't contain
    it. And then so on it can, one byte at a
  • 18:08 - 18:13
    time, discover that this program expects
    this magic sequence "PNG" at the start of
  • 18:13 - 18:19
    the input. So the way that the fuzzer does
    this is that every time a new input
  • 18:19 - 18:24
    discovers a new path though the target, it
    is considered interesting and added to a
  • 18:24 - 18:29
    corpus of interesting inputs. And every
    time the fuzzer needs to generate a new
  • 18:29 - 18:36
    input, it will select something from the
    corpus, mutate it randomly, and then use
  • 18:36 - 18:40
    it as the new input. So this is like
    a—this is, like, conceptually pretty
  • 18:40 - 18:43
    simple, but in practice it works really
    well and it really lets the fuzzer
  • 18:43 - 18:48
    discover the format that the target
    expects in an unsupervised way. So as an
  • 18:48 - 18:53
    example, this is an experiment that was
    run by the author of AFL—AFL is the fuzzer
  • 18:53 - 18:58
    that sort of popularized this
    technique—where he was fuzzing a JPEG-
  • 18:58 - 19:02
    parsing library, starting from a corpus
    that only contained the string "hello". So
  • 19:02 - 19:08
    now clearly "hello" is not a valid JPEG
    image and so—but still, like, the fuzzer
  • 19:08 - 19:12
    was still able to find—to discover the
    correct format. So after a while it
  • 19:12 - 19:18
    started generating some grayscale images,
    on the top left, and as it generated more
  • 19:18 - 19:21
    and more inputs, it started generating
    more interesting images, such as some
  • 19:21 - 19:25
    grayscale gradients, and later on even
    some color images. So as you can see, this
  • 19:25 - 19:31
    really works, and it allows us to fuzz a
    program without really teaching the fuzzer
  • 19:31 - 19:35
    how the input should look like. So that's
    it for coverage-guided fuzzing. Now we'll
  • 19:35 - 19:38
    talk a bit about sanitizations. As a
    reminder, the core idea behind
  • 19:38 - 19:42
    sanitization is that just looking for
    crashes is likely to miss some of the
  • 19:42 - 19:46
    bugs. So, for example, if you have this
    out-of-bounds one-byte read, that will
  • 19:46 - 19:50
    probably not crash the target, but you
    would still like to catch it because it
  • 19:50 - 19:53
    could be used for an info leak, for
    example. So one of the most popular
  • 19:53 - 19:59
    sanitizers is Address Sanitizer. So
    Address Sanitizer will instrument all the
  • 19:59 - 20:05
    memory accesses in your program and check
    for memory corruption, which—so, memory
  • 20:05 - 20:09
    corruption is a pretty dangerous class of
    bugs that unfortunately still plagues C
  • 20:09 - 20:17
    and C++ programs and unsafe languages in
    general. And ASan tries to catch it by
  • 20:17 - 20:21
    instrumenting the target. It is very
    popular; it has been used to find
  • 20:21 - 20:27
    thousands of bugs in complex software like
    Chrome and Linux, and even though it has,
  • 20:27 - 20:32
    like, a bit of a slowdown—like about 2x—it
    is still really popular because it lets
  • 20:32 - 20:37
    you find many, many more bugs. So how does
    it work? The basic idea is that ASan will
  • 20:37 - 20:42
    insert some special regions of memory
    called 'red zones' around every object in
  • 20:42 - 20:47
    memory. So we have a small example here
    where we declare a 4-byte array on the
  • 20:47 - 20:54
    stack. So ASan will allocate the array
    "buf" and then add a red zone before it
  • 20:54 - 20:59
    and a red zone after it. Whenever the
    program accesses the red zones, it is
  • 20:59 - 21:03
    terminated with a security violation. So
    the instrumentation just prints a bug
  • 21:03 - 21:07
    report and then crashes the target. This
    is very useful for detecting, for example,
  • 21:07 - 21:11
    buffer overflows or underflows and many
    other kinds of bugs such as use-after-free
  • 21:11 - 21:16
    and so on. So, as an example here, we are
    trying to copy 5 bytes into a 4-byte
  • 21:16 - 21:23
    buffer, and ASan will check each of the
    accesses one by one. And when it sees that
  • 21:23 - 21:27
    the last byte writes to a red zone, it
    detects the violation and crashes the
  • 21:27 - 21:32
    program. So this is good for us because
    this bug might have not been found by
  • 21:32 - 21:36
    simply looking for crashes, but it's
    definitely found if we use ASan. So this
  • 21:36 - 21:41
    is something we want for fuzzing. So now
    that we've covered—briefly covered ASan we
  • 21:41 - 21:46
    can talk about instrumenting binaries in
    the kernel. So Mathias left us with
  • 21:46 - 21:53
    RetroWrite, and with RetroWrite we can add
    both coverage tracking and ASan to
  • 21:53 - 21:57
    binaries. So the simple—it's a really
    simple idea: now that we can rewrite this
  • 21:57 - 22:03
    binary and add instructions wherever we
    want, we can implement both coverage
  • 22:03 - 22:07
    tracking and ASan. In order to implement
    coverage tracking, we simply have to
  • 22:07 - 22:12
    identify the start of every basic block
    and add a little piece of instrumentation
  • 22:12 - 22:16
    at the start of the basic block that tells
    the fuzzer 'hey, we've reached this part
  • 22:16 - 22:19
    of the program'—'hey, we've reached this
    other part of the program'. Then the
  • 22:19 - 22:25
    fuzzer can figure out whether that's a new
    part or not. ASan is also, like, you know,
  • 22:25 - 22:29
    it's also somewhat—it can also be
    implemented in this way by finding all
  • 22:29 - 22:34
    memory accesses, and then linking with
    libASan. libASan is a sort of runtime for
  • 22:34 - 22:39
    ASan that takes care of inserting the red
    zones and instrument—and adding, you know,
  • 22:39 - 22:43
    like, keeping around all the metadata that
    ASan needs to know where the red zones
  • 22:43 - 22:48
    are, and detecting whether a memory access
    is invalid. So, how can we apply all of
  • 22:48 - 22:52
    this in the kernel? Well, first of all,
    fuzzing the kernel is not as easy as
  • 22:52 - 22:58
    fuzzing some userspace program. There's
    some issues here. So first of all, there's
  • 22:58 - 23:02
    crash handling. So whenever you're fuzzing
    a userspace program, you expect crashes,
  • 23:02 - 23:06
    well, because that's what we're after. And
    if a userspace program crashes, then the
  • 23:06 - 23:11
    US simply terminates the crash gracefully.
    And so the fuzzer can detect this, and
  • 23:11 - 23:16
    save the input as a crashing input, and so
    on. And this is all fine. But when you're
  • 23:16 - 23:19
    fuzzing the kernel, so—if you were fuzzing
    the kernel of the machine that you were
  • 23:19 - 23:23
    using for fuzzing, after a while, the
    machine would just go down. Because, after
  • 23:23 - 23:27
    all, the kernel runs the machine, and if
    it starts misbehaving, then all of it can
  • 23:27 - 23:32
    go wrong. And more importantly, you can
    lose your crashes, because the if the
  • 23:32 - 23:35
    machine crashes, then the state of the
    fuzzer is lost and you have no idea what
  • 23:35 - 23:40
    your crashing input was. So what most
    kernel fuzzers have to do is that they
  • 23:40 - 23:43
    resort to some kind of VM to keep the
    system stable. So they fuzz the kernel in
  • 23:43 - 23:48
    a VM and then run the fuzzing agent
    outside the VM. On top of that is tooling.
  • 23:48 - 23:53
    So, if you want to fuzz a user space
    program, you can just download AFL or use
  • 23:53 - 23:58
    libfuzzer; there's plenty of tutorials
    online, it's really easy to set up and
  • 23:58 - 24:01
    just, like—compile your program, you start
    fuzzing and you're good to go. If you want
  • 24:01 - 24:05
    to fuzz the kernel, it's already much more
    complicated. So, for example, if you want
  • 24:05 - 24:09
    to fuzz Linux with, say, syzkaller, which
    is a popular kernel fuzzer, you have to
  • 24:09 - 24:14
    compile the kernel, you have to use a
    special config that supports syzkaller,
  • 24:14 - 24:20
    you have way less guides available than
    for userspace fuzzing, and in general it's
  • 24:20 - 24:25
    just much more complex and less intuitive
    than just fuzzing userspace. And lastly,
  • 24:25 - 24:29
    we have the issue of determinism. So in
    general, if you have a single threaded
  • 24:29 - 24:33
    userspace program, unless it uses some
    kind of random number generator, it is
  • 24:33 - 24:38
    more or less deterministic. There's
    nothing that affects the execution of the
  • 24:38 - 24:42
    program. But—and this is really nice if
    you want to try to reproduce a test case,
  • 24:42 - 24:46
    because if you have a non-deterministic
    test case, then it's really hard to know
  • 24:46 - 24:51
    whether this is really a crash or if it's
    just something that you should ignore, and
  • 24:51 - 24:56
    in the kernel this is even harder, because
    you don't only have concurrency, like
  • 24:56 - 25:01
    multi-processing, you also have interrupts.
    So interrupts can happen at any time, and
  • 25:01 - 25:06
    if one time you got an interrupt while
    executing your test case and the second
  • 25:06 - 25:10
    time you didn't, then maybe it only
    crashes one time - you don't really know,
  • 25:10 - 25:16
    it's not pretty. And so again, we
    have several approaches to fuzzing
  • 25:16 - 25:21
    binaries in the kernel. First one is to do
    black box fuzzing. We don't really
  • 25:21 - 25:24
    like this because it doesn't find much,
    especially in something complex
  • 25:24 - 25:27
    like a kernel. Approach 1 is to
    use dynamic translation,
  • 25:27 - 25:33
    so, use something
    like QEMU or—you name it. This works, and
  • 25:33 - 25:35
    people have used it successfully; the
    problem is that it is really, really,
  • 25:35 - 25:42
    really slow. Like, we're talking about
    10x-plus overhead. And as we said before,
  • 25:42 - 25:46
    the more iterations, the more test cases
    you can execute in the same amount of
  • 25:46 - 25:51
    time, the better, because you find more
    bugs. And on top of that, there's no
  • 25:51 - 25:58
    currently available sanitizer for
    kernel binaries that works—is based on
  • 25:58 - 26:01
    this approach. So in userspace you have
    something like valgrind; in the kernel,
  • 26:01 - 26:05
    you don't have anything, at least that we
    know of. There is another approach, which
  • 26:05 - 26:10
    is to use Intel Processor Trace. This has
    been, like—there's been some research
  • 26:10 - 26:14
    papers on this recently, and this is nice
    because it allows you to collect coverage
  • 26:14 - 26:18
    at nearly zero overhead. It's, like,
    really fast, but the problem is that it
  • 26:18 - 26:23
    requires hardware support, so it requires
    a fairly new x86 CPU, and if you want to
  • 26:23 - 26:27
    fuzz something on ARM, say, like, your
    Android driver, or if you want to use an
  • 26:27 - 26:32
    older CPU, then you're out of luck. And
    what's worse, you cannot really use it for
  • 26:32 - 26:36
    sanitization, or at least not the kind of
    sanitization that ASan does, because it
  • 26:36 - 26:42
    just traces the execution; it doesn't
    allow you to do checks on memory accesses.
  • 26:42 - 26:47
    So Approach 3, which is what we will use,
    is static rewriting. So, we had this very
  • 26:47 - 26:51
    nice framework for rewriting userspace
    binaries, and then we asked ourselves, can
  • 26:51 - 26:57
    we make this work in the kernel? So we
    took the system, the original RetroWrite,
  • 26:57 - 27:03
    we modified it, we implemented support for
    Linux modules, and... it works! So we have
  • 27:03 - 27:08
    implemented it—we have used it to fuzz
    some kernel modules, and it really shows
  • 27:08 - 27:12
    that this approach doesn't only work for
    userspace; it can also be applied to the
  • 27:12 - 27:19
    kernel. So as for some implementation, the
    nice thing about kernel modules is that
  • 27:19 - 27:22
    they're always position independent. So
    you cannot have position—like, fixed-
  • 27:22 - 27:26
    position kernel modules because Linux just
    doesn't allow it. So we sort of get that
  • 27:26 - 27:32
    for free, which is nice. And Linux modules
    are also a special class of ELF files,
  • 27:32 - 27:36
    which means that the format is—even though
    it's not the same as userspace binaries,
  • 27:36 - 27:40
    it's still somewhat similar, so we didn't
    have to change the symbolizer that much,
  • 27:40 - 27:47
    which is also nice. And we implemented
    symbolization with this, and we used it to
  • 27:47 - 27:54
    implement both code coverage and binary
    ASan for kernel binary modules. So for
  • 27:54 - 27:59
    coverage: The idea behind the whole
    RetroWrite project was that we wanted to
  • 27:59 - 28:04
    integrate with existing tools. So existing
    fuzzing tools. We didn't want to force our
  • 28:04 - 28:09
    users to write their own fuzzer that is
    compatible with RetroWrite. So for—in
  • 28:09 - 28:13
    userspace we had AFL-style coverage
    tracking, and binary ASan which is
  • 28:13 - 28:16
    compatible with source-based ASan, and we
    wanted to follow the same principle in the
  • 28:16 - 28:22
    kernel. So it turns out that Linux has
    this built-in coverage-tracking framework
  • 28:22 - 28:27
    called kCov that is used by several
    popular kernel fuzzers like syzkaller, and
  • 28:27 - 28:31
    we wanted to use it ourselves. So we
    designed our coverage instrumentation so
  • 28:31 - 28:37
    that it integrates with kCov. The downside
    is that you need to compile the kernel
  • 28:37 - 28:41
    with kCov, but then again, Linux is open
    source, so you can sort of always do that;
  • 28:41 - 28:44
    the kernel usually—it's usually not the
    kernel, it is a binary blob, but it's
  • 28:44 - 28:49
    usually only the modules. So that's just
    still fine. And the way you do this is—the
  • 28:49 - 28:53
    way you implement kCov for binary modules
    is that you just have to find the start of
  • 28:53 - 28:59
    every basic block, and add a call to some
    function that then stores the collected
  • 28:59 - 29:03
    coverage. So here's an example: we have a
    short snippet of code with three basic
  • 29:03 - 29:08
    blocks, and all we have to do is add a
    call to "trace_pc" to the start of the
  • 29:08 - 29:12
    basic block. "trace_pc" is a function that
    is part of the main kernel image that then
  • 29:12 - 29:17
    collects this coverage and makes it
    available to a userspace fuzzing agent. So
  • 29:17 - 29:21
    this is all really easy and it works. And
    let's now see how we implemented binary
  • 29:21 - 29:26
    ASan. So as I mentioned before, when we
    instrument the program with binary ASan in
  • 29:26 - 29:30
    userspace we link with libASan, which
    takes care of setting up the metadata,
  • 29:30 - 29:34
    takes care of putting the red zones around
    our allocations, and so on. So we had to
  • 29:34 - 29:37
    do something similar in the kernel; of
    course, you cannot link with libASan in
  • 29:37 - 29:43
    the kernel, because that doesn't work, but
    what we can do instead is, again, compile
  • 29:43 - 29:47
    the kernel with kASan support. So this
    instruments the allocator, kmalloc, to add
  • 29:47 - 29:52
    the red zones; it allocates space for the
    metadata, it keeps this metadata around,
  • 29:52 - 29:56
    does this all for us, which is really
    nice. And again, the big advantage of
  • 29:56 - 30:01
    using this approach is that we can
    integrate seamlessly with a kASan-
  • 30:01 - 30:06
    instrumented kernel and with fuzzers that
    rely on kASan such as syzkaller. So we see
  • 30:06 - 30:12
    this as more of a plus than, like, a
    limitation. And how do you implement ASan?
  • 30:12 - 30:17
    Well, you have to find every memory access
    and instrument it to check the—to check
  • 30:17 - 30:22
    whether this is accessing a red zone. And
    if it does then you just call this bug
  • 30:22 - 30:26
    report function that produces a stack
    trace, a bug report, and crashes the
  • 30:26 - 30:30
    kernel, so that the fuzzer can detect it.
    Again, this is compatible with source-
  • 30:30 - 30:37
    based kASan, so we're happy. We can simply
    load the rewritten module with added
  • 30:37 - 30:40
    instrumentation into a kernel, as long as
    you have compiled the kernel with the
  • 30:40 - 30:44
    right flags, and we can use a standard
    kernel fuzzer. Here for the—our
  • 30:44 - 30:50
    evaluation, we used syzkaller, a popular
    kernel fuzzer by some folks at Google, and
  • 30:50 - 30:55
    it worked really well. So we've finally
    reached the end of our journey, and now we
  • 30:55 - 31:00
    wanted to present some experiments we did
    to see if this really works. So for
  • 31:00 - 31:05
    userspace, we wanted to compare the
    performance of our binary ASan with
  • 31:05 - 31:10
    source-based ASan and with existing
    solutions that also work on binaries. So
  • 31:10 - 31:16
    for userspace, you can use valgrind
    memcheck. It's a memory sanitizer that is
  • 31:16 - 31:21
    based on binary translation and dynamic
    binary translation and works on binaries.
  • 31:21 - 31:25
    We compared it with source ASan and
    RetroWrite ASan on the SPEC CPU benchmark
  • 31:25 - 31:31
    and saw how fast it was. And for the
    kernel we decided to fuzz some file
  • 31:31 - 31:38
    systems and some drivers with syzkaller
    using both source-based KASan and kCov and
  • 31:38 - 31:45
    kRetroWrite-based KASan and kCov. So these
    are our results for userspace. So the red
  • 31:45 - 31:49
    bar is valgrind. We can see that the
    execution time of valgrind is the highest.
  • 31:49 - 31:56
    It is really, really slow—like, 3, 10, 30x
    overhead, way too slow for fuzzing. Then
  • 31:56 - 32:03
    in green, we have our binary ASan, which
    is, like, already a large improvement. In
  • 32:03 - 32:07
    orange we have source-based ASan. And then
    finally in blue we have the original code
  • 32:07 - 32:11
    without any instrumentation whatsoever. So
    we can see that source-based ASan has,
  • 32:11 - 32:17
    like, 2x or 3x overhead, and binary ASan
    is a bit higher, like, a bit less
  • 32:17 - 32:21
    efficient, but still somewhat close. So
    that's for userspace, and for the kernel,
  • 32:21 - 32:25
    we—these are some preliminary results, so,
    this is, like—I'm doing this work as part
  • 32:25 - 32:30
    of my master's thesis, and so I'm still,
    like, running the evaluation. Here we can
  • 32:30 - 32:33
    see that the overhead is already, like, a
    bit lower. So the reason for this is that
  • 32:33 - 32:40
    SPEC is a pure CPU benchmark; it doesn't
    interact with the system that much. And so
  • 32:40 - 32:44
    any instrumentation that you add is going
    to massively slow down, or, like,
  • 32:44 - 32:49
    considerably slow down the execution. By
    contrast, when you fuzz a file system with
  • 32:49 - 32:56
    syzkaller, not only every test case has to
    go from the high—the host to the guest and
  • 32:56 - 33:02
    then do multiple syscalls and so on, but
    also every system call has to go through
  • 33:02 - 33:05
    several layers of abstraction before it
    gets to the actual file system. And all
  • 33:05 - 33:10
    these—like, all of this takes a lot of
    time, and so in practice the overhead of
  • 33:10 - 33:16
    our instrumentation seems to be pretty
    reasonable. So, since we know that you
  • 33:16 - 33:33
    like demos, we've prepared a small demo of
    kRetroWrite. So. Let's see. Yep. Okay. All
  • 33:33 - 33:40
    right, so we've prepared a small kernel
    module. And this module is just, like,
  • 33:40 - 33:46
    really simple; it contains a
    vulnerability, and what it does is that it
  • 33:46 - 33:50
    creates a character device. So if you're
    not familiar with this, a character device
  • 33:50 - 33:55
    is like a fake file that is exposed by a
    kernel driver and that it can read to and
  • 33:55 - 34:02
    write from. And instead of going to a
    file, the data that you read—that you, in
  • 34:02 - 34:06
    this case, write to the fake file—goes to
    the driver and is handled by this demo
  • 34:06 - 34:10
    write function. So as we can see, this
    function allocates a buffer, a 16-byte
  • 34:10 - 34:15
    buffer on the heap, and then copies some
    data into it, and then it checks if the
  • 34:15 - 34:20
    data contains the string "1337". If it
    does, then it accesses the buffer out of
  • 34:20 - 34:23
    bounds; you can see "alloc[16]" and the
    buffer is sixteen bytes; this is an out-
  • 34:23 - 34:28
    of-bounds read by one byte. And if it
    doesn't then it just accesses the buffer
  • 34:28 - 34:33
    in bounds, which is fine, and it's not a
    vulnerability. So we can compile this
  • 34:33 - 34:47
    driver. OK, um... OK, and then so we have
    our module, and then we will instrument it
  • 34:47 - 35:01
    using kRetroWrite. So, instrument... Yes,
    please. OK. Right. So kRetroWrite did some
  • 35:01 - 35:07
    processing, and it produced an
    instrumented module with ASan or kASan and
  • 35:07 - 35:10
    a symbolized assembly file. We can
    actually have a look at the symbolized
  • 35:10 - 35:18
    assembly file to see what it looks like.
    Yes. Yes. OK. So, is this big enough?
  • 35:18 - 35:23
    Yeah... As you can see, so—we can actually
    see here the ASan instrumentation. Ah,
  • 35:23 - 35:29
    shouldn't—yeah. So, we—this is the ASan
    instrumentation. The original code loads
  • 35:29 - 35:33
    some data from this address. And as you
    can see, the ASan instrumentation first
  • 35:33 - 35:38
    computes the actual address, and then does
    some checking—basically, this is checking
  • 35:38 - 35:44
    some metadata that ASan stores to check if
    the address is in a red zone or not, and
  • 35:44 - 35:49
    then if the fail check fails, then it
    calls this ASan report which produces a
  • 35:49 - 35:55
    stack trace and crashes the kernel. So
    this is fine. We can actually even look at
  • 35:55 - 36:18
    the disassembly of both modules, so...
    object dump and then demo... Ah, nope. OK,
  • 36:18 - 36:22
    so on the left, we have the original
    module without any instrumentation; on the
  • 36:22 - 36:27
    right, we have the module instrumented
    with ASan. So as you can see, the original
  • 36:27 - 36:33
    module has "push r13" and then has this
    memory load here; on the right in the
  • 36:33 - 36:39
    instrumented module, kRetroWrite inserted
    the ASan instrumentation. So the original
  • 36:39 - 36:44
    load is still down here, but between that,
    between the first instruction and this
  • 36:44 - 36:48
    instruction, we have—now have the kASan
    instrumentation that does our check. So
  • 36:48 - 36:57
    this is all fine. Now we can actually test
    it and see what it does. So we can—we will
  • 36:57 - 37:02
    boot a very simple, a very minimal Linux
    system, and try to target the
  • 37:02 - 37:06
    vulnerability first with the non-
    instrumented module and then with the
  • 37:06 - 37:10
    instrumented module. And we can—we will
    see that in the—with the non-instrumented
  • 37:10 - 37:15
    module, the kernel will not crash, but
    with the instrumented module it will crash
  • 37:15 - 37:22
    and produce a bug report. So. Let's see.
    Yeah, this is a QEMU VM, I have no idea
  • 37:22 - 37:27
    why it's taking so long to boot. I'll
    blame the the demo gods not being kind to
  • 37:27 - 37:40
    us. Yeah, I guess we just have to wait.
    OK. So. All right, so we loaded the
  • 37:40 - 37:47
    module. We will see that it has created a
    fake file character device in /dev/demo.
  • 37:47 - 37:59
    Yep. We can write this file. Yep. So this
    will—this accesses the array in bounds,
  • 37:59 - 38:04
    and so this is fine. Then what we can also
    do is write "1337" to it so it will access
  • 38:04 - 38:09
    the array out of bounds. So this is the
    non instrumented module, so this will not
  • 38:09 - 38:14
    crash. It will just print some garbage
    value. Okay, that's it. Now we can load
  • 38:14 - 38:26
    the instrumented module instead... and do
    the same experiment again. All right. We
  • 38:26 - 38:32
    can see that /dev/demo is still here. So
    the module still works. Let's try to write
  • 38:32 - 38:39
    "1234" into it. This, again, doesn't
    crash. But when we try to write "1337",
  • 38:39 - 38:48
    this will produce a bug report.
    applause
  • 38:48 - 38:51
    So this has quite a lot of information. We
  • 38:51 - 38:56
    can see, like, the—where the memory was
    allocated, there's a stack trace for that;
  • 38:56 - 39:02
    it wasn't freed, so there's no stack trace
    for the free. And we see that the cache
  • 39:02 - 39:07
    size of the memory, like, it was a 16-byte
    allocation. We can see the shape of the
  • 39:07 - 39:11
    memory. We see that these two zeros means
    that there's two 8-byte chunks of valid
  • 39:11 - 39:16
    memory. And then these "fc fc fc" is
    the—are the red zones that I was talking
  • 39:16 - 39:20
    about before. All right, so that's it for
    the demo. We will switch back to our
  • 39:20 - 39:25
    presentation now. So... hope you enjoyed
    it.
  • 39:25 - 39:31
    gannimo: Cool. So after applying this to a
    demo module, we also wanted to see what
  • 39:31 - 39:35
    happens if we apply this to a real file
    system. After a couple of hours we
  • 39:35 - 39:41
    were—when we came back and checked on the
    results, we saw a couple of issues popping
  • 39:41 - 39:49
    up, including a nice set of use-after-free
    reads, a set of use-after-free writes, and
  • 39:49 - 39:56
    we checked the bug reports and we saw a
    whole bunch of Linux kernel issues popping
  • 39:56 - 40:03
    up one after the other in this nondescript
    module that we fuzzed. We're in the
  • 40:03 - 40:07
    process of reporting it. This will take
    some time until it is fixed; that's why
  • 40:07 - 40:13
    you see the blurry lines. But as you see,
    there's still quite a bit of opportunity
  • 40:13 - 40:19
    in the Linux kernel where you can apply
    different forms of targeted fuzzing into
  • 40:19 - 40:26
    different modules, leverage these modules
    on top of a kASan instrumented kernel and
  • 40:26 - 40:32
    then leveraging this as part of your
    fuzzing toolchain to find interesting
  • 40:32 - 40:39
    kernel 0days that... yeah. You can then
    develop further, or report, or do whatever
  • 40:39 - 40:45
    you want with them. Now, we've shown you
    how you can take existing binary-only
  • 40:45 - 40:51
    modules, think different binary-only
    drivers, or even existing modules where
  • 40:51 - 40:56
    you don't want to instrument a full set of
    the Linux kernel, but only focus fuzzing
  • 40:56 - 41:02
    and exploration on a small different—small
    limited piece of code and then do security
  • 41:02 - 41:09
    tests on those. We've shown you how we can
    do coverage-based tracking and address
  • 41:09 - 41:14
    sanitization. But this is also up to you
    on what kind of other instrumentation you
  • 41:14 - 41:18
    want. Like this is just a tool, a
    framework that allows you to do arbitrary
  • 41:18 - 41:24
    forms of instrumentation. So we've taken
    you on a journey from instrumenting
  • 41:24 - 41:29
    binaries over coverage-guided fuzzing and
    sanitization to instrumenting modules in
  • 41:29 - 41:37
    the kernel and then finding crashes in the
    kernel. Let me wrap up the talk. So, this
  • 41:37 - 41:42
    is one of the the fun pieces of work that
    we do in the hexhive lab at EPFL. So if
  • 41:42 - 41:46
    you're looking for postdoc opportunities
    or if you're thinking about a PhD, come
  • 41:46 - 41:52
    talk to us. We're always hiring. The tools
    will be released as open source. A large
  • 41:52 - 41:57
    chunk of the userspace work is already
    open source. We're working on a set of
  • 41:57 - 42:02
    additional demos and so on so that you can
    get started faster, leveraging the
  • 42:02 - 42:08
    different existing instrumentation that is
    already out there. The userspace work is
  • 42:08 - 42:12
    already available. The kernel work will be
    available in a couple of weeks. This
  • 42:12 - 42:17
    allows you to instrument real-world
    binaries for fuzzing, leveraging existing
  • 42:17 - 42:21
    transformations for coverage tracking to
    enable fast and effective fuzzing and
  • 42:21 - 42:26
    memory checking to detect the actual bugs
    that exist there. The key takeaway from
  • 42:26 - 42:32
    this talk is that RetroWrite and
    kRetroWrite enables static binary
  • 42:32 - 42:38
    rewriting at zero instrumentation cost. We
    take the limitation of focusing only on
  • 42:38 - 42:43
    position-independent code, which is not a
    real implementation, but we get the
  • 42:43 - 42:48
    advantage of being able to symbolize
    without actually relying on heuristics, so
  • 42:48 - 42:55
    we can even symbolize large, complex
    source—large, complex applications and
  • 42:55 - 43:01
    effectively rewrite those aspects and then
    you can focus fuzzing on these parts.
  • 43:01 - 43:06
    Another point I want to mention is that
    this enables you to reuse existing tooling
  • 43:06 - 43:11
    so you can take a binary blob, instrument
    it, and then reuse, for example, Address
  • 43:11 - 43:16
    Sanitizer or existing fuzzing tools, as it
    integrates really, really nice. As I said,
  • 43:16 - 43:23
    all the code is open source. Check it out.
    Try it. Let us know if it breaks. We're
  • 43:23 - 43:28
    happy to fix. We are committed to open
    source. And let us know if there are any
  • 43:28 - 43:37
    questions. Thank you.
    applause
  • 43:37 - 43:42
    Herald: So, thanks, guys, for an
    interesting talk. We have some time for
  • 43:42 - 43:47
    questions, so we have microphones along
    the aisles. We'll start from question from
  • 43:47 - 43:51
    microphone number two.
    Q: Hi. Thanks for your talk and for the
  • 43:51 - 43:59
    demo. I'm not sure about the use-case you
    showed for the kernel RetroWrite. 'Cause
  • 43:59 - 44:06
    you're usually interested in fuzzing
    binary in kernelspace when you don't have
  • 44:06 - 44:14
    source code for the kernel. For example,
    for IoT or Android and so on. But you just
  • 44:14 - 44:22
    reuse the kCov and kASan in the kernel,
    and you never have the kernel in IoT or
  • 44:22 - 44:29
    Android which is compiled with that. So
    are you—do you have any plans to binary
  • 44:29 - 44:32
    instrument the kernel itself, not the
    modules?
  • 44:32 - 44:39
    Nspace: So we thought about that. I think
    that there's some additional problems that
  • 44:39 - 44:44
    we would have to solve in order to be able
    to instrument the full kernel. So other
  • 44:44 - 44:48
    than the fact that it gives us
    compatibility with, like, existing tools,
  • 44:48 - 44:52
    the reason why we decided to go with
    compiling the kernel with kASan and kCov
  • 44:52 - 44:57
    is that building the, like—you would you
    have to, like, think about it. You
  • 44:57 - 45:02
    have to instrument the memory allocator to
    add red zones, which is, like, already
  • 45:02 - 45:07
    somewhat complex. You have to instrument
    the exception handlers to catch, like, any
  • 45:07 - 45:12
    faults that the instrumentation detects.
    You would have to, like, set up some
  • 45:12 - 45:17
    memory for the ASan shadow. So this is,
    like—I think you should be able to do it,
  • 45:17 - 45:22
    but it would require a lot of additional
    work. So this is, like—this was like four
  • 45:22 - 45:26
    months' thesis. So we decided to start
    small and prove that it works in
  • 45:26 - 45:30
    the kernel for modules, and then leave it
    to future work to actually extend it to
  • 45:30 - 45:38
    the full kernel. Also, like, I think for
    Android—so in the case of Linux, the
  • 45:38 - 45:42
    kernel is GPL, right, so if the
    manufacturers ships a custom kernel, they
  • 45:42 - 45:45
    have to release the source code, right?
    Q: They never do.
  • 45:45 - 45:47
    Nspace: They never—well, that's a
    different issue. Right?
  • 45:47 - 45:49
    gannimo: Right.
    Q: So that's why I ask, because I don't
  • 45:49 - 45:52
    see how it just can be used in the real
    world.
  • 45:52 - 45:57
    gannimo: Well, let me try to put this into
    perspective a little bit as well. Right.
  • 45:57 - 46:02
    So there's the—what we did so far is we
    leveraged existing tools, like kASan or
  • 46:02 - 46:09
    kCov, and integrated into these existing
    tools. Now, doing heap-based allocation is
  • 46:09 - 46:14
    fairly simple and replacing those with
    additional red zones—that instrumentation
  • 46:14 - 46:20
    you can carry out fairly well by focusing
    on the different allocators. Second to
  • 46:20 - 46:25
    that, simply oopsing the kernel and
    printing the stack trace is also fairly
  • 46:25 - 46:29
    straightforward. So it's not a lot of
    additional effort. So it is—it involves
  • 46:29 - 46:38
    some engineering effort to port this to
    non-kASan-compiled kernels. But we think
  • 46:38 - 46:45
    it is very feasible. In the interest of
    time, we focused on kASan-enabled kernels,
  • 46:45 - 46:51
    so that some form of ASan is already
    enabled. But yeah, this is additional
  • 46:51 - 46:56
    engineering effort. But there is also a
    community out there that can help us with
  • 46:56 - 47:01
    these kind of changes. So kRetroWrite and
    RetroWrite themselves are the binary
  • 47:01 - 47:07
    rewriting platform that allow you to turn
    a binary into an assembly file that you
  • 47:07 - 47:12
    can then instrument and run different
    passes on top of it. So another pass would
  • 47:12 - 47:16
    be a full ASan pass or kASan pass that
    somebody could add and then contribute
  • 47:16 - 47:19
    back to the community.
    Q: Yeah, it would be really useful.
  • 47:19 - 47:20
    Thanks.
    gannimo: Cool.
  • 47:20 - 47:24
    Angel: Next question from the Internet.
    Q: Yes, there is a question regarding the
  • 47:24 - 47:31
    slide on the SPEC CPU benchmark. The
    second or third graph from the right had
  • 47:31 - 47:37
    an instrumented version that was faster
    than the original program. Why is that?
  • 47:37 - 47:42
    gannimo: Cache effect. Thank you.
    Angel: Microphone number one.
  • 47:42 - 47:47
    Q: Thank you. Thank you for presentation.
    I have question: how many architecture do
  • 47:47 - 47:51
    you support, and if you have support more,
    what then?
  • 47:51 - 47:56
    gannimo: x86_64.
    Q: Okay. So no plans for ARM or MIPS,
  • 47:56 - 47:58
    or...?
    gannimo: Oh, there are plans.
  • 47:58 - 48:01
    Q: Okay.
    Nspace: Right, so—
  • 48:01 - 48:06
    gannimo: Right. Again, there's a finite
    amount of time. We focused on the
  • 48:06 - 48:12
    technology. ARM is high up on the list. If
    somebody is interested in working on it
  • 48:12 - 48:18
    and contributing, we're happy to hear from
    it. Our list of targets is ARM first and
  • 48:18 - 48:23
    then maybe something else. But I think
    with x86_64 and ARM we've covered a
  • 48:23 - 48:33
    majority of the interesting platforms.
    Q: And second question, did you try to
  • 48:33 - 48:38
    fuzz any real closed-source program?
    Because as I understand from presentation,
  • 48:38 - 48:45
    you fuzz, like, just file system, what we
    can compile and fuzz with syzkaller like
  • 48:45 - 48:49
    in the past.
    Nspace: So for the evaluation, we wanted
  • 48:49 - 48:52
    to be able to compare between the source-
    based instrumentation and the binary-based
  • 48:52 - 48:57
    instrumentation, so we focused mostly on
    open-source filesystem and drivers because
  • 48:57 - 49:02
    then we could instrument them with a
    compiler. We haven't yet tried, but this
  • 49:02 - 49:06
    is, like, also pretty high up on the list.
    We wanted to try to find some closed-
  • 49:06 - 49:11
    source drivers—there's lots of them, like
    for GPUs or anything—and we'll give it a
  • 49:11 - 49:15
    try and find some 0days, perhaps.
    Q: Yes, but with syzkaller, you still have
  • 49:15 - 49:23
    a problem. You have to write rules, like,
    dictionaries. I mean, you have to
  • 49:23 - 49:25
    understand the format, have to communicate
    with the driver.
  • 49:25 - 49:29
    Nspace: Yeah, right But there's, for
    example, closed-source file systems that
  • 49:29 - 49:33
    we are looking at.
    Q: Okay. Thinking.
  • 49:33 - 49:39
    Herald: Number two.
    Q: Hi. Thank you for your talk. So I don't
  • 49:39 - 49:45
    know if there are any kCov- or kASan-
    equivalent solution to Windows, but I was
  • 49:45 - 49:50
    wondering if you tried, or are you
    planning to do it on Windows, the
  • 49:50 - 49:53
    framework? Because I know it might be
    challenging because of the driver
  • 49:53 - 49:57
    signature enforcement and PatchGuard, but
    I wondered if you tried or thought about
  • 49:57 - 49:59
    it.
    gannimo: Yes, we thought about it and we
  • 49:59 - 50:06
    decided against it. Windows is incredibly
    hard and we are academics. The research I
  • 50:06 - 50:12
    do in my lab, or we do in my research lab,
    focuses on predominantly open-source
  • 50:12 - 50:17
    software and empowers open-source
    software. Doing full support for Microsoft
  • 50:17 - 50:21
    Windows is somewhat out of scope. If
    somebody wants to port these tools, we are
  • 50:21 - 50:24
    happy to hear it and work with these
    people. But it's a lot of additional
  • 50:24 - 50:29
    engineering effort, versus very
    additional—very low additional research
  • 50:29 - 50:33
    value, so we'll have to find some form of
    compromise. And, like, if you would be
  • 50:33 - 50:39
    willing to fund us, we would go ahead. But
    it's—yeah, it's a cost question.
  • 50:39 - 50:42
    Q: And you're referring both to kernel and
    user space, right?
  • 50:42 - 50:45
    gannimo: Yeah.
    Q: Okay. Thank you.
  • 50:45 - 50:48
    Herald: Number five.
    Q: Hi, thanks for the talk. This seems
  • 50:48 - 50:52
    most interesting if you're looking for
    vulnerabilities in closed source kernel
  • 50:52 - 50:58
    modules, but not giving it too much
    thought, it seems it's really trivial to
  • 50:58 - 51:02
    prevent this if you're writing a closed
    source module.
  • 51:02 - 51:07
    gannimo: Well, how would you prevent this?
    Q: Well, for starters, you would just take
  • 51:07 - 51:11
    a difference between the address of two
    functions. That's not gonna be IP
  • 51:11 - 51:16
    relative, so...
    Nspace: Right. So we explicitly—like, even
  • 51:16 - 51:22
    in the original RetroWrite paper—we
    explicitly decided to not try to deal with
  • 51:22 - 51:26
    obfuscated code, or code that is
    purposefully trying to defeat this kind of
  • 51:26 - 51:31
    rewriting. Because, like, the assumption
    is that first of all, there are techniques
  • 51:31 - 51:34
    to, like, deobfuscate code or remove
    these, like, checks in some way, but this
  • 51:34 - 51:40
    is, like, sort of orthogonal work. And at
    the same time, I guess most drivers are
  • 51:40 - 51:44
    not really compiled with the sort of
    obfuscation; they're just, like, you know,
  • 51:44 - 51:48
    they're compiled with a regular compiler.
    But yeah, of course, this is, like, a
  • 51:48 - 51:50
    limitation.
    gannimo: They're likely stripped, but not
  • 51:50 - 51:54
    necessarily obfuscated. At least from what
    we've seen when we looked at binary-only
  • 51:54 - 51:59
    drivers.
    Herald: Microphone number two.
  • 51:59 - 52:04
    Q: How do you decide where to place the
    red zones? From what I heard, you talked
  • 52:04 - 52:10
    about instrumenting the allocators, but,
    well, there are a lot of variables on the
  • 52:10 - 52:13
    stack, so how do you deal with those?
    gannimo: Oh, yeah, that's actually super
  • 52:13 - 52:20
    cool. I refer to some extent to the paper
    that is on the GitHub repo as well. If you
  • 52:20 - 52:27
    think about it, modern compilers use
    canaries for buffers. Are you aware of
  • 52:27 - 52:31
    stack canaries—how stack canaries work?
    So, stack canaries—like, if the compiler
  • 52:31 - 52:34
    sees there's a buffer that may be
    overflown, it places a stack canary
  • 52:34 - 52:40
    between the buffer and any other data.
    What we use is we—as part of our analysis
  • 52:40 - 52:45
    tool, we find these stack canaries, remove
    the code that does the stack canary, and
  • 52:45 - 52:49
    use this space to place our red zones. So
    we actually hack the stack in areas,
  • 52:49 - 52:55
    remove that code, and add ASan red zones
    into the empty stack canaries that are now
  • 52:55 - 52:59
    there. It's actually a super cool
    optimization because we piggyback on what
  • 52:59 - 53:03
    kind of work the compiler already did for
    us before, and we can then leverage that
  • 53:03 - 53:07
    to gain additional benefits and protect
    the stack as well.
  • 53:07 - 53:11
    Q: Thanks.
    Angel: Another question from the Internet.
  • 53:16 - 53:21
    Q: Yes. Did you consider lifting the
    binary code to LLVM IR instead of
  • 53:21 - 53:28
    generating assembler source?
    gannimo: Yes. laughter But, so—a little
  • 53:28 - 53:32
    bit longer answer. Yes, we did consider
    that. Yes, it would be super nice to lift
  • 53:32 - 53:39
    to LLVM IR. We've actually looked into
    this. It's incredibly hard. It's
  • 53:39 - 53:42
    incredibly complex. There's no direct
    mapping between the machine code
  • 53:42 - 53:48
    equivalent and the LLVM IR. You would
    still need to recover all the types. So
  • 53:48 - 53:52
    it's like this magic dream that you
    recover full LLVM IR, then do heavyweight
  • 53:52 - 53:57
    transformations on top of it. But this is
    incredibly hard because if you compile
  • 53:57 - 54:04
    down from LLVM IR to machine code, you
    lose a massive amount of information. You
  • 54:04 - 54:07
    would have to find a way to recover all of
    that information, which is pretty much
  • 54:07 - 54:15
    impossible and undecidable for many cases.
    So for example, just as a note, we only
  • 54:15 - 54:19
    recover control flow and we only
    desymbolize control flow. For data
  • 54:19 - 54:23
    references—we don't support
    instrumentation of data references yet
  • 54:23 - 54:29
    because there's still an undecidable
    problem that we are facing with. I can
  • 54:29 - 54:33
    talk more about this offline, or there is
    a note in the paper as well. So this is
  • 54:33 - 54:37
    just a small problem. Only if you're
    lifting to assembly files. If you're
  • 54:37 - 54:42
    lifting to LLVM IR, you would have to do
    full end-to-end type recovery, which is
  • 54:42 - 54:46
    massively more complicated. Yes, it would
    be super nice. Unfortunately, it is
  • 54:46 - 54:51
    undecidable and really, really hard. So
    you can come up with some heuristics, but
  • 54:51 - 54:55
    there is no solution that will do this
    in—that will be correct 100 percent of the
  • 54:55 - 54:57
    time.
    Angel: We'll take one more question from
  • 54:57 - 55:03
    microphone number six.
    Q: Thank you for your talk. What kind of
  • 55:03 - 55:07
    disassemblers did you use for RetroWrite,
    and did you have problems with the wrong
  • 55:07 - 55:13
    disassembly? And if so, how did you handle
    it?
  • 55:13 - 55:19
    Nspace: So, RetroWrite—so we used
    Capstone for the disassembly.
  • 55:19 - 55:24
    gannimo: An amazing tool, by the way.
    Nspace: Yeah. So the idea is that, like,
  • 55:24 - 55:30
    we need some kind of—some information
    about where the functions are. So for the
  • 55:30 - 55:34
    kernel modules, this is actually fine
    because kernel modules come with this sort
  • 55:34 - 55:38
    of information because the kernel needs
    it, to build stack traces, for example.
  • 55:38 - 55:42
    For userspace binaries, this is somewhat
    less common, but you can use another tool
  • 55:42 - 55:46
    to try to do function identification. And
    we do, like—sort of, like, disassemble the
  • 55:46 - 55:54
    entire function. So we have run into some
    issues with, like—in AT&T syntax, because
  • 55:54 - 56:00
    like we wanted to use gas, GNU's
    assembler, for, for...
  • 56:00 - 56:04
    gannimo: Reassembling.
    Nspace: Reassembly, yeah. And some
  • 56:04 - 56:10
    instructions are a lot—you can express the
    same, like, two different instructions,
  • 56:10 - 56:16
    like five-byte NOP and six-byte NOP, using
    the same string of, like, text—a mnemonic,
  • 56:16 - 56:20
    an operand string. But the problem is
    that, like, the kernel doesn't like it and
  • 56:20 - 56:22
    crashes. This took me like two days to
    debug.
  • 56:22 - 56:28
    gannimo: So the kernel uses dynamic binary
    patching when it runs, at runtime, and it
  • 56:28 - 56:33
    uses fixed offsets, so if you replace a
    five-byte NOP with a six-byte NOP or vice
  • 56:33 - 56:38
    versa, your offsets change and your kernel
    just blows up in your face.
  • 56:38 - 56:43
    Q: So it was kind of a case-on-case basis
    where you saw the errors coming out of the
  • 56:43 - 56:48
    disassembly and you had to fix it?
    Nspace: So sorry, can you repeat the
  • 56:48 - 56:51
    question?
    Q: Like, for example, if you—if some
  • 56:51 - 56:55
    instruction is not supported by the
    disassembler, so you saw that it crashed,
  • 56:55 - 56:58
    that there's something wrong, and then you
    fix it by hand?
  • 56:58 - 57:03
    Nspace: Yeah, well, if we saw that there
    was a problem with it, this—like, I don't
  • 57:03 - 57:07
    recall having any unknown instructions in
    the dissasembler. I don't think I've ever
  • 57:07 - 57:11
    had a problem with that. But yeah, this
    was a lot of, like, you know, engineering
  • 57:11 - 57:14
    work.
    gannimo: So let me repeat. The problem was
  • 57:14 - 57:19
    not a bug in the disassembler, but an
    issue with the instruction format—that the
  • 57:19 - 57:25
    same mnemonic can be translated into two
    different instructions, one of which was
  • 57:25 - 57:29
    five bytes long, the other one was six
    bytes long. Both used the exact same
  • 57:29 - 57:33
    mnemonic. Right, so this was an issue with
    assembly encoding.
  • 57:33 - 57:38
    Q: But you had no problems with
    unsupported instructions which couldn't be
  • 57:38 - 57:41
    disassembled?
    Nspace: No, no. Not as far as I know, at
  • 57:41 - 57:43
    least.
    Angel: We have one more minute, so a very
  • 57:43 - 57:52
    short question from microphone number two.
    Q: Does it work? Ah. Is your binary
  • 57:52 - 58:02
    instrumentation equally powerful as kernel
    address space... I mean, kASan? So, does
  • 58:02 - 58:06
    it detect all the memory corruptions on
    stack, heap and globals?
  • 58:06 - 58:13
    gannimo: No globals. But heap—it does all
    of them on the heap. There's some slight
  • 58:13 - 58:20
    variation on the stack because we have to
    piggyback on the canary stuff. As I
  • 58:20 - 58:24
    mentioned quickly before, there is no
    reflowing and full recovery of data
  • 58:24 - 58:29
    layouts. So to get anything on the stack,
    we have to piggyback on existing compiler
  • 58:29 - 58:37
    extensions like stack canaries. But—so we
    don't support intra-object overflows on
  • 58:37 - 58:41
    the stack. But we do leverage the stack in
    areas to get some stack benefits, which
  • 58:41 - 58:45
    is, I don't know, 90, 95 percent there
    because the stack canaries are pretty
  • 58:45 - 58:51
    good. For heap, we get the same precision.
    For globals, we have very limited support.
  • 58:51 - 58:54
    Q: Thanks.
    Angel: So that's all the time we have for
  • 58:54 - 58:58
    this talk. You can find the speakers, I
    think, afterwards offline. Please give
  • 58:58 - 59:00
    them a big round of applause for an
    interesting talk.
  • 59:00 - 59:03
    applause
  • 59:03 - 59:07
    36c3 postrol music
  • 59:07 - 59:29
    Subtitles created by c3subtitles.de
    in the year 2021. Join, and help us!
Title:
36C3 - No source, no problem! High speed binary fuzzing
Description:

more » « less
Video Language:
English
Duration:
59:29

English subtitles

Revisions