-
36c3 prerol music
-
Herald: So, the next talk for this
afternoon is about high speed binary
-
fuzzing. We have two researchers that will
be presenting the product of their latest
-
work, which is a framework for static
binary rewriting. Our speakers are—the
-
first one is a computer science master's
student at EPFL and the second one is a
-
security researcher and assistant
professor at EPFL. Please give a big round
-
of applause to Nspace and gannimo.
-
Applause
-
gannimo (Mathias Payer): Thanks for the
introduction. It's a pleasure to be here,
-
as always. We're going to talk about
different ways to speed up your fuzzing
-
and to find different kinds of
vulnerabilities or to tweak your binaries
-
in somewhat unintended ways. I'm Mathias
Payer or I go by gannimo on Twitter and I
-
am an assistant professor at EPFL working
on different forms of software security:
-
fuzzing sanitization, but also different
kinds of mitigations. And Matteo over
-
there is working on his master's thesis on
different forms of binary rewriting for
-
the kernel. And today we're going to take
you on a journey on how to actually
-
develop very fast and very efficient
binary rewriting mechanisms that allow you
-
to do unintended modifications to the
binaries and allow you to explore
-
different kinds of unintended features in
binaries. So about this talk. What we
-
discovered or the reason why we set out on
this journey was that fuzzing binaries is
-
really, really hard. There's very few
tools in user space. There's—it's
-
extremely hard to set it up and it's
extremely hard to set it up in a
-
performant way. The setup is complex. You
have to compile different tools. You have
-
to modify it. And the results are not
really that satisfactory. As soon as you
-
move to the kernel, fuzzing binaries in a
kernel is even harder. There's no tooling
-
whatsoever, there's very few users
actually working with binary code in the
-
kernel or modifying binary code, and it's
just a nightmare to work with. So what we
-
are presenting today is a new approach
that allows you to instrument any form of
-
binary code or modern binary code based on
static rewriting, which gives you full
-
native performance. You only pay for the
instrumentation that you add, and you can
-
do very heavyweight transformations on top
of it. The picture, if you look at the
-
modern system, let's say we are looking at
a modern setup. Let's say you're looking
-
at cat pictures in your browser: Chrome
plus the kernel plus the libc plus the
-
graphical user interface together clog up
at about 100 million lines of code.
-
Instrumenting all of this for some form of
security analysis is a nightmare,
-
especially along this large stack of
software. There's quite a bit of different
-
compilers involved. There's different
linkers. It may be compiled on a different
-
system, with different settings and so on.
And then getting your instrumentation
-
across all of this is pretty much
impossible and extremely hard to work
-
with. And we want to enable you to select
those different parts that you're actually
-
interested in. Modify those and then focus
your fuzzing or analysis approaches on
-
those small subsets of the code, giving
you a much better and stronger capability
-
to test the systems that you're, or those
parts of the system that you're really,
-
really interested in. Who's worked on
fuzzing before? Quick show of hands. Wow,
-
that's a bunch of you. Do you use AFL?
Yeah, most of you, AFL. Libfuzzer? Cool,
-
about 10, 15 percent libfuzzer, 30 percent
fuzzing, and AFL. There's a quite good
-
knowledge of fuzzing, so I'm not going to
spend too much time on fuzzing, but for
-
those that haven't really run their
fuzzing campaigns yet, it's a very simple
-
software testing technique. You're
effectively taking a binary, let's say
-
Chrome, as a target and you're running
this in some form of execution
-
environment. And fuzzing then consists of
some form of input generation that creates
-
new test cases, throws them at your
program and sees—and checks what is
-
happening with your program. And either
everything is OK, and your code is being
-
executed, and your input—the program
terminates, everything is fine, or you
-
have a bug report. If you have a bug
report, you can use this. Find the
-
vulnerability, maybe develop a PoC and
then come up with some form of either
-
exploit or patch or anything else. Right.
So this is pretty much fuzzing in a in a
-
nutshell. How do you get fuzzing to be
effective? How can you cover large source
-
bases, complex code, and complex
environment? Well, there's a couple of
-
simple steps that you can take. And let's
walk quickly through effective fuzzing
-
101. Well, first, you want to be able to
create test cases that actually trigger
-
bugs. And this is a very, very
complicated, complicated part. And we need
-
to have some notion of the inputs that a
program accepts. And we need to have some
-
notion of how we can explore different
parts of the program, right? Different
-
parts of functionality. Well, on one hand,
we could have a developer write all the
-
test cases by hand, but this would be kind
of boring. It would also require a lot of
-
human effort in creating these different
inputs and so on. So coverage guided
-
fuzzing has evolved as a very simple way
to guide the fuzzing process, leveraging
-
the information on which parts of the code
have been executed by simply tracing the
-
individual path through the program based
on the execution flow. So we can—the
-
fuzzer can use this feedback to then
modify the inputs that are being thrown at
-
the fuzzing process. The second step is
the fuzzer must be able to detect bugs. If
-
you've ever looked at a memory corruption,
if you're just writing one byte after the
-
end of a buffer, it's highly likely that
your software is not going to crash. But
-
it's still a bug, and it may still be
exploitable based on the underlying
-
conditions. So we want to be able to
detect violations as soon as they happen,
-
for example, based on on some form of
sanitization that we add, some form of
-
instrumentation that we add to the to the
binary, that then tells us, hey, there's a
-
violation of the memory safety property,
and we terminate the application right
-
away as a feedback to the fuzzer. Third,
but the—and last but not least: Speed is
-
key, right? For if you're running a
fuzzing campaign, you have a fixed
-
resource budget. You have a couple of
cores, and you want to run for 24 hours,
-
48 hours, a couple of days. But in any
way, whatever your constraints are, you
-
have a fixed amount of instructions that
you can actually execute. And you have to
-
decide, am I spending my instructions on
generating new inputs, tracking
-
constraints, finding bugs, running
sanitization or executing the program? And
-
you need to find a balance between all of
them, as it is a zero sum game. You have a
-
fixed amount of resources and you're
trying to make the best with these
-
resources. So any overhead is slowing you
down. And again, this becomes an
-
optimization problem. How can you most
effectively use the resources that you
-
have available? As we are fuzzing with
source code, it's quite easy to actually
-
leverage existing mechanisms, and we add
all that instrumentation at compile time.
-
We take source code, we pipe it through
the compiler and modern compiler
-
platforms, allow you to instrument and add
little code snippets during the
-
compilation process that then carry out
all these tasks that are useful for
-
fuzzing. For example, modern compilers can
add short snippets of code for coverage
-
tracking that will record which parts of
the code that you have executed, or for
-
sanitization which record and check every
single memory access if it is safe or not.
-
And then when you're running the
instrumented binary, everything is fine
-
and you can detect the policy violations
as you go along. Now if you would have
-
source code for everything, this would be
amazing. But it's often not the case,
-
right? We may be able on Linux to cover a
large part of the protocol stack by
-
focusing only on source-code-based
approaches. But there may be applications
-
where no source code is available. If we
move to Android or other mobile systems,
-
there's many drivers that are not
available as open source or just available
-
as binary blobs, or the full software
stack may be closed-source and we only get
-
the binaries. And we still want to find
vulnerabilities in these complex software
-
stacks that span hundreds of millions of
lines of code in a very efficient way. The
-
only solution to cover this part of
massive code base is to actually rewrite
-
and focus on binaries. A very simple
approach could be black box fuzzing, but
-
this is—this doesn't really get you
anywhere because you don't get any
-
feedback; you don't get any information if
you're triggering bugs. So one simple
-
approach, and this is the approach that is
most dominantly used today, is to rewrite
-
the program or the binary dynamically. So
you're taking the binary and during
-
execution you use some form of dynamic
binary instrumentation based on Pin, angr,
-
or some other binary rewriting tool and
translate the targeted runtime, adding
-
this binary instrumentation on top of it
as you're executing it. It's simple, it's
-
straightforward, but it comes at a
terrible performance cost of ten to a
-
hundred x slow down, which is not really
effective. And you're spending all your
-
cores and your cycles on just executing
the binary instrumentation. So we don't
-
really want to do this and we want to have
something that's more effective than that.
-
So what we are focusing on is to do static
rewriting. It involves a much more complex
-
analysis as we are rewriting the binary
before it is being executed, and we have
-
to recover all of the control flow, all of
the different mechanisms, but it results
-
in a much better performance. And we can
get more bang for our buck. So why is
-
static rewriting so challenging? Well,
first, simply adding code will break the
-
target. So if you are disassembling this
piece of code here, which is a simple loop
-
that loads data, decrements the registers,
and then jumps if you're not at the end of
-
the array and keeps iterating through this
array. Now, as you look at the jump-not-
-
zero instruction, the last instruction of
the snippet, it is a relative offset. So
-
it jumps backward seven bytes. Which is
nice if you just execute the code as is.
-
But as soon as you want to insert new
code, you change the offsets in the
-
program, and you're modifying all these
different offsets. And simply adding new
-
code somewhere in between will break the
target. So a core feature that we need to
-
enforce, or core property that we need to
enforce, is that we must find all the
-
references and properly adjust them, both
relative offsets and absolute offsets as
-
well. Getting a single one wrong will
break everything. What makes this problem
-
really, really hard is that if you're
looking at the binary, a byte is a byte,
-
right? There's no way for us to
distinguish between scalars and
-
references, and in fact they are
indistinguishable. Getting a single
-
reference wrong breaks the target and
would introduce arbitrary crashes. So we
-
have to come up with ways that allow us to
distinguish between the two. So for
-
example, if you have this code here, it
takes a value and stores it somewhere on
-
the stack. This could come from two
different kind of high-level constructs.
-
On one hand, it could be taking the
address of a function and storing this
-
function address somewhere and in a stack
variable. Or it could be just storing a
-
scalar in a stack variable. And these two
are indistinguishable, and rewriting them,
-
as soon as we add new code, the offsets
will change. If it is a function, we would
-
have to modify the value; if it is a
scalar, we have to keep the same value. So
-
how can we come up with a way that allows
us to distinguish between the two and
-
rewrite binaries by recovering this
missing information? So let us take—let me
-
take you or let us take you on a journey
towards instrumenting binaries in the
-
kernel. This is what we aim for. We'll
start with the simple case of
-
instrumenting binaries in user land, talk
about different kinds of coverage guided
-
fuzzing and what kind of instrumentation
we can add, what kind of sanitization we
-
can add, and then focusing on taking it
all together and applying it to kernel
-
binaries to see what what will fall out of
it. Let's start with instrumenting
-
binaries first. I will now talk a little
bit about RetroWrite, our mechanism and
-
our tool that enables static binary
instrumentation by symbolizing existing
-
binaries. So we recover the information
and we translate relative offsets and
-
absolute offsets into actual labels that
are added to the assembly file. The
-
instrumentation can then work on the
recovered assembly file, which can then be
-
reassembled into a binary that can then be
executed for fuzzing. We implement
-
coverage tracking and binary address
sanitizer on top of this, leveraging
-
abstraction as we go forward. The key to
enabling this kind of binary rewriting is
-
position-independent code. And position-
independent code has become the de-facto
-
standard for any code that is being
executed on a modern system. And it
-
effectively says that it is code that can
be loaded at any arbitrary address in your
-
address space as you are executing
binaries. It is essential and a
-
requirement if you want to have address
space layout randomization or if you want
-
to use shared libraries, which de facto
you want to use in all these different
-
systems. So since a couple of years, all
the code that you're executing on your
-
phones, on your desktops, on your laptops
is position-independent code. And the idea
-
between the position-independent code is
that you can load it anywhere in your
-
address space and you can therefore not
use any hard-coded static addresses and
-
you have to inform the system of
relocations or pick relative
-
addresses—to—on how the system can
relocate these different mechanisms. On
-
x86_64, position-independent code
leverages addressing that is relative to
-
the instruction pointer. So for example,
it uses the current instruction pointer
-
and then a relative offset to that
instruction pointer to reference global
-
variables, other functions and so on. And
this is a very easy way for us to
-
distinguish references from constants,
especially in PIE binaries. If it is RIP-
-
relative, it is a reference; everything
else is a constant. And we can build our
-
translation algorithm and our translation
mechanism based on this fundamental
-
finding to remove any form of heuristic
that is needed by focusing especially on
-
position-independent code. So we're
supporting position-independent code; we
-
are—we don't support non-position-
independent code, but we give you the
-
guarantee that we can rewrite all the
different code that is out there. So
-
symbolization works as follows: If you
have the little bit of code on the lower
-
right, symbolization replaces first all
the references with assembler labels. So
-
look at the call instruction and the jump-
not-zero instruction; the call instruction
-
references an absolute address and the
jump-not-zero instruction jumps backward
-
relative 15 bytes. So by focusing on these
relative jumps and calls, we can replace
-
them with actual labels and rewrite the
binary as follows: so we're calling a
-
function, replacing it with the actual
label, and for the jump-not-zero we are
-
inserting an actual label in the assembly
code and adding a backward reference. For
-
PC-relative addresses, for example the
data load, we can then replace it with the
-
name of the actual data that we have
recovered, and we can then add all the
-
different relocations and use that as
auxiliary information on top of it. After
-
these three steps, we can insert any new
code in between, and can therefore add
-
different forms of instrumentations or run
some more higher-level analysis on top of
-
it, and then reassemble the file for
fuzzing or coverage-guided tracking,
-
address sanitization or whatever else you
want to do. I will now hand over to
-
Matteo, who will cover coverage-guided
fuzzing and sanitization and then
-
instrumenting the binaries in the kernel.
Go ahead.
-
Nspace (Matteo Rizzo): So, now that we
have this really nice framework to rewrite
-
binaries, one of the things that we want
to add to actually get the fuzzing is this
-
coverage-tracking instrumentation. So
coverage-guided fuzzing is a way, a
-
method, for—to let the fuzzer discover
interesting inputs, an interesting path to
-
the target by itself. So the basic idea is
that the fuzzer will track coverage—the
-
parts of the programs that are covered by
different inputs by inserting some kind of
-
instrumentation. So, for example, here we
have this target program that checks if
-
the input contains the string "PNG" at the
beginning, and if it does, then it does
-
something interesting, otherwise it just
bails out and fails. So if we track the
-
part of the programs that each input
executes, the fuzzer can figure out that
-
an input that contains "P" will have
discovered a different path through the
-
program than input that doesn't contain
it. And then so on it can, one byte at a
-
time, discover that this program expects
this magic sequence "PNG" at the start of
-
the input. So the way that the fuzzer does
this is that every time a new input
-
discovers a new path though the target, it
is considered interesting and added to a
-
corpus of interesting inputs. And every
time the fuzzer needs to generate a new
-
input, it will select something from the
corpus, mutate it randomly, and then use
-
it as the new input. So this is like
a—this is, like, conceptually pretty
-
simple, but in practice it works really
well and it really lets the fuzzer
-
discover the format that the target
expects in an unsupervised way. So as an
-
example, this is an experiment that was
run by the author of AFL—AFL is the fuzzer
-
that sort of popularized this
technique—where he was fuzzing a JPEG-
-
parsing library, starting from a corpus
that only contained the string "hello". So
-
now clearly "hello" is not a valid JPEG
image and so—but still, like, the fuzzer
-
was still able to find—to discover the
correct format. So after a while it
-
started generating some grayscale images,
on the top left, and as it generated more
-
and more inputs, it started generating
more interesting images, such as some
-
grayscale gradients, and later on even
some color images. So as you can see, this
-
really works, and it allows us to fuzz a
program without really teaching the fuzzer
-
how the input should look like. So that's
it for coverage-guided fuzzing. Now we'll
-
talk a bit about sanitizations. As a
reminder, the core idea behind
-
sanitization is that just looking for
crashes is likely to miss some of the
-
bugs. So, for example, if you have this
out-of-bounds one-byte read, that will
-
probably not crash the target, but you
would still like to catch it because it
-
could be used for an info leak, for
example. So one of the most popular
-
sanitizers is Address Sanitizer. So
Address Sanitizer will instrument all the
-
memory accesses in your program and check
for memory corruption, which—so, memory
-
corruption is a pretty dangerous class of
bugs that unfortunately still plagues C
-
and C++ programs and unsafe languages in
general. And ASan tries to catch it by
-
instrumenting the target. It is very
popular; it has been used to find
-
thousands of bugs in complex software like
Chrome and Linux, and even though it has,
-
like, a bit of a slowdown—like about 2x—it
is still really popular because it lets
-
you find many, many more bugs. So how does
it work? The basic idea is that ASan will
-
insert some special regions of memory
called 'red zones' around every object in
-
memory. So we have a small example here
where we declare a 4-byte array on the
-
stack. So ASan will allocate the array
"buf" and then add a red zone before it
-
and a red zone after it. Whenever the
program accesses the red zones, it is
-
terminated with a security violation. So
the instrumentation just prints a bug
-
report and then crashes the target. This
is very useful for detecting, for example,
-
buffer overflows or underflows and many
other kinds of bugs such as use-after-free
-
and so on. So, as an example here, we are
trying to copy 5 bytes into a 4-byte
-
buffer, and ASan will check each of the
accesses one by one. And when it sees that
-
the last byte writes to a red zone, it
detects the violation and crashes the
-
program. So this is good for us because
this bug might have not been found by
-
simply looking for crashes, but it's
definitely found if we use ASan. So this
-
is something we want for fuzzing. So now
that we've covered—briefly covered ASan we
-
can talk about instrumenting binaries in
the kernel. So Mathias left us with
-
RetroWrite, and with RetroWrite we can add
both coverage tracking and ASan to
-
binaries. So the simple—it's a really
simple idea: now that we can rewrite this
-
binary and add instructions wherever we
want, we can implement both coverage
-
tracking and ASan. In order to implement
coverage tracking, we simply have to
-
identify the start of every basic block
and add a little piece of instrumentation
-
at the start of the basic block that tells
the fuzzer 'hey, we've reached this part
-
of the program'—'hey, we've reached this
other part of the program'. Then the
-
fuzzer can figure out whether that's a new
part or not. ASan is also, like, you know,
-
it's also somewhat—it can also be
implemented in this way by finding all
-
memory accesses, and then linking with
libASan. libASan is a sort of runtime for
-
ASan that takes care of inserting the red
zones and instrument—and adding, you know,
-
like, keeping around all the metadata that
ASan needs to know where the red zones
-
are, and detecting whether a memory access
is invalid. So, how can we apply all of
-
this in the kernel? Well, first of all,
fuzzing the kernel is not as easy as
-
fuzzing some userspace program. There's
some issues here. So first of all, there's
-
crash handling. So whenever you're fuzzing
a userspace program, you expect crashes,
-
well, because that's what we're after. And
if a userspace program crashes, then the
-
US simply terminates the crash gracefully.
And so the fuzzer can detect this, and
-
save the input as a crashing input, and so
on. And this is all fine. But when you're
-
fuzzing the kernel, so—if you were fuzzing
the kernel of the machine that you were
-
using for fuzzing, after a while, the
machine would just go down. Because, after
-
all, the kernel runs the machine, and if
it starts misbehaving, then all of it can
-
go wrong. And more importantly, you can
lose your crashes, because the if the
-
machine crashes, then the state of the
fuzzer is lost and you have no idea what
-
your crashing input was. So what most
kernel fuzzers have to do is that they
-
resort to some kind of VM to keep the
system stable. So they fuzz the kernel in
-
a VM and then run the fuzzing agent
outside the VM. On top of that is tooling.
-
So, if you want to fuzz a user space
program, you can just download AFL or use
-
libfuzzer; there's plenty of tutorials
online, it's really easy to set up and
-
just, like—compile your program, you start
fuzzing and you're good to go. If you want
-
to fuzz the kernel, it's already much more
complicated. So, for example, if you want
-
to fuzz Linux with, say, syzkaller, which
is a popular kernel fuzzer, you have to
-
compile the kernel, you have to use a
special config that supports syzkaller,
-
you have way less guides available than
for userspace fuzzing, and in general it's
-
just much more complex and less intuitive
than just fuzzing userspace. And lastly,
-
we have the issue of determinism. So in
general, if you have a single threaded
-
userspace program, unless it uses some
kind of random number generator, it is
-
more or less deterministic. There's
nothing that affects the execution of the
-
program. But—and this is really nice if
you want to try to reproduce a test case,
-
because if you have a non-deterministic
test case, then it's really hard to know
-
whether this is really a crash or if it's
just something that you should ignore, and
-
in the kernel this is even harder, because
you don't only have concurrency, like
-
multi-processing, you also have interrupts.
So interrupts can happen at any time, and
-
if one time you got an interrupt while
executing your test case and the second
-
time you didn't, then maybe it only
crashes one time - you don't really know,
-
it's not pretty. And so again, we
have several approaches to fuzzing
-
binaries in the kernel. First one is to do
black box fuzzing. We don't really
-
like this because it doesn't find much,
especially in something complex
-
like a kernel. Approach 1 is to
use dynamic translation,
-
so, use something
like QEMU or—you name it. This works, and
-
people have used it successfully; the
problem is that it is really, really,
-
really slow. Like, we're talking about
10x-plus overhead. And as we said before,
-
the more iterations, the more test cases
you can execute in the same amount of
-
time, the better, because you find more
bugs. And on top of that, there's no
-
currently available sanitizer for
kernel binaries that works—is based on
-
this approach. So in userspace you have
something like valgrind; in the kernel,
-
you don't have anything, at least that we
know of. There is another approach, which
-
is to use Intel Processor Trace. This has
been, like—there's been some research
-
papers on this recently, and this is nice
because it allows you to collect coverage
-
at nearly zero overhead. It's, like,
really fast, but the problem is that it
-
requires hardware support, so it requires
a fairly new x86 CPU, and if you want to
-
fuzz something on ARM, say, like, your
Android driver, or if you want to use an
-
older CPU, then you're out of luck. And
what's worse, you cannot really use it for
-
sanitization, or at least not the kind of
sanitization that ASan does, because it
-
just traces the execution; it doesn't
allow you to do checks on memory accesses.
-
So Approach 3, which is what we will use,
is static rewriting. So, we had this very
-
nice framework for rewriting userspace
binaries, and then we asked ourselves, can
-
we make this work in the kernel? So we
took the system, the original RetroWrite,
-
we modified it, we implemented support for
Linux modules, and... it works! So we have
-
implemented it—we have used it to fuzz
some kernel modules, and it really shows
-
that this approach doesn't only work for
userspace; it can also be applied to the
-
kernel. So as for some implementation, the
nice thing about kernel modules is that
-
they're always position independent. So
you cannot have position—like, fixed-
-
position kernel modules because Linux just
doesn't allow it. So we sort of get that
-
for free, which is nice. And Linux modules
are also a special class of ELF files,
-
which means that the format is—even though
it's not the same as userspace binaries,
-
it's still somewhat similar, so we didn't
have to change the symbolizer that much,
-
which is also nice. And we implemented
symbolization with this, and we used it to
-
implement both code coverage and binary
ASan for kernel binary modules. So for
-
coverage: The idea behind the whole
RetroWrite project was that we wanted to
-
integrate with existing tools. So existing
fuzzing tools. We didn't want to force our
-
users to write their own fuzzer that is
compatible with RetroWrite. So for—in
-
userspace we had AFL-style coverage
tracking, and binary ASan which is
-
compatible with source-based ASan, and we
wanted to follow the same principle in the
-
kernel. So it turns out that Linux has
this built-in coverage-tracking framework
-
called kCov that is used by several
popular kernel fuzzers like syzkaller, and
-
we wanted to use it ourselves. So we
designed our coverage instrumentation so
-
that it integrates with kCov. The downside
is that you need to compile the kernel
-
with kCov, but then again, Linux is open
source, so you can sort of always do that;
-
the kernel usually—it's usually not the
kernel, it is a binary blob, but it's
-
usually only the modules. So that's just
still fine. And the way you do this is—the
-
way you implement kCov for binary modules
is that you just have to find the start of
-
every basic block, and add a call to some
function that then stores the collected
-
coverage. So here's an example: we have a
short snippet of code with three basic
-
blocks, and all we have to do is add a
call to "trace_pc" to the start of the
-
basic block. "trace_pc" is a function that
is part of the main kernel image that then
-
collects this coverage and makes it
available to a userspace fuzzing agent. So
-
this is all really easy and it works. And
let's now see how we implemented binary
-
ASan. So as I mentioned before, when we
instrument the program with binary ASan in
-
userspace we link with libASan, which
takes care of setting up the metadata,
-
takes care of putting the red zones around
our allocations, and so on. So we had to
-
do something similar in the kernel; of
course, you cannot link with libASan in
-
the kernel, because that doesn't work, but
what we can do instead is, again, compile
-
the kernel with kASan support. So this
instruments the allocator, kmalloc, to add
-
the red zones; it allocates space for the
metadata, it keeps this metadata around,
-
does this all for us, which is really
nice. And again, the big advantage of
-
using this approach is that we can
integrate seamlessly with a kASan-
-
instrumented kernel and with fuzzers that
rely on kASan such as syzkaller. So we see
-
this as more of a plus than, like, a
limitation. And how do you implement ASan?
-
Well, you have to find every memory access
and instrument it to check the—to check
-
whether this is accessing a red zone. And
if it does then you just call this bug
-
report function that produces a stack
trace, a bug report, and crashes the
-
kernel, so that the fuzzer can detect it.
Again, this is compatible with source-
-
based kASan, so we're happy. We can simply
load the rewritten module with added
-
instrumentation into a kernel, as long as
you have compiled the kernel with the
-
right flags, and we can use a standard
kernel fuzzer. Here for the—our
-
evaluation, we used syzkaller, a popular
kernel fuzzer by some folks at Google, and
-
it worked really well. So we've finally
reached the end of our journey, and now we
-
wanted to present some experiments we did
to see if this really works. So for
-
userspace, we wanted to compare the
performance of our binary ASan with
-
source-based ASan and with existing
solutions that also work on binaries. So
-
for userspace, you can use valgrind
memcheck. It's a memory sanitizer that is
-
based on binary translation and dynamic
binary translation and works on binaries.
-
We compared it with source ASan and
RetroWrite ASan on the SPEC CPU benchmark
-
and saw how fast it was. And for the
kernel we decided to fuzz some file
-
systems and some drivers with syzkaller
using both source-based KASan and kCov and
-
kRetroWrite-based KASan and kCov. So these
are our results for userspace. So the red
-
bar is valgrind. We can see that the
execution time of valgrind is the highest.
-
It is really, really slow—like, 3, 10, 30x
overhead, way too slow for fuzzing. Then
-
in green, we have our binary ASan, which
is, like, already a large improvement. In
-
orange we have source-based ASan. And then
finally in blue we have the original code
-
without any instrumentation whatsoever. So
we can see that source-based ASan has,
-
like, 2x or 3x overhead, and binary ASan
is a bit higher, like, a bit less
-
efficient, but still somewhat close. So
that's for userspace, and for the kernel,
-
we—these are some preliminary results, so,
this is, like—I'm doing this work as part
-
of my master's thesis, and so I'm still,
like, running the evaluation. Here we can
-
see that the overhead is already, like, a
bit lower. So the reason for this is that
-
SPEC is a pure CPU benchmark; it doesn't
interact with the system that much. And so
-
any instrumentation that you add is going
to massively slow down, or, like,
-
considerably slow down the execution. By
contrast, when you fuzz a file system with
-
syzkaller, not only every test case has to
go from the high—the host to the guest and
-
then do multiple syscalls and so on, but
also every system call has to go through
-
several layers of abstraction before it
gets to the actual file system. And all
-
these—like, all of this takes a lot of
time, and so in practice the overhead of
-
our instrumentation seems to be pretty
reasonable. So, since we know that you
-
like demos, we've prepared a small demo of
kRetroWrite. So. Let's see. Yep. Okay. All
-
right, so we've prepared a small kernel
module. And this module is just, like,
-
really simple; it contains a
vulnerability, and what it does is that it
-
creates a character device. So if you're
not familiar with this, a character device
-
is like a fake file that is exposed by a
kernel driver and that it can read to and
-
write from. And instead of going to a
file, the data that you read—that you, in
-
this case, write to the fake file—goes to
the driver and is handled by this demo
-
write function. So as we can see, this
function allocates a buffer, a 16-byte
-
buffer on the heap, and then copies some
data into it, and then it checks if the
-
data contains the string "1337". If it
does, then it accesses the buffer out of
-
bounds; you can see "alloc[16]" and the
buffer is sixteen bytes; this is an out-
-
of-bounds read by one byte. And if it
doesn't then it just accesses the buffer
-
in bounds, which is fine, and it's not a
vulnerability. So we can compile this
-
driver. OK, um... OK, and then so we have
our module, and then we will instrument it
-
using kRetroWrite. So, instrument... Yes,
please. OK. Right. So kRetroWrite did some
-
processing, and it produced an
instrumented module with ASan or kASan and
-
a symbolized assembly file. We can
actually have a look at the symbolized
-
assembly file to see what it looks like.
Yes. Yes. OK. So, is this big enough?
-
Yeah... As you can see, so—we can actually
see here the ASan instrumentation. Ah,
-
shouldn't—yeah. So, we—this is the ASan
instrumentation. The original code loads
-
some data from this address. And as you
can see, the ASan instrumentation first
-
computes the actual address, and then does
some checking—basically, this is checking
-
some metadata that ASan stores to check if
the address is in a red zone or not, and
-
then if the fail check fails, then it
calls this ASan report which produces a
-
stack trace and crashes the kernel. So
this is fine. We can actually even look at
-
the disassembly of both modules, so...
object dump and then demo... Ah, nope. OK,
-
so on the left, we have the original
module without any instrumentation; on the
-
right, we have the module instrumented
with ASan. So as you can see, the original
-
module has "push r13" and then has this
memory load here; on the right in the
-
instrumented module, kRetroWrite inserted
the ASan instrumentation. So the original
-
load is still down here, but between that,
between the first instruction and this
-
instruction, we have—now have the kASan
instrumentation that does our check. So
-
this is all fine. Now we can actually test
it and see what it does. So we can—we will
-
boot a very simple, a very minimal Linux
system, and try to target the
-
vulnerability first with the non-
instrumented module and then with the
-
instrumented module. And we can—we will
see that in the—with the non-instrumented
-
module, the kernel will not crash, but
with the instrumented module it will crash
-
and produce a bug report. So. Let's see.
Yeah, this is a QEMU VM, I have no idea
-
why it's taking so long to boot. I'll
blame the the demo gods not being kind to
-
us. Yeah, I guess we just have to wait.
OK. So. All right, so we loaded the
-
module. We will see that it has created a
fake file character device in /dev/demo.
-
Yep. We can write this file. Yep. So this
will—this accesses the array in bounds,
-
and so this is fine. Then what we can also
do is write "1337" to it so it will access
-
the array out of bounds. So this is the
non instrumented module, so this will not
-
crash. It will just print some garbage
value. Okay, that's it. Now we can load
-
the instrumented module instead... and do
the same experiment again. All right. We
-
can see that /dev/demo is still here. So
the module still works. Let's try to write
-
"1234" into it. This, again, doesn't
crash. But when we try to write "1337",
-
this will produce a bug report.
applause
-
So this has quite a lot of information. We
-
can see, like, the—where the memory was
allocated, there's a stack trace for that;
-
it wasn't freed, so there's no stack trace
for the free. And we see that the cache
-
size of the memory, like, it was a 16-byte
allocation. We can see the shape of the
-
memory. We see that these two zeros means
that there's two 8-byte chunks of valid
-
memory. And then these "fc fc fc" is
the—are the red zones that I was talking
-
about before. All right, so that's it for
the demo. We will switch back to our
-
presentation now. So... hope you enjoyed
it.
-
gannimo: Cool. So after applying this to a
demo module, we also wanted to see what
-
happens if we apply this to a real file
system. After a couple of hours we
-
were—when we came back and checked on the
results, we saw a couple of issues popping
-
up, including a nice set of use-after-free
reads, a set of use-after-free writes, and
-
we checked the bug reports and we saw a
whole bunch of Linux kernel issues popping
-
up one after the other in this nondescript
module that we fuzzed. We're in the
-
process of reporting it. This will take
some time until it is fixed; that's why
-
you see the blurry lines. But as you see,
there's still quite a bit of opportunity
-
in the Linux kernel where you can apply
different forms of targeted fuzzing into
-
different modules, leverage these modules
on top of a kASan instrumented kernel and
-
then leveraging this as part of your
fuzzing toolchain to find interesting
-
kernel 0days that... yeah. You can then
develop further, or report, or do whatever
-
you want with them. Now, we've shown you
how you can take existing binary-only
-
modules, think different binary-only
drivers, or even existing modules where
-
you don't want to instrument a full set of
the Linux kernel, but only focus fuzzing
-
and exploration on a small different—small
limited piece of code and then do security
-
tests on those. We've shown you how we can
do coverage-based tracking and address
-
sanitization. But this is also up to you
on what kind of other instrumentation you
-
want. Like this is just a tool, a
framework that allows you to do arbitrary
-
forms of instrumentation. So we've taken
you on a journey from instrumenting
-
binaries over coverage-guided fuzzing and
sanitization to instrumenting modules in
-
the kernel and then finding crashes in the
kernel. Let me wrap up the talk. So, this
-
is one of the the fun pieces of work that
we do in the hexhive lab at EPFL. So if
-
you're looking for postdoc opportunities
or if you're thinking about a PhD, come
-
talk to us. We're always hiring. The tools
will be released as open source. A large
-
chunk of the userspace work is already
open source. We're working on a set of
-
additional demos and so on so that you can
get started faster, leveraging the
-
different existing instrumentation that is
already out there. The userspace work is
-
already available. The kernel work will be
available in a couple of weeks. This
-
allows you to instrument real-world
binaries for fuzzing, leveraging existing
-
transformations for coverage tracking to
enable fast and effective fuzzing and
-
memory checking to detect the actual bugs
that exist there. The key takeaway from
-
this talk is that RetroWrite and
kRetroWrite enables static binary
-
rewriting at zero instrumentation cost. We
take the limitation of focusing only on
-
position-independent code, which is not a
real implementation, but we get the
-
advantage of being able to symbolize
without actually relying on heuristics, so
-
we can even symbolize large, complex
source—large, complex applications and
-
effectively rewrite those aspects and then
you can focus fuzzing on these parts.
-
Another point I want to mention is that
this enables you to reuse existing tooling
-
so you can take a binary blob, instrument
it, and then reuse, for example, Address
-
Sanitizer or existing fuzzing tools, as it
integrates really, really nice. As I said,
-
all the code is open source. Check it out.
Try it. Let us know if it breaks. We're
-
happy to fix. We are committed to open
source. And let us know if there are any
-
questions. Thank you.
applause
-
Herald: So, thanks, guys, for an
interesting talk. We have some time for
-
questions, so we have microphones along
the aisles. We'll start from question from
-
microphone number two.
Q: Hi. Thanks for your talk and for the
-
demo. I'm not sure about the use-case you
showed for the kernel RetroWrite. 'Cause
-
you're usually interested in fuzzing
binary in kernelspace when you don't have
-
source code for the kernel. For example,
for IoT or Android and so on. But you just
-
reuse the kCov and kASan in the kernel,
and you never have the kernel in IoT or
-
Android which is compiled with that. So
are you—do you have any plans to binary
-
instrument the kernel itself, not the
modules?
-
Nspace: So we thought about that. I think
that there's some additional problems that
-
we would have to solve in order to be able
to instrument the full kernel. So other
-
than the fact that it gives us
compatibility with, like, existing tools,
-
the reason why we decided to go with
compiling the kernel with kASan and kCov
-
is that building the, like—you would you
have to, like, think about it. You
-
have to instrument the memory allocator to
add red zones, which is, like, already
-
somewhat complex. You have to instrument
the exception handlers to catch, like, any
-
faults that the instrumentation detects.
You would have to, like, set up some
-
memory for the ASan shadow. So this is,
like—I think you should be able to do it,
-
but it would require a lot of additional
work. So this is, like—this was like four
-
months' thesis. So we decided to start
small and prove that it works in
-
the kernel for modules, and then leave it
to future work to actually extend it to
-
the full kernel. Also, like, I think for
Android—so in the case of Linux, the
-
kernel is GPL, right, so if the
manufacturers ships a custom kernel, they
-
have to release the source code, right?
Q: They never do.
-
Nspace: They never—well, that's a
different issue. Right?
-
gannimo: Right.
Q: So that's why I ask, because I don't
-
see how it just can be used in the real
world.
-
gannimo: Well, let me try to put this into
perspective a little bit as well. Right.
-
So there's the—what we did so far is we
leveraged existing tools, like kASan or
-
kCov, and integrated into these existing
tools. Now, doing heap-based allocation is
-
fairly simple and replacing those with
additional red zones—that instrumentation
-
you can carry out fairly well by focusing
on the different allocators. Second to
-
that, simply oopsing the kernel and
printing the stack trace is also fairly
-
straightforward. So it's not a lot of
additional effort. So it is—it involves
-
some engineering effort to port this to
non-kASan-compiled kernels. But we think
-
it is very feasible. In the interest of
time, we focused on kASan-enabled kernels,
-
so that some form of ASan is already
enabled. But yeah, this is additional
-
engineering effort. But there is also a
community out there that can help us with
-
these kind of changes. So kRetroWrite and
RetroWrite themselves are the binary
-
rewriting platform that allow you to turn
a binary into an assembly file that you
-
can then instrument and run different
passes on top of it. So another pass would
-
be a full ASan pass or kASan pass that
somebody could add and then contribute
-
back to the community.
Q: Yeah, it would be really useful.
-
Thanks.
gannimo: Cool.
-
Angel: Next question from the Internet.
Q: Yes, there is a question regarding the
-
slide on the SPEC CPU benchmark. The
second or third graph from the right had
-
an instrumented version that was faster
than the original program. Why is that?
-
gannimo: Cache effect. Thank you.
Angel: Microphone number one.
-
Q: Thank you. Thank you for presentation.
I have question: how many architecture do
-
you support, and if you have support more,
what then?
-
gannimo: x86_64.
Q: Okay. So no plans for ARM or MIPS,
-
or...?
gannimo: Oh, there are plans.
-
Q: Okay.
Nspace: Right, so—
-
gannimo: Right. Again, there's a finite
amount of time. We focused on the
-
technology. ARM is high up on the list. If
somebody is interested in working on it
-
and contributing, we're happy to hear from
it. Our list of targets is ARM first and
-
then maybe something else. But I think
with x86_64 and ARM we've covered a
-
majority of the interesting platforms.
Q: And second question, did you try to
-
fuzz any real closed-source program?
Because as I understand from presentation,
-
you fuzz, like, just file system, what we
can compile and fuzz with syzkaller like
-
in the past.
Nspace: So for the evaluation, we wanted
-
to be able to compare between the source-
based instrumentation and the binary-based
-
instrumentation, so we focused mostly on
open-source filesystem and drivers because
-
then we could instrument them with a
compiler. We haven't yet tried, but this
-
is, like, also pretty high up on the list.
We wanted to try to find some closed-
-
source drivers—there's lots of them, like
for GPUs or anything—and we'll give it a
-
try and find some 0days, perhaps.
Q: Yes, but with syzkaller, you still have
-
a problem. You have to write rules, like,
dictionaries. I mean, you have to
-
understand the format, have to communicate
with the driver.
-
Nspace: Yeah, right But there's, for
example, closed-source file systems that
-
we are looking at.
Q: Okay. Thinking.
-
Herald: Number two.
Q: Hi. Thank you for your talk. So I don't
-
know if there are any kCov- or kASan-
equivalent solution to Windows, but I was
-
wondering if you tried, or are you
planning to do it on Windows, the
-
framework? Because I know it might be
challenging because of the driver
-
signature enforcement and PatchGuard, but
I wondered if you tried or thought about
-
it.
gannimo: Yes, we thought about it and we
-
decided against it. Windows is incredibly
hard and we are academics. The research I
-
do in my lab, or we do in my research lab,
focuses on predominantly open-source
-
software and empowers open-source
software. Doing full support for Microsoft
-
Windows is somewhat out of scope. If
somebody wants to port these tools, we are
-
happy to hear it and work with these
people. But it's a lot of additional
-
engineering effort, versus very
additional—very low additional research
-
value, so we'll have to find some form of
compromise. And, like, if you would be
-
willing to fund us, we would go ahead. But
it's—yeah, it's a cost question.
-
Q: And you're referring both to kernel and
user space, right?
-
gannimo: Yeah.
Q: Okay. Thank you.
-
Herald: Number five.
Q: Hi, thanks for the talk. This seems
-
most interesting if you're looking for
vulnerabilities in closed source kernel
-
modules, but not giving it too much
thought, it seems it's really trivial to
-
prevent this if you're writing a closed
source module.
-
gannimo: Well, how would you prevent this?
Q: Well, for starters, you would just take
-
a difference between the address of two
functions. That's not gonna be IP
-
relative, so...
Nspace: Right. So we explicitly—like, even
-
in the original RetroWrite paper—we
explicitly decided to not try to deal with
-
obfuscated code, or code that is
purposefully trying to defeat this kind of
-
rewriting. Because, like, the assumption
is that first of all, there are techniques
-
to, like, deobfuscate code or remove
these, like, checks in some way, but this
-
is, like, sort of orthogonal work. And at
the same time, I guess most drivers are
-
not really compiled with the sort of
obfuscation; they're just, like, you know,
-
they're compiled with a regular compiler.
But yeah, of course, this is, like, a
-
limitation.
gannimo: They're likely stripped, but not
-
necessarily obfuscated. At least from what
we've seen when we looked at binary-only
-
drivers.
Herald: Microphone number two.
-
Q: How do you decide where to place the
red zones? From what I heard, you talked
-
about instrumenting the allocators, but,
well, there are a lot of variables on the
-
stack, so how do you deal with those?
gannimo: Oh, yeah, that's actually super
-
cool. I refer to some extent to the paper
that is on the GitHub repo as well. If you
-
think about it, modern compilers use
canaries for buffers. Are you aware of
-
stack canaries—how stack canaries work?
So, stack canaries—like, if the compiler
-
sees there's a buffer that may be
overflown, it places a stack canary
-
between the buffer and any other data.
What we use is we—as part of our analysis
-
tool, we find these stack canaries, remove
the code that does the stack canary, and
-
use this space to place our red zones. So
we actually hack the stack in areas,
-
remove that code, and add ASan red zones
into the empty stack canaries that are now
-
there. It's actually a super cool
optimization because we piggyback on what
-
kind of work the compiler already did for
us before, and we can then leverage that
-
to gain additional benefits and protect
the stack as well.
-
Q: Thanks.
Angel: Another question from the Internet.
-
Q: Yes. Did you consider lifting the
binary code to LLVM IR instead of
-
generating assembler source?
gannimo: Yes. laughter But, so—a little
-
bit longer answer. Yes, we did consider
that. Yes, it would be super nice to lift
-
to LLVM IR. We've actually looked into
this. It's incredibly hard. It's
-
incredibly complex. There's no direct
mapping between the machine code
-
equivalent and the LLVM IR. You would
still need to recover all the types. So
-
it's like this magic dream that you
recover full LLVM IR, then do heavyweight
-
transformations on top of it. But this is
incredibly hard because if you compile
-
down from LLVM IR to machine code, you
lose a massive amount of information. You
-
would have to find a way to recover all of
that information, which is pretty much
-
impossible and undecidable for many cases.
So for example, just as a note, we only
-
recover control flow and we only
desymbolize control flow. For data
-
references—we don't support
instrumentation of data references yet
-
because there's still an undecidable
problem that we are facing with. I can
-
talk more about this offline, or there is
a note in the paper as well. So this is
-
just a small problem. Only if you're
lifting to assembly files. If you're
-
lifting to LLVM IR, you would have to do
full end-to-end type recovery, which is
-
massively more complicated. Yes, it would
be super nice. Unfortunately, it is
-
undecidable and really, really hard. So
you can come up with some heuristics, but
-
there is no solution that will do this
in—that will be correct 100 percent of the
-
time.
Angel: We'll take one more question from
-
microphone number six.
Q: Thank you for your talk. What kind of
-
disassemblers did you use for RetroWrite,
and did you have problems with the wrong
-
disassembly? And if so, how did you handle
it?
-
Nspace: So, RetroWrite—so we used
Capstone for the disassembly.
-
gannimo: An amazing tool, by the way.
Nspace: Yeah. So the idea is that, like,
-
we need some kind of—some information
about where the functions are. So for the
-
kernel modules, this is actually fine
because kernel modules come with this sort
-
of information because the kernel needs
it, to build stack traces, for example.
-
For userspace binaries, this is somewhat
less common, but you can use another tool
-
to try to do function identification. And
we do, like—sort of, like, disassemble the
-
entire function. So we have run into some
issues with, like—in AT&T syntax, because
-
like we wanted to use gas, GNU's
assembler, for, for...
-
gannimo: Reassembling.
Nspace: Reassembly, yeah. And some
-
instructions are a lot—you can express the
same, like, two different instructions,
-
like five-byte NOP and six-byte NOP, using
the same string of, like, text—a mnemonic,
-
an operand string. But the problem is
that, like, the kernel doesn't like it and
-
crashes. This took me like two days to
debug.
-
gannimo: So the kernel uses dynamic binary
patching when it runs, at runtime, and it
-
uses fixed offsets, so if you replace a
five-byte NOP with a six-byte NOP or vice
-
versa, your offsets change and your kernel
just blows up in your face.
-
Q: So it was kind of a case-on-case basis
where you saw the errors coming out of the
-
disassembly and you had to fix it?
Nspace: So sorry, can you repeat the
-
question?
Q: Like, for example, if you—if some
-
instruction is not supported by the
disassembler, so you saw that it crashed,
-
that there's something wrong, and then you
fix it by hand?
-
Nspace: Yeah, well, if we saw that there
was a problem with it, this—like, I don't
-
recall having any unknown instructions in
the dissasembler. I don't think I've ever
-
had a problem with that. But yeah, this
was a lot of, like, you know, engineering
-
work.
gannimo: So let me repeat. The problem was
-
not a bug in the disassembler, but an
issue with the instruction format—that the
-
same mnemonic can be translated into two
different instructions, one of which was
-
five bytes long, the other one was six
bytes long. Both used the exact same
-
mnemonic. Right, so this was an issue with
assembly encoding.
-
Q: But you had no problems with
unsupported instructions which couldn't be
-
disassembled?
Nspace: No, no. Not as far as I know, at
-
least.
Angel: We have one more minute, so a very
-
short question from microphone number two.
Q: Does it work? Ah. Is your binary
-
instrumentation equally powerful as kernel
address space... I mean, kASan? So, does
-
it detect all the memory corruptions on
stack, heap and globals?
-
gannimo: No globals. But heap—it does all
of them on the heap. There's some slight
-
variation on the stack because we have to
piggyback on the canary stuff. As I
-
mentioned quickly before, there is no
reflowing and full recovery of data
-
layouts. So to get anything on the stack,
we have to piggyback on existing compiler
-
extensions like stack canaries. But—so we
don't support intra-object overflows on
-
the stack. But we do leverage the stack in
areas to get some stack benefits, which
-
is, I don't know, 90, 95 percent there
because the stack canaries are pretty
-
good. For heap, we get the same precision.
For globals, we have very limited support.
-
Q: Thanks.
Angel: So that's all the time we have for
-
this talk. You can find the speakers, I
think, afterwards offline. Please give
-
them a big round of applause for an
interesting talk.
-
applause
-
36c3 postrol music
-
Subtitles created by c3subtitles.de
in the year 2021. Join, and help us!