36c3 prerol music
Herald: So, the next talk for this
afternoon is about high speed binary
fuzzing. We have two researchers that will
be presenting the product of their latest
work, which is a framework for static
binary rewriting. Our speakers are—the
first one is a computer science master's
student at EPFL and the second one is a
security researcher and assistant
professor at EPFL. Please give a big round
of applause to Nspace and gannimo.
Applause
gannimo (Mathias Payer): Thanks for the
introduction. It's a pleasure to be here,
as always. We're going to talk about
different ways to speed up your fuzzing
and to find different kinds of
vulnerabilities or to tweak your binaries
in somewhat unintended ways. I'm Mathias
Payer or I go by gannimo on Twitter and I
am an assistant professor at EPFL working
on different forms of software security:
fuzzing sanitization, but also different
kinds of mitigations. And Matteo over
there is working on his master's thesis on
different forms of binary rewriting for
the kernel. And today we're going to take
you on a journey on how to actually
develop very fast and very efficient
binary rewriting mechanisms that allow you
to do unintended modifications to the
binaries and allow you to explore
different kinds of unintended features in
binaries. So about this talk. What we
discovered or the reason why we set out on
this journey was that fuzzing binaries is
really, really hard. There's very few
tools in user space. There's—it's
extremely hard to set it up and it's
extremely hard to set it up in a
performant way. The setup is complex. You
have to compile different tools. You have
to modify it. And the results are not
really that satisfactory. As soon as you
move to the kernel, fuzzing binaries in a
kernel is even harder. There's no tooling
whatsoever, there's very few users
actually working with binary code in the
kernel or modifying binary code, and it's
just a nightmare to work with. So what we
are presenting today is a new approach
that allows you to instrument any form of
binary code or modern binary code based on
static rewriting, which gives you full
native performance. You only pay for the
instrumentation that you add, and you can
do very heavyweight transformations on top
of it. The picture, if you look at the
modern system, let's say we are looking at
a modern setup. Let's say you're looking
at cat pictures in your browser: Chrome
plus the kernel plus the libc plus the
graphical user interface together clog up
at about 100 million lines of code.
Instrumenting all of this for some form of
security analysis is a nightmare,
especially along this large stack of
software. There's quite a bit of different
compilers involved. There's different
linkers. It may be compiled on a different
system, with different settings and so on.
And then getting your instrumentation
across all of this is pretty much
impossible and extremely hard to work
with. And we want to enable you to select
those different parts that you're actually
interested in. Modify those and then focus
your fuzzing or analysis approaches on
those small subsets of the code, giving
you a much better and stronger capability
to test the systems that you're, or those
parts of the system that you're really,
really interested in. Who's worked on
fuzzing before? Quick show of hands. Wow,
that's a bunch of you. Do you use AFL?
Yeah, most of you, AFL. Libfuzzer? Cool,
about 10, 15 percent libfuzzer, 30 percent
fuzzing, and AFL. There's a quite good
knowledge of fuzzing, so I'm not going to
spend too much time on fuzzing, but for
those that haven't really run their
fuzzing campaigns yet, it's a very simple
software testing technique. You're
effectively taking a binary, let's say
Chrome, as a target and you're running
this in some form of execution
environment. And fuzzing then consists of
some form of input generation that creates
new test cases, throws them at your
program and sees—and checks what is
happening with your program. And either
everything is OK, and your code is being
executed, and your input—the program
terminates, everything is fine, or you
have a bug report. If you have a bug
report, you can use this. Find the
vulnerability, maybe develop a PoC and
then come up with some form of either
exploit or patch or anything else. Right.
So this is pretty much fuzzing in a in a
nutshell. How do you get fuzzing to be
effective? How can you cover large source
bases, complex code, and complex
environment? Well, there's a couple of
simple steps that you can take. And let's
walk quickly through effective fuzzing
101. Well, first, you want to be able to
create test cases that actually trigger
bugs. And this is a very, very
complicated, complicated part. And we need
to have some notion of the inputs that a
program accepts. And we need to have some
notion of how we can explore different
parts of the program, right? Different
parts of functionality. Well, on one hand,
we could have a developer write all the
test cases by hand, but this would be kind
of boring. It would also require a lot of
human effort in creating these different
inputs and so on. So coverage guided
fuzzing has evolved as a very simple way
to guide the fuzzing process, leveraging
the information on which parts of the code
have been executed by simply tracing the
individual path through the program based
on the execution flow. So we can—the
fuzzer can use this feedback to then
modify the inputs that are being thrown at
the fuzzing process. The second step is
the fuzzer must be able to detect bugs. If
you've ever looked at a memory corruption,
if you're just writing one byte after the
end of a buffer, it's highly likely that
your software is not going to crash. But
it's still a bug, and it may still be
exploitable based on the underlying
conditions. So we want to be able to
detect violations as soon as they happen,
for example, based on on some form of
sanitization that we add, some form of
instrumentation that we add to the to the
binary, that then tells us, hey, there's a
violation of the memory safety property,
and we terminate the application right
away as a feedback to the fuzzer. Third,
but the—and last but not least: Speed is
key, right? For if you're running a
fuzzing campaign, you have a fixed
resource budget. You have a couple of
cores, and you want to run for 24 hours,
48 hours, a couple of days. But in any
way, whatever your constraints are, you
have a fixed amount of instructions that
you can actually execute. And you have to
decide, am I spending my instructions on
generating new inputs, tracking
constraints, finding bugs, running
sanitization or executing the program? And
you need to find a balance between all of
them, as it is a zero sum game. You have a
fixed amount of resources and you're
trying to make the best with these
resources. So any overhead is slowing you
down. And again, this becomes an
optimization problem. How can you most
effectively use the resources that you
have available? As we are fuzzing with
source code, it's quite easy to actually
leverage existing mechanisms, and we add
all that instrumentation at compile time.
We take source code, we pipe it through
the compiler and modern compiler
platforms, allow you to instrument and add
little code snippets during the
compilation process that then carry out
all these tasks that are useful for
fuzzing. For example, modern compilers can
add short snippets of code for coverage
tracking that will record which parts of
the code that you have executed, or for
sanitization which record and check every
single memory access if it is safe or not.
And then when you're running the
instrumented binary, everything is fine
and you can detect the policy violations
as you go along. Now if you would have
source code for everything, this would be
amazing. But it's often not the case,
right? We may be able on Linux to cover a
large part of the protocol stack by
focusing only on source-code-based
approaches. But there may be applications
where no source code is available. If we
move to Android or other mobile systems,
there's many drivers that are not
available as open source or just available
as binary blobs, or the full software
stack may be closed-source and we only get
the binaries. And we still want to find
vulnerabilities in these complex software
stacks that span hundreds of millions of
lines of code in a very efficient way. The
only solution to cover this part of
massive code base is to actually rewrite
and focus on binaries. A very simple
approach could be black box fuzzing, but
this is—this doesn't really get you
anywhere because you don't get any
feedback; you don't get any information if
you're triggering bugs. So one simple
approach, and this is the approach that is
most dominantly used today, is to rewrite
the program or the binary dynamically. So
you're taking the binary and during
execution you use some form of dynamic
binary instrumentation based on Pin, angr,
or some other binary rewriting tool and
translate the targeted runtime, adding
this binary instrumentation on top of it
as you're executing it. It's simple, it's
straightforward, but it comes at a
terrible performance cost of ten to a
hundred x slow down, which is not really
effective. And you're spending all your
cores and your cycles on just executing
the binary instrumentation. So we don't
really want to do this and we want to have
something that's more effective than that.
So what we are focusing on is to do static
rewriting. It involves a much more complex
analysis as we are rewriting the binary
before it is being executed, and we have
to recover all of the control flow, all of
the different mechanisms, but it results
in a much better performance. And we can
get more bang for our buck. So why is
static rewriting so challenging? Well,
first, simply adding code will break the
target. So if you are disassembling this
piece of code here, which is a simple loop
that loads data, decrements the registers,
and then jumps if you're not at the end of
the array and keeps iterating through this
array. Now, as you look at the jump-not-
zero instruction, the last instruction of
the snippet, it is a relative offset. So
it jumps backward seven bytes. Which is
nice if you just execute the code as is.
But as soon as you want to insert new
code, you change the offsets in the
program, and you're modifying all these
different offsets. And simply adding new
code somewhere in between will break the
target. So a core feature that we need to
enforce, or core property that we need to
enforce, is that we must find all the
references and properly adjust them, both
relative offsets and absolute offsets as
well. Getting a single one wrong will
break everything. What makes this problem
really, really hard is that if you're
looking at the binary, a byte is a byte,
right? There's no way for us to
distinguish between scalars and
references, and in fact they are
indistinguishable. Getting a single
reference wrong breaks the target and
would introduce arbitrary crashes. So we
have to come up with ways that allow us to
distinguish between the two. So for
example, if you have this code here, it
takes a value and stores it somewhere on
the stack. This could come from two
different kind of high-level constructs.
On one hand, it could be taking the
address of a function and storing this
function address somewhere and in a stack
variable. Or it could be just storing a
scalar in a stack variable. And these two
are indistinguishable, and rewriting them,
as soon as we add new code, the offsets
will change. If it is a function, we would
have to modify the value; if it is a
scalar, we have to keep the same value. So
how can we come up with a way that allows
us to distinguish between the two and
rewrite binaries by recovering this
missing information? So let us take—let me
take you or let us take you on a journey
towards instrumenting binaries in the
kernel. This is what we aim for. We'll
start with the simple case of
instrumenting binaries in user land, talk
about different kinds of coverage guided
fuzzing and what kind of instrumentation
we can add, what kind of sanitization we
can add, and then focusing on taking it
all together and applying it to kernel
binaries to see what what will fall out of
it. Let's start with instrumenting
binaries first. I will now talk a little
bit about RetroWrite, our mechanism and
our tool that enables static binary
instrumentation by symbolizing existing
binaries. So we recover the information
and we translate relative offsets and
absolute offsets into actual labels that
are added to the assembly file. The
instrumentation can then work on the
recovered assembly file, which can then be
reassembled into a binary that can then be
executed for fuzzing. We implement
coverage tracking and binary address
sanitizer on top of this, leveraging
abstraction as we go forward. The key to
enabling this kind of binary rewriting is
position-independent code. And position-
independent code has become the de-facto
standard for any code that is being
executed on a modern system. And it
effectively says that it is code that can
be loaded at any arbitrary address in your
address space as you are executing
binaries. It is essential and a
requirement if you want to have address
space layout randomization or if you want
to use shared libraries, which de facto
you want to use in all these different
systems. So since a couple of years, all
the code that you're executing on your
phones, on your desktops, on your laptops
is position-independent code. And the idea
between the position-independent code is
that you can load it anywhere in your
address space and you can therefore not
use any hard-coded static addresses and
you have to inform the system of
relocations or pick relative
addresses—to—on how the system can
relocate these different mechanisms. On
x86_64, position-independent code
leverages addressing that is relative to
the instruction pointer. So for example,
it uses the current instruction pointer
and then a relative offset to that
instruction pointer to reference global
variables, other functions and so on. And
this is a very easy way for us to
distinguish references from constants,
especially in PIE binaries. If it is RIP-
relative, it is a reference; everything
else is a constant. And we can build our
translation algorithm and our translation
mechanism based on this fundamental
finding to remove any form of heuristic
that is needed by focusing especially on
position-independent code. So we're
supporting position-independent code; we
are—we don't support non-position-
independent code, but we give you the
guarantee that we can rewrite all the
different code that is out there. So
symbolization works as follows: If you
have the little bit of code on the lower
right, symbolization replaces first all
the references with assembler labels. So
look at the call instruction and the jump-
not-zero instruction; the call instruction
references an absolute address and the
jump-not-zero instruction jumps backward
relative 15 bytes. So by focusing on these
relative jumps and calls, we can replace
them with actual labels and rewrite the
binary as follows: so we're calling a
function, replacing it with the actual
label, and for the jump-not-zero we are
inserting an actual label in the assembly
code and adding a backward reference. For
PC-relative addresses, for example the
data load, we can then replace it with the
name of the actual data that we have
recovered, and we can then add all the
different relocations and use that as
auxiliary information on top of it. After
these three steps, we can insert any new
code in between, and can therefore add
different forms of instrumentations or run
some more higher-level analysis on top of
it, and then reassemble the file for
fuzzing or coverage-guided tracking,
address sanitization or whatever else you
want to do. I will now hand over to
Matteo, who will cover coverage-guided
fuzzing and sanitization and then
instrumenting the binaries in the kernel.
Go ahead.
Nspace (Matteo Rizzo): So, now that we
have this really nice framework to rewrite
binaries, one of the things that we want
to add to actually get the fuzzing is this
coverage-tracking instrumentation. So
coverage-guided fuzzing is a way, a
method, for—to let the fuzzer discover
interesting inputs, an interesting path to
the target by itself. So the basic idea is
that the fuzzer will track coverage—the
parts of the programs that are covered by
different inputs by inserting some kind of
instrumentation. So, for example, here we
have this target program that checks if
the input contains the string "PNG" at the
beginning, and if it does, then it does
something interesting, otherwise it just
bails out and fails. So if we track the
part of the programs that each input
executes, the fuzzer can figure out that
an input that contains "P" will have
discovered a different path through the
program than input that doesn't contain
it. And then so on it can, one byte at a
time, discover that this program expects
this magic sequence "PNG" at the start of
the input. So the way that the fuzzer does
this is that every time a new input
discovers a new path though the target, it
is considered interesting and added to a
corpus of interesting inputs. And every
time the fuzzer needs to generate a new
input, it will select something from the
corpus, mutate it randomly, and then use
it as the new input. So this is like
a—this is, like, conceptually pretty
simple, but in practice it works really
well and it really lets the fuzzer
discover the format that the target
expects in an unsupervised way. So as an
example, this is an experiment that was
run by the author of AFL—AFL is the fuzzer
that sort of popularized this
technique—where he was fuzzing a JPEG-
parsing library, starting from a corpus
that only contained the string "hello". So
now clearly "hello" is not a valid JPEG
image and so—but still, like, the fuzzer
was still able to find—to discover the
correct format. So after a while it
started generating some grayscale images,
on the top left, and as it generated more
and more inputs, it started generating
more interesting images, such as some
grayscale gradients, and later on even
some color images. So as you can see, this
really works, and it allows us to fuzz a
program without really teaching the fuzzer
how the input should look like. So that's
it for coverage-guided fuzzing. Now we'll
talk a bit about sanitizations. As a
reminder, the core idea behind
sanitization is that just looking for
crashes is likely to miss some of the
bugs. So, for example, if you have this
out-of-bounds one-byte read, that will
probably not crash the target, but you
would still like to catch it because it
could be used for an info leak, for
example. So one of the most popular
sanitizers is Address Sanitizer. So
Address Sanitizer will instrument all the
memory accesses in your program and check
for memory corruption, which—so, memory
corruption is a pretty dangerous class of
bugs that unfortunately still plagues C
and C++ programs and unsafe languages in
general. And ASan tries to catch it by
instrumenting the target. It is very
popular; it has been used to find
thousands of bugs in complex software like
Chrome and Linux, and even though it has,
like, a bit of a slowdown—like about 2x—it
is still really popular because it lets
you find many, many more bugs. So how does
it work? The basic idea is that ASan will
insert some special regions of memory
called 'red zones' around every object in
memory. So we have a small example here
where we declare a 4-byte array on the
stack. So ASan will allocate the array
"buf" and then add a red zone before it
and a red zone after it. Whenever the
program accesses the red zones, it is
terminated with a security violation. So
the instrumentation just prints a bug
report and then crashes the target. This
is very useful for detecting, for example,
buffer overflows or underflows and many
other kinds of bugs such as use-after-free
and so on. So, as an example here, we are
trying to copy 5 bytes into a 4-byte
buffer, and ASan will check each of the
accesses one by one. And when it sees that
the last byte writes to a red zone, it
detects the violation and crashes the
program. So this is good for us because
this bug might have not been found by
simply looking for crashes, but it's
definitely found if we use ASan. So this
is something we want for fuzzing. So now
that we've covered—briefly covered ASan we
can talk about instrumenting binaries in
the kernel. So Mathias left us with
RetroWrite, and with RetroWrite we can add
both coverage tracking and ASan to
binaries. So the simple—it's a really
simple idea: now that we can rewrite this
binary and add instructions wherever we
want, we can implement both coverage
tracking and ASan. In order to implement
coverage tracking, we simply have to
identify the start of every basic block
and add a little piece of instrumentation
at the start of the basic block that tells
the fuzzer 'hey, we've reached this part
of the program'—'hey, we've reached this
other part of the program'. Then the
fuzzer can figure out whether that's a new
part or not. ASan is also, like, you know,
it's also somewhat—it can also be
implemented in this way by finding all
memory accesses, and then linking with
libASan. libASan is a sort of runtime for
ASan that takes care of inserting the red
zones and instrument—and adding, you know,
like, keeping around all the metadata that
ASan needs to know where the red zones
are, and detecting whether a memory access
is invalid. So, how can we apply all of
this in the kernel? Well, first of all,
fuzzing the kernel is not as easy as
fuzzing some userspace program. There's
some issues here. So first of all, there's
crash handling. So whenever you're fuzzing
a userspace program, you expect crashes,
well, because that's what we're after. And
if a userspace program crashes, then the
US simply terminates the crash gracefully.
And so the fuzzer can detect this, and
save the input as a crashing input, and so
on. And this is all fine. But when you're
fuzzing the kernel, so—if you were fuzzing
the kernel of the machine that you were
using for fuzzing, after a while, the
machine would just go down. Because, after
all, the kernel runs the machine, and if
it starts misbehaving, then all of it can
go wrong. And more importantly, you can
lose your crashes, because the if the
machine crashes, then the state of the
fuzzer is lost and you have no idea what
your crashing input was. So what most
kernel fuzzers have to do is that they
resort to some kind of VM to keep the
system stable. So they fuzz the kernel in
a VM and then run the fuzzing agent
outside the VM. On top of that is tooling.
So, if you want to fuzz a user space
program, you can just download AFL or use
libfuzzer; there's plenty of tutorials
online, it's really easy to set up and
just, like—compile your program, you start
fuzzing and you're good to go. If you want
to fuzz the kernel, it's already much more
complicated. So, for example, if you want
to fuzz Linux with, say, syzkaller, which
is a popular kernel fuzzer, you have to
compile the kernel, you have to use a
special config that supports syzkaller,
you have way less guides available than
for userspace fuzzing, and in general it's
just much more complex and less intuitive
than just fuzzing userspace. And lastly,
we have the issue of determinism. So in
general, if you have a single threaded
userspace program, unless it uses some
kind of random number generator, it is
more or less deterministic. There's
nothing that affects the execution of the
program. But—and this is really nice if
you want to try to reproduce a test case,
because if you have a non-deterministic
test case, then it's really hard to know
whether this is really a crash or if it's
just something that you should ignore, and
in the kernel this is even harder, because
you don't only have concurrency, like
multi-processing, you also have interrupts.
So interrupts can happen at any time, and
if one time you got an interrupt while
executing your test case and the second
time you didn't, then maybe it only
crashes one time - you don't really know,
it's not pretty. And so again, we
have several approaches to fuzzing
binaries in the kernel. First one is to do
black box fuzzing. We don't really
like this because it doesn't find much,
especially in something complex
like a kernel. Approach 1 is to
use dynamic translation,
so, use something
like QEMU or—you name it. This works, and
people have used it successfully; the
problem is that it is really, really,
really slow. Like, we're talking about
10x-plus overhead. And as we said before,
the more iterations, the more test cases
you can execute in the same amount of
time, the better, because you find more
bugs. And on top of that, there's no
currently available sanitizer for
kernel binaries that works—is based on
this approach. So in userspace you have
something like valgrind; in the kernel,
you don't have anything, at least that we
know of. There is another approach, which
is to use Intel Processor Trace. This has
been, like—there's been some research
papers on this recently, and this is nice
because it allows you to collect coverage
at nearly zero overhead. It's, like,
really fast, but the problem is that it
requires hardware support, so it requires
a fairly new x86 CPU, and if you want to
fuzz something on ARM, say, like, your
Android driver, or if you want to use an
older CPU, then you're out of luck. And
what's worse, you cannot really use it for
sanitization, or at least not the kind of
sanitization that ASan does, because it
just traces the execution; it doesn't
allow you to do checks on memory accesses.
So Approach 3, which is what we will use,
is static rewriting. So, we had this very
nice framework for rewriting userspace
binaries, and then we asked ourselves, can
we make this work in the kernel? So we
took the system, the original RetroWrite,
we modified it, we implemented support for
Linux modules, and... it works! So we have
implemented it—we have used it to fuzz
some kernel modules, and it really shows
that this approach doesn't only work for
userspace; it can also be applied to the
kernel. So as for some implementation, the
nice thing about kernel modules is that
they're always position independent. So
you cannot have position—like, fixed-
position kernel modules because Linux just
doesn't allow it. So we sort of get that
for free, which is nice. And Linux modules
are also a special class of ELF files,
which means that the format is—even though
it's not the same as userspace binaries,
it's still somewhat similar, so we didn't
have to change the symbolizer that much,
which is also nice. And we implemented
symbolization with this, and we used it to
implement both code coverage and binary
ASan for kernel binary modules. So for
coverage: The idea behind the whole
RetroWrite project was that we wanted to
integrate with existing tools. So existing
fuzzing tools. We didn't want to force our
users to write their own fuzzer that is
compatible with RetroWrite. So for—in
userspace we had AFL-style coverage
tracking, and binary ASan which is
compatible with source-based ASan, and we
wanted to follow the same principle in the
kernel. So it turns out that Linux has
this built-in coverage-tracking framework
called kCov that is used by several
popular kernel fuzzers like syzkaller, and
we wanted to use it ourselves. So we
designed our coverage instrumentation so
that it integrates with kCov. The downside
is that you need to compile the kernel
with kCov, but then again, Linux is open
source, so you can sort of always do that;
the kernel usually—it's usually not the
kernel, it is a binary blob, but it's
usually only the modules. So that's just
still fine. And the way you do this is—the
way you implement kCov for binary modules
is that you just have to find the start of
every basic block, and add a call to some
function that then stores the collected
coverage. So here's an example: we have a
short snippet of code with three basic
blocks, and all we have to do is add a
call to "trace_pc" to the start of the
basic block. "trace_pc" is a function that
is part of the main kernel image that then
collects this coverage and makes it
available to a userspace fuzzing agent. So
this is all really easy and it works. And
let's now see how we implemented binary
ASan. So as I mentioned before, when we
instrument the program with binary ASan in
userspace we link with libASan, which
takes care of setting up the metadata,
takes care of putting the red zones around
our allocations, and so on. So we had to
do something similar in the kernel; of
course, you cannot link with libASan in
the kernel, because that doesn't work, but
what we can do instead is, again, compile
the kernel with kASan support. So this
instruments the allocator, kmalloc, to add
the red zones; it allocates space for the
metadata, it keeps this metadata around,
does this all for us, which is really
nice. And again, the big advantage of
using this approach is that we can
integrate seamlessly with a kASan-
instrumented kernel and with fuzzers that
rely on kASan such as syzkaller. So we see
this as more of a plus than, like, a
limitation. And how do you implement ASan?
Well, you have to find every memory access
and instrument it to check the—to check
whether this is accessing a red zone. And
if it does then you just call this bug
report function that produces a stack
trace, a bug report, and crashes the
kernel, so that the fuzzer can detect it.
Again, this is compatible with source-
based kASan, so we're happy. We can simply
load the rewritten module with added
instrumentation into a kernel, as long as
you have compiled the kernel with the
right flags, and we can use a standard
kernel fuzzer. Here for the—our
evaluation, we used syzkaller, a popular
kernel fuzzer by some folks at Google, and
it worked really well. So we've finally
reached the end of our journey, and now we
wanted to present some experiments we did
to see if this really works. So for
userspace, we wanted to compare the
performance of our binary ASan with
source-based ASan and with existing
solutions that also work on binaries. So
for userspace, you can use valgrind
memcheck. It's a memory sanitizer that is
based on binary translation and dynamic
binary translation and works on binaries.
We compared it with source ASan and
RetroWrite ASan on the SPEC CPU benchmark
and saw how fast it was. And for the
kernel we decided to fuzz some file
systems and some drivers with syzkaller
using both source-based KASan and kCov and
kRetroWrite-based KASan and kCov. So these
are our results for userspace. So the red
bar is valgrind. We can see that the
execution time of valgrind is the highest.
It is really, really slow—like, 3, 10, 30x
overhead, way too slow for fuzzing. Then
in green, we have our binary ASan, which
is, like, already a large improvement. In
orange we have source-based ASan. And then
finally in blue we have the original code
without any instrumentation whatsoever. So
we can see that source-based ASan has,
like, 2x or 3x overhead, and binary ASan
is a bit higher, like, a bit less
efficient, but still somewhat close. So
that's for userspace, and for the kernel,
we—these are some preliminary results, so,
this is, like—I'm doing this work as part
of my master's thesis, and so I'm still,
like, running the evaluation. Here we can
see that the overhead is already, like, a
bit lower. So the reason for this is that
SPEC is a pure CPU benchmark; it doesn't
interact with the system that much. And so
any instrumentation that you add is going
to massively slow down, or, like,
considerably slow down the execution. By
contrast, when you fuzz a file system with
syzkaller, not only every test case has to
go from the high—the host to the guest and
then do multiple syscalls and so on, but
also every system call has to go through
several layers of abstraction before it
gets to the actual file system. And all
these—like, all of this takes a lot of
time, and so in practice the overhead of
our instrumentation seems to be pretty
reasonable. So, since we know that you
like demos, we've prepared a small demo of
kRetroWrite. So. Let's see. Yep. Okay. All
right, so we've prepared a small kernel
module. And this module is just, like,
really simple; it contains a
vulnerability, and what it does is that it
creates a character device. So if you're
not familiar with this, a character device
is like a fake file that is exposed by a
kernel driver and that it can read to and
write from. And instead of going to a
file, the data that you read—that you, in
this case, write to the fake file—goes to
the driver and is handled by this demo
write function. So as we can see, this
function allocates a buffer, a 16-byte
buffer on the heap, and then copies some
data into it, and then it checks if the
data contains the string "1337". If it
does, then it accesses the buffer out of
bounds; you can see "alloc[16]" and the
buffer is sixteen bytes; this is an out-
of-bounds read by one byte. And if it
doesn't then it just accesses the buffer
in bounds, which is fine, and it's not a
vulnerability. So we can compile this
driver. OK, um... OK, and then so we have
our module, and then we will instrument it
using kRetroWrite. So, instrument... Yes,
please. OK. Right. So kRetroWrite did some
processing, and it produced an
instrumented module with ASan or kASan and
a symbolized assembly file. We can
actually have a look at the symbolized
assembly file to see what it looks like.
Yes. Yes. OK. So, is this big enough?
Yeah... As you can see, so—we can actually
see here the ASan instrumentation. Ah,
shouldn't—yeah. So, we—this is the ASan
instrumentation. The original code loads
some data from this address. And as you
can see, the ASan instrumentation first
computes the actual address, and then does
some checking—basically, this is checking
some metadata that ASan stores to check if
the address is in a red zone or not, and
then if the fail check fails, then it
calls this ASan report which produces a
stack trace and crashes the kernel. So
this is fine. We can actually even look at
the disassembly of both modules, so...
object dump and then demo... Ah, nope. OK,
so on the left, we have the original
module without any instrumentation; on the
right, we have the module instrumented
with ASan. So as you can see, the original
module has "push r13" and then has this
memory load here; on the right in the
instrumented module, kRetroWrite inserted
the ASan instrumentation. So the original
load is still down here, but between that,
between the first instruction and this
instruction, we have—now have the kASan
instrumentation that does our check. So
this is all fine. Now we can actually test
it and see what it does. So we can—we will
boot a very simple, a very minimal Linux
system, and try to target the
vulnerability first with the non-
instrumented module and then with the
instrumented module. And we can—we will
see that in the—with the non-instrumented
module, the kernel will not crash, but
with the instrumented module it will crash
and produce a bug report. So. Let's see.
Yeah, this is a QEMU VM, I have no idea
why it's taking so long to boot. I'll
blame the the demo gods not being kind to
us. Yeah, I guess we just have to wait.
OK. So. All right, so we loaded the
module. We will see that it has created a
fake file character device in /dev/demo.
Yep. We can write this file. Yep. So this
will—this accesses the array in bounds,
and so this is fine. Then what we can also
do is write "1337" to it so it will access
the array out of bounds. So this is the
non instrumented module, so this will not
crash. It will just print some garbage
value. Okay, that's it. Now we can load
the instrumented module instead... and do
the same experiment again. All right. We
can see that /dev/demo is still here. So
the module still works. Let's try to write
"1234" into it. This, again, doesn't
crash. But when we try to write "1337",
this will produce a bug report.
applause
So this has quite a lot of information. We
can see, like, the—where the memory was
allocated, there's a stack trace for that;
it wasn't freed, so there's no stack trace
for the free. And we see that the cache
size of the memory, like, it was a 16-byte
allocation. We can see the shape of the
memory. We see that these two zeros means
that there's two 8-byte chunks of valid
memory. And then these "fc fc fc" is
the—are the red zones that I was talking
about before. All right, so that's it for
the demo. We will switch back to our
presentation now. So... hope you enjoyed
it.
gannimo: Cool. So after applying this to a
demo module, we also wanted to see what
happens if we apply this to a real file
system. After a couple of hours we
were—when we came back and checked on the
results, we saw a couple of issues popping
up, including a nice set of use-after-free
reads, a set of use-after-free writes, and
we checked the bug reports and we saw a
whole bunch of Linux kernel issues popping
up one after the other in this nondescript
module that we fuzzed. We're in the
process of reporting it. This will take
some time until it is fixed; that's why
you see the blurry lines. But as you see,
there's still quite a bit of opportunity
in the Linux kernel where you can apply
different forms of targeted fuzzing into
different modules, leverage these modules
on top of a kASan instrumented kernel and
then leveraging this as part of your
fuzzing toolchain to find interesting
kernel 0days that... yeah. You can then
develop further, or report, or do whatever
you want with them. Now, we've shown you
how you can take existing binary-only
modules, think different binary-only
drivers, or even existing modules where
you don't want to instrument a full set of
the Linux kernel, but only focus fuzzing
and exploration on a small different—small
limited piece of code and then do security
tests on those. We've shown you how we can
do coverage-based tracking and address
sanitization. But this is also up to you
on what kind of other instrumentation you
want. Like this is just a tool, a
framework that allows you to do arbitrary
forms of instrumentation. So we've taken
you on a journey from instrumenting
binaries over coverage-guided fuzzing and
sanitization to instrumenting modules in
the kernel and then finding crashes in the
kernel. Let me wrap up the talk. So, this
is one of the the fun pieces of work that
we do in the hexhive lab at EPFL. So if
you're looking for postdoc opportunities
or if you're thinking about a PhD, come
talk to us. We're always hiring. The tools
will be released as open source. A large
chunk of the userspace work is already
open source. We're working on a set of
additional demos and so on so that you can
get started faster, leveraging the
different existing instrumentation that is
already out there. The userspace work is
already available. The kernel work will be
available in a couple of weeks. This
allows you to instrument real-world
binaries for fuzzing, leveraging existing
transformations for coverage tracking to
enable fast and effective fuzzing and
memory checking to detect the actual bugs
that exist there. The key takeaway from
this talk is that RetroWrite and
kRetroWrite enables static binary
rewriting at zero instrumentation cost. We
take the limitation of focusing only on
position-independent code, which is not a
real implementation, but we get the
advantage of being able to symbolize
without actually relying on heuristics, so
we can even symbolize large, complex
source—large, complex applications and
effectively rewrite those aspects and then
you can focus fuzzing on these parts.
Another point I want to mention is that
this enables you to reuse existing tooling
so you can take a binary blob, instrument
it, and then reuse, for example, Address
Sanitizer or existing fuzzing tools, as it
integrates really, really nice. As I said,
all the code is open source. Check it out.
Try it. Let us know if it breaks. We're
happy to fix. We are committed to open
source. And let us know if there are any
questions. Thank you.
applause
Herald: So, thanks, guys, for an
interesting talk. We have some time for
questions, so we have microphones along
the aisles. We'll start from question from
microphone number two.
Q: Hi. Thanks for your talk and for the
demo. I'm not sure about the use-case you
showed for the kernel RetroWrite. 'Cause
you're usually interested in fuzzing
binary in kernelspace when you don't have
source code for the kernel. For example,
for IoT or Android and so on. But you just
reuse the kCov and kASan in the kernel,
and you never have the kernel in IoT or
Android which is compiled with that. So
are you—do you have any plans to binary
instrument the kernel itself, not the
modules?
Nspace: So we thought about that. I think
that there's some additional problems that
we would have to solve in order to be able
to instrument the full kernel. So other
than the fact that it gives us
compatibility with, like, existing tools,
the reason why we decided to go with
compiling the kernel with kASan and kCov
is that building the, like—you would you
have to, like, think about it. You
have to instrument the memory allocator to
add red zones, which is, like, already
somewhat complex. You have to instrument
the exception handlers to catch, like, any
faults that the instrumentation detects.
You would have to, like, set up some
memory for the ASan shadow. So this is,
like—I think you should be able to do it,
but it would require a lot of additional
work. So this is, like—this was like four
months' thesis. So we decided to start
small and prove that it works in
the kernel for modules, and then leave it
to future work to actually extend it to
the full kernel. Also, like, I think for
Android—so in the case of Linux, the
kernel is GPL, right, so if the
manufacturers ships a custom kernel, they
have to release the source code, right?
Q: They never do.
Nspace: They never—well, that's a
different issue. Right?
gannimo: Right.
Q: So that's why I ask, because I don't
see how it just can be used in the real
world.
gannimo: Well, let me try to put this into
perspective a little bit as well. Right.
So there's the—what we did so far is we
leveraged existing tools, like kASan or
kCov, and integrated into these existing
tools. Now, doing heap-based allocation is
fairly simple and replacing those with
additional red zones—that instrumentation
you can carry out fairly well by focusing
on the different allocators. Second to
that, simply oopsing the kernel and
printing the stack trace is also fairly
straightforward. So it's not a lot of
additional effort. So it is—it involves
some engineering effort to port this to
non-kASan-compiled kernels. But we think
it is very feasible. In the interest of
time, we focused on kASan-enabled kernels,
so that some form of ASan is already
enabled. But yeah, this is additional
engineering effort. But there is also a
community out there that can help us with
these kind of changes. So kRetroWrite and
RetroWrite themselves are the binary
rewriting platform that allow you to turn
a binary into an assembly file that you
can then instrument and run different
passes on top of it. So another pass would
be a full ASan pass or kASan pass that
somebody could add and then contribute
back to the community.
Q: Yeah, it would be really useful.
Thanks.
gannimo: Cool.
Angel: Next question from the Internet.
Q: Yes, there is a question regarding the
slide on the SPEC CPU benchmark. The
second or third graph from the right had
an instrumented version that was faster
than the original program. Why is that?
gannimo: Cache effect. Thank you.
Angel: Microphone number one.
Q: Thank you. Thank you for presentation.
I have question: how many architecture do
you support, and if you have support more,
what then?
gannimo: x86_64.
Q: Okay. So no plans for ARM or MIPS,
or...?
gannimo: Oh, there are plans.
Q: Okay.
Nspace: Right, so—
gannimo: Right. Again, there's a finite
amount of time. We focused on the
technology. ARM is high up on the list. If
somebody is interested in working on it
and contributing, we're happy to hear from
it. Our list of targets is ARM first and
then maybe something else. But I think
with x86_64 and ARM we've covered a
majority of the interesting platforms.
Q: And second question, did you try to
fuzz any real closed-source program?
Because as I understand from presentation,
you fuzz, like, just file system, what we
can compile and fuzz with syzkaller like
in the past.
Nspace: So for the evaluation, we wanted
to be able to compare between the source-
based instrumentation and the binary-based
instrumentation, so we focused mostly on
open-source filesystem and drivers because
then we could instrument them with a
compiler. We haven't yet tried, but this
is, like, also pretty high up on the list.
We wanted to try to find some closed-
source drivers—there's lots of them, like
for GPUs or anything—and we'll give it a
try and find some 0days, perhaps.
Q: Yes, but with syzkaller, you still have
a problem. You have to write rules, like,
dictionaries. I mean, you have to
understand the format, have to communicate
with the driver.
Nspace: Yeah, right But there's, for
example, closed-source file systems that
we are looking at.
Q: Okay. Thinking.
Herald: Number two.
Q: Hi. Thank you for your talk. So I don't
know if there are any kCov- or kASan-
equivalent solution to Windows, but I was
wondering if you tried, or are you
planning to do it on Windows, the
framework? Because I know it might be
challenging because of the driver
signature enforcement and PatchGuard, but
I wondered if you tried or thought about
it.
gannimo: Yes, we thought about it and we
decided against it. Windows is incredibly
hard and we are academics. The research I
do in my lab, or we do in my research lab,
focuses on predominantly open-source
software and empowers open-source
software. Doing full support for Microsoft
Windows is somewhat out of scope. If
somebody wants to port these tools, we are
happy to hear it and work with these
people. But it's a lot of additional
engineering effort, versus very
additional—very low additional research
value, so we'll have to find some form of
compromise. And, like, if you would be
willing to fund us, we would go ahead. But
it's—yeah, it's a cost question.
Q: And you're referring both to kernel and
user space, right?
gannimo: Yeah.
Q: Okay. Thank you.
Herald: Number five.
Q: Hi, thanks for the talk. This seems
most interesting if you're looking for
vulnerabilities in closed source kernel
modules, but not giving it too much
thought, it seems it's really trivial to
prevent this if you're writing a closed
source module.
gannimo: Well, how would you prevent this?
Q: Well, for starters, you would just take
a difference between the address of two
functions. That's not gonna be IP
relative, so...
Nspace: Right. So we explicitly—like, even
in the original RetroWrite paper—we
explicitly decided to not try to deal with
obfuscated code, or code that is
purposefully trying to defeat this kind of
rewriting. Because, like, the assumption
is that first of all, there are techniques
to, like, deobfuscate code or remove
these, like, checks in some way, but this
is, like, sort of orthogonal work. And at
the same time, I guess most drivers are
not really compiled with the sort of
obfuscation; they're just, like, you know,
they're compiled with a regular compiler.
But yeah, of course, this is, like, a
limitation.
gannimo: They're likely stripped, but not
necessarily obfuscated. At least from what
we've seen when we looked at binary-only
drivers.
Herald: Microphone number two.
Q: How do you decide where to place the
red zones? From what I heard, you talked
about instrumenting the allocators, but,
well, there are a lot of variables on the
stack, so how do you deal with those?
gannimo: Oh, yeah, that's actually super
cool. I refer to some extent to the paper
that is on the GitHub repo as well. If you
think about it, modern compilers use
canaries for buffers. Are you aware of
stack canaries—how stack canaries work?
So, stack canaries—like, if the compiler
sees there's a buffer that may be
overflown, it places a stack canary
between the buffer and any other data.
What we use is we—as part of our analysis
tool, we find these stack canaries, remove
the code that does the stack canary, and
use this space to place our red zones. So
we actually hack the stack in areas,
remove that code, and add ASan red zones
into the empty stack canaries that are now
there. It's actually a super cool
optimization because we piggyback on what
kind of work the compiler already did for
us before, and we can then leverage that
to gain additional benefits and protect
the stack as well.
Q: Thanks.
Angel: Another question from the Internet.
Q: Yes. Did you consider lifting the
binary code to LLVM IR instead of
generating assembler source?
gannimo: Yes. laughter But, so—a little
bit longer answer. Yes, we did consider
that. Yes, it would be super nice to lift
to LLVM IR. We've actually looked into
this. It's incredibly hard. It's
incredibly complex. There's no direct
mapping between the machine code
equivalent and the LLVM IR. You would
still need to recover all the types. So
it's like this magic dream that you
recover full LLVM IR, then do heavyweight
transformations on top of it. But this is
incredibly hard because if you compile
down from LLVM IR to machine code, you
lose a massive amount of information. You
would have to find a way to recover all of
that information, which is pretty much
impossible and undecidable for many cases.
So for example, just as a note, we only
recover control flow and we only
desymbolize control flow. For data
references—we don't support
instrumentation of data references yet
because there's still an undecidable
problem that we are facing with. I can
talk more about this offline, or there is
a note in the paper as well. So this is
just a small problem. Only if you're
lifting to assembly files. If you're
lifting to LLVM IR, you would have to do
full end-to-end type recovery, which is
massively more complicated. Yes, it would
be super nice. Unfortunately, it is
undecidable and really, really hard. So
you can come up with some heuristics, but
there is no solution that will do this
in—that will be correct 100 percent of the
time.
Angel: We'll take one more question from
microphone number six.
Q: Thank you for your talk. What kind of
disassemblers did you use for RetroWrite,
and did you have problems with the wrong
disassembly? And if so, how did you handle
it?
Nspace: So, RetroWrite—so we used
Capstone for the disassembly.
gannimo: An amazing tool, by the way.
Nspace: Yeah. So the idea is that, like,
we need some kind of—some information
about where the functions are. So for the
kernel modules, this is actually fine
because kernel modules come with this sort
of information because the kernel needs
it, to build stack traces, for example.
For userspace binaries, this is somewhat
less common, but you can use another tool
to try to do function identification. And
we do, like—sort of, like, disassemble the
entire function. So we have run into some
issues with, like—in AT&T syntax, because
like we wanted to use gas, GNU's
assembler, for, for...
gannimo: Reassembling.
Nspace: Reassembly, yeah. And some
instructions are a lot—you can express the
same, like, two different instructions,
like five-byte NOP and six-byte NOP, using
the same string of, like, text—a mnemonic,
an operand string. But the problem is
that, like, the kernel doesn't like it and
crashes. This took me like two days to
debug.
gannimo: So the kernel uses dynamic binary
patching when it runs, at runtime, and it
uses fixed offsets, so if you replace a
five-byte NOP with a six-byte NOP or vice
versa, your offsets change and your kernel
just blows up in your face.
Q: So it was kind of a case-on-case basis
where you saw the errors coming out of the
disassembly and you had to fix it?
Nspace: So sorry, can you repeat the
question?
Q: Like, for example, if you—if some
instruction is not supported by the
disassembler, so you saw that it crashed,
that there's something wrong, and then you
fix it by hand?
Nspace: Yeah, well, if we saw that there
was a problem with it, this—like, I don't
recall having any unknown instructions in
the dissasembler. I don't think I've ever
had a problem with that. But yeah, this
was a lot of, like, you know, engineering
work.
gannimo: So let me repeat. The problem was
not a bug in the disassembler, but an
issue with the instruction format—that the
same mnemonic can be translated into two
different instructions, one of which was
five bytes long, the other one was six
bytes long. Both used the exact same
mnemonic. Right, so this was an issue with
assembly encoding.
Q: But you had no problems with
unsupported instructions which couldn't be
disassembled?
Nspace: No, no. Not as far as I know, at
least.
Angel: We have one more minute, so a very
short question from microphone number two.
Q: Does it work? Ah. Is your binary
instrumentation equally powerful as kernel
address space... I mean, kASan? So, does
it detect all the memory corruptions on
stack, heap and globals?
gannimo: No globals. But heap—it does all
of them on the heap. There's some slight
variation on the stack because we have to
piggyback on the canary stuff. As I
mentioned quickly before, there is no
reflowing and full recovery of data
layouts. So to get anything on the stack,
we have to piggyback on existing compiler
extensions like stack canaries. But—so we
don't support intra-object overflows on
the stack. But we do leverage the stack in
areas to get some stack benefits, which
is, I don't know, 90, 95 percent there
because the stack canaries are pretty
good. For heap, we get the same precision.
For globals, we have very limited support.
Q: Thanks.
Angel: So that's all the time we have for
this talk. You can find the speakers, I
think, afterwards offline. Please give
them a big round of applause for an
interesting talk.
applause
36c3 postrol music
Subtitles created by c3subtitles.de
in the year 2021. Join, and help us!