36C3 - No source, no problem! High speed binary fuzzing

Edit subtitles

0:00 - 0:10

36c3 prerol music
0:18 - 0:23

Herald: So, the next talk for this
afternoon is about high speed binary
0:23 - 0:28

fuzzing. We have two researchers that will
be presenting the product of their latest
0:28 - 0:34

work, which is a framework for static
binary rewriting. Our speakers are—the
0:34 - 0:39

first one is a computer science master's
student at EPFL and the second one is a
0:39 - 0:43

security researcher and assistant
professor at EPFL. Please give a big round
0:43 - 0:45

of applause to Nspace and gannimo.
0:45 - 0:50

Applause
0:50 - 0:53

gannimo (Mathias Payer): Thanks for the
introduction. It's a pleasure to be here,
0:53 - 0:58

as always. We're going to talk about
different ways to speed up your fuzzing
0:58 - 1:02

and to find different kinds of
vulnerabilities or to tweak your binaries
1:02 - 1:08

in somewhat unintended ways. I'm Mathias
Payer or I go by gannimo on Twitter and I
1:08 - 1:14

am an assistant professor at EPFL working
on different forms of software security:
1:14 - 1:19

fuzzing sanitization, but also different
kinds of mitigations. And Matteo over
1:19 - 1:24

there is working on his master's thesis on
different forms of binary rewriting for
1:24 - 1:28

the kernel. And today we're going to take
you on a journey on how to actually
1:28 - 1:32

develop very fast and very efficient
binary rewriting mechanisms that allow you
1:32 - 1:38

to do unintended modifications to the
binaries and allow you to explore
1:38 - 1:46

different kinds of unintended features in
binaries. So about this talk. What we
1:46 - 1:50

discovered or the reason why we set out on
this journey was that fuzzing binaries is
1:50 - 1:56

really, really hard. There's very few
tools in user space. There's—it's
1:56 - 2:00

extremely hard to set it up and it's
extremely hard to set it up in a
2:00 - 2:04

performant way. The setup is complex. You
have to compile different tools. You have
2:04 - 2:09

to modify it. And the results are not
really that satisfactory. As soon as you
2:09 - 2:13

move to the kernel, fuzzing binaries in a
kernel is even harder. There's no tooling
2:13 - 2:17

whatsoever, there's very few users
actually working with binary code in the
2:17 - 2:23

kernel or modifying binary code, and it's
just a nightmare to work with. So what we
2:23 - 2:27

are presenting today is a new approach
that allows you to instrument any form of
2:27 - 2:32

binary code or modern binary code based on
static rewriting, which gives you full
2:32 - 2:37

native performance. You only pay for the
instrumentation that you add, and you can
2:37 - 2:42

do very heavyweight transformations on top
of it. The picture, if you look at the
2:42 - 2:47

modern system, let's say we are looking at
a modern setup. Let's say you're looking
2:47 - 2:53

at cat pictures in your browser: Chrome
plus the kernel plus the libc plus the
2:53 - 2:58

graphical user interface together clog up
at about 100 million lines of code.
2:58 - 3:03

Instrumenting all of this for some form of
security analysis is a nightmare,
3:03 - 3:07

especially along this large stack of
software. There's quite a bit of different
3:07 - 3:11

compilers involved. There's different
linkers. It may be compiled on a different
3:11 - 3:15

system, with different settings and so on.
And then getting your instrumentation
3:15 - 3:19

across all of this is pretty much
impossible and extremely hard to work
3:19 - 3:24

with. And we want to enable you to select
those different parts that you're actually
3:24 - 3:30

interested in. Modify those and then focus
your fuzzing or analysis approaches on
3:30 - 3:35

those small subsets of the code, giving
you a much better and stronger capability
3:35 - 3:39

to test the systems that you're, or those
parts of the system that you're really,
3:39 - 3:46

really interested in. Who's worked on
fuzzing before? Quick show of hands. Wow,
3:46 - 3:54

that's a bunch of you. Do you use AFL?
Yeah, most of you, AFL. Libfuzzer? Cool,
3:54 - 4:00

about 10, 15 percent libfuzzer, 30 percent
fuzzing, and AFL. There's a quite good
4:00 - 4:04

knowledge of fuzzing, so I'm not going to
spend too much time on fuzzing, but for
4:04 - 4:08

those that haven't really run their
fuzzing campaigns yet, it's a very simple
4:08 - 4:12

software testing technique. You're
effectively taking a binary, let's say
4:12 - 4:16

Chrome, as a target and you're running
this in some form of execution
4:16 - 4:21

environment. And fuzzing then consists of
some form of input generation that creates
4:21 - 4:27

new test cases, throws them at your
program and sees—and checks what is
4:27 - 4:31

happening with your program. And either
everything is OK, and your code is being
4:31 - 4:36

executed, and your input—the program
terminates, everything is fine, or you
4:36 - 4:40

have a bug report. If you have a bug
report, you can use this. Find the
4:40 - 4:45

vulnerability, maybe develop a PoC and
then come up with some form of either
4:45 - 4:49

exploit or patch or anything else. Right.
So this is pretty much fuzzing in a in a
4:49 - 4:56

nutshell. How do you get fuzzing to be
effective? How can you cover large source
4:56 - 5:00

bases, complex code, and complex
environment? Well, there's a couple of
5:00 - 5:05

simple steps that you can take. And let's
walk quickly through effective fuzzing
5:05 - 5:13

101. Well, first, you want to be able to
create test cases that actually trigger
5:13 - 5:18

bugs. And this is a very, very
complicated, complicated part. And we need
5:18 - 5:23

to have some notion of the inputs that a
program accepts. And we need to have some
5:23 - 5:28

notion of how we can explore different
parts of the program, right? Different
5:28 - 5:31

parts of functionality. Well, on one hand,
we could have a developer write all the
5:31 - 5:34

test cases by hand, but this would be kind
of boring. It would also require a lot of
5:34 - 5:40

human effort in creating these different
inputs and so on. So coverage guided
5:40 - 5:47

fuzzing has evolved as a very simple way
to guide the fuzzing process, leveraging
5:47 - 5:51

the information on which parts of the code
have been executed by simply tracing the
5:51 - 5:58

individual path through the program based
on the execution flow. So we can—the
5:58 - 6:03

fuzzer can use this feedback to then
modify the inputs that are being thrown at
6:03 - 6:10

the fuzzing process. The second step is
the fuzzer must be able to detect bugs. If
6:10 - 6:13

you've ever looked at a memory corruption,
if you're just writing one byte after the
6:13 - 6:18

end of a buffer, it's highly likely that
your software is not going to crash. But
6:18 - 6:21

it's still a bug, and it may still be
exploitable based on the underlying
6:21 - 6:27

conditions. So we want to be able to
detect violations as soon as they happen,
6:27 - 6:32

for example, based on on some form of
sanitization that we add, some form of
6:32 - 6:35

instrumentation that we add to the to the
binary, that then tells us, hey, there's a
6:35 - 6:40

violation of the memory safety property,
and we terminate the application right
6:40 - 6:45

away as a feedback to the fuzzer. Third,
but the—and last but not least: Speed is
6:45 - 6:50

key, right? For if you're running a
fuzzing campaign, you have a fixed
6:50 - 6:55

resource budget. You have a couple of
cores, and you want to run for 24 hours,
6:55 - 6:59

48 hours, a couple of days. But in any
way, whatever your constraints are, you
6:59 - 7:04

have a fixed amount of instructions that
you can actually execute. And you have to
7:04 - 7:09

decide, am I spending my instructions on
generating new inputs, tracking
7:09 - 7:14

constraints, finding bugs, running
sanitization or executing the program? And
7:14 - 7:18

you need to find a balance between all of
them, as it is a zero sum game. You have a
7:18 - 7:21

fixed amount of resources and you're
trying to make the best with these
7:21 - 7:27

resources. So any overhead is slowing you
down. And again, this becomes an
7:27 - 7:31

optimization problem. How can you most
effectively use the resources that you
7:31 - 7:38

have available? As we are fuzzing with
source code, it's quite easy to actually
7:38 - 7:42

leverage existing mechanisms, and we add
all that instrumentation at compile time.
7:42 - 7:46

We take source code, we pipe it through
the compiler and modern compiler
7:46 - 7:51

platforms, allow you to instrument and add
little code snippets during the
7:51 - 7:55

compilation process that then carry out
all these tasks that are useful for
7:55 - 8:00

fuzzing. For example, modern compilers can
add short snippets of code for coverage
8:00 - 8:04

tracking that will record which parts of
the code that you have executed, or for
8:04 - 8:09

sanitization which record and check every
single memory access if it is safe or not.
8:09 - 8:12

And then when you're running the
instrumented binary, everything is fine
8:12 - 8:17

and you can detect the policy violations
as you go along. Now if you would have
8:17 - 8:21

source code for everything, this would be
amazing. But it's often not the case,
8:21 - 8:28

right? We may be able on Linux to cover a
large part of the protocol stack by
8:28 - 8:34

focusing only on source-code-based
approaches. But there may be applications
8:34 - 8:39

where no source code is available. If we
move to Android or other mobile systems,
8:39 - 8:43

there's many drivers that are not
available as open source or just available
8:43 - 8:49

as binary blobs, or the full software
stack may be closed-source and we only get
8:49 - 8:52

the binaries. And we still want to find
vulnerabilities in these complex software
8:52 - 9:00

stacks that span hundreds of millions of
lines of code in a very efficient way. The
9:00 - 9:05

only solution to cover this part of
massive code base is to actually rewrite
9:05 - 9:09

and focus on binaries. A very simple
approach could be black box fuzzing, but
9:09 - 9:12

this is—this doesn't really get you
anywhere because you don't get any
9:12 - 9:16

feedback; you don't get any information if
you're triggering bugs. So one simple
9:16 - 9:20

approach, and this is the approach that is
most dominantly used today, is to rewrite
9:20 - 9:26

the program or the binary dynamically. So
you're taking the binary and during
9:26 - 9:32

execution you use some form of dynamic
binary instrumentation based on Pin, angr,
9:32 - 9:37

or some other binary rewriting tool and
translate the targeted runtime, adding
9:37 - 9:43

this binary instrumentation on top of it
as you're executing it. It's simple, it's
9:43 - 9:47

straightforward, but it comes at a
terrible performance cost of ten to a
9:47 - 9:52

hundred x slow down, which is not really
effective. And you're spending all your
9:52 - 9:58

cores and your cycles on just executing
the binary instrumentation. So we don't
9:58 - 10:02

really want to do this and we want to have
something that's more effective than that.
10:02 - 10:07

So what we are focusing on is to do static
rewriting. It involves a much more complex
10:07 - 10:12

analysis as we are rewriting the binary
before it is being executed, and we have
10:12 - 10:18

to recover all of the control flow, all of
the different mechanisms, but it results
10:18 - 10:25

in a much better performance. And we can
get more bang for our buck. So why is
10:25 - 10:31

static rewriting so challenging? Well,
first, simply adding code will break the
10:31 - 10:35

target. So if you are disassembling this
piece of code here, which is a simple loop
10:35 - 10:41

that loads data, decrements the registers,
and then jumps if you're not at the end of
10:41 - 10:46

the array and keeps iterating through this
array. Now, as you look at the jump-not-
10:46 - 10:52

zero instruction, the last instruction of
the snippet, it is a relative offset. So
10:52 - 10:58

it jumps backward seven bytes. Which is
nice if you just execute the code as is.
10:58 - 11:02

But as soon as you want to insert new
code, you change the offsets in the
11:02 - 11:07

program, and you're modifying all these
different offsets. And simply adding new
11:07 - 11:13

code somewhere in between will break the
target. So a core feature that we need to
11:13 - 11:18

enforce, or core property that we need to
enforce, is that we must find all the
11:18 - 11:24

references and properly adjust them, both
relative offsets and absolute offsets as
11:24 - 11:30

well. Getting a single one wrong will
break everything. What makes this problem
11:30 - 11:35

really, really hard is that if you're
looking at the binary, a byte is a byte,
11:35 - 11:38

right? There's no way for us to
distinguish between scalars and
11:38 - 11:44

references, and in fact they are
indistinguishable. Getting a single
11:44 - 11:50

reference wrong breaks the target and
would introduce arbitrary crashes. So we
11:50 - 11:54

have to come up with ways that allow us to
distinguish between the two. So for
11:54 - 12:00

example, if you have this code here, it
takes a value and stores it somewhere on
12:00 - 12:07

the stack. This could come from two
different kind of high-level constructs.
12:07 - 12:12

On one hand, it could be taking the
address of a function and storing this
12:12 - 12:17

function address somewhere and in a stack
variable. Or it could be just storing a
12:17 - 12:22

scalar in a stack variable. And these two
are indistinguishable, and rewriting them,
12:22 - 12:25

as soon as we add new code, the offsets
will change. If it is a function, we would
12:25 - 12:32

have to modify the value; if it is a
scalar, we have to keep the same value. So
12:32 - 12:36

how can we come up with a way that allows
us to distinguish between the two and
12:36 - 12:45

rewrite binaries by recovering this
missing information? So let us take—let me
12:45 - 12:48

take you or let us take you on a journey
towards instrumenting binaries in the
12:48 - 12:53

kernel. This is what we aim for. We'll
start with the simple case of
12:53 - 12:57

instrumenting binaries in user land, talk
about different kinds of coverage guided
12:57 - 13:02

fuzzing and what kind of instrumentation
we can add, what kind of sanitization we
13:02 - 13:06

can add, and then focusing on taking it
all together and applying it to kernel
13:06 - 13:11

binaries to see what what will fall out of
it. Let's start with instrumenting
13:11 - 13:17

binaries first. I will now talk a little
bit about RetroWrite, our mechanism and
13:17 - 13:25

our tool that enables static binary
instrumentation by symbolizing existing
13:25 - 13:31

binaries. So we recover the information
and we translate relative offsets and
13:31 - 13:40

absolute offsets into actual labels that
are added to the assembly file. The
13:40 - 13:43

instrumentation can then work on the
recovered assembly file, which can then be
13:43 - 13:48

reassembled into a binary that can then be
executed for fuzzing. We implement
13:48 - 13:52

coverage tracking and binary address
sanitizer on top of this, leveraging
13:52 - 13:58

abstraction as we go forward. The key to
enabling this kind of binary rewriting is
13:58 - 14:02

position-independent code. And position-
independent code has become the de-facto
14:02 - 14:07

standard for any code that is being
executed on a modern system. And it
14:07 - 14:12

effectively says that it is code that can
be loaded at any arbitrary address in your
14:12 - 14:16

address space as you are executing
binaries. It is essential and a
14:16 - 14:19

requirement if you want to have address
space layout randomization or if you want
14:19 - 14:22

to use shared libraries, which de facto
you want to use in all these different
14:22 - 14:26

systems. So since a couple of years, all
the code that you're executing on your
14:26 - 14:33

phones, on your desktops, on your laptops
is position-independent code. And the idea
14:33 - 14:37

between the position-independent code is
that you can load it anywhere in your
14:37 - 14:41

address space and you can therefore not
use any hard-coded static addresses and
14:41 - 14:44

you have to inform the system of
relocations or pick relative
14:44 - 14:53

addresses—to—on how the system can
relocate these different mechanisms. On
14:53 - 14:59

x86_64, position-independent code
leverages addressing that is relative to
14:59 - 15:03

the instruction pointer. So for example,
it uses the current instruction pointer
15:03 - 15:08

and then a relative offset to that
instruction pointer to reference global
15:08 - 15:12

variables, other functions and so on. And
this is a very easy way for us to
15:12 - 15:18

distinguish references from constants,
especially in PIE binaries. If it is RIP-
15:18 - 15:21

relative, it is a reference; everything
else is a constant. And we can build our
15:21 - 15:26

translation algorithm and our translation
mechanism based on this fundamental
15:26 - 15:30

finding to remove any form of heuristic
that is needed by focusing especially on
15:30 - 15:35

position-independent code. So we're
supporting position-independent code; we
15:35 - 15:39

are—we don't support non-position-
independent code, but we give you the
15:39 - 15:43

guarantee that we can rewrite all the
different code that is out there. So
15:43 - 15:48

symbolization works as follows: If you
have the little bit of code on the lower
15:48 - 15:54

right, symbolization replaces first all
the references with assembler labels. So
15:54 - 15:58

look at the call instruction and the jump-
not-zero instruction; the call instruction
15:58 - 16:02

references an absolute address and the
jump-not-zero instruction jumps backward
16:02 - 16:08

relative 15 bytes. So by focusing on these
relative jumps and calls, we can replace
16:08 - 16:12

them with actual labels and rewrite the
binary as follows: so we're calling a
16:12 - 16:16

function, replacing it with the actual
label, and for the jump-not-zero we are
16:16 - 16:21

inserting an actual label in the assembly
code and adding a backward reference. For
16:21 - 16:26

PC-relative addresses, for example the
data load, we can then replace it with the
16:26 - 16:30

name of the actual data that we have
recovered, and we can then add all the
16:30 - 16:36

different relocations and use that as
auxiliary information on top of it. After
16:36 - 16:43

these three steps, we can insert any new
code in between, and can therefore add
16:43 - 16:47

different forms of instrumentations or run
some more higher-level analysis on top of
16:47 - 16:54

it, and then reassemble the file for
fuzzing or coverage-guided tracking,
16:54 - 16:59

address sanitization or whatever else you
want to do. I will now hand over to
16:59 - 17:04

Matteo, who will cover coverage-guided
fuzzing and sanitization and then
17:04 - 17:07

instrumenting the binaries in the kernel.
Go ahead.
17:07 - 17:11

Nspace (Matteo Rizzo): So, now that we
have this really nice framework to rewrite
17:11 - 17:16

binaries, one of the things that we want
to add to actually get the fuzzing is this
17:16 - 17:23

coverage-tracking instrumentation. So
coverage-guided fuzzing is a way, a
17:23 - 17:28

method, for—to let the fuzzer discover
interesting inputs, an interesting path to
17:28 - 17:36

the target by itself. So the basic idea is
that the fuzzer will track coverage—the
17:36 - 17:39

parts of the programs that are covered by
different inputs by inserting some kind of
17:39 - 17:43

instrumentation. So, for example, here we
have this target program that checks if
17:43 - 17:49

the input contains the string "PNG" at the
beginning, and if it does, then it does
17:49 - 17:54

something interesting, otherwise it just
bails out and fails. So if we track the
17:54 - 17:58

part of the programs that each input
executes, the fuzzer can figure out that
17:58 - 18:03

an input that contains "P" will have
discovered a different path through the
18:03 - 18:08

program than input that doesn't contain
it. And then so on it can, one byte at a
18:08 - 18:13

time, discover that this program expects
this magic sequence "PNG" at the start of
18:13 - 18:19

the input. So the way that the fuzzer does
this is that every time a new input
18:19 - 18:24

discovers a new path though the target, it
is considered interesting and added to a
18:24 - 18:29

corpus of interesting inputs. And every
time the fuzzer needs to generate a new
18:29 - 18:36

input, it will select something from the
corpus, mutate it randomly, and then use
18:36 - 18:40

it as the new input. So this is like
a—this is, like, conceptually pretty
18:40 - 18:43

simple, but in practice it works really
well and it really lets the fuzzer
18:43 - 18:48

discover the format that the target
expects in an unsupervised way. So as an
18:48 - 18:53

example, this is an experiment that was
run by the author of AFL—AFL is the fuzzer
18:53 - 18:58

that sort of popularized this
technique—where he was fuzzing a JPEG-
18:58 - 19:02

parsing library, starting from a corpus
that only contained the string "hello". So
19:02 - 19:08

now clearly "hello" is not a valid JPEG
image and so—but still, like, the fuzzer
19:08 - 19:12

was still able to find—to discover the
correct format. So after a while it
19:12 - 19:18

started generating some grayscale images,
on the top left, and as it generated more
19:18 - 19:21

and more inputs, it started generating
more interesting images, such as some
19:21 - 19:25

grayscale gradients, and later on even
some color images. So as you can see, this
19:25 - 19:31

really works, and it allows us to fuzz a
program without really teaching the fuzzer
19:31 - 19:35

how the input should look like. So that's
it for coverage-guided fuzzing. Now we'll
19:35 - 19:38

talk a bit about sanitizations. As a
reminder, the core idea behind
19:38 - 19:42

sanitization is that just looking for
crashes is likely to miss some of the
19:42 - 19:46

bugs. So, for example, if you have this
out-of-bounds one-byte read, that will
19:46 - 19:50

probably not crash the target, but you
would still like to catch it because it
19:50 - 19:53

could be used for an info leak, for
example. So one of the most popular
19:53 - 19:59

sanitizers is Address Sanitizer. So
Address Sanitizer will instrument all the
19:59 - 20:05

memory accesses in your program and check
for memory corruption, which—so, memory
20:05 - 20:09

corruption is a pretty dangerous class of
bugs that unfortunately still plagues C
20:09 - 20:17

and C++ programs and unsafe languages in
general. And ASan tries to catch it by
20:17 - 20:21

instrumenting the target. It is very
popular; it has been used to find
20:21 - 20:27

thousands of bugs in complex software like
Chrome and Linux, and even though it has,
20:27 - 20:32

like, a bit of a slowdown—like about 2x—it
is still really popular because it lets
20:32 - 20:37

you find many, many more bugs. So how does
it work? The basic idea is that ASan will
20:37 - 20:42

insert some special regions of memory
called 'red zones' around every object in
20:42 - 20:47

memory. So we have a small example here
where we declare a 4-byte array on the
20:47 - 20:54

stack. So ASan will allocate the array
"buf" and then add a red zone before it
20:54 - 20:59

and a red zone after it. Whenever the
program accesses the red zones, it is
20:59 - 21:03

terminated with a security violation. So
the instrumentation just prints a bug
21:03 - 21:07

report and then crashes the target. This
is very useful for detecting, for example,
21:07 - 21:11

buffer overflows or underflows and many
other kinds of bugs such as use-after-free
21:11 - 21:16

and so on. So, as an example here, we are
trying to copy 5 bytes into a 4-byte
21:16 - 21:23

buffer, and ASan will check each of the
accesses one by one. And when it sees that
21:23 - 21:27

the last byte writes to a red zone, it
detects the violation and crashes the
21:27 - 21:32

program. So this is good for us because
this bug might have not been found by
21:32 - 21:36

simply looking for crashes, but it's
definitely found if we use ASan. So this
21:36 - 21:41

is something we want for fuzzing. So now
that we've covered—briefly covered ASan we
21:41 - 21:46

can talk about instrumenting binaries in
the kernel. So Mathias left us with
21:46 - 21:53

RetroWrite, and with RetroWrite we can add
both coverage tracking and ASan to
21:53 - 21:57

binaries. So the simple—it's a really
simple idea: now that we can rewrite this
21:57 - 22:03

binary and add instructions wherever we
want, we can implement both coverage
22:03 - 22:07

tracking and ASan. In order to implement
coverage tracking, we simply have to
22:07 - 22:12

identify the start of every basic block
and add a little piece of instrumentation
22:12 - 22:16

at the start of the basic block that tells
the fuzzer 'hey, we've reached this part
22:16 - 22:19

of the program'—'hey, we've reached this
other part of the program'. Then the
22:19 - 22:25

fuzzer can figure out whether that's a new
part or not. ASan is also, like, you know,
22:25 - 22:29

it's also somewhat—it can also be
implemented in this way by finding all
22:29 - 22:34

memory accesses, and then linking with
libASan. libASan is a sort of runtime for
22:34 - 22:39

ASan that takes care of inserting the red
zones and instrument—and adding, you know,
22:39 - 22:43

like, keeping around all the metadata that
ASan needs to know where the red zones
22:43 - 22:48

are, and detecting whether a memory access
is invalid. So, how can we apply all of
22:48 - 22:52

this in the kernel? Well, first of all,
fuzzing the kernel is not as easy as
22:52 - 22:58

fuzzing some userspace program. There's
some issues here. So first of all, there's
22:58 - 23:02

crash handling. So whenever you're fuzzing
a userspace program, you expect crashes,
23:02 - 23:06

well, because that's what we're after. And
if a userspace program crashes, then the
23:06 - 23:11

US simply terminates the crash gracefully.
And so the fuzzer can detect this, and
23:11 - 23:16

save the input as a crashing input, and so
on. And this is all fine. But when you're
23:16 - 23:19

fuzzing the kernel, so—if you were fuzzing
the kernel of the machine that you were
23:19 - 23:23

using for fuzzing, after a while, the
machine would just go down. Because, after
23:23 - 23:27

all, the kernel runs the machine, and if
it starts misbehaving, then all of it can
23:27 - 23:32

go wrong. And more importantly, you can
lose your crashes, because the if the
23:32 - 23:35

machine crashes, then the state of the
fuzzer is lost and you have no idea what
23:35 - 23:40

your crashing input was. So what most
kernel fuzzers have to do is that they
23:40 - 23:43

resort to some kind of VM to keep the
system stable. So they fuzz the kernel in
23:43 - 23:48

a VM and then run the fuzzing agent
outside the VM. On top of that is tooling.
23:48 - 23:53

So, if you want to fuzz a user space
program, you can just download AFL or use
23:53 - 23:58

libfuzzer; there's plenty of tutorials
online, it's really easy to set up and
23:58 - 24:01

just, like—compile your program, you start
fuzzing and you're good to go. If you want
24:01 - 24:05

to fuzz the kernel, it's already much more
complicated. So, for example, if you want
24:05 - 24:09

to fuzz Linux with, say, syzkaller, which
is a popular kernel fuzzer, you have to
24:09 - 24:14

compile the kernel, you have to use a
special config that supports syzkaller,
24:14 - 24:20

you have way less guides available than
for userspace fuzzing, and in general it's
24:20 - 24:25

just much more complex and less intuitive
than just fuzzing userspace. And lastly,
24:25 - 24:29

we have the issue of determinism. So in
general, if you have a single threaded
24:29 - 24:33

userspace program, unless it uses some
kind of random number generator, it is
24:33 - 24:38

more or less deterministic. There's
nothing that affects the execution of the
24:38 - 24:42

program. But—and this is really nice if
you want to try to reproduce a test case,
24:42 - 24:46

because if you have a non-deterministic
test case, then it's really hard to know
24:46 - 24:51

whether this is really a crash or if it's
just something that you should ignore, and
24:51 - 24:56

in the kernel this is even harder, because
you don't only have concurrency, like
24:56 - 25:01

multi-processing, you also have interrupts.
So interrupts can happen at any time, and
25:01 - 25:06

if one time you got an interrupt while
executing your test case and the second
25:06 - 25:10

time you didn't, then maybe it only
crashes one time - you don't really know,
25:10 - 25:16

it's not pretty. And so again, we
have several approaches to fuzzing
25:16 - 25:21

binaries in the kernel. First one is to do
black box fuzzing. We don't really
25:21 - 25:24

like this because it doesn't find much,
especially in something complex
25:24 - 25:27

like a kernel. Approach 1 is to
use dynamic translation,
25:27 - 25:33

so, use something
like QEMU or—you name it. This works, and
25:33 - 25:35

people have used it successfully; the
problem is that it is really, really,
25:35 - 25:42

really slow. Like, we're talking about
10x-plus overhead. And as we said before,
25:42 - 25:46

the more iterations, the more test cases
you can execute in the same amount of
25:46 - 25:51

time, the better, because you find more
bugs. And on top of that, there's no
25:51 - 25:58

currently available sanitizer for
kernel binaries that works—is based on
25:58 - 26:01

this approach. So in userspace you have
something like valgrind; in the kernel,
26:01 - 26:05

you don't have anything, at least that we
know of. There is another approach, which
26:05 - 26:10

is to use Intel Processor Trace. This has
been, like—there's been some research
26:10 - 26:14

papers on this recently, and this is nice
because it allows you to collect coverage
26:14 - 26:18

at nearly zero overhead. It's, like,
really fast, but the problem is that it
26:18 - 26:23

requires hardware support, so it requires
a fairly new x86 CPU, and if you want to
26:23 - 26:27

fuzz something on ARM, say, like, your
Android driver, or if you want to use an
26:27 - 26:32

older CPU, then you're out of luck. And
what's worse, you cannot really use it for
26:32 - 26:36

sanitization, or at least not the kind of
sanitization that ASan does, because it
26:36 - 26:42

just traces the execution; it doesn't
allow you to do checks on memory accesses.
26:42 - 26:47

So Approach 3, which is what we will use,
is static rewriting. So, we had this very
26:47 - 26:51

nice framework for rewriting userspace
binaries, and then we asked ourselves, can
26:51 - 26:57

we make this work in the kernel? So we
took the system, the original RetroWrite,
26:57 - 27:03

we modified it, we implemented support for
Linux modules, and... it works! So we have
27:03 - 27:08

implemented it—we have used it to fuzz
some kernel modules, and it really shows
27:08 - 27:12

that this approach doesn't only work for
userspace; it can also be applied to the
27:12 - 27:19

kernel. So as for some implementation, the
nice thing about kernel modules is that
27:19 - 27:22

they're always position independent. So
you cannot have position—like, fixed-
27:22 - 27:26

position kernel modules because Linux just
doesn't allow it. So we sort of get that
27:26 - 27:32

for free, which is nice. And Linux modules
are also a special class of ELF files,
27:32 - 27:36

which means that the format is—even though
it's not the same as userspace binaries,
27:36 - 27:40

it's still somewhat similar, so we didn't
have to change the symbolizer that much,
27:40 - 27:47

which is also nice. And we implemented
symbolization with this, and we used it to
27:47 - 27:54

implement both code coverage and binary
ASan for kernel binary modules. So for
27:54 - 27:59

coverage: The idea behind the whole
RetroWrite project was that we wanted to
27:59 - 28:04

integrate with existing tools. So existing
fuzzing tools. We didn't want to force our
28:04 - 28:09

users to write their own fuzzer that is
compatible with RetroWrite. So for—in
28:09 - 28:13

userspace we had AFL-style coverage
tracking, and binary ASan which is
28:13 - 28:16

compatible with source-based ASan, and we
wanted to follow the same principle in the
28:16 - 28:22

kernel. So it turns out that Linux has
this built-in coverage-tracking framework
28:22 - 28:27

called kCov that is used by several
popular kernel fuzzers like syzkaller, and
28:27 - 28:31

we wanted to use it ourselves. So we
designed our coverage instrumentation so
28:31 - 28:37

that it integrates with kCov. The downside
is that you need to compile the kernel
28:37 - 28:41

with kCov, but then again, Linux is open
source, so you can sort of always do that;
28:41 - 28:44

the kernel usually—it's usually not the
kernel, it is a binary blob, but it's
28:44 - 28:49

usually only the modules. So that's just
still fine. And the way you do this is—the
28:49 - 28:53

way you implement kCov for binary modules
is that you just have to find the start of
28:53 - 28:59

every basic block, and add a call to some
function that then stores the collected
28:59 - 29:03

coverage. So here's an example: we have a
short snippet of code with three basic
29:03 - 29:08

blocks, and all we have to do is add a
call to "trace_pc" to the start of the
29:08 - 29:12

basic block. "trace_pc" is a function that
is part of the main kernel image that then
29:12 - 29:17

collects this coverage and makes it
available to a userspace fuzzing agent. So
29:17 - 29:21

this is all really easy and it works. And
let's now see how we implemented binary
29:21 - 29:26

ASan. So as I mentioned before, when we
instrument the program with binary ASan in
29:26 - 29:30

userspace we link with libASan, which
takes care of setting up the metadata,
29:30 - 29:34

takes care of putting the red zones around
our allocations, and so on. So we had to
29:34 - 29:37

do something similar in the kernel; of
course, you cannot link with libASan in
29:37 - 29:43

the kernel, because that doesn't work, but
what we can do instead is, again, compile
29:43 - 29:47

the kernel with kASan support. So this
instruments the allocator, kmalloc, to add
29:47 - 29:52

the red zones; it allocates space for the
metadata, it keeps this metadata around,
29:52 - 29:56

does this all for us, which is really
nice. And again, the big advantage of
29:56 - 30:01

using this approach is that we can
integrate seamlessly with a kASan-
30:01 - 30:06

instrumented kernel and with fuzzers that
rely on kASan such as syzkaller. So we see
30:06 - 30:12

this as more of a plus than, like, a
limitation. And how do you implement ASan?
30:12 - 30:17

Well, you have to find every memory access
and instrument it to check the—to check
30:17 - 30:22

whether this is accessing a red zone. And
if it does then you just call this bug
30:22 - 30:26

report function that produces a stack
trace, a bug report, and crashes the
30:26 - 30:30

kernel, so that the fuzzer can detect it.
Again, this is compatible with source-
30:30 - 30:37

based kASan, so we're happy. We can simply
load the rewritten module with added
30:37 - 30:40

instrumentation into a kernel, as long as
you have compiled the kernel with the
30:40 - 30:44

right flags, and we can use a standard
kernel fuzzer. Here for the—our
30:44 - 30:50

evaluation, we used syzkaller, a popular
kernel fuzzer by some folks at Google, and
30:50 - 30:55

it worked really well. So we've finally
reached the end of our journey, and now we
30:55 - 31:00

wanted to present some experiments we did
to see if this really works. So for
31:00 - 31:05

userspace, we wanted to compare the
performance of our binary ASan with
31:05 - 31:10

source-based ASan and with existing
solutions that also work on binaries. So
31:10 - 31:16

for userspace, you can use valgrind
memcheck. It's a memory sanitizer that is
31:16 - 31:21

based on binary translation and dynamic
binary translation and works on binaries.
31:21 - 31:25

We compared it with source ASan and
RetroWrite ASan on the SPEC CPU benchmark
31:25 - 31:31

and saw how fast it was. And for the
kernel we decided to fuzz some file
31:31 - 31:38

systems and some drivers with syzkaller
using both source-based KASan and kCov and
31:38 - 31:45

kRetroWrite-based KASan and kCov. So these
are our results for userspace. So the red
31:45 - 31:49

bar is valgrind. We can see that the
execution time of valgrind is the highest.
31:49 - 31:56

It is really, really slow—like, 3, 10, 30x
overhead, way too slow for fuzzing. Then
31:56 - 32:03

in green, we have our binary ASan, which
is, like, already a large improvement. In
32:03 - 32:07

orange we have source-based ASan. And then
finally in blue we have the original code
32:07 - 32:11

without any instrumentation whatsoever. So
we can see that source-based ASan has,
32:11 - 32:17

like, 2x or 3x overhead, and binary ASan
is a bit higher, like, a bit less
32:17 - 32:21

efficient, but still somewhat close. So
that's for userspace, and for the kernel,
32:21 - 32:25

we—these are some preliminary results, so,
this is, like—I'm doing this work as part
32:25 - 32:30

of my master's thesis, and so I'm still,
like, running the evaluation. Here we can
32:30 - 32:33

see that the overhead is already, like, a
bit lower. So the reason for this is that
32:33 - 32:40

SPEC is a pure CPU benchmark; it doesn't
interact with the system that much. And so
32:40 - 32:44

any instrumentation that you add is going
to massively slow down, or, like,
32:44 - 32:49

considerably slow down the execution. By
contrast, when you fuzz a file system with
32:49 - 32:56

syzkaller, not only every test case has to
go from the high—the host to the guest and
32:56 - 33:02

then do multiple syscalls and so on, but
also every system call has to go through
33:02 - 33:05

several layers of abstraction before it
gets to the actual file system. And all
33:05 - 33:10

these—like, all of this takes a lot of
time, and so in practice the overhead of
33:10 - 33:16

our instrumentation seems to be pretty
reasonable. So, since we know that you
33:16 - 33:33

like demos, we've prepared a small demo of
kRetroWrite. So. Let's see. Yep. Okay. All
33:33 - 33:40

right, so we've prepared a small kernel
module. And this module is just, like,
33:40 - 33:46

really simple; it contains a
vulnerability, and what it does is that it
33:46 - 33:50

creates a character device. So if you're
not familiar with this, a character device
33:50 - 33:55

is like a fake file that is exposed by a
kernel driver and that it can read to and
33:55 - 34:02

write from. And instead of going to a
file, the data that you read—that you, in
34:02 - 34:06

this case, write to the fake file—goes to
the driver and is handled by this demo
34:06 - 34:10

write function. So as we can see, this
function allocates a buffer, a 16-byte
34:10 - 34:15

buffer on the heap, and then copies some
data into it, and then it checks if the
34:15 - 34:20

data contains the string "1337". If it
does, then it accesses the buffer out of
34:20 - 34:23

bounds; you can see "alloc[16]" and the
buffer is sixteen bytes; this is an out-
34:23 - 34:28

of-bounds read by one byte. And if it
doesn't then it just accesses the buffer
34:28 - 34:33

in bounds, which is fine, and it's not a
vulnerability. So we can compile this
34:33 - 34:47

driver. OK, um... OK, and then so we have
our module, and then we will instrument it
34:47 - 35:01

using kRetroWrite. So, instrument... Yes,
please. OK. Right. So kRetroWrite did some
35:01 - 35:07

processing, and it produced an
instrumented module with ASan or kASan and
35:07 - 35:10

a symbolized assembly file. We can
actually have a look at the symbolized
35:10 - 35:18

assembly file to see what it looks like.
Yes. Yes. OK. So, is this big enough?
35:18 - 35:23

Yeah... As you can see, so—we can actually
see here the ASan instrumentation. Ah,
35:23 - 35:29

shouldn't—yeah. So, we—this is the ASan
instrumentation. The original code loads
35:29 - 35:33

some data from this address. And as you
can see, the ASan instrumentation first
35:33 - 35:38

computes the actual address, and then does
some checking—basically, this is checking
35:38 - 35:44

some metadata that ASan stores to check if
the address is in a red zone or not, and
35:44 - 35:49

then if the fail check fails, then it
calls this ASan report which produces a
35:49 - 35:55

stack trace and crashes the kernel. So
this is fine. We can actually even look at
35:55 - 36:18

the disassembly of both modules, so...
object dump and then demo... Ah, nope. OK,
36:18 - 36:22

so on the left, we have the original
module without any instrumentation; on the
36:22 - 36:27

right, we have the module instrumented
with ASan. So as you can see, the original
36:27 - 36:33

module has "push r13" and then has this
memory load here; on the right in the
36:33 - 36:39

instrumented module, kRetroWrite inserted
the ASan instrumentation. So the original
36:39 - 36:44

load is still down here, but between that,
between the first instruction and this
36:44 - 36:48

instruction, we have—now have the kASan
instrumentation that does our check. So
36:48 - 36:57

this is all fine. Now we can actually test
it and see what it does. So we can—we will
36:57 - 37:02

boot a very simple, a very minimal Linux
system, and try to target the
37:02 - 37:06

vulnerability first with the non-
instrumented module and then with the
37:06 - 37:10

instrumented module. And we can—we will
see that in the—with the non-instrumented
37:10 - 37:15

module, the kernel will not crash, but
with the instrumented module it will crash
37:15 - 37:22

and produce a bug report. So. Let's see.
Yeah, this is a QEMU VM, I have no idea
37:22 - 37:27

why it's taking so long to boot. I'll
blame the the demo gods not being kind to
37:27 - 37:40

us. Yeah, I guess we just have to wait.
OK. So. All right, so we loaded the
37:40 - 37:47

module. We will see that it has created a
fake file character device in /dev/demo.
37:47 - 37:59

Yep. We can write this file. Yep. So this
will—this accesses the array in bounds,
37:59 - 38:04

and so this is fine. Then what we can also
do is write "1337" to it so it will access
38:04 - 38:09

the array out of bounds. So this is the
non instrumented module, so this will not
38:09 - 38:14

crash. It will just print some garbage
value. Okay, that's it. Now we can load
38:14 - 38:26

the instrumented module instead... and do
the same experiment again. All right. We
38:26 - 38:32

can see that /dev/demo is still here. So
the module still works. Let's try to write
38:32 - 38:39

"1234" into it. This, again, doesn't
crash. But when we try to write "1337",
38:39 - 38:48

this will produce a bug report.
applause
38:48 - 38:51

So this has quite a lot of information. We
38:51 - 38:56

can see, like, the—where the memory was
allocated, there's a stack trace for that;
38:56 - 39:02

it wasn't freed, so there's no stack trace
for the free. And we see that the cache
39:02 - 39:07

size of the memory, like, it was a 16-byte
allocation. We can see the shape of the
39:07 - 39:11

memory. We see that these two zeros means
that there's two 8-byte chunks of valid
39:11 - 39:16

memory. And then these "fc fc fc" is
the—are the red zones that I was talking
39:16 - 39:20

about before. All right, so that's it for
the demo. We will switch back to our
39:20 - 39:25

presentation now. So... hope you enjoyed
it.
39:25 - 39:31

gannimo: Cool. So after applying this to a
demo module, we also wanted to see what
39:31 - 39:35

happens if we apply this to a real file
system. After a couple of hours we
39:35 - 39:41

were—when we came back and checked on the
results, we saw a couple of issues popping
39:41 - 39:49

up, including a nice set of use-after-free
reads, a set of use-after-free writes, and
39:49 - 39:56

we checked the bug reports and we saw a
whole bunch of Linux kernel issues popping
39:56 - 40:03

up one after the other in this nondescript
module that we fuzzed. We're in the
40:03 - 40:07

process of reporting it. This will take
some time until it is fixed; that's why
40:07 - 40:13

you see the blurry lines. But as you see,
there's still quite a bit of opportunity
40:13 - 40:19

in the Linux kernel where you can apply
different forms of targeted fuzzing into
40:19 - 40:26

different modules, leverage these modules
on top of a kASan instrumented kernel and
40:26 - 40:32

then leveraging this as part of your
fuzzing toolchain to find interesting
40:32 - 40:39

kernel 0days that... yeah. You can then
develop further, or report, or do whatever
40:39 - 40:45

you want with them. Now, we've shown you
how you can take existing binary-only
40:45 - 40:51

modules, think different binary-only
drivers, or even existing modules where
40:51 - 40:56

you don't want to instrument a full set of
the Linux kernel, but only focus fuzzing
40:56 - 41:02

and exploration on a small different—small
limited piece of code and then do security
41:02 - 41:09

tests on those. We've shown you how we can
do coverage-based tracking and address
41:09 - 41:14

sanitization. But this is also up to you
on what kind of other instrumentation you
41:14 - 41:18

want. Like this is just a tool, a
framework that allows you to do arbitrary
41:18 - 41:24

forms of instrumentation. So we've taken
you on a journey from instrumenting
41:24 - 41:29

binaries over coverage-guided fuzzing and
sanitization to instrumenting modules in
41:29 - 41:37

the kernel and then finding crashes in the
kernel. Let me wrap up the talk. So, this
41:37 - 41:42

is one of the the fun pieces of work that
we do in the hexhive lab at EPFL. So if
41:42 - 41:46

you're looking for postdoc opportunities
or if you're thinking about a PhD, come
41:46 - 41:52

talk to us. We're always hiring. The tools
will be released as open source. A large
41:52 - 41:57

chunk of the userspace work is already
open source. We're working on a set of
41:57 - 42:02

additional demos and so on so that you can
get started faster, leveraging the
42:02 - 42:08

different existing instrumentation that is
already out there. The userspace work is
42:08 - 42:12

already available. The kernel work will be
available in a couple of weeks. This
42:12 - 42:17

allows you to instrument real-world
binaries for fuzzing, leveraging existing
42:17 - 42:21

transformations for coverage tracking to
enable fast and effective fuzzing and
42:21 - 42:26

memory checking to detect the actual bugs
that exist there. The key takeaway from
42:26 - 42:32

this talk is that RetroWrite and
kRetroWrite enables static binary
42:32 - 42:38

rewriting at zero instrumentation cost. We
take the limitation of focusing only on
42:38 - 42:43

position-independent code, which is not a
real implementation, but we get the
42:43 - 42:48

advantage of being able to symbolize
without actually relying on heuristics, so
42:48 - 42:55

we can even symbolize large, complex
source—large, complex applications and
42:55 - 43:01

effectively rewrite those aspects and then
you can focus fuzzing on these parts.
43:01 - 43:06

Another point I want to mention is that
this enables you to reuse existing tooling
43:06 - 43:11

so you can take a binary blob, instrument
it, and then reuse, for example, Address
43:11 - 43:16

Sanitizer or existing fuzzing tools, as it
integrates really, really nice. As I said,
43:16 - 43:23

all the code is open source. Check it out.
Try it. Let us know if it breaks. We're
43:23 - 43:28

happy to fix. We are committed to open
source. And let us know if there are any
43:28 - 43:37

questions. Thank you.
applause
43:37 - 43:42

Herald: So, thanks, guys, for an
interesting talk. We have some time for
43:42 - 43:47

questions, so we have microphones along
the aisles. We'll start from question from
43:47 - 43:51

microphone number two.
Q: Hi. Thanks for your talk and for the
43:51 - 43:59

demo. I'm not sure about the use-case you
showed for the kernel RetroWrite. 'Cause
43:59 - 44:06

you're usually interested in fuzzing
binary in kernelspace when you don't have
44:06 - 44:14

source code for the kernel. For example,
for IoT or Android and so on. But you just
44:14 - 44:22

reuse the kCov and kASan in the kernel,
and you never have the kernel in IoT or
44:22 - 44:29

Android which is compiled with that. So
are you—do you have any plans to binary
44:29 - 44:32

instrument the kernel itself, not the
modules?
44:32 - 44:39

Nspace: So we thought about that. I think
that there's some additional problems that
44:39 - 44:44

we would have to solve in order to be able
to instrument the full kernel. So other
44:44 - 44:48

than the fact that it gives us
compatibility with, like, existing tools,
44:48 - 44:52

the reason why we decided to go with
compiling the kernel with kASan and kCov
44:52 - 44:57

is that building the, like—you would you
have to, like, think about it. You
44:57 - 45:02

have to instrument the memory allocator to
add red zones, which is, like, already
45:02 - 45:07

somewhat complex. You have to instrument
the exception handlers to catch, like, any
45:07 - 45:12

faults that the instrumentation detects.
You would have to, like, set up some
45:12 - 45:17

memory for the ASan shadow. So this is,
like—I think you should be able to do it,
45:17 - 45:22

but it would require a lot of additional
work. So this is, like—this was like four
45:22 - 45:26

months' thesis. So we decided to start
small and prove that it works in
45:26 - 45:30

the kernel for modules, and then leave it
to future work to actually extend it to
45:30 - 45:38

the full kernel. Also, like, I think for
Android—so in the case of Linux, the
45:38 - 45:42

kernel is GPL, right, so if the
manufacturers ships a custom kernel, they
45:42 - 45:45

have to release the source code, right?
Q: They never do.
45:45 - 45:47

Nspace: They never—well, that's a
different issue. Right?
45:47 - 45:49

gannimo: Right.
Q: So that's why I ask, because I don't
45:49 - 45:52

see how it just can be used in the real
world.
45:52 - 45:57

gannimo: Well, let me try to put this into
perspective a little bit as well. Right.
45:57 - 46:02

So there's the—what we did so far is we
leveraged existing tools, like kASan or
46:02 - 46:09

kCov, and integrated into these existing
tools. Now, doing heap-based allocation is
46:09 - 46:14

fairly simple and replacing those with
additional red zones—that instrumentation
46:14 - 46:20

you can carry out fairly well by focusing
on the different allocators. Second to
46:20 - 46:25

that, simply oopsing the kernel and
printing the stack trace is also fairly
46:25 - 46:29

straightforward. So it's not a lot of
additional effort. So it is—it involves
46:29 - 46:38

some engineering effort to port this to
non-kASan-compiled kernels. But we think
46:38 - 46:45

it is very feasible. In the interest of
time, we focused on kASan-enabled kernels,
46:45 - 46:51

so that some form of ASan is already
enabled. But yeah, this is additional
46:51 - 46:56

engineering effort. But there is also a
community out there that can help us with
46:56 - 47:01

these kind of changes. So kRetroWrite and
RetroWrite themselves are the binary
47:01 - 47:07

rewriting platform that allow you to turn
a binary into an assembly file that you
47:07 - 47:12

can then instrument and run different
passes on top of it. So another pass would
47:12 - 47:16

be a full ASan pass or kASan pass that
somebody could add and then contribute
47:16 - 47:19

back to the community.
Q: Yeah, it would be really useful.
47:19 - 47:20

Thanks.
gannimo: Cool.
47:20 - 47:24

Angel: Next question from the Internet.
Q: Yes, there is a question regarding the
47:24 - 47:31

slide on the SPEC CPU benchmark. The
second or third graph from the right had
47:31 - 47:37

an instrumented version that was faster
than the original program. Why is that?
47:37 - 47:42

gannimo: Cache effect. Thank you.
Angel: Microphone number one.
47:42 - 47:47

Q: Thank you. Thank you for presentation.
I have question: how many architecture do
47:47 - 47:51

you support, and if you have support more,
what then?
47:51 - 47:56

gannimo: x86_64.
Q: Okay. So no plans for ARM or MIPS,
47:56 - 47:58

or...?
gannimo: Oh, there are plans.
47:58 - 48:01

Q: Okay.
Nspace: Right, so—
48:01 - 48:06

gannimo: Right. Again, there's a finite
amount of time. We focused on the
48:06 - 48:12

technology. ARM is high up on the list. If
somebody is interested in working on it
48:12 - 48:18

and contributing, we're happy to hear from
it. Our list of targets is ARM first and
48:18 - 48:23

then maybe something else. But I think
with x86_64 and ARM we've covered a
48:23 - 48:33

majority of the interesting platforms.
Q: And second question, did you try to
48:33 - 48:38

fuzz any real closed-source program?
Because as I understand from presentation,
48:38 - 48:45

you fuzz, like, just file system, what we
can compile and fuzz with syzkaller like
48:45 - 48:49

in the past.
Nspace: So for the evaluation, we wanted
48:49 - 48:52

to be able to compare between the source-
based instrumentation and the binary-based
48:52 - 48:57

instrumentation, so we focused mostly on
open-source filesystem and drivers because
48:57 - 49:02

then we could instrument them with a
compiler. We haven't yet tried, but this
49:02 - 49:06

is, like, also pretty high up on the list.
We wanted to try to find some closed-
49:06 - 49:11

source drivers—there's lots of them, like
for GPUs or anything—and we'll give it a
49:11 - 49:15

try and find some 0days, perhaps.
Q: Yes, but with syzkaller, you still have
49:15 - 49:23

a problem. You have to write rules, like,
dictionaries. I mean, you have to
49:23 - 49:25

understand the format, have to communicate
with the driver.
49:25 - 49:29

Nspace: Yeah, right But there's, for
example, closed-source file systems that
49:29 - 49:33

we are looking at.
Q: Okay. Thinking.
49:33 - 49:39

Herald: Number two.
Q: Hi. Thank you for your talk. So I don't
49:39 - 49:45

know if there are any kCov- or kASan-
equivalent solution to Windows, but I was
49:45 - 49:50

wondering if you tried, or are you
planning to do it on Windows, the
49:50 - 49:53

framework? Because I know it might be
challenging because of the driver
49:53 - 49:57

signature enforcement and PatchGuard, but
I wondered if you tried or thought about
49:57 - 49:59

it.
gannimo: Yes, we thought about it and we
49:59 - 50:06

decided against it. Windows is incredibly
hard and we are academics. The research I
50:06 - 50:12

do in my lab, or we do in my research lab,
focuses on predominantly open-source
50:12 - 50:17

software and empowers open-source
software. Doing full support for Microsoft
50:17 - 50:21

Windows is somewhat out of scope. If
somebody wants to port these tools, we are
50:21 - 50:24

happy to hear it and work with these
people. But it's a lot of additional
50:24 - 50:29

engineering effort, versus very
additional—very low additional research
50:29 - 50:33

value, so we'll have to find some form of
compromise. And, like, if you would be
50:33 - 50:39

willing to fund us, we would go ahead. But
it's—yeah, it's a cost question.
50:39 - 50:42

Q: And you're referring both to kernel and
user space, right?
50:42 - 50:45

gannimo: Yeah.
Q: Okay. Thank you.
50:45 - 50:48

Herald: Number five.
Q: Hi, thanks for the talk. This seems
50:48 - 50:52

most interesting if you're looking for
vulnerabilities in closed source kernel
50:52 - 50:58

modules, but not giving it too much
thought, it seems it's really trivial to
50:58 - 51:02

prevent this if you're writing a closed
source module.
51:02 - 51:07

gannimo: Well, how would you prevent this?
Q: Well, for starters, you would just take
51:07 - 51:11

a difference between the address of two
functions. That's not gonna be IP
51:11 - 51:16

relative, so...
Nspace: Right. So we explicitly—like, even
51:16 - 51:22

in the original RetroWrite paper—we
explicitly decided to not try to deal with
51:22 - 51:26

obfuscated code, or code that is
purposefully trying to defeat this kind of
51:26 - 51:31

rewriting. Because, like, the assumption
is that first of all, there are techniques
51:31 - 51:34

to, like, deobfuscate code or remove
these, like, checks in some way, but this
51:34 - 51:40

is, like, sort of orthogonal work. And at
the same time, I guess most drivers are
51:40 - 51:44

not really compiled with the sort of
obfuscation; they're just, like, you know,
51:44 - 51:48

they're compiled with a regular compiler.
But yeah, of course, this is, like, a
51:48 - 51:50

limitation.
gannimo: They're likely stripped, but not
51:50 - 51:54

necessarily obfuscated. At least from what
we've seen when we looked at binary-only
51:54 - 51:59

drivers.
Herald: Microphone number two.
51:59 - 52:04

Q: How do you decide where to place the
red zones? From what I heard, you talked
52:04 - 52:10

about instrumenting the allocators, but,
well, there are a lot of variables on the
52:10 - 52:13

stack, so how do you deal with those?
gannimo: Oh, yeah, that's actually super
52:13 - 52:20

cool. I refer to some extent to the paper
that is on the GitHub repo as well. If you
52:20 - 52:27

think about it, modern compilers use
canaries for buffers. Are you aware of
52:27 - 52:31

stack canaries—how stack canaries work?
So, stack canaries—like, if the compiler
52:31 - 52:34

sees there's a buffer that may be
overflown, it places a stack canary
52:34 - 52:40

between the buffer and any other data.
What we use is we—as part of our analysis
52:40 - 52:45

tool, we find these stack canaries, remove
the code that does the stack canary, and
52:45 - 52:49

use this space to place our red zones. So
we actually hack the stack in areas,
52:49 - 52:55

remove that code, and add ASan red zones
into the empty stack canaries that are now
52:55 - 52:59

there. It's actually a super cool
optimization because we piggyback on what
52:59 - 53:03

kind of work the compiler already did for
us before, and we can then leverage that
53:03 - 53:07

to gain additional benefits and protect
the stack as well.
53:07 - 53:11

Q: Thanks.
Angel: Another question from the Internet.
53:16 - 53:21

Q: Yes. Did you consider lifting the
binary code to LLVM IR instead of
53:21 - 53:28

generating assembler source?
gannimo: Yes. laughter But, so—a little
53:28 - 53:32

bit longer answer. Yes, we did consider
that. Yes, it would be super nice to lift
53:32 - 53:39

to LLVM IR. We've actually looked into
this. It's incredibly hard. It's
53:39 - 53:42

incredibly complex. There's no direct
mapping between the machine code
53:42 - 53:48

equivalent and the LLVM IR. You would
still need to recover all the types. So
53:48 - 53:52

it's like this magic dream that you
recover full LLVM IR, then do heavyweight
53:52 - 53:57

transformations on top of it. But this is
incredibly hard because if you compile
53:57 - 54:04

down from LLVM IR to machine code, you
lose a massive amount of information. You
54:04 - 54:07

would have to find a way to recover all of
that information, which is pretty much
54:07 - 54:15

impossible and undecidable for many cases.
So for example, just as a note, we only
54:15 - 54:19

recover control flow and we only
desymbolize control flow. For data
54:19 - 54:23

references—we don't support
instrumentation of data references yet
54:23 - 54:29

because there's still an undecidable
problem that we are facing with. I can
54:29 - 54:33

talk more about this offline, or there is
a note in the paper as well. So this is
54:33 - 54:37

just a small problem. Only if you're
lifting to assembly files. If you're
54:37 - 54:42

lifting to LLVM IR, you would have to do
full end-to-end type recovery, which is
54:42 - 54:46

massively more complicated. Yes, it would
be super nice. Unfortunately, it is
54:46 - 54:51

undecidable and really, really hard. So
you can come up with some heuristics, but
54:51 - 54:55

there is no solution that will do this
in—that will be correct 100 percent of the
54:55 - 54:57

time.
Angel: We'll take one more question from
54:57 - 55:03

microphone number six.
Q: Thank you for your talk. What kind of
55:03 - 55:07

disassemblers did you use for RetroWrite,
and did you have problems with the wrong
55:07 - 55:13

disassembly? And if so, how did you handle
it?
55:13 - 55:19

Nspace: So, RetroWrite—so we used
Capstone for the disassembly.
55:19 - 55:24

gannimo: An amazing tool, by the way.
Nspace: Yeah. So the idea is that, like,
55:24 - 55:30

we need some kind of—some information
about where the functions are. So for the
55:30 - 55:34

kernel modules, this is actually fine
because kernel modules come with this sort
55:34 - 55:38

of information because the kernel needs
it, to build stack traces, for example.
55:38 - 55:42

For userspace binaries, this is somewhat
less common, but you can use another tool
55:42 - 55:46

to try to do function identification. And
we do, like—sort of, like, disassemble the
55:46 - 55:54

entire function. So we have run into some
issues with, like—in AT&T syntax, because
55:54 - 56:00

like we wanted to use gas, GNU's
assembler, for, for...
56:00 - 56:04

gannimo: Reassembling.
Nspace: Reassembly, yeah. And some
56:04 - 56:10

instructions are a lot—you can express the
same, like, two different instructions,
56:10 - 56:16

like five-byte NOP and six-byte NOP, using
the same string of, like, text—a mnemonic,
56:16 - 56:20

an operand string. But the problem is
that, like, the kernel doesn't like it and
56:20 - 56:22

crashes. This took me like two days to
debug.
56:22 - 56:28

gannimo: So the kernel uses dynamic binary
patching when it runs, at runtime, and it
56:28 - 56:33

uses fixed offsets, so if you replace a
five-byte NOP with a six-byte NOP or vice
56:33 - 56:38

versa, your offsets change and your kernel
just blows up in your face.
56:38 - 56:43

Q: So it was kind of a case-on-case basis
where you saw the errors coming out of the
56:43 - 56:48

disassembly and you had to fix it?
Nspace: So sorry, can you repeat the
56:48 - 56:51

question?
Q: Like, for example, if you—if some
56:51 - 56:55

instruction is not supported by the
disassembler, so you saw that it crashed,
56:55 - 56:58

that there's something wrong, and then you
fix it by hand?
56:58 - 57:03

Nspace: Yeah, well, if we saw that there
was a problem with it, this—like, I don't
57:03 - 57:07

recall having any unknown instructions in
the dissasembler. I don't think I've ever
57:07 - 57:11

had a problem with that. But yeah, this
was a lot of, like, you know, engineering
57:11 - 57:14

work.
gannimo: So let me repeat. The problem was
57:14 - 57:19

not a bug in the disassembler, but an
issue with the instruction format—that the
57:19 - 57:25

same mnemonic can be translated into two
different instructions, one of which was
57:25 - 57:29

five bytes long, the other one was six
bytes long. Both used the exact same
57:29 - 57:33

mnemonic. Right, so this was an issue with
assembly encoding.
57:33 - 57:38

Q: But you had no problems with
unsupported instructions which couldn't be
57:38 - 57:41

disassembled?
Nspace: No, no. Not as far as I know, at
57:41 - 57:43

least.
Angel: We have one more minute, so a very
57:43 - 57:52

short question from microphone number two.
Q: Does it work? Ah. Is your binary
57:52 - 58:02

instrumentation equally powerful as kernel
address space... I mean, kASan? So, does
58:02 - 58:06

it detect all the memory corruptions on
stack, heap and globals?
58:06 - 58:13

gannimo: No globals. But heap—it does all
of them on the heap. There's some slight
58:13 - 58:20

variation on the stack because we have to
piggyback on the canary stuff. As I
58:20 - 58:24

mentioned quickly before, there is no
reflowing and full recovery of data
58:24 - 58:29

layouts. So to get anything on the stack,
we have to piggyback on existing compiler
58:29 - 58:37

extensions like stack canaries. But—so we
don't support intra-object overflows on
58:37 - 58:41

the stack. But we do leverage the stack in
areas to get some stack benefits, which
58:41 - 58:45

is, I don't know, 90, 95 percent there
because the stack canaries are pretty
58:45 - 58:51

good. For heap, we get the same precision.
For globals, we have very limited support.
58:51 - 58:54

Q: Thanks.
Angel: So that's all the time we have for
58:54 - 58:58

this talk. You can find the speakers, I
think, afterwards offline. Please give
58:58 - 59:00

them a big round of applause for an
interesting talk.
59:00 - 59:03

applause
59:03 - 59:07

36c3 postrol music
59:07 - 59:29

Subtitles created by c3subtitles.de
in the year 2021. Join, and help us!

Title:: 36C3 - No source, no problem! High speed binary fuzzing
Description:: more » « less
Video Language:: English
Duration:: 59:29

	Bar Sch edited English subtitles for 36C3 - No source, no problem! High speed binary fuzzing
	maehw edited English subtitles for 36C3 - No source, no problem! High speed binary fuzzing
	camelusferus edited English subtitles for 36C3 - No source, no problem! High speed binary fuzzing
	camelusferus edited English subtitles for 36C3 - No source, no problem! High speed binary fuzzing
	maehw edited English subtitles for 36C3 - No source, no problem! High speed binary fuzzing
	Johann.rc3 edited English subtitles for 36C3 - No source, no problem! High speed binary fuzzing
	maehw edited English subtitles for 36C3 - No source, no problem! High speed binary fuzzing
	C3Subtitles edited English subtitles for 36C3 - No source, no problem! High speed binary fuzzing

Show all

English subtitles

Revisions

Revision 10 Edited

Bar Sch

36C3 - No source, no problem! High speed binary fuzzing

Revisions

Our website uses cookies

Operating cookies (Required)