1
00:00:00,000 --> 00:00:09,520
<i>36c3 prerol music</i>

2
00:00:18,410 --> 00:00:23,250
Herald: So, the next talk for this
afternoon is about high speed binary

3
00:00:23,250 --> 00:00:28,110
fuzzing. We have two researchers that will
be presenting the product of their latest

4
00:00:28,110 --> 00:00:33,640
work, which is a framework for static
binary rewriting. Our speakers are—the

5
00:00:33,640 --> 00:00:38,580
first one is a computer science master's
student at EPFL and the second one is a

6
00:00:38,580 --> 00:00:42,730
security researcher and assistant
professor at EPFL. Please give a big round

7
00:00:42,730 --> 00:00:45,048
of applause to Nspace and gannimo.

8
00:00:45,048 --> 00:00:50,280
<i>Applause</i>

9
00:00:50,280 --> 00:00:52,610
gannimo (Mathias Payer): Thanks for the
introduction. It's a pleasure to be here,

10
00:00:52,610 --> 00:00:57,850
as always. We're going to talk about
different ways to speed up your fuzzing

11
00:00:57,850 --> 00:01:02,050
and to find different kinds of
vulnerabilities or to tweak your binaries

12
00:01:02,050 --> 00:01:08,070
in somewhat unintended ways. I'm Mathias
Payer or I go by gannimo on Twitter and I

13
00:01:08,070 --> 00:01:14,440
am an assistant professor at EPFL working
on different forms of software security:

14
00:01:14,440 --> 00:01:18,700
fuzzing sanitization, but also different
kinds of mitigations. And Matteo over

15
00:01:18,700 --> 00:01:24,160
there is working on his master's thesis on
different forms of binary rewriting for

16
00:01:24,160 --> 00:01:27,820
the kernel. And today we're going to take
you on a journey on how to actually

17
00:01:27,820 --> 00:01:32,180
develop very fast and very efficient
binary rewriting mechanisms that allow you

18
00:01:32,180 --> 00:01:37,710
to do unintended modifications to the
binaries and allow you to explore

19
00:01:37,710 --> 00:01:45,700
different kinds of unintended features in
binaries. So about this talk. What we

20
00:01:45,700 --> 00:01:49,729
discovered or the reason why we set out on
this journey was that fuzzing binaries is

21
00:01:49,729 --> 00:01:56,460
really, really hard. There's very few
tools in user space. There's—it's

22
00:01:56,460 --> 00:01:59,680
extremely hard to set it up and it's
extremely hard to set it up in a

23
00:01:59,680 --> 00:02:04,479
performant way. The setup is complex. You
have to compile different tools. You have

24
00:02:04,479 --> 00:02:08,520
to modify it. And the results are not
really that satisfactory. As soon as you

25
00:02:08,520 --> 00:02:13,320
move to the kernel, fuzzing binaries in a
kernel is even harder. There's no tooling

26
00:02:13,320 --> 00:02:16,880
whatsoever, there's very few users
actually working with binary code in the

27
00:02:16,880 --> 00:02:22,630
kernel or modifying binary code, and it's
just a nightmare to work with. So what we

28
00:02:22,630 --> 00:02:26,850
are presenting today is a new approach
that allows you to instrument any form of

29
00:02:26,850 --> 00:02:31,920
binary code or modern binary code based on
static rewriting, which gives you full

30
00:02:31,920 --> 00:02:36,819
native performance. You only pay for the
instrumentation that you add, and you can

31
00:02:36,819 --> 00:02:41,690
do very heavyweight transformations on top
of it. The picture, if you look at the

32
00:02:41,690 --> 00:02:47,470
modern system, let's say we are looking at
a modern setup. Let's say you're looking

33
00:02:47,470 --> 00:02:52,700
at cat pictures in your browser: Chrome
plus the kernel plus the libc plus the

34
00:02:52,700 --> 00:02:57,920
graphical user interface together clog up
at about 100 million lines of code.

35
00:02:57,920 --> 00:03:02,670
Instrumenting all of this for some form of
security analysis is a nightmare,

36
00:03:02,670 --> 00:03:06,690
especially along this large stack of
software. There's quite a bit of different

37
00:03:06,690 --> 00:03:11,260
compilers involved. There's different
linkers. It may be compiled on a different

38
00:03:11,260 --> 00:03:14,620
system, with different settings and so on.
And then getting your instrumentation

39
00:03:14,620 --> 00:03:18,569
across all of this is pretty much
impossible and extremely hard to work

40
00:03:18,569 --> 00:03:24,269
with. And we want to enable you to select
those different parts that you're actually

41
00:03:24,269 --> 00:03:29,629
interested in. Modify those and then focus
your fuzzing or analysis approaches on

42
00:03:29,629 --> 00:03:35,040
those small subsets of the code, giving
you a much better and stronger capability

43
00:03:35,040 --> 00:03:38,690
to test the systems that you're, or those
parts of the system that you're really,

44
00:03:38,690 --> 00:03:45,659
really interested in. Who's worked on
fuzzing before? Quick show of hands. Wow,

45
00:03:45,659 --> 00:03:54,379
that's a bunch of you. Do you use AFL?
Yeah, most of you, AFL. Libfuzzer? Cool,

46
00:03:54,379 --> 00:03:59,760
about 10, 15 percent libfuzzer, 30 percent
fuzzing, and AFL. There's a quite good

47
00:03:59,760 --> 00:04:03,980
knowledge of fuzzing, so I'm not going to
spend too much time on fuzzing, but for

48
00:04:03,980 --> 00:04:07,500
those that haven't really run their
fuzzing campaigns yet, it's a very simple

49
00:04:07,500 --> 00:04:12,060
software testing technique. You're
effectively taking a binary, let's say

50
00:04:12,060 --> 00:04:16,480
Chrome, as a target and you're running
this in some form of execution

51
00:04:16,480 --> 00:04:20,959
environment. And fuzzing then consists of
some form of input generation that creates

52
00:04:20,959 --> 00:04:26,620
new test cases, throws them at your
program and sees—and checks what is

53
00:04:26,620 --> 00:04:31,310
happening with your program. And either
everything is OK, and your code is being

54
00:04:31,310 --> 00:04:35,640
executed, and your input—the program
terminates, everything is fine, or you

55
00:04:35,640 --> 00:04:39,773
have a bug report. If you have a bug
report, you can use this. Find the

56
00:04:39,773 --> 00:04:44,520
vulnerability, maybe develop a PoC and
then come up with some form of either

57
00:04:44,520 --> 00:04:49,240
exploit or patch or anything else. Right.
So this is pretty much fuzzing in a in a

58
00:04:49,240 --> 00:04:55,560
nutshell. How do you get fuzzing to be
effective? How can you cover large source

59
00:04:55,560 --> 00:05:00,419
bases, complex code, and complex
environment? Well, there's a couple of

60
00:05:00,419 --> 00:05:04,979
simple steps that you can take. And let's
walk quickly through effective fuzzing

61
00:05:04,979 --> 00:05:12,630
101. Well, first, you want to be able to
create test cases that actually trigger

62
00:05:12,630 --> 00:05:18,100
bugs. And this is a very, very
complicated, complicated part. And we need

63
00:05:18,100 --> 00:05:22,800
to have some notion of the inputs that a
program accepts. And we need to have some

64
00:05:22,800 --> 00:05:27,780
notion of how we can explore different
parts of the program, right? Different

65
00:05:27,780 --> 00:05:30,870
parts of functionality. Well, on one hand,
we could have a developer write all the

66
00:05:30,870 --> 00:05:34,370
test cases by hand, but this would be kind
of boring. It would also require a lot of

67
00:05:34,370 --> 00:05:40,220
human effort in creating these different
inputs and so on. So coverage guided

68
00:05:40,220 --> 00:05:46,990
fuzzing has evolved as a very simple way
to guide the fuzzing process, leveraging

69
00:05:46,990 --> 00:05:51,220
the information on which parts of the code
have been executed by simply tracing the

70
00:05:51,220 --> 00:05:58,500
individual path through the program based
on the execution flow. So we can—the

71
00:05:58,500 --> 00:06:03,460
fuzzer can use this feedback to then
modify the inputs that are being thrown at

72
00:06:03,460 --> 00:06:09,830
the fuzzing process. The second step is
the fuzzer must be able to detect bugs. If

73
00:06:09,830 --> 00:06:13,080
you've ever looked at a memory corruption,
if you're just writing one byte after the

74
00:06:13,080 --> 00:06:18,490
end of a buffer, it's highly likely that
your software is not going to crash. But

75
00:06:18,490 --> 00:06:21,180
it's still a bug, and it may still be
exploitable based on the underlying

76
00:06:21,180 --> 00:06:26,690
conditions. So we want to be able to
detect violations as soon as they happen,

77
00:06:26,690 --> 00:06:31,600
for example, based on on some form of
sanitization that we add, some form of

78
00:06:31,600 --> 00:06:35,400
instrumentation that we add to the to the
binary, that then tells us, hey, there's a

79
00:06:35,400 --> 00:06:39,729
violation of the memory safety property,
and we terminate the application right

80
00:06:39,729 --> 00:06:45,300
away as a feedback to the fuzzer. Third,
but the—and last but not least: Speed is

81
00:06:45,300 --> 00:06:49,569
key, right? For if you're running a
fuzzing campaign, you have a fixed

82
00:06:49,569 --> 00:06:54,639
resource budget. You have a couple of
cores, and you want to run for 24 hours,

83
00:06:54,639 --> 00:06:59,470
48 hours, a couple of days. But in any
way, whatever your constraints are, you

84
00:06:59,470 --> 00:07:04,210
have a fixed amount of instructions that
you can actually execute. And you have to

85
00:07:04,210 --> 00:07:08,699
decide, am I spending my instructions on
generating new inputs, tracking

86
00:07:08,699 --> 00:07:14,139
constraints, finding bugs, running
sanitization or executing the program? And

87
00:07:14,139 --> 00:07:17,790
you need to find a balance between all of
them, as it is a zero sum game. You have a

88
00:07:17,790 --> 00:07:20,870
fixed amount of resources and you're
trying to make the best with these

89
00:07:20,870 --> 00:07:26,890
resources. So any overhead is slowing you
down. And again, this becomes an

90
00:07:26,890 --> 00:07:30,819
optimization problem. How can you most
effectively use the resources that you

91
00:07:30,819 --> 00:07:37,580
have available? As we are fuzzing with
source code, it's quite easy to actually

92
00:07:37,580 --> 00:07:41,770
leverage existing mechanisms, and we add
all that instrumentation at compile time.

93
00:07:41,770 --> 00:07:45,630
We take source code, we pipe it through
the compiler and modern compiler

94
00:07:45,630 --> 00:07:51,169
platforms, allow you to instrument and add
little code snippets during the

95
00:07:51,169 --> 00:07:55,419
compilation process that then carry out
all these tasks that are useful for

96
00:07:55,419 --> 00:08:00,270
fuzzing. For example, modern compilers can
add short snippets of code for coverage

97
00:08:00,270 --> 00:08:03,990
tracking that will record which parts of
the code that you have executed, or for

98
00:08:03,990 --> 00:08:08,770
sanitization which record and check every
single memory access if it is safe or not.

99
00:08:08,770 --> 00:08:12,360
And then when you're running the
instrumented binary, everything is fine

100
00:08:12,360 --> 00:08:17,380
and you can detect the policy violations
as you go along. Now if you would have

101
00:08:17,380 --> 00:08:21,330
source code for everything, this would be
amazing. But it's often not the case,

102
00:08:21,330 --> 00:08:28,129
right? We may be able on Linux to cover a
large part of the protocol stack by

103
00:08:28,129 --> 00:08:33,940
focusing only on source-code-based
approaches. But there may be applications

104
00:08:33,940 --> 00:08:39,300
where no source code is available. If we
move to Android or other mobile systems,

105
00:08:39,300 --> 00:08:43,199
there's many drivers that are not
available as open source or just available

106
00:08:43,199 --> 00:08:48,630
as binary blobs, or the full software
stack may be closed-source and we only get

107
00:08:48,630 --> 00:08:52,329
the binaries. And we still want to find
vulnerabilities in these complex software

108
00:08:52,329 --> 00:08:59,530
stacks that span hundreds of millions of
lines of code in a very efficient way. The

109
00:08:59,530 --> 00:09:04,620
only solution to cover this part of
massive code base is to actually rewrite

110
00:09:04,620 --> 00:09:08,990
and focus on binaries. A very simple
approach could be black box fuzzing, but

111
00:09:08,990 --> 00:09:11,620
this is—this doesn't really get you
anywhere because you don't get any

112
00:09:11,620 --> 00:09:16,100
feedback; you don't get any information if
you're triggering bugs. So one simple

113
00:09:16,100 --> 00:09:20,290
approach, and this is the approach that is
most dominantly used today, is to rewrite

114
00:09:20,290 --> 00:09:26,040
the program or the binary dynamically. So
you're taking the binary and during

115
00:09:26,040 --> 00:09:32,010
execution you use some form of dynamic
binary instrumentation based on Pin, angr,

116
00:09:32,010 --> 00:09:37,140
or some other binary rewriting tool and
translate the targeted runtime, adding

117
00:09:37,140 --> 00:09:43,330
this binary instrumentation on top of it
as you're executing it. It's simple, it's

118
00:09:43,330 --> 00:09:46,930
straightforward, but it comes at a
terrible performance cost of ten to a

119
00:09:46,930 --> 00:09:51,600
hundred x slow down, which is not really
effective. And you're spending all your

120
00:09:51,600 --> 00:09:57,600
cores and your cycles on just executing
the binary instrumentation. So we don't

121
00:09:57,600 --> 00:10:01,790
really want to do this and we want to have
something that's more effective than that.

122
00:10:01,790 --> 00:10:07,360
So what we are focusing on is to do static
rewriting. It involves a much more complex

123
00:10:07,360 --> 00:10:12,380
analysis as we are rewriting the binary
before it is being executed, and we have

124
00:10:12,380 --> 00:10:17,880
to recover all of the control flow, all of
the different mechanisms, but it results

125
00:10:17,880 --> 00:10:24,690
in a much better performance. And we can
get more bang for our buck. So why is

126
00:10:24,690 --> 00:10:30,830
static rewriting so challenging? Well,
first, simply adding code will break the

127
00:10:30,830 --> 00:10:35,320
target. So if you are disassembling this
piece of code here, which is a simple loop

128
00:10:35,320 --> 00:10:40,620
that loads data, decrements the registers,
and then jumps if you're not at the end of

129
00:10:40,620 --> 00:10:46,470
the array and keeps iterating through this
array. Now, as you look at the jump-not-

130
00:10:46,470 --> 00:10:52,100
zero instruction, the last instruction of
the snippet, it is a relative offset. So

131
00:10:52,100 --> 00:10:57,990
it jumps backward seven bytes. Which is
nice if you just execute the code as is.

132
00:10:57,990 --> 00:11:02,040
But as soon as you want to insert new
code, you change the offsets in the

133
00:11:02,040 --> 00:11:07,110
program, and you're modifying all these
different offsets. And simply adding new

134
00:11:07,110 --> 00:11:12,769
code somewhere in between will break the
target. So a core feature that we need to

135
00:11:12,769 --> 00:11:18,170
enforce, or core property that we need to
enforce, is that we must find all the

136
00:11:18,170 --> 00:11:24,050
references and properly adjust them, both
relative offsets and absolute offsets as

137
00:11:24,050 --> 00:11:29,800
well. Getting a single one wrong will
break everything. What makes this problem

138
00:11:29,800 --> 00:11:34,520
really, really hard is that if you're
looking at the binary, a byte is a byte,

139
00:11:34,520 --> 00:11:38,320
right? There's no way for us to
distinguish between scalars and

140
00:11:38,320 --> 00:11:43,649
references, and in fact they are
indistinguishable. Getting a single

141
00:11:43,649 --> 00:11:50,400
reference wrong breaks the target and
would introduce arbitrary crashes. So we

142
00:11:50,400 --> 00:11:54,460
have to come up with ways that allow us to
distinguish between the two. So for

143
00:11:54,460 --> 00:11:59,899
example, if you have this code here, it
takes a value and stores it somewhere on

144
00:11:59,899 --> 00:12:07,060
the stack. This could come from two
different kind of high-level constructs.

145
00:12:07,060 --> 00:12:12,170
On one hand, it could be taking the
address of a function and storing this

146
00:12:12,170 --> 00:12:16,540
function address somewhere and in a stack
variable. Or it could be just storing a

147
00:12:16,540 --> 00:12:21,579
scalar in a stack variable. And these two
are indistinguishable, and rewriting them,

148
00:12:21,579 --> 00:12:25,220
as soon as we add new code, the offsets
will change. If it is a function, we would

149
00:12:25,220 --> 00:12:31,800
have to modify the value; if it is a
scalar, we have to keep the same value. So

150
00:12:31,800 --> 00:12:35,510
how can we come up with a way that allows
us to distinguish between the two and

151
00:12:35,510 --> 00:12:44,610
rewrite binaries by recovering this
missing information? So let us take—let me

152
00:12:44,610 --> 00:12:48,120
take you or let us take you on a journey
towards instrumenting binaries in the

153
00:12:48,120 --> 00:12:53,070
kernel. This is what we aim for. We'll
start with the simple case of

154
00:12:53,070 --> 00:12:57,410
instrumenting binaries in user land, talk
about different kinds of coverage guided

155
00:12:57,410 --> 00:13:01,750
fuzzing and what kind of instrumentation
we can add, what kind of sanitization we

156
00:13:01,750 --> 00:13:06,390
can add, and then focusing on taking it
all together and applying it to kernel

157
00:13:06,390 --> 00:13:11,480
binaries to see what what will fall out of
it. Let's start with instrumenting

158
00:13:11,480 --> 00:13:17,019
binaries first. I will now talk a little
bit about RetroWrite, our mechanism and

159
00:13:17,019 --> 00:13:24,560
our tool that enables static binary
instrumentation by symbolizing existing

160
00:13:24,560 --> 00:13:30,800
binaries. So we recover the information
and we translate relative offsets and

161
00:13:30,800 --> 00:13:39,710
absolute offsets into actual labels that
are added to the assembly file. The

162
00:13:39,710 --> 00:13:42,760
instrumentation can then work on the
recovered assembly file, which can then be

163
00:13:42,760 --> 00:13:48,110
reassembled into a binary that can then be
executed for fuzzing. We implement

164
00:13:48,110 --> 00:13:52,459
coverage tracking and binary address
sanitizer on top of this, leveraging

165
00:13:52,459 --> 00:13:57,970
abstraction as we go forward. The key to
enabling this kind of binary rewriting is

166
00:13:57,970 --> 00:14:02,170
position-independent code. And position-
independent code has become the de-facto

167
00:14:02,170 --> 00:14:07,420
standard for any code that is being
executed on a modern system. And it

168
00:14:07,420 --> 00:14:12,019
effectively says that it is code that can
be loaded at any arbitrary address in your

169
00:14:12,019 --> 00:14:15,600
address space as you are executing
binaries. It is essential and a

170
00:14:15,600 --> 00:14:19,010
requirement if you want to have address
space layout randomization or if you want

171
00:14:19,010 --> 00:14:22,269
to use shared libraries, which de facto
you want to use in all these different

172
00:14:22,269 --> 00:14:26,090
systems. So since a couple of years, all
the code that you're executing on your

173
00:14:26,090 --> 00:14:33,079
phones, on your desktops, on your laptops
is position-independent code. And the idea

174
00:14:33,079 --> 00:14:36,680
between the position-independent code is
that you can load it anywhere in your

175
00:14:36,680 --> 00:14:41,040
address space and you can therefore not
use any hard-coded static addresses and

176
00:14:41,040 --> 00:14:44,420
you have to inform the system of
relocations or pick relative

177
00:14:44,420 --> 00:14:52,920
addresses—to—on how the system can
relocate these different mechanisms. On

178
00:14:52,920 --> 00:14:58,540
x86_64, position-independent code
leverages addressing that is relative to

179
00:14:58,540 --> 00:15:03,440
the instruction pointer. So for example,
it uses the current instruction pointer

180
00:15:03,440 --> 00:15:07,519
and then a relative offset to that
instruction pointer to reference global

181
00:15:07,519 --> 00:15:12,030
variables, other functions and so on. And
this is a very easy way for us to

182
00:15:12,030 --> 00:15:17,710
distinguish references from constants,
especially in PIE binaries. If it is RIP-

183
00:15:17,710 --> 00:15:21,360
relative, it is a reference; everything
else is a constant. And we can build our

184
00:15:21,360 --> 00:15:25,690
translation algorithm and our translation
mechanism based on this fundamental

185
00:15:25,690 --> 00:15:30,130
finding to remove any form of heuristic
that is needed by focusing especially on

186
00:15:30,130 --> 00:15:35,030
position-independent code. So we're
supporting position-independent code; we

187
00:15:35,030 --> 00:15:38,920
are—we don't support non-position-
independent code, but we give you the

188
00:15:38,920 --> 00:15:43,200
guarantee that we can rewrite all the
different code that is out there. So

189
00:15:43,200 --> 00:15:48,449
symbolization works as follows: If you
have the little bit of code on the lower

190
00:15:48,449 --> 00:15:54,030
right, symbolization replaces first all
the references with assembler labels. So

191
00:15:54,030 --> 00:15:57,700
look at the call instruction and the jump-
not-zero instruction; the call instruction

192
00:15:57,700 --> 00:16:02,399
references an absolute address and the
jump-not-zero instruction jumps backward

193
00:16:02,399 --> 00:16:08,259
relative 15 bytes. So by focusing on these
relative jumps and calls, we can replace

194
00:16:08,259 --> 00:16:12,020
them with actual labels and rewrite the
binary as follows: so we're calling a

195
00:16:12,020 --> 00:16:15,839
function, replacing it with the actual
label, and for the jump-not-zero we are

196
00:16:15,839 --> 00:16:21,020
inserting an actual label in the assembly
code and adding a backward reference. For

197
00:16:21,020 --> 00:16:26,089
PC-relative addresses, for example the
data load, we can then replace it with the

198
00:16:26,089 --> 00:16:30,329
name of the actual data that we have
recovered, and we can then add all the

199
00:16:30,329 --> 00:16:35,630
different relocations and use that as
auxiliary information on top of it. After

200
00:16:35,630 --> 00:16:43,480
these three steps, we can insert any new
code in between, and can therefore add

201
00:16:43,480 --> 00:16:47,420
different forms of instrumentations or run
some more higher-level analysis on top of

202
00:16:47,420 --> 00:16:53,940
it, and then reassemble the file for
fuzzing or coverage-guided tracking,

203
00:16:53,940 --> 00:16:59,100
address sanitization or whatever else you
want to do. I will now hand over to

204
00:16:59,100 --> 00:17:04,490
Matteo, who will cover coverage-guided
fuzzing and sanitization and then

205
00:17:04,490 --> 00:17:07,260
instrumenting the binaries in the kernel.
Go ahead.

206
00:17:07,260 --> 00:17:11,300
Nspace (Matteo Rizzo): So, now that we
have this really nice framework to rewrite

207
00:17:11,300 --> 00:17:16,500
binaries, one of the things that we want
to add to actually get the fuzzing is this

208
00:17:16,500 --> 00:17:22,960
coverage-tracking instrumentation. So
coverage-guided fuzzing is a way, a

209
00:17:22,960 --> 00:17:27,549
method, for—to let the fuzzer discover
interesting inputs, an interesting path to

210
00:17:27,549 --> 00:17:35,520
the target by itself. So the basic idea is
that the fuzzer will track coverage—the

211
00:17:35,520 --> 00:17:39,190
parts of the programs that are covered by
different inputs by inserting some kind of

212
00:17:39,190 --> 00:17:43,419
instrumentation. So, for example, here we
have this target program that checks if

213
00:17:43,419 --> 00:17:48,651
the input contains the string "PNG" at the
beginning, and if it does, then it does

214
00:17:48,651 --> 00:17:53,559
something interesting, otherwise it just
bails out and fails. So if we track the

215
00:17:53,559 --> 00:17:58,240
part of the programs that each input
executes, the fuzzer can figure out that

216
00:17:58,240 --> 00:18:03,100
an input that contains "P" will have
discovered a different path through the

217
00:18:03,100 --> 00:18:08,080
program than input that doesn't contain
it. And then so on it can, one byte at a

218
00:18:08,080 --> 00:18:13,360
time, discover that this program expects
this magic sequence "PNG" at the start of

219
00:18:13,360 --> 00:18:19,280
the input. So the way that the fuzzer does
this is that every time a new input

220
00:18:19,280 --> 00:18:23,730
discovers a new path though the target, it
is considered interesting and added to a

221
00:18:23,730 --> 00:18:28,890
corpus of interesting inputs. And every
time the fuzzer needs to generate a new

222
00:18:28,890 --> 00:18:35,610
input, it will select something from the
corpus, mutate it randomly, and then use

223
00:18:35,610 --> 00:18:39,830
it as the new input. So this is like
a—this is, like, conceptually pretty

224
00:18:39,830 --> 00:18:43,150
simple, but in practice it works really
well and it really lets the fuzzer

225
00:18:43,150 --> 00:18:47,740
discover the format that the target
expects in an unsupervised way. So as an

226
00:18:47,740 --> 00:18:53,010
example, this is an experiment that was
run by the author of AFL—AFL is the fuzzer

227
00:18:53,010 --> 00:18:58,049
that sort of popularized this
technique—where he was fuzzing a JPEG-

228
00:18:58,049 --> 00:19:02,160
parsing library, starting from a corpus
that only contained the string "hello". So

229
00:19:02,160 --> 00:19:07,650
now clearly "hello" is not a valid JPEG
image and so—but still, like, the fuzzer

230
00:19:07,650 --> 00:19:12,070
was still able to find—to discover the
correct format. So after a while it

231
00:19:12,070 --> 00:19:17,580
started generating some grayscale images,
on the top left, and as it generated more

232
00:19:17,580 --> 00:19:20,720
and more inputs, it started generating
more interesting images, such as some

233
00:19:20,720 --> 00:19:25,120
grayscale gradients, and later on even
some color images. So as you can see, this

234
00:19:25,120 --> 00:19:30,630
really works, and it allows us to fuzz a
program without really teaching the fuzzer

235
00:19:30,630 --> 00:19:34,600
how the input should look like. So that's
it for coverage-guided fuzzing. Now we'll

236
00:19:34,600 --> 00:19:38,190
talk a bit about sanitizations. As a
reminder, the core idea behind

237
00:19:38,190 --> 00:19:42,330
sanitization is that just looking for
crashes is likely to miss some of the

238
00:19:42,330 --> 00:19:45,919
bugs. So, for example, if you have this
out-of-bounds one-byte read, that will

239
00:19:45,919 --> 00:19:49,590
probably not crash the target, but you
would still like to catch it because it

240
00:19:49,590 --> 00:19:53,080
could be used for an info leak, for
example. So one of the most popular

241
00:19:53,080 --> 00:19:59,030
sanitizers is Address Sanitizer. So
Address Sanitizer will instrument all the

242
00:19:59,030 --> 00:20:04,630
memory accesses in your program and check
for memory corruption, which—so, memory

243
00:20:04,630 --> 00:20:08,809
corruption is a pretty dangerous class of
bugs that unfortunately still plagues C

244
00:20:08,809 --> 00:20:16,770
and C++ programs and unsafe languages in
general. And ASan tries to catch it by

245
00:20:16,770 --> 00:20:21,220
instrumenting the target. It is very
popular; it has been used to find

246
00:20:21,220 --> 00:20:26,900
thousands of bugs in complex software like
Chrome and Linux, and even though it has,

247
00:20:26,900 --> 00:20:31,500
like, a bit of a slowdown—like about 2x—it
is still really popular because it lets

248
00:20:31,500 --> 00:20:37,120
you find many, many more bugs. So how does
it work? The basic idea is that ASan will

249
00:20:37,120 --> 00:20:41,790
insert some special regions of memory
called 'red zones' around every object in

250
00:20:41,790 --> 00:20:47,270
memory. So we have a small example here
where we declare a 4-byte array on the

251
00:20:47,270 --> 00:20:53,700
stack. So ASan will allocate the array
"buf" and then add a red zone before it

252
00:20:53,700 --> 00:20:59,060
and a red zone after it. Whenever the
program accesses the red zones, it is

253
00:20:59,060 --> 00:21:02,660
terminated with a security violation. So
the instrumentation just prints a bug

254
00:21:02,660 --> 00:21:07,419
report and then crashes the target. This
is very useful for detecting, for example,

255
00:21:07,419 --> 00:21:11,400
buffer overflows or underflows and many
other kinds of bugs such as use-after-free

256
00:21:11,400 --> 00:21:16,230
and so on. So, as an example here, we are
trying to copy 5 bytes into a 4-byte

257
00:21:16,230 --> 00:21:22,580
buffer, and ASan will check each of the
accesses one by one. And when it sees that

258
00:21:22,580 --> 00:21:26,810
the last byte writes to a red zone, it
detects the violation and crashes the

259
00:21:26,810 --> 00:21:32,370
program. So this is good for us because
this bug might have not been found by

260
00:21:32,370 --> 00:21:36,120
simply looking for crashes, but it's
definitely found if we use ASan. So this

261
00:21:36,120 --> 00:21:40,750
is something we want for fuzzing. So now
that we've covered—briefly covered ASan we

262
00:21:40,750 --> 00:21:45,970
can talk about instrumenting binaries in
the kernel. So Mathias left us with

263
00:21:45,970 --> 00:21:52,580
RetroWrite, and with RetroWrite we can add
both coverage tracking and ASan to

264
00:21:52,580 --> 00:21:57,410
binaries. So the simple—it's a really
simple idea: now that we can rewrite this

265
00:21:57,410 --> 00:22:02,760
binary and add instructions wherever we
want, we can implement both coverage

266
00:22:02,760 --> 00:22:07,390
tracking and ASan. In order to implement
coverage tracking, we simply have to

267
00:22:07,390 --> 00:22:11,710
identify the start of every basic block
and add a little piece of instrumentation

268
00:22:11,710 --> 00:22:15,789
at the start of the basic block that tells
the fuzzer 'hey, we've reached this part

269
00:22:15,789 --> 00:22:19,400
of the program'—'hey, we've reached this
other part of the program'. Then the

270
00:22:19,400 --> 00:22:25,039
fuzzer can figure out whether that's a new
part or not. ASan is also, like, you know,

271
00:22:25,039 --> 00:22:29,240
it's also somewhat—it can also be
implemented in this way by finding all

272
00:22:29,240 --> 00:22:33,929
memory accesses, and then linking with
libASan. libASan is a sort of runtime for

273
00:22:33,929 --> 00:22:38,820
ASan that takes care of inserting the red
zones and instrument—and adding, you know,

274
00:22:38,820 --> 00:22:43,340
like, keeping around all the metadata that
ASan needs to know where the red zones

275
00:22:43,340 --> 00:22:48,419
are, and detecting whether a memory access
is invalid. So, how can we apply all of

276
00:22:48,419 --> 00:22:52,309
this in the kernel? Well, first of all,
fuzzing the kernel is not as easy as

277
00:22:52,309 --> 00:22:57,920
fuzzing some userspace program. There's
some issues here. So first of all, there's

278
00:22:57,920 --> 00:23:01,950
crash handling. So whenever you're fuzzing
a userspace program, you expect crashes,

279
00:23:01,950 --> 00:23:06,289
well, because that's what we're after. And
if a userspace program crashes, then the

280
00:23:06,289 --> 00:23:11,410
US simply terminates the crash gracefully.
And so the fuzzer can detect this, and

281
00:23:11,410 --> 00:23:16,270
save the input as a crashing input, and so
on. And this is all fine. But when you're

282
00:23:16,270 --> 00:23:19,470
fuzzing the kernel, so—if you were fuzzing
the kernel of the machine that you were

283
00:23:19,470 --> 00:23:23,040
using for fuzzing, after a while, the
machine would just go down. Because, after

284
00:23:23,040 --> 00:23:27,180
all, the kernel runs the machine, and if
it starts misbehaving, then all of it can

285
00:23:27,180 --> 00:23:31,720
go wrong. And more importantly, you can
lose your crashes, because the if the

286
00:23:31,720 --> 00:23:35,450
machine crashes, then the state of the
fuzzer is lost and you have no idea what

287
00:23:35,450 --> 00:23:39,590
your crashing input was. So what most
kernel fuzzers have to do is that they

288
00:23:39,590 --> 00:23:43,419
resort to some kind of VM to keep the
system stable. So they fuzz the kernel in

289
00:23:43,419 --> 00:23:48,500
a VM and then run the fuzzing agent
outside the VM. On top of that is tooling.

290
00:23:48,500 --> 00:23:52,710
So, if you want to fuzz a user space
program, you can just download AFL or use

291
00:23:52,710 --> 00:23:57,540
libfuzzer; there's plenty of tutorials
online, it's really easy to set up and

292
00:23:57,540 --> 00:24:01,200
just, like—compile your program, you start
fuzzing and you're good to go. If you want

293
00:24:01,200 --> 00:24:05,240
to fuzz the kernel, it's already much more
complicated. So, for example, if you want

294
00:24:05,240 --> 00:24:09,390
to fuzz Linux with, say, syzkaller, which
is a popular kernel fuzzer, you have to

295
00:24:09,390 --> 00:24:14,030
compile the kernel, you have to use a
special config that supports syzkaller,

296
00:24:14,030 --> 00:24:20,100
you have way less guides available than
for userspace fuzzing, and in general it's

297
00:24:20,100 --> 00:24:24,940
just much more complex and less intuitive
than just fuzzing userspace. And lastly,

298
00:24:24,940 --> 00:24:29,330
we have the issue of determinism. So in
general, if you have a single threaded

299
00:24:29,330 --> 00:24:32,770
userspace program, unless it uses some
kind of random number generator, it is

300
00:24:32,770 --> 00:24:38,210
more or less deterministic. There's
nothing that affects the execution of the

301
00:24:38,210 --> 00:24:42,299
program. But—and this is really nice if
you want to try to reproduce a test case,

302
00:24:42,299 --> 00:24:46,340
because if you have a non-deterministic
test case, then it's really hard to know

303
00:24:46,340 --> 00:24:50,680
whether this is really a crash or if it's
just something that you should ignore, and

304
00:24:50,680 --> 00:24:56,280
in the kernel this is even harder, because
you don't only have concurrency, like

305
00:24:56,280 --> 00:25:01,200
multi-processing, you also have interrupts.
So interrupts can happen at any time, and

306
00:25:01,200 --> 00:25:05,850
if one time you got an interrupt while
executing your test case and the second

307
00:25:05,850 --> 00:25:09,947
time you didn't, then maybe it only
crashes one time - you don't really know,

308
00:25:09,947 --> 00:25:15,910
it's not pretty. And so again, we
have several approaches to fuzzing

309
00:25:15,910 --> 00:25:20,550
binaries in the kernel. First one is to do
black box fuzzing. We don't really

310
00:25:20,550 --> 00:25:23,677
like this because it doesn't find much,
especially in something complex

311
00:25:23,677 --> 00:25:27,380
like a kernel. Approach 1 is to
use dynamic translation,

312
00:25:27,380 --> 00:25:32,620
so, use something
like QEMU or—you name it. This works, and

313
00:25:32,620 --> 00:25:35,121
people have used it successfully; the
problem is that it is really, really,

314
00:25:35,121 --> 00:25:41,500
really slow. Like, we're talking about
10x-plus overhead. And as we said before,

315
00:25:41,500 --> 00:25:45,570
the more iterations, the more test cases
you can execute in the same amount of

316
00:25:45,570 --> 00:25:50,700
time, the better, because you find more
bugs. And on top of that, there's no

317
00:25:50,700 --> 00:25:57,520
currently available sanitizer for
kernel binaries that works—is based on

318
00:25:57,520 --> 00:26:01,309
this approach. So in userspace you have
something like valgrind; in the kernel,

319
00:26:01,309 --> 00:26:05,071
you don't have anything, at least that we
know of. There is another approach, which

320
00:26:05,071 --> 00:26:09,951
is to use Intel Processor Trace. This has
been, like—there's been some research

321
00:26:09,951 --> 00:26:14,240
papers on this recently, and this is nice
because it allows you to collect coverage

322
00:26:14,240 --> 00:26:18,040
at nearly zero overhead. It's, like,
really fast, but the problem is that it

323
00:26:18,040 --> 00:26:23,020
requires hardware support, so it requires
a fairly new x86 CPU, and if you want to

324
00:26:23,020 --> 00:26:27,159
fuzz something on ARM, say, like, your
Android driver, or if you want to use an

325
00:26:27,159 --> 00:26:32,120
older CPU, then you're out of luck. And
what's worse, you cannot really use it for

326
00:26:32,120 --> 00:26:36,490
sanitization, or at least not the kind of
sanitization that ASan does, because it

327
00:26:36,490 --> 00:26:41,770
just traces the execution; it doesn't
allow you to do checks on memory accesses.

328
00:26:41,770 --> 00:26:47,350
So Approach 3, which is what we will use,
is static rewriting. So, we had this very

329
00:26:47,350 --> 00:26:50,750
nice framework for rewriting userspace
binaries, and then we asked ourselves, can

330
00:26:50,750 --> 00:26:56,659
we make this work in the kernel? So we
took the system, the original RetroWrite,

331
00:26:56,659 --> 00:27:02,650
we modified it, we implemented support for
Linux modules, and... it works! So we have

332
00:27:02,650 --> 00:27:08,110
implemented it—we have used it to fuzz
some kernel modules, and it really shows

333
00:27:08,110 --> 00:27:11,640
that this approach doesn't only work for
userspace; it can also be applied to the

334
00:27:11,640 --> 00:27:18,510
kernel. So as for some implementation, the
nice thing about kernel modules is that

335
00:27:18,510 --> 00:27:22,170
they're always position independent. So
you cannot have position—like, fixed-

336
00:27:22,170 --> 00:27:26,370
position kernel modules because Linux just
doesn't allow it. So we sort of get that

337
00:27:26,370 --> 00:27:32,220
for free, which is nice. And Linux modules
are also a special class of ELF files,

338
00:27:32,220 --> 00:27:35,890
which means that the format is—even though
it's not the same as userspace binaries,

339
00:27:35,890 --> 00:27:40,310
it's still somewhat similar, so we didn't
have to change the symbolizer that much,

340
00:27:40,310 --> 00:27:46,539
which is also nice. And we implemented
symbolization with this, and we used it to

341
00:27:46,539 --> 00:27:54,490
implement both code coverage and binary
ASan for kernel binary modules. So for

342
00:27:54,490 --> 00:27:59,039
coverage: The idea behind the whole
RetroWrite project was that we wanted to

343
00:27:59,039 --> 00:28:03,500
integrate with existing tools. So existing
fuzzing tools. We didn't want to force our

344
00:28:03,500 --> 00:28:08,770
users to write their own fuzzer that is
compatible with RetroWrite. So for—in

345
00:28:08,770 --> 00:28:13,470
userspace we had AFL-style coverage
tracking, and binary ASan which is

346
00:28:13,470 --> 00:28:16,490
compatible with source-based ASan, and we
wanted to follow the same principle in the

347
00:28:16,490 --> 00:28:21,900
kernel. So it turns out that Linux has
this built-in coverage-tracking framework

348
00:28:21,900 --> 00:28:26,529
called kCov that is used by several
popular kernel fuzzers like syzkaller, and

349
00:28:26,529 --> 00:28:31,049
we wanted to use it ourselves. So we
designed our coverage instrumentation so

350
00:28:31,049 --> 00:28:36,590
that it integrates with kCov. The downside
is that you need to compile the kernel

351
00:28:36,590 --> 00:28:40,690
with kCov, but then again, Linux is open
source, so you can sort of always do that;

352
00:28:40,690 --> 00:28:44,279
the kernel usually—it's usually not the
kernel, it is a binary blob, but it's

353
00:28:44,279 --> 00:28:48,929
usually only the modules. So that's just
still fine. And the way you do this is—the

354
00:28:48,929 --> 00:28:53,370
way you implement kCov for binary modules
is that you just have to find the start of

355
00:28:53,370 --> 00:28:58,539
every basic block, and add a call to some
function that then stores the collected

356
00:28:58,539 --> 00:29:02,530
coverage. So here's an example: we have a
short snippet of code with three basic

357
00:29:02,530 --> 00:29:07,620
blocks, and all we have to do is add a
call to "trace_pc" to the start of the

358
00:29:07,620 --> 00:29:11,940
basic block. "trace_pc" is a function that
is part of the main kernel image that then

359
00:29:11,940 --> 00:29:17,230
collects this coverage and makes it
available to a userspace fuzzing agent. So

360
00:29:17,230 --> 00:29:21,210
this is all really easy and it works. And
let's now see how we implemented binary

361
00:29:21,210 --> 00:29:25,600
ASan. So as I mentioned before, when we
instrument the program with binary ASan in

362
00:29:25,600 --> 00:29:29,690
userspace we link with libASan, which
takes care of setting up the metadata,

363
00:29:29,690 --> 00:29:33,880
takes care of putting the red zones around
our allocations, and so on. So we had to

364
00:29:33,880 --> 00:29:37,330
do something similar in the kernel; of
course, you cannot link with libASan in

365
00:29:37,330 --> 00:29:42,630
the kernel, because that doesn't work, but
what we can do instead is, again, compile

366
00:29:42,630 --> 00:29:47,240
the kernel with kASan support. So this
instruments the allocator, kmalloc, to add

367
00:29:47,240 --> 00:29:52,110
the red zones; it allocates space for the
metadata, it keeps this metadata around,

368
00:29:52,110 --> 00:29:56,279
does this all for us, which is really
nice. And again, the big advantage of

369
00:29:56,279 --> 00:30:00,580
using this approach is that we can
integrate seamlessly with a kASan-

370
00:30:00,580 --> 00:30:05,800
instrumented kernel and with fuzzers that
rely on kASan such as syzkaller. So we see

371
00:30:05,800 --> 00:30:11,500
this as more of a plus than, like, a
limitation. And how do you implement ASan?

372
00:30:11,500 --> 00:30:16,561
Well, you have to find every memory access
and instrument it to check the—to check

373
00:30:16,561 --> 00:30:22,370
whether this is accessing a red zone. And
if it does then you just call this bug

374
00:30:22,370 --> 00:30:26,010
report function that produces a stack
trace, a bug report, and crashes the

375
00:30:26,010 --> 00:30:29,649
kernel, so that the fuzzer can detect it.
Again, this is compatible with source-

376
00:30:29,649 --> 00:30:36,990
based kASan, so we're happy. We can simply
load the rewritten module with added

377
00:30:36,990 --> 00:30:40,220
instrumentation into a kernel, as long as
you have compiled the kernel with the

378
00:30:40,220 --> 00:30:44,340
right flags, and we can use a standard
kernel fuzzer. Here for the—our

379
00:30:44,340 --> 00:30:49,910
evaluation, we used syzkaller, a popular
kernel fuzzer by some folks at Google, and

380
00:30:49,910 --> 00:30:55,460
it worked really well. So we've finally
reached the end of our journey, and now we

381
00:30:55,460 --> 00:31:00,470
wanted to present some experiments we did
to see if this really works. So for

382
00:31:00,470 --> 00:31:05,289
userspace, we wanted to compare the
performance of our binary ASan with

383
00:31:05,289 --> 00:31:10,360
source-based ASan and with existing
solutions that also work on binaries. So

384
00:31:10,360 --> 00:31:15,860
for userspace, you can use valgrind
memcheck. It's a memory sanitizer that is

385
00:31:15,860 --> 00:31:20,850
based on binary translation and dynamic
binary translation and works on binaries.

386
00:31:20,850 --> 00:31:25,460
We compared it with source ASan and
RetroWrite ASan on the SPEC CPU benchmark

387
00:31:25,460 --> 00:31:31,100
and saw how fast it was. And for the
kernel we decided to fuzz some file

388
00:31:31,100 --> 00:31:37,519
systems and some drivers with syzkaller
using both source-based KASan and kCov and

389
00:31:37,519 --> 00:31:44,671
kRetroWrite-based KASan and kCov. So these
are our results for userspace. So the red

390
00:31:44,671 --> 00:31:48,990
bar is valgrind. We can see that the
execution time of valgrind is the highest.

391
00:31:48,990 --> 00:31:55,892
It is really, really slow—like, 3, 10, 30x
overhead, way too slow for fuzzing. Then

392
00:31:55,892 --> 00:32:02,580
in green, we have our binary ASan, which
is, like, already a large improvement. In

393
00:32:02,580 --> 00:32:07,059
orange we have source-based ASan. And then
finally in blue we have the original code

394
00:32:07,059 --> 00:32:11,090
without any instrumentation whatsoever. So
we can see that source-based ASan has,

395
00:32:11,090 --> 00:32:16,659
like, 2x or 3x overhead, and binary ASan
is a bit higher, like, a bit less

396
00:32:16,659 --> 00:32:21,312
efficient, but still somewhat close. So
that's for userspace, and for the kernel,

397
00:32:21,312 --> 00:32:25,440
we—these are some preliminary results, so,
this is, like—I'm doing this work as part

398
00:32:25,440 --> 00:32:29,897
of my master's thesis, and so I'm still,
like, running the evaluation. Here we can

399
00:32:29,897 --> 00:32:33,419
see that the overhead is already, like, a
bit lower. So the reason for this is that

400
00:32:33,419 --> 00:32:39,690
SPEC is a pure CPU benchmark; it doesn't
interact with the system that much. And so

401
00:32:39,690 --> 00:32:44,416
any instrumentation that you add is going
to massively slow down, or, like,

402
00:32:44,416 --> 00:32:49,320
considerably slow down the execution. By
contrast, when you fuzz a file system with

403
00:32:49,320 --> 00:32:56,460
syzkaller, not only every test case has to
go from the high—the host to the guest and

404
00:32:56,460 --> 00:33:01,770
then do multiple syscalls and so on, but
also every system call has to go through

405
00:33:01,770 --> 00:33:05,368
several layers of abstraction before it
gets to the actual file system. And all

406
00:33:05,368 --> 00:33:09,610
these—like, all of this takes a lot of
time, and so in practice the overhead of

407
00:33:09,610 --> 00:33:15,581
our instrumentation seems to be pretty
reasonable. So, since we know that you

408
00:33:15,581 --> 00:33:32,838
like demos, we've prepared a small demo of
kRetroWrite. So. Let's see. Yep. Okay. All

409
00:33:32,838 --> 00:33:40,470
right, so we've prepared a small kernel
module. And this module is just, like,

410
00:33:40,470 --> 00:33:45,669
really simple; it contains a
vulnerability, and what it does is that it

411
00:33:45,669 --> 00:33:49,929
creates a character device. So if you're
not familiar with this, a character device

412
00:33:49,929 --> 00:33:55,130
is like a fake file that is exposed by a
kernel driver and that it can read to and

413
00:33:55,130 --> 00:34:01,630
write from. And instead of going to a
file, the data that you read—that you, in

414
00:34:01,630 --> 00:34:05,590
this case, write to the fake file—goes to
the driver and is handled by this demo

415
00:34:05,590 --> 00:34:10,481
write function. So as we can see, this
function allocates a buffer, a 16-byte

416
00:34:10,481 --> 00:34:14,850
buffer on the heap, and then copies some
data into it, and then it checks if the

417
00:34:14,850 --> 00:34:19,970
data contains the string "1337". If it
does, then it accesses the buffer out of

418
00:34:19,970 --> 00:34:23,446
bounds; you can see "alloc[16]" and the
buffer is sixteen bytes; this is an out-

419
00:34:23,446 --> 00:34:27,550
of-bounds read by one byte. And if it
doesn't then it just accesses the buffer

420
00:34:27,550 --> 00:34:33,050
in bounds, which is fine, and it's not a
vulnerability. So we can compile this

421
00:34:33,050 --> 00:34:47,450
driver. OK, um... OK, and then so we have
our module, and then we will instrument it

422
00:34:47,450 --> 00:35:01,495
using kRetroWrite. So, instrument... Yes,
please. OK. Right. So kRetroWrite did some

423
00:35:01,495 --> 00:35:07,329
processing, and it produced an
instrumented module with ASan or kASan and

424
00:35:07,329 --> 00:35:09,770
a symbolized assembly file. We can
actually have a look at the symbolized

425
00:35:09,770 --> 00:35:17,740
assembly file to see what it looks like.
Yes. Yes. OK. So, is this big enough?

426
00:35:17,740 --> 00:35:22,900
Yeah... As you can see, so—we can actually
see here the ASan instrumentation. Ah,

427
00:35:22,900 --> 00:35:29,329
shouldn't—yeah. So, we—this is the ASan
instrumentation. The original code loads

428
00:35:29,329 --> 00:35:33,290
some data from this address. And as you
can see, the ASan instrumentation first

429
00:35:33,290 --> 00:35:38,240
computes the actual address, and then does
some checking—basically, this is checking

430
00:35:38,240 --> 00:35:44,430
some metadata that ASan stores to check if
the address is in a red zone or not, and

431
00:35:44,430 --> 00:35:49,430
then if the fail check fails, then it
calls this ASan report which produces a

432
00:35:49,430 --> 00:35:54,829
stack trace and crashes the kernel. So
this is fine. We can actually even look at

433
00:35:54,829 --> 00:36:17,820
the disassembly of both modules, so...
object dump and then demo... Ah, nope. OK,

434
00:36:17,820 --> 00:36:21,830
so on the left, we have the original
module without any instrumentation; on the

435
00:36:21,830 --> 00:36:27,070
right, we have the module instrumented
with ASan. So as you can see, the original

436
00:36:27,070 --> 00:36:33,160
module has "push r13" and then has this
memory load here; on the right in the

437
00:36:33,160 --> 00:36:38,559
instrumented module, kRetroWrite inserted
the ASan instrumentation. So the original

438
00:36:38,559 --> 00:36:43,940
load is still down here, but between that,
between the first instruction and this

439
00:36:43,940 --> 00:36:47,851
instruction, we have—now have the kASan
instrumentation that does our check. So

440
00:36:47,851 --> 00:36:56,700
this is all fine. Now we can actually test
it and see what it does. So we can—we will

441
00:36:56,700 --> 00:37:02,210
boot a very simple, a very minimal Linux
system, and try to target the

442
00:37:02,210 --> 00:37:05,793
vulnerability first with the non-
instrumented module and then with the

443
00:37:05,793 --> 00:37:10,410
instrumented module. And we can—we will
see that in the—with the non-instrumented

444
00:37:10,410 --> 00:37:14,550
module, the kernel will not crash, but
with the instrumented module it will crash

445
00:37:14,550 --> 00:37:22,434
and produce a bug report. So. Let's see.
Yeah, this is a QEMU VM, I have no idea

446
00:37:22,434 --> 00:37:27,481
why it's taking so long to boot. I'll
blame the the demo gods not being kind to

447
00:37:27,481 --> 00:37:39,730
us. Yeah, I guess we just have to wait.
OK. So. All right, so we loaded the

448
00:37:39,730 --> 00:37:47,334
module. We will see that it has created a
fake file character device in /dev/demo.

449
00:37:47,334 --> 00:37:59,020
Yep. We can write this file. Yep. So this
will—this accesses the array in bounds,

450
00:37:59,020 --> 00:38:04,410
and so this is fine. Then what we can also
do is write "1337" to it so it will access

451
00:38:04,410 --> 00:38:08,968
the array out of bounds. So this is the
non instrumented module, so this will not

452
00:38:08,968 --> 00:38:14,050
crash. It will just print some garbage
value. Okay, that's it. Now we can load

453
00:38:14,050 --> 00:38:25,890
the instrumented module instead... and do
the same experiment again. All right. We

454
00:38:25,890 --> 00:38:31,640
can see that /dev/demo is still here. So
the module still works. Let's try to write

455
00:38:31,640 --> 00:38:38,540
"1234" into it. This, again, doesn't
crash. But when we try to write "1337",

456
00:38:38,540 --> 00:38:47,940
this will produce a bug report.
<i>applause</i>

457
00:38:47,940 --> 00:38:51,129
So this has quite a lot of information. We

458
00:38:51,129 --> 00:38:55,700
can see, like, the—where the memory was
allocated, there's a stack trace for that;

459
00:38:55,700 --> 00:39:02,150
it wasn't freed, so there's no stack trace
for the free. And we see that the cache

460
00:39:02,150 --> 00:39:06,760
size of the memory, like, it was a 16-byte
allocation. We can see the shape of the

461
00:39:06,760 --> 00:39:10,900
memory. We see that these two zeros means
that there's two 8-byte chunks of valid

462
00:39:10,900 --> 00:39:15,550
memory. And then these "fc fc fc" is
the—are the red zones that I was talking

463
00:39:15,550 --> 00:39:19,980
about before. All right, so that's it for
the demo. We will switch back to our

464
00:39:19,980 --> 00:39:24,630
presentation now. So... hope you enjoyed
it.

465
00:39:24,630 --> 00:39:30,530
gannimo: Cool. So after applying this to a
demo module, we also wanted to see what

466
00:39:30,530 --> 00:39:35,365
happens if we apply this to a real file
system. After a couple of hours we

467
00:39:35,365 --> 00:39:41,390
were—when we came back and checked on the
results, we saw a couple of issues popping

468
00:39:41,390 --> 00:39:48,720
up, including a nice set of use-after-free
reads, a set of use-after-free writes, and

469
00:39:48,720 --> 00:39:56,220
we checked the bug reports and we saw a
whole bunch of Linux kernel issues popping

470
00:39:56,220 --> 00:40:02,640
up one after the other in this nondescript
module that we fuzzed. We're in the

471
00:40:02,640 --> 00:40:06,930
process of reporting it. This will take
some time until it is fixed; that's why

472
00:40:06,930 --> 00:40:13,470
you see the blurry lines. But as you see,
there's still quite a bit of opportunity

473
00:40:13,470 --> 00:40:19,190
in the Linux kernel where you can apply
different forms of targeted fuzzing into

474
00:40:19,190 --> 00:40:26,349
different modules, leverage these modules
on top of a kASan instrumented kernel and

475
00:40:26,349 --> 00:40:31,720
then leveraging this as part of your
fuzzing toolchain to find interesting

476
00:40:31,720 --> 00:40:39,080
kernel 0days that... yeah. You can then
develop further, or report, or do whatever

477
00:40:39,080 --> 00:40:44,766
you want with them. Now, we've shown you
how you can take existing binary-only

478
00:40:44,766 --> 00:40:51,250
modules, think different binary-only
drivers, or even existing modules where

479
00:40:51,250 --> 00:40:55,800
you don't want to instrument a full set of
the Linux kernel, but only focus fuzzing

480
00:40:55,800 --> 00:41:02,130
and exploration on a small different—small
limited piece of code and then do security

481
00:41:02,130 --> 00:41:09,247
tests on those. We've shown you how we can
do coverage-based tracking and address

482
00:41:09,247 --> 00:41:13,500
sanitization. But this is also up to you
on what kind of other instrumentation you

483
00:41:13,500 --> 00:41:17,890
want. Like this is just a tool, a
framework that allows you to do arbitrary

484
00:41:17,890 --> 00:41:23,780
forms of instrumentation. So we've taken
you on a journey from instrumenting

485
00:41:23,780 --> 00:41:29,380
binaries over coverage-guided fuzzing and
sanitization to instrumenting modules in

486
00:41:29,380 --> 00:41:36,692
the kernel and then finding crashes in the
kernel. Let me wrap up the talk. So, this

487
00:41:36,692 --> 00:41:41,581
is one of the the fun pieces of work that
we do in the hexhive lab at EPFL. So if

488
00:41:41,581 --> 00:41:45,740
you're looking for postdoc opportunities
or if you're thinking about a PhD, come

489
00:41:45,740 --> 00:41:51,809
talk to us. We're always hiring. The tools
will be released as open source. A large

490
00:41:51,809 --> 00:41:57,319
chunk of the userspace work is already
open source. We're working on a set of

491
00:41:57,319 --> 00:42:02,350
additional demos and so on so that you can
get started faster, leveraging the

492
00:42:02,350 --> 00:42:07,810
different existing instrumentation that is
already out there. The userspace work is

493
00:42:07,810 --> 00:42:12,139
already available. The kernel work will be
available in a couple of weeks. This

494
00:42:12,139 --> 00:42:16,770
allows you to instrument real-world
binaries for fuzzing, leveraging existing

495
00:42:16,770 --> 00:42:21,200
transformations for coverage tracking to
enable fast and effective fuzzing and

496
00:42:21,200 --> 00:42:26,490
memory checking to detect the actual bugs
that exist there. The key takeaway from

497
00:42:26,490 --> 00:42:32,430
this talk is that RetroWrite and
kRetroWrite enables static binary

498
00:42:32,430 --> 00:42:38,300
rewriting at zero instrumentation cost. We
take the limitation of focusing only on

499
00:42:38,300 --> 00:42:43,240
position-independent code, which is not a
real implementation, but we get the

500
00:42:43,240 --> 00:42:47,800
advantage of being able to symbolize
without actually relying on heuristics, so

501
00:42:47,800 --> 00:42:55,380
we can even symbolize large, complex
source—large, complex applications and

502
00:42:55,380 --> 00:43:01,090
effectively rewrite those aspects and then
you can focus fuzzing on these parts.

503
00:43:01,090 --> 00:43:06,329
Another point I want to mention is that
this enables you to reuse existing tooling

504
00:43:06,329 --> 00:43:10,981
so you can take a binary blob, instrument
it, and then reuse, for example, Address

505
00:43:10,981 --> 00:43:15,966
Sanitizer or existing fuzzing tools, as it
integrates really, really nice. As I said,

506
00:43:15,966 --> 00:43:22,700
all the code is open source. Check it out.
Try it. Let us know if it breaks. We're

507
00:43:22,700 --> 00:43:27,521
happy to fix. We are committed to open
source. And let us know if there are any

508
00:43:27,521 --> 00:43:36,750
questions. Thank you.
<i>applause</i>

509
00:43:36,750 --> 00:43:42,250
Herald: So, thanks, guys, for an
interesting talk. We have some time for

510
00:43:42,250 --> 00:43:47,180
questions, so we have microphones along
the aisles. We'll start from question from

511
00:43:47,180 --> 00:43:51,079
microphone number two.
Q: Hi. Thanks for your talk and for the

512
00:43:51,079 --> 00:43:59,400
demo. I'm not sure about the use-case you
showed for the kernel RetroWrite. 'Cause

513
00:43:59,400 --> 00:44:05,579
you're usually interested in fuzzing
binary in kernelspace when you don't have

514
00:44:05,579 --> 00:44:13,980
source code for the kernel. For example,
for IoT or Android and so on. But you just

515
00:44:13,980 --> 00:44:22,260
reuse the kCov and kASan in the kernel,
and you never have the kernel in IoT or

516
00:44:22,260 --> 00:44:28,599
Android which is compiled with that. So
are you—do you have any plans to binary

517
00:44:28,599 --> 00:44:31,666
instrument the kernel itself, not the
modules?

518
00:44:31,666 --> 00:44:39,390
Nspace: So we thought about that. I think
that there's some additional problems that

519
00:44:39,390 --> 00:44:43,910
we would have to solve in order to be able
to instrument the full kernel. So other

520
00:44:43,910 --> 00:44:47,819
than the fact that it gives us
compatibility with, like, existing tools,

521
00:44:47,819 --> 00:44:51,720
the reason why we decided to go with
compiling the kernel with kASan and kCov

522
00:44:51,720 --> 00:44:56,757
is that building the, like—you would you
have to, like, think about it. You

523
00:44:56,757 --> 00:45:01,540
have to instrument the memory allocator to
add red zones, which is, like, already

524
00:45:01,540 --> 00:45:07,069
somewhat complex. You have to instrument
the exception handlers to catch, like, any

525
00:45:07,069 --> 00:45:12,240
faults that the instrumentation detects.
You would have to, like, set up some

526
00:45:12,240 --> 00:45:17,480
memory for the ASan shadow. So this is,
like—I think you should be able to do it,

527
00:45:17,480 --> 00:45:21,690
but it would require a lot of additional
work. So this is, like—this was like four

528
00:45:21,690 --> 00:45:25,510
months' thesis. So we decided to start
small and prove that it works in

529
00:45:25,510 --> 00:45:30,470
the kernel for modules, and then leave it
to future work to actually extend it to

530
00:45:30,470 --> 00:45:37,558
the full kernel. Also, like, I think for
Android—so in the case of Linux, the

531
00:45:37,558 --> 00:45:42,072
kernel is GPL, right, so if the
manufacturers ships a custom kernel, they

532
00:45:42,072 --> 00:45:44,614
have to release the source code, right?
Q: They never do.

533
00:45:44,614 --> 00:45:47,220
Nspace: They never—well, that's a
different issue. Right?

534
00:45:47,220 --> 00:45:49,009
gannimo: Right.
Q: So that's why I ask, because I don't

535
00:45:49,009 --> 00:45:51,839
see how it just can be used in the real
world.

536
00:45:51,839 --> 00:45:57,122
gannimo: Well, let me try to put this into
perspective a little bit as well. Right.

537
00:45:57,122 --> 00:46:02,030
So there's the—what we did so far is we
leveraged existing tools, like kASan or

538
00:46:02,030 --> 00:46:09,440
kCov, and integrated into these existing
tools. Now, doing heap-based allocation is

539
00:46:09,440 --> 00:46:13,572
fairly simple and replacing those with
additional red zones—that instrumentation

540
00:46:13,572 --> 00:46:20,203
you can carry out fairly well by focusing
on the different allocators. Second to

541
00:46:20,203 --> 00:46:24,972
that, simply oopsing the kernel and
printing the stack trace is also fairly

542
00:46:24,972 --> 00:46:29,250
straightforward. So it's not a lot of
additional effort. So it is—it involves

543
00:46:29,250 --> 00:46:38,471
some engineering effort to port this to
non-kASan-compiled kernels. But we think

544
00:46:38,471 --> 00:46:44,740
it is very feasible. In the interest of
time, we focused on kASan-enabled kernels,

545
00:46:44,740 --> 00:46:50,960
so that some form of ASan is already
enabled. But yeah, this is additional

546
00:46:50,960 --> 00:46:55,660
engineering effort. But there is also a
community out there that can help us with

547
00:46:55,660 --> 00:47:00,960
these kind of changes. So kRetroWrite and
RetroWrite themselves are the binary

548
00:47:00,960 --> 00:47:07,060
rewriting platform that allow you to turn
a binary into an assembly file that you

549
00:47:07,060 --> 00:47:11,619
can then instrument and run different
passes on top of it. So another pass would

550
00:47:11,619 --> 00:47:16,399
be a full ASan pass or kASan pass that
somebody could add and then contribute

551
00:47:16,399 --> 00:47:19,100
back to the community.
Q: Yeah, it would be really useful.

552
00:47:19,100 --> 00:47:20,186
Thanks.
gannimo: Cool.

553
00:47:20,186 --> 00:47:24,260
Angel: Next question from the Internet.
Q: Yes, there is a question regarding the

554
00:47:24,260 --> 00:47:30,890
slide on the SPEC CPU benchmark. The
second or third graph from the right had

555
00:47:30,890 --> 00:47:36,700
an instrumented version that was faster
than the original program. Why is that?

556
00:47:36,700 --> 00:47:42,299
gannimo: Cache effect. Thank you.
Angel: Microphone number one.

557
00:47:42,299 --> 00:47:47,032
Q: Thank you. Thank you for presentation.
I have question: how many architecture do

558
00:47:47,032 --> 00:47:51,210
you support, and if you have support more,
what then?

559
00:47:51,210 --> 00:47:56,400
gannimo: x86_64.
Q: Okay. So no plans for ARM or MIPS,

560
00:47:56,400 --> 00:47:58,130
or...?
gannimo: Oh, there are plans.

561
00:47:58,130 --> 00:48:01,390
Q: Okay.
Nspace: Right, so—

562
00:48:01,390 --> 00:48:05,980
gannimo: Right. Again, there's a finite
amount of time. We focused on the

563
00:48:05,980 --> 00:48:11,778
technology. ARM is high up on the list. If
somebody is interested in working on it

564
00:48:11,778 --> 00:48:17,670
and contributing, we're happy to hear from
it. Our list of targets is ARM first and

565
00:48:17,670 --> 00:48:22,915
then maybe something else. But I think
with x86_64 and ARM we've covered a

566
00:48:22,915 --> 00:48:33,420
majority of the interesting platforms.
Q: And second question, did you try to

567
00:48:33,420 --> 00:48:37,970
fuzz any real closed-source program?
Because as I understand from presentation,

568
00:48:37,970 --> 00:48:44,710
you fuzz, like, just file system, what we
can compile and fuzz with syzkaller like

569
00:48:44,710 --> 00:48:48,570
in the past.
Nspace: So for the evaluation, we wanted

570
00:48:48,570 --> 00:48:52,130
to be able to compare between the source-
based instrumentation and the binary-based

571
00:48:52,130 --> 00:48:57,460
instrumentation, so we focused mostly on
open-source filesystem and drivers because

572
00:48:57,460 --> 00:49:02,058
then we could instrument them with a
compiler. We haven't yet tried, but this

573
00:49:02,058 --> 00:49:05,740
is, like, also pretty high up on the list.
We wanted to try to find some closed-

574
00:49:05,740 --> 00:49:10,609
source drivers—there's lots of them, like
for GPUs or anything—and we'll give it a

575
00:49:10,609 --> 00:49:15,460
try and find some 0days, perhaps.
Q: Yes, but with syzkaller, you still have

576
00:49:15,460 --> 00:49:22,582
a problem. You have to write rules, like,
dictionaries. I mean, you have to

577
00:49:22,582 --> 00:49:24,599
understand the format, have to communicate
with the driver.

578
00:49:24,599 --> 00:49:28,550
Nspace: Yeah, right But there's, for
example, closed-source file systems that

579
00:49:28,550 --> 00:49:33,270
we are looking at.
Q: Okay. Thinking.

580
00:49:33,270 --> 00:49:38,657
Herald: Number two.
Q: Hi. Thank you for your talk. So I don't

581
00:49:38,657 --> 00:49:45,070
know if there are any kCov- or kASan-
equivalent solution to Windows, but I was

582
00:49:45,070 --> 00:49:49,933
wondering if you tried, or are you
planning to do it on Windows, the

583
00:49:49,933 --> 00:49:52,540
framework? Because I know it might be
challenging because of the driver

584
00:49:52,540 --> 00:49:56,849
signature enforcement and PatchGuard, but
I wondered if you tried or thought about

585
00:49:56,849 --> 00:49:59,290
it.
gannimo: Yes, we thought about it and we

586
00:49:59,290 --> 00:50:06,383
decided against it. Windows is incredibly
hard and we are academics. The research I

587
00:50:06,383 --> 00:50:11,800
do in my lab, or we do in my research lab,
focuses on predominantly open-source

588
00:50:11,800 --> 00:50:17,060
software and empowers open-source
software. Doing full support for Microsoft

589
00:50:17,060 --> 00:50:20,780
Windows is somewhat out of scope. If
somebody wants to port these tools, we are

590
00:50:20,780 --> 00:50:24,190
happy to hear it and work with these
people. But it's a lot of additional

591
00:50:24,190 --> 00:50:28,530
engineering effort, versus very
additional—very low additional research

592
00:50:28,530 --> 00:50:33,060
value, so we'll have to find some form of
compromise. And, like, if you would be

593
00:50:33,060 --> 00:50:38,650
willing to fund us, we would go ahead. But
it's—yeah, it's a cost question.

594
00:50:38,650 --> 00:50:42,089
Q: And you're referring both to kernel and
user space, right?

595
00:50:42,089 --> 00:50:45,089
gannimo: Yeah.
Q: Okay. Thank you.

596
00:50:45,089 --> 00:50:48,105
Herald: Number five.
Q: Hi, thanks for the talk. This seems

597
00:50:48,105 --> 00:50:52,400
most interesting if you're looking for
vulnerabilities in closed source kernel

598
00:50:52,400 --> 00:50:58,359
modules, but not giving it too much
thought, it seems it's really trivial to

599
00:50:58,359 --> 00:51:01,920
prevent this if you're writing a closed
source module.

600
00:51:01,920 --> 00:51:07,130
gannimo: Well, how would you prevent this?
Q: Well, for starters, you would just take

601
00:51:07,130 --> 00:51:11,492
a difference between the address of two
functions. That's not gonna be IP

602
00:51:11,492 --> 00:51:15,860
relative, so...
Nspace: Right. So we explicitly—like, even

603
00:51:15,860 --> 00:51:21,589
in the original RetroWrite paper—we
explicitly decided to not try to deal with

604
00:51:21,589 --> 00:51:25,777
obfuscated code, or code that is
purposefully trying to defeat this kind of

605
00:51:25,777 --> 00:51:30,510
rewriting. Because, like, the assumption
is that first of all, there are techniques

606
00:51:30,510 --> 00:51:34,099
to, like, deobfuscate code or remove
these, like, checks in some way, but this

607
00:51:34,099 --> 00:51:39,510
is, like, sort of orthogonal work. And at
the same time, I guess most drivers are

608
00:51:39,510 --> 00:51:43,980
not really compiled with the sort of
obfuscation; they're just, like, you know,

609
00:51:43,980 --> 00:51:47,657
they're compiled with a regular compiler.
But yeah, of course, this is, like, a

610
00:51:47,657 --> 00:51:50,070
limitation.
gannimo: They're likely stripped, but not

611
00:51:50,070 --> 00:51:54,281
necessarily obfuscated. At least from what
we've seen when we looked at binary-only

612
00:51:54,281 --> 00:51:58,980
drivers.
Herald: Microphone number two.

613
00:51:58,980 --> 00:52:04,350
Q: How do you decide where to place the
red zones? From what I heard, you talked

614
00:52:04,350 --> 00:52:10,030
about instrumenting the allocators, but,
well, there are a lot of variables on the

615
00:52:10,030 --> 00:52:13,270
stack, so how do you deal with those?
gannimo: Oh, yeah, that's actually super

616
00:52:13,270 --> 00:52:20,159
cool. I refer to some extent to the paper
that is on the GitHub repo as well. If you

617
00:52:20,159 --> 00:52:26,778
think about it, modern compilers use
canaries for buffers. Are you aware of

618
00:52:26,778 --> 00:52:31,150
stack canaries—how stack canaries work?
So, stack canaries—like, if the compiler

619
00:52:31,150 --> 00:52:34,440
sees there's a buffer that may be
overflown, it places a stack canary

620
00:52:34,440 --> 00:52:39,740
between the buffer and any other data.
What we use is we—as part of our analysis

621
00:52:39,740 --> 00:52:44,750
tool, we find these stack canaries, remove
the code that does the stack canary, and

622
00:52:44,750 --> 00:52:49,420
use this space to place our red zones. So
we actually hack the stack in areas,

623
00:52:49,420 --> 00:52:54,569
remove that code, and add ASan red zones
into the empty stack canaries that are now

624
00:52:54,569 --> 00:52:58,599
there. It's actually a super cool
optimization because we piggyback on what

625
00:52:58,599 --> 00:53:02,630
kind of work the compiler already did for
us before, and we can then leverage that

626
00:53:02,630 --> 00:53:06,780
to gain additional benefits and protect
the stack as well.

627
00:53:06,780 --> 00:53:11,120
Q: Thanks.
Angel: Another question from the Internet.

628
00:53:16,039 --> 00:53:20,920
Q: Yes. Did you consider lifting the
binary code to LLVM IR instead of

629
00:53:20,920 --> 00:53:28,370
generating assembler source?
gannimo: Yes. <i>laughter</i> But, so—a little

630
00:53:28,370 --> 00:53:32,060
bit longer answer. Yes, we did consider
that. Yes, it would be super nice to lift

631
00:53:32,060 --> 00:53:38,710
to LLVM IR. We've actually looked into
this. It's incredibly hard. It's

632
00:53:38,710 --> 00:53:42,270
incredibly complex. There's no direct
mapping between the machine code

633
00:53:42,270 --> 00:53:48,490
equivalent and the LLVM IR. You would
still need to recover all the types. So

634
00:53:48,490 --> 00:53:51,800
it's like this magic dream that you
recover full LLVM IR, then do heavyweight

635
00:53:51,800 --> 00:53:57,470
transformations on top of it. But this is
incredibly hard because if you compile

636
00:53:57,470 --> 00:54:03,570
down from LLVM IR to machine code, you
lose a massive amount of information. You

637
00:54:03,570 --> 00:54:07,150
would have to find a way to recover all of
that information, which is pretty much

638
00:54:07,150 --> 00:54:14,990
impossible and undecidable for many cases.
So for example, just as a note, we only

639
00:54:14,990 --> 00:54:19,420
recover control flow and we only
desymbolize control flow. For data

640
00:54:19,420 --> 00:54:23,030
references—we don't support
instrumentation of data references yet

641
00:54:23,030 --> 00:54:28,839
because there's still an undecidable
problem that we are facing with. I can

642
00:54:28,839 --> 00:54:32,859
talk more about this offline, or there is
a note in the paper as well. So this is

643
00:54:32,859 --> 00:54:37,270
just a small problem. Only if you're
lifting to assembly files. If you're

644
00:54:37,270 --> 00:54:41,700
lifting to LLVM IR, you would have to do
full end-to-end type recovery, which is

645
00:54:41,700 --> 00:54:46,400
massively more complicated. Yes, it would
be super nice. Unfortunately, it is

646
00:54:46,400 --> 00:54:50,530
undecidable and really, really hard. So
you can come up with some heuristics, but

647
00:54:50,530 --> 00:54:55,270
there is no solution that will do this
in—that will be correct 100 percent of the

648
00:54:55,270 --> 00:54:57,490
time.
Angel: We'll take one more question from

649
00:54:57,490 --> 00:55:02,609
microphone number six.
Q: Thank you for your talk. What kind of

650
00:55:02,609 --> 00:55:07,299
disassemblers did you use for RetroWrite,
and did you have problems with the wrong

651
00:55:07,299 --> 00:55:12,880
disassembly? And if so, how did you handle
it?

652
00:55:12,880 --> 00:55:18,790
Nspace: So, RetroWrite—so we used
Capstone for the disassembly.

653
00:55:18,790 --> 00:55:24,150
gannimo: An amazing tool, by the way.
Nspace: Yeah. So the idea is that, like,

654
00:55:24,150 --> 00:55:30,240
we need some kind of—some information
about where the functions are. So for the

655
00:55:30,240 --> 00:55:33,549
kernel modules, this is actually fine
because kernel modules come with this sort

656
00:55:33,549 --> 00:55:37,730
of information because the kernel needs
it, to build stack traces, for example.

657
00:55:37,730 --> 00:55:41,869
For userspace binaries, this is somewhat
less common, but you can use another tool

658
00:55:41,869 --> 00:55:46,170
to try to do function identification. And
we do, like—sort of, like, disassemble the

659
00:55:46,170 --> 00:55:54,500
entire function. So we have run into some
issues with, like—in AT&T syntax, because

660
00:55:54,500 --> 00:55:59,650
like we wanted to use gas, GNU's
assembler, for, for...

661
00:55:59,650 --> 00:56:04,240
gannimo: Reassembling.
Nspace: Reassembly, yeah. And some

662
00:56:04,240 --> 00:56:09,819
instructions are a lot—you can express the
same, like, two different instructions,

663
00:56:09,819 --> 00:56:15,670
like five-byte NOP and six-byte NOP, using
the same string of, like, text—a mnemonic,

664
00:56:15,670 --> 00:56:19,970
an operand string. But the problem is
that, like, the kernel doesn't like it and

665
00:56:19,970 --> 00:56:21,970
crashes. This took me like two days to
debug.

666
00:56:21,970 --> 00:56:27,640
gannimo: So the kernel uses dynamic binary
patching when it runs, at runtime, and it

667
00:56:27,640 --> 00:56:32,980
uses fixed offsets, so if you replace a
five-byte NOP with a six-byte NOP or vice

668
00:56:32,980 --> 00:56:37,830
versa, your offsets change and your kernel
just blows up in your face.

669
00:56:37,830 --> 00:56:43,099
Q: So it was kind of a case-on-case basis
where you saw the errors coming out of the

670
00:56:43,099 --> 00:56:47,920
disassembly and you had to fix it?
Nspace: So sorry, can you repeat the

671
00:56:47,920 --> 00:56:51,030
question?
Q: Like, for example, if you—if some

672
00:56:51,030 --> 00:56:54,910
instruction is not supported by the
disassembler, so you saw that it crashed,

673
00:56:54,910 --> 00:56:58,000
that there's something wrong, and then you
fix it by hand?

674
00:56:58,000 --> 00:57:02,940
Nspace: Yeah, well, if we saw that there
was a problem with it, this—like, I don't

675
00:57:02,940 --> 00:57:06,960
recall having any unknown instructions in
the dissasembler. I don't think I've ever

676
00:57:06,960 --> 00:57:11,290
had a problem with that. But yeah, this
was a lot of, like, you know, engineering

677
00:57:11,290 --> 00:57:14,290
work.
gannimo: So let me repeat. The problem was

678
00:57:14,290 --> 00:57:19,220
not a bug in the disassembler, but an
issue with the instruction format—that the

679
00:57:19,220 --> 00:57:24,530
same mnemonic can be translated into two
different instructions, one of which was

680
00:57:24,530 --> 00:57:29,089
five bytes long, the other one was six
bytes long. Both used the exact same

681
00:57:29,089 --> 00:57:32,880
mnemonic. Right, so this was an issue with
assembly encoding.

682
00:57:32,880 --> 00:57:38,290
Q: But you had no problems with
unsupported instructions which couldn't be

683
00:57:38,290 --> 00:57:41,339
disassembled?
Nspace: No, no. Not as far as I know, at

684
00:57:41,339 --> 00:57:43,339
least.
Angel: We have one more minute, so a very

685
00:57:43,339 --> 00:57:52,069
short question from microphone number two.
Q: Does it work? Ah. Is your binary

686
00:57:52,069 --> 00:58:02,020
instrumentation equally powerful as kernel
address space... I mean, kASan? So, does

687
00:58:02,020 --> 00:58:06,349
it detect all the memory corruptions on
stack, heap and globals?

688
00:58:06,349 --> 00:58:13,050
gannimo: No globals. But heap—it does all
of them on the heap. There's some slight

689
00:58:13,050 --> 00:58:20,150
variation on the stack because we have to
piggyback on the canary stuff. As I

690
00:58:20,150 --> 00:58:23,880
mentioned quickly before, there is no
reflowing and full recovery of data

691
00:58:23,880 --> 00:58:28,990
layouts. So to get anything on the stack,
we have to piggyback on existing compiler

692
00:58:28,990 --> 00:58:36,650
extensions like stack canaries. But—so we
don't support intra-object overflows on

693
00:58:36,650 --> 00:58:40,631
the stack. But we do leverage the stack in
areas to get some stack benefits, which

694
00:58:40,631 --> 00:58:45,490
is, I don't know, 90, 95 percent there
because the stack canaries are pretty

695
00:58:45,490 --> 00:58:51,319
good. For heap, we get the same precision.
For globals, we have very limited support.

696
00:58:51,319 --> 00:58:54,290
Q: Thanks.
Angel: So that's all the time we have for

697
00:58:54,290 --> 00:58:57,600
this talk. You can find the speakers, I
think, afterwards offline. Please give

698
00:58:57,600 --> 00:58:59,820
them a big round of applause for an
interesting talk.

699
00:58:59,820 --> 00:59:03,050
<i>applause</i>

700
00:59:03,050 --> 00:59:07,360
<i>36c3 postrol music</i>

701
00:59:07,360 --> 00:59:29,000
Subtitles created by c3subtitles.de
in the year 2021. Join, and help us!