36c3 prerol music Herald: So, the next talk for this afternoon is about high speed binary fuzzing. We have two researchers that will be presenting the product of their latest work, which is a framework for static binary rewriting. Our speakers are—the first one is a computer science master's student at EPFL and the second one is a security researcher and assistant professor at EPFL. Please give a big round of applause to Nspace and gannimo. Applause gannimo (Mathias Payer): Thanks for the introduction. It's a pleasure to be here, as always. We're going to talk about different ways to speed up your fuzzing and to find different kinds of vulnerabilities or to tweak your binaries in somewhat unintended ways. I'm Mathias Payer or I go by gannimo on Twitter and I am an assistant professor at EPFL working on different forms of software security: fuzzing sanitization, but also different kinds of mitigations. And Matteo over there is working on his master's thesis on different forms of binary rewriting for the kernel. And today we're going to take you on a journey on how to actually develop very fast and very efficient binary rewriting mechanisms that allow you to do unintended modifications to the binaries and allow you to explore different kinds of unintended features in binaries. So about this talk. What we discovered or the reason why we set out on this journey was that fuzzing binaries is really, really hard. There's very few tools in user space. There's—it's extremely hard to set it up and it's extremely hard to set it up in a performant way. The setup is complex. You have to compile different tools. You have to modify it. And the results are not really that satisfactory. As soon as you move to the kernel, fuzzing binaries in a kernel is even harder. There's no tooling whatsoever, there's very few users actually working with binary code in the kernel or modifying binary code, and it's just a nightmare to work with. So what we are presenting today is a new approach that allows you to instrument any form of binary code or modern binary code based on static rewriting, which gives you full native performance. You only pay for the instrumentation that you add, and you can do very heavyweight transformations on top of it. The picture, if you look at the modern system, let's say we are looking at a modern setup. Let's say you're looking at cat pictures in your browser: Chrome plus the kernel plus the libc plus the graphical user interface together clog up at about 100 million lines of code. Instrumenting all of this for some form of security analysis is a nightmare, especially along this large stack of software. There's quite a bit of different compilers involved. There's different linkers. It may be compiled on a different system, with different settings and so on. And then getting your instrumentation across all of this is pretty much impossible and extremely hard to work with. And we want to enable you to select those different parts that you're actually interested in. Modify those and then focus your fuzzing or analysis approaches on those small subsets of the code, giving you a much better and stronger capability to test the systems that you're, or those parts of the system that you're really, really interested in. Who's worked on fuzzing before? Quick show of hands. Wow, that's a bunch of you. Do you use AFL? Yeah, most of you, AFL. Libfuzzer? Cool, about 10, 15 percent libfuzzer, 30 percent fuzzing, and AFL. There's a quite good knowledge of fuzzing, so I'm not going to spend too much time on fuzzing, but for those that haven't really run their fuzzing campaigns yet, it's a very simple software testing technique. You're effectively taking a binary, let's say Chrome, as a target and you're running this in some form of execution environment. And fuzzing then consists of some form of input generation that creates new test cases, throws them at your program and sees—and checks what is happening with your program. And either everything is OK, and your code is being executed, and your input—the program terminates, everything is fine, or you have a bug report. If you have a bug report, you can use this. Find the vulnerability, maybe develop a PoC and then come up with some form of either exploit or patch or anything else. Right. So this is pretty much fuzzing in a in a nutshell. How do you get fuzzing to be effective? How can you cover large source bases, complex code, and complex environment? Well, there's a couple of simple steps that you can take. And let's walk quickly through effective fuzzing 101. Well, first, you want to be able to create test cases that actually trigger bugs. And this is a very, very complicated, complicated part. And we need to have some notion of the inputs that a program accepts. And we need to have some notion of how we can explore different parts of the program, right? Different parts of functionality. Well, on one hand, we could have a developer write all the test cases by hand, but this would be kind of boring. It would also require a lot of human effort in creating these different inputs and so on. So coverage guided fuzzing has evolved as a very simple way to guide the fuzzing process, leveraging the information on which parts of the code have been executed by simply tracing the individual path through the program based on the execution flow. So we can—the fuzzer can use this feedback to then modify the inputs that are being thrown at the fuzzing process. The second step is the fuzzer must be able to detect bugs. If you've ever looked at a memory corruption, if you're just writing one byte after the end of a buffer, it's highly likely that your software is not going to crash. But it's still a bug, and it may still be exploitable based on the underlying conditions. So we want to be able to detect violations as soon as they happen, for example, based on on some form of sanitization that we add, some form of instrumentation that we add to the to the binary, that then tells us, hey, there's a violation of the memory safety property, and we terminate the application right away as a feedback to the fuzzer. Third, but the—and last but not least: Speed is key, right? For if you're running a fuzzing campaign, you have a fixed resource budget. You have a couple of cores, and you want to run for 24 hours, 48 hours, a couple of days. But in any way, whatever your constraints are, you have a fixed amount of instructions that you can actually execute. And you have to decide, am I spending my instructions on generating new inputs, tracking constraints, finding bugs, running sanitization or executing the program? And you need to find a balance between all of them, as it is a zero sum game. You have a fixed amount of resources and you're trying to make the best with these resources. So any overhead is slowing you down. And again, this becomes an optimization problem. How can you most effectively use the resources that you have available? As we are fuzzing with source code, it's quite easy to actually leverage existing mechanisms, and we add all that instrumentation at compile time. We take source code, we pipe it through the compiler and modern compiler platforms, allow you to instrument and add little code snippets during the compilation process that then carry out all these tasks that are useful for fuzzing. For example, modern compilers can add short snippets of code for coverage tracking that will record which parts of the code that you have executed, or for sanitization which record and check every single memory access if it is safe or not. And then when you're running the instrumented binary, everything is fine and you can detect the policy violations as you go along. Now if you would have source code for everything, this would be amazing. But it's often not the case, right? We may be able on Linux to cover a large part of the protocol stack by focusing only on source-code-based approaches. But there may be applications where no source code is available. If we move to Android or other mobile systems, there's many drivers that are not available as open source or just available as binary blobs, or the full software stack may be closed-source and we only get the binaries. And we still want to find vulnerabilities in these complex software stacks that span hundreds of millions of lines of code in a very efficient way. The only solution to cover this part of massive code base is to actually rewrite and focus on binaries. A very simple approach could be black box fuzzing, but this is—this doesn't really get you anywhere because you don't get any feedback; you don't get any information if you're triggering bugs. So one simple approach, and this is the approach that is most dominantly used today, is to rewrite the program or the binary dynamically. So you're taking the binary and during execution you use some form of dynamic binary instrumentation based on Pin, angr, or some other binary rewriting tool and translate the targeted runtime, adding this binary instrumentation on top of it as you're executing it. It's simple, it's straightforward, but it comes at a terrible performance cost of ten to a hundred x slow down, which is not really effective. And you're spending all your cores and your cycles on just executing the binary instrumentation. So we don't really want to do this and we want to have something that's more effective than that. So what we are focusing on is to do static rewriting. It involves a much more complex analysis as we are rewriting the binary before it is being executed, and we have to recover all of the control flow, all of the different mechanisms, but it results in a much better performance. And we can get more bang for our buck. So why is static rewriting so challenging? Well, first, simply adding code will break the target. So if you are disassembling this piece of code here, which is a simple loop that loads data, decrements the registers, and then jumps if you're not at the end of the array and keeps iterating through this array. Now, as you look at the jump-not- zero instruction, the last instruction of the snippet, it is a relative offset. So it jumps backward seven bytes. Which is nice if you just execute the code as is. But as soon as you want to insert new code, you change the offsets in the program, and you're modifying all these different offsets. And simply adding new code somewhere in between will break the target. So a core feature that we need to enforce, or core property that we need to enforce, is that we must find all the references and properly adjust them, both relative offsets and absolute offsets as well. Getting a single one wrong will break everything. What makes this problem really, really hard is that if you're looking at the binary, a byte is a byte, right? There's no way for us to distinguish between scalars and references, and in fact they are indistinguishable. Getting a single reference wrong breaks the target and would introduce arbitrary crashes. So we have to come up with ways that allow us to distinguish between the two. So for example, if you have this code here, it takes a value and stores it somewhere on the stack. This could come from two different kind of high-level constructs. On one hand, it could be taking the address of a function and storing this function address somewhere and in a stack variable. Or it could be just storing a scalar in a stack variable. And these two are indistinguishable, and rewriting them, as soon as we add new code, the offsets will change. If it is a function, we would have to modify the value; if it is a scalar, we have to keep the same value. So how can we come up with a way that allows us to distinguish between the two and rewrite binaries by recovering this missing information? So let us take—let me take you or let us take you on a journey towards instrumenting binaries in the kernel. This is what we aim for. We'll start with the simple case of instrumenting binaries in user land, talk about different kinds of coverage guided fuzzing and what kind of instrumentation we can add, what kind of sanitization we can add, and then focusing on taking it all together and applying it to kernel binaries to see what what will fall out of it. Let's start with instrumenting binaries first. I will now talk a little bit about RetroWrite, our mechanism and our tool that enables static binary instrumentation by symbolizing existing binaries. So we recover the information and we translate relative offsets and absolute offsets into actual labels that are added to the assembly file. The instrumentation can then work on the recovered assembly file, which can then be reassembled into a binary that can then be executed for fuzzing. We implement coverage tracking and binary address sanitizer on top of this, leveraging abstraction as we go forward. The key to enabling this kind of binary rewriting is position-independent code. And position- independent code has become the de-facto standard for any code that is being executed on a modern system. And it effectively says that it is code that can be loaded at any arbitrary address in your address space as you are executing binaries. It is essential and a requirement if you want to have address space layout randomization or if you want to use shared libraries, which de facto you want to use in all these different systems. So since a couple of years, all the code that you're executing on your phones, on your desktops, on your laptops is position-independent code. And the idea between the position-independent code is that you can load it anywhere in your address space and you can therefore not use any hard-coded static addresses and you have to inform the system of relocations or pick relative addresses—to—on how the system can relocate these different mechanisms. On x86_64, position-independent code leverages addressing that is relative to the instruction pointer. So for example, it uses the current instruction pointer and then a relative offset to that instruction pointer to reference global variables, other functions and so on. And this is a very easy way for us to distinguish references from constants, especially in PIE binaries. If it is RIP- relative, it is a reference; everything else is a constant. And we can build our translation algorithm and our translation mechanism based on this fundamental finding to remove any form of heuristic that is needed by focusing especially on position-independent code. So we're supporting position-independent code; we are—we don't support non-position- independent code, but we give you the guarantee that we can rewrite all the different code that is out there. So symbolization works as follows: If you have the little bit of code on the lower right, symbolization replaces first all the references with assembler labels. So look at the call instruction and the jump- not-zero instruction; the call instruction references an absolute address and the jump-not-zero instruction jumps backward relative 15 bytes. So by focusing on these relative jumps and calls, we can replace them with actual labels and rewrite the binary as follows: so we're calling a function, replacing it with the actual label, and for the jump-not-zero we are inserting an actual label in the assembly code and adding a backward reference. For PC-relative addresses, for example the data load, we can then replace it with the name of the actual data that we have recovered, and we can then add all the different relocations and use that as auxiliary information on top of it. After these three steps, we can insert any new code in between, and can therefore add different forms of instrumentations or run some more higher-level analysis on top of it, and then reassemble the file for fuzzing or coverage-guided tracking, address sanitization or whatever else you want to do. I will now hand over to Matteo, who will cover coverage-guided fuzzing and sanitization and then instrumenting the binaries in the kernel. Go ahead. Nspace (Matteo Rizzo): So, now that we have this really nice framework to rewrite binaries, one of the things that we want to add to actually get the fuzzing is this coverage-tracking instrumentation. So coverage-guided fuzzing is a way, a method, for—to let the fuzzer discover interesting inputs, an interesting path to the target by itself. So the basic idea is that the fuzzer will track coverage—the parts of the programs that are covered by different inputs by inserting some kind of instrumentation. So, for example, here we have this target program that checks if the input contains the string "PNG" at the beginning, and if it does, then it does something interesting, otherwise it just bails out and fails. So if we track the part of the programs that each input executes, the fuzzer can figure out that an input that contains "P" will have discovered a different path through the program than input that doesn't contain it. And then so on it can, one byte at a time, discover that this program expects this magic sequence "PNG" at the start of the input. So the way that the fuzzer does this is that every time a new input discovers a new path though the target, it is considered interesting and added to a corpus of interesting inputs. And every time the fuzzer needs to generate a new input, it will select something from the corpus, mutate it randomly, and then use it as the new input. So this is like a—this is, like, conceptually pretty simple, but in practice it works really well and it really lets the fuzzer discover the format that the target expects in an unsupervised way. So as an example, this is an experiment that was run by the author of AFL—AFL is the fuzzer that sort of popularized this technique—where he was fuzzing a JPEG- parsing library, starting from a corpus that only contained the string "hello". So now clearly "hello" is not a valid JPEG image and so—but still, like, the fuzzer was still able to find—to discover the correct format. So after a while it started generating some grayscale images, on the top left, and as it generated more and more inputs, it started generating more interesting images, such as some grayscale gradients, and later on even some color images. So as you can see, this really works, and it allows us to fuzz a program without really teaching the fuzzer how the input should look like. So that's it for coverage-guided fuzzing. Now we'll talk a bit about sanitizations. As a reminder, the core idea behind sanitization is that just looking for crashes is likely to miss some of the bugs. So, for example, if you have this out-of-bounds one-byte read, that will probably not crash the target, but you would still like to catch it because it could be used for an info leak, for example. So one of the most popular sanitizers is Address Sanitizer. So Address Sanitizer will instrument all the memory accesses in your program and check for memory corruption, which—so, memory corruption is a pretty dangerous class of bugs that unfortunately still plagues C and C++ programs and unsafe languages in general. And ASan tries to catch it by instrumenting the target. It is very popular; it has been used to find thousands of bugs in complex software like Chrome and Linux, and even though it has, like, a bit of a slowdown—like about 2x—it is still really popular because it lets you find many, many more bugs. So how does it work? The basic idea is that ASan will insert some special regions of memory called 'red zones' around every object in memory. So we have a small example here where we declare a 4-byte array on the stack. So ASan will allocate the array "buf" and then add a red zone before it and a red zone after it. Whenever the program accesses the red zones, it is terminated with a security violation. So the instrumentation just prints a bug report and then crashes the target. This is very useful for detecting, for example, buffer overflows or underflows and many other kinds of bugs such as use-after-free and so on. So, as an example here, we are trying to copy 5 bytes into a 4-byte buffer, and ASan will check each of the accesses one by one. And when it sees that the last byte writes to a red zone, it detects the violation and crashes the program. So this is good for us because this bug might have not been found by simply looking for crashes, but it's definitely found if we use ASan. So this is something we want for fuzzing. So now that we've covered—briefly covered ASan we can talk about instrumenting binaries in the kernel. So Mathias left us with RetroWrite, and with RetroWrite we can add both coverage tracking and ASan to binaries. So the simple—it's a really simple idea: now that we can rewrite this binary and add instructions wherever we want, we can implement both coverage tracking and ASan. In order to implement coverage tracking, we simply have to identify the start of every basic block and add a little piece of instrumentation at the start of the basic block that tells the fuzzer 'hey, we've reached this part of the program'—'hey, we've reached this other part of the program'. Then the fuzzer can figure out whether that's a new part or not. ASan is also, like, you know, it's also somewhat—it can also be implemented in this way by finding all memory accesses, and then linking with libASan. libASan is a sort of runtime for ASan that takes care of inserting the red zones and instrument—and adding, you know, like, keeping around all the metadata that ASan needs to know where the red zones are, and detecting whether a memory access is invalid. So, how can we apply all of this in the kernel? Well, first of all, fuzzing the kernel is not as easy as fuzzing some userspace program. There's some issues here. So first of all, there's crash handling. So whenever you're fuzzing a userspace program, you expect crashes, well, because that's what we're after. And if a userspace program crashes, then the US simply terminates the crash gracefully. And so the fuzzer can detect this, and save the input as a crashing input, and so on. And this is all fine. But when you're fuzzing the kernel, so—if you were fuzzing the kernel of the machine that you were using for fuzzing, after a while, the machine would just go down. Because, after all, the kernel runs the machine, and if it starts misbehaving, then all of it can go wrong. And more importantly, you can lose your crashes, because the if the machine crashes, then the state of the fuzzer is lost and you have no idea what your crashing input was. So what most kernel fuzzers have to do is that they resort to some kind of VM to keep the system stable. So they fuzz the kernel in a VM and then run the fuzzing agent outside the VM. On top of that is tooling. So, if you want to fuzz a user space program, you can just download AFL or use libfuzzer; there's plenty of tutorials online, it's really easy to set up and just, like—compile your program, you start fuzzing and you're good to go. If you want to fuzz the kernel, it's already much more complicated. So, for example, if you want to fuzz Linux with, say, syzkaller, which is a popular kernel fuzzer, you have to compile the kernel, you have to use a special config that supports syzkaller, you have way less guides available than for userspace fuzzing, and in general it's just much more complex and less intuitive than just fuzzing userspace. And lastly, we have the issue of determinism. So in general, if you have a single threaded userspace program, unless it uses some kind of random number generator, it is more or less deterministic. There's nothing that affects the execution of the program. But—and this is really nice if you want to try to reproduce a test case, because if you have a non-deterministic test case, then it's really hard to know whether this is really a crash or if it's just something that you should ignore, and in the kernel this is even harder, because you don't only have concurrency, like multi-processing, you also have interrupts. So interrupts can happen at any time, and if one time you got an interrupt while executing your test case and the second time you didn't, then maybe it only crashes one time - you don't really know, it's not pretty. And so again, we have several approaches to fuzzing binaries in the kernel. First one is to do black box fuzzing. We don't really like this because it doesn't find much, especially in something complex like a kernel. Approach 1 is to use dynamic translation, so, use something like QEMU or—you name it. This works, and people have used it successfully; the problem is that it is really, really, really slow. Like, we're talking about 10x-plus overhead. And as we said before, the more iterations, the more test cases you can execute in the same amount of time, the better, because you find more bugs. And on top of that, there's no currently available sanitizer for kernel binaries that works—is based on this approach. So in userspace you have something like valgrind; in the kernel, you don't have anything, at least that we know of. There is another approach, which is to use Intel Processor Trace. This has been, like—there's been some research papers on this recently, and this is nice because it allows you to collect coverage at nearly zero overhead. It's, like, really fast, but the problem is that it requires hardware support, so it requires a fairly new x86 CPU, and if you want to fuzz something on ARM, say, like, your Android driver, or if you want to use an older CPU, then you're out of luck. And what's worse, you cannot really use it for sanitization, or at least not the kind of sanitization that ASan does, because it just traces the execution; it doesn't allow you to do checks on memory accesses. So Approach 3, which is what we will use, is static rewriting. So, we had this very nice framework for rewriting userspace binaries, and then we asked ourselves, can we make this work in the kernel? So we took the system, the original RetroWrite, we modified it, we implemented support for Linux modules, and... it works! So we have implemented it—we have used it to fuzz some kernel modules, and it really shows that this approach doesn't only work for userspace; it can also be applied to the kernel. So as for some implementation, the nice thing about kernel modules is that they're always position independent. So you cannot have position—like, fixed- position kernel modules because Linux just doesn't allow it. So we sort of get that for free, which is nice. And Linux modules are also a special class of ELF files, which means that the format is—even though it's not the same as userspace binaries, it's still somewhat similar, so we didn't have to change the symbolizer that much, which is also nice. And we implemented symbolization with this, and we used it to implement both code coverage and binary ASan for kernel binary modules. So for coverage: The idea behind the whole RetroWrite project was that we wanted to integrate with existing tools. So existing fuzzing tools. We didn't want to force our users to write their own fuzzer that is compatible with RetroWrite. So for—in userspace we had AFL-style coverage tracking, and binary ASan which is compatible with source-based ASan, and we wanted to follow the same principle in the kernel. So it turns out that Linux has this built-in coverage-tracking framework called kCov that is used by several popular kernel fuzzers like syzkaller, and we wanted to use it ourselves. So we designed our coverage instrumentation so that it integrates with kCov. The downside is that you need to compile the kernel with kCov, but then again, Linux is open source, so you can sort of always do that; the kernel usually—it's usually not the kernel, it is a binary blob, but it's usually only the modules. So that's just still fine. And the way you do this is—the way you implement kCov for binary modules is that you just have to find the start of every basic block, and add a call to some function that then stores the collected coverage. So here's an example: we have a short snippet of code with three basic blocks, and all we have to do is add a call to "trace_pc" to the start of the basic block. "trace_pc" is a function that is part of the main kernel image that then collects this coverage and makes it available to a userspace fuzzing agent. So this is all really easy and it works. And let's now see how we implemented binary ASan. So as I mentioned before, when we instrument the program with binary ASan in userspace we link with libASan, which takes care of setting up the metadata, takes care of putting the red zones around our allocations, and so on. So we had to do something similar in the kernel; of course, you cannot link with libASan in the kernel, because that doesn't work, but what we can do instead is, again, compile the kernel with kASan support. So this instruments the allocator, kmalloc, to add the red zones; it allocates space for the metadata, it keeps this metadata around, does this all for us, which is really nice. And again, the big advantage of using this approach is that we can integrate seamlessly with a kASan- instrumented kernel and with fuzzers that rely on kASan such as syzkaller. So we see this as more of a plus than, like, a limitation. And how do you implement ASan? Well, you have to find every memory access and instrument it to check the—to check whether this is accessing a red zone. And if it does then you just call this bug report function that produces a stack trace, a bug report, and crashes the kernel, so that the fuzzer can detect it. Again, this is compatible with source- based kASan, so we're happy. We can simply load the rewritten module with added instrumentation into a kernel, as long as you have compiled the kernel with the right flags, and we can use a standard kernel fuzzer. Here for the—our evaluation, we used syzkaller, a popular kernel fuzzer by some folks at Google, and it worked really well. So we've finally reached the end of our journey, and now we wanted to present some experiments we did to see if this really works. So for userspace, we wanted to compare the performance of our binary ASan with source-based ASan and with existing solutions that also work on binaries. So for userspace, you can use valgrind memcheck. It's a memory sanitizer that is based on binary translation and dynamic binary translation and works on binaries. We compared it with source ASan and RetroWrite ASan on the SPEC CPU benchmark and saw how fast it was. And for the kernel we decided to fuzz some file systems and some drivers with syzkaller using both source-based KASan and kCov and kRetroWrite-based KASan and kCov. So these are our results for userspace. So the red bar is valgrind. We can see that the execution time of valgrind is the highest. It is really, really slow—like, 3, 10, 30x overhead, way too slow for fuzzing. Then in green, we have our binary ASan, which is, like, already a large improvement. In orange we have source-based ASan. And then finally in blue we have the original code without any instrumentation whatsoever. So we can see that source-based ASan has, like, 2x or 3x overhead, and binary ASan is a bit higher, like, a bit less efficient, but still somewhat close. So that's for userspace, and for the kernel, we—these are some preliminary results, so, this is, like—I'm doing this work as part of my master's thesis, and so I'm still, like, running the evaluation. Here we can see that the overhead is already, like, a bit lower. So the reason for this is that SPEC is a pure CPU benchmark; it doesn't interact with the system that much. And so any instrumentation that you add is going to massively slow down, or, like, considerably slow down the execution. By contrast, when you fuzz a file system with syzkaller, not only every test case has to go from the high—the host to the guest and then do multiple syscalls and so on, but also every system call has to go through several layers of abstraction before it gets to the actual file system. And all these—like, all of this takes a lot of time, and so in practice the overhead of our instrumentation seems to be pretty reasonable. So, since we know that you like demos, we've prepared a small demo of kRetroWrite. So. Let's see. Yep. Okay. All right, so we've prepared a small kernel module. And this module is just, like, really simple; it contains a vulnerability, and what it does is that it creates a character device. So if you're not familiar with this, a character device is like a fake file that is exposed by a kernel driver and that it can read to and write from. And instead of going to a file, the data that you read—that you, in this case, write to the fake file—goes to the driver and is handled by this demo write function. So as we can see, this function allocates a buffer, a 16-byte buffer on the heap, and then copies some data into it, and then it checks if the data contains the string "1337". If it does, then it accesses the buffer out of bounds; you can see "alloc[16]" and the buffer is sixteen bytes; this is an out- of-bounds read by one byte. And if it doesn't then it just accesses the buffer in bounds, which is fine, and it's not a vulnerability. So we can compile this driver. OK, um... OK, and then so we have our module, and then we will instrument it using kRetroWrite. So, instrument... Yes, please. OK. Right. So kRetroWrite did some processing, and it produced an instrumented module with ASan or kASan and a symbolized assembly file. We can actually have a look at the symbolized assembly file to see what it looks like. Yes. Yes. OK. So, is this big enough? Yeah... As you can see, so—we can actually see here the ASan instrumentation. Ah, shouldn't—yeah. So, we—this is the ASan instrumentation. The original code loads some data from this address. And as you can see, the ASan instrumentation first computes the actual address, and then does some checking—basically, this is checking some metadata that ASan stores to check if the address is in a red zone or not, and then if the fail check fails, then it calls this ASan report which produces a stack trace and crashes the kernel. So this is fine. We can actually even look at the disassembly of both modules, so... object dump and then demo... Ah, nope. OK, so on the left, we have the original module without any instrumentation; on the right, we have the module instrumented with ASan. So as you can see, the original module has "push r13" and then has this memory load here; on the right in the instrumented module, kRetroWrite inserted the ASan instrumentation. So the original load is still down here, but between that, between the first instruction and this instruction, we have—now have the kASan instrumentation that does our check. So this is all fine. Now we can actually test it and see what it does. So we can—we will boot a very simple, a very minimal Linux system, and try to target the vulnerability first with the non- instrumented module and then with the instrumented module. And we can—we will see that in the—with the non-instrumented module, the kernel will not crash, but with the instrumented module it will crash and produce a bug report. So. Let's see. Yeah, this is a QEMU VM, I have no idea why it's taking so long to boot. I'll blame the the demo gods not being kind to us. Yeah, I guess we just have to wait. OK. So. All right, so we loaded the module. We will see that it has created a fake file character device in /dev/demo. Yep. We can write this file. Yep. So this will—this accesses the array in bounds, and so this is fine. Then what we can also do is write "1337" to it so it will access the array out of bounds. So this is the non instrumented module, so this will not crash. It will just print some garbage value. Okay, that's it. Now we can load the instrumented module instead... and do the same experiment again. All right. We can see that /dev/demo is still here. So the module still works. Let's try to write "1234" into it. This, again, doesn't crash. But when we try to write "1337", this will produce a bug report. applause So this has quite a lot of information. We can see, like, the—where the memory was allocated, there's a stack trace for that; it wasn't freed, so there's no stack trace for the free. And we see that the cache size of the memory, like, it was a 16-byte allocation. We can see the shape of the memory. We see that these two zeros means that there's two 8-byte chunks of valid memory. And then these "fc fc fc" is the—are the red zones that I was talking about before. All right, so that's it for the demo. We will switch back to our presentation now. So... hope you enjoyed it. gannimo: Cool. So after applying this to a demo module, we also wanted to see what happens if we apply this to a real file system. After a couple of hours we were—when we came back and checked on the results, we saw a couple of issues popping up, including a nice set of use-after-free reads, a set of use-after-free writes, and we checked the bug reports and we saw a whole bunch of Linux kernel issues popping up one after the other in this nondescript module that we fuzzed. We're in the process of reporting it. This will take some time until it is fixed; that's why you see the blurry lines. But as you see, there's still quite a bit of opportunity in the Linux kernel where you can apply different forms of targeted fuzzing into different modules, leverage these modules on top of a kASan instrumented kernel and then leveraging this as part of your fuzzing toolchain to find interesting kernel 0days that... yeah. You can then develop further, or report, or do whatever you want with them. Now, we've shown you how you can take existing binary-only modules, think different binary-only drivers, or even existing modules where you don't want to instrument a full set of the Linux kernel, but only focus fuzzing and exploration on a small different—small limited piece of code and then do security tests on those. We've shown you how we can do coverage-based tracking and address sanitization. But this is also up to you on what kind of other instrumentation you want. Like this is just a tool, a framework that allows you to do arbitrary forms of instrumentation. So we've taken you on a journey from instrumenting binaries over coverage-guided fuzzing and sanitization to instrumenting modules in the kernel and then finding crashes in the kernel. Let me wrap up the talk. So, this is one of the the fun pieces of work that we do in the hexhive lab at EPFL. So if you're looking for postdoc opportunities or if you're thinking about a PhD, come talk to us. We're always hiring. The tools will be released as open source. A large chunk of the userspace work is already open source. We're working on a set of additional demos and so on so that you can get started faster, leveraging the different existing instrumentation that is already out there. The userspace work is already available. The kernel work will be available in a couple of weeks. This allows you to instrument real-world binaries for fuzzing, leveraging existing transformations for coverage tracking to enable fast and effective fuzzing and memory checking to detect the actual bugs that exist there. The key takeaway from this talk is that RetroWrite and kRetroWrite enables static binary rewriting at zero instrumentation cost. We take the limitation of focusing only on position-independent code, which is not a real implementation, but we get the advantage of being able to symbolize without actually relying on heuristics, so we can even symbolize large, complex source—large, complex applications and effectively rewrite those aspects and then you can focus fuzzing on these parts. Another point I want to mention is that this enables you to reuse existing tooling so you can take a binary blob, instrument it, and then reuse, for example, Address Sanitizer or existing fuzzing tools, as it integrates really, really nice. As I said, all the code is open source. Check it out. Try it. Let us know if it breaks. We're happy to fix. We are committed to open source. And let us know if there are any questions. Thank you. applause Herald: So, thanks, guys, for an interesting talk. We have some time for questions, so we have microphones along the aisles. We'll start from question from microphone number two. Q: Hi. Thanks for your talk and for the demo. I'm not sure about the use-case you showed for the kernel RetroWrite. 'Cause you're usually interested in fuzzing binary in kernelspace when you don't have source code for the kernel. For example, for IoT or Android and so on. But you just reuse the kCov and kASan in the kernel, and you never have the kernel in IoT or Android which is compiled with that. So are you—do you have any plans to binary instrument the kernel itself, not the modules? Nspace: So we thought about that. I think that there's some additional problems that we would have to solve in order to be able to instrument the full kernel. So other than the fact that it gives us compatibility with, like, existing tools, the reason why we decided to go with compiling the kernel with kASan and kCov is that building the, like—you would you have to, like, think about it. You have to instrument the memory allocator to add red zones, which is, like, already somewhat complex. You have to instrument the exception handlers to catch, like, any faults that the instrumentation detects. You would have to, like, set up some memory for the ASan shadow. So this is, like—I think you should be able to do it, but it would require a lot of additional work. So this is, like—this was like four months' thesis. So we decided to start small and prove that it works in the kernel for modules, and then leave it to future work to actually extend it to the full kernel. Also, like, I think for Android—so in the case of Linux, the kernel is GPL, right, so if the manufacturers ships a custom kernel, they have to release the source code, right? Q: They never do. Nspace: They never—well, that's a different issue. Right? gannimo: Right. Q: So that's why I ask, because I don't see how it just can be used in the real world. gannimo: Well, let me try to put this into perspective a little bit as well. Right. So there's the—what we did so far is we leveraged existing tools, like kASan or kCov, and integrated into these existing tools. Now, doing heap-based allocation is fairly simple and replacing those with additional red zones—that instrumentation you can carry out fairly well by focusing on the different allocators. Second to that, simply oopsing the kernel and printing the stack trace is also fairly straightforward. So it's not a lot of additional effort. So it is—it involves some engineering effort to port this to non-kASan-compiled kernels. But we think it is very feasible. In the interest of time, we focused on kASan-enabled kernels, so that some form of ASan is already enabled. But yeah, this is additional engineering effort. But there is also a community out there that can help us with these kind of changes. So kRetroWrite and RetroWrite themselves are the binary rewriting platform that allow you to turn a binary into an assembly file that you can then instrument and run different passes on top of it. So another pass would be a full ASan pass or kASan pass that somebody could add and then contribute back to the community. Q: Yeah, it would be really useful. Thanks. gannimo: Cool. Angel: Next question from the Internet. Q: Yes, there is a question regarding the slide on the SPEC CPU benchmark. The second or third graph from the right had an instrumented version that was faster than the original program. Why is that? gannimo: Cache effect. Thank you. Angel: Microphone number one. Q: Thank you. Thank you for presentation. I have question: how many architecture do you support, and if you have support more, what then? gannimo: x86_64. Q: Okay. So no plans for ARM or MIPS, or...? gannimo: Oh, there are plans. Q: Okay. Nspace: Right, so— gannimo: Right. Again, there's a finite amount of time. We focused on the technology. ARM is high up on the list. If somebody is interested in working on it and contributing, we're happy to hear from it. Our list of targets is ARM first and then maybe something else. But I think with x86_64 and ARM we've covered a majority of the interesting platforms. Q: And second question, did you try to fuzz any real closed-source program? Because as I understand from presentation, you fuzz, like, just file system, what we can compile and fuzz with syzkaller like in the past. Nspace: So for the evaluation, we wanted to be able to compare between the source- based instrumentation and the binary-based instrumentation, so we focused mostly on open-source filesystem and drivers because then we could instrument them with a compiler. We haven't yet tried, but this is, like, also pretty high up on the list. We wanted to try to find some closed- source drivers—there's lots of them, like for GPUs or anything—and we'll give it a try and find some 0days, perhaps. Q: Yes, but with syzkaller, you still have a problem. You have to write rules, like, dictionaries. I mean, you have to understand the format, have to communicate with the driver. Nspace: Yeah, right But there's, for example, closed-source file systems that we are looking at. Q: Okay. Thinking. Herald: Number two. Q: Hi. Thank you for your talk. So I don't know if there are any kCov- or kASan- equivalent solution to Windows, but I was wondering if you tried, or are you planning to do it on Windows, the framework? Because I know it might be challenging because of the driver signature enforcement and PatchGuard, but I wondered if you tried or thought about it. gannimo: Yes, we thought about it and we decided against it. Windows is incredibly hard and we are academics. The research I do in my lab, or we do in my research lab, focuses on predominantly open-source software and empowers open-source software. Doing full support for Microsoft Windows is somewhat out of scope. If somebody wants to port these tools, we are happy to hear it and work with these people. But it's a lot of additional engineering effort, versus very additional—very low additional research value, so we'll have to find some form of compromise. And, like, if you would be willing to fund us, we would go ahead. But it's—yeah, it's a cost question. Q: And you're referring both to kernel and user space, right? gannimo: Yeah. Q: Okay. Thank you. Herald: Number five. Q: Hi, thanks for the talk. This seems most interesting if you're looking for vulnerabilities in closed source kernel modules, but not giving it too much thought, it seems it's really trivial to prevent this if you're writing a closed source module. gannimo: Well, how would you prevent this? Q: Well, for starters, you would just take a difference between the address of two functions. That's not gonna be IP relative, so... Nspace: Right. So we explicitly—like, even in the original RetroWrite paper—we explicitly decided to not try to deal with obfuscated code, or code that is purposefully trying to defeat this kind of rewriting. Because, like, the assumption is that first of all, there are techniques to, like, deobfuscate code or remove these, like, checks in some way, but this is, like, sort of orthogonal work. And at the same time, I guess most drivers are not really compiled with the sort of obfuscation; they're just, like, you know, they're compiled with a regular compiler. But yeah, of course, this is, like, a limitation. gannimo: They're likely stripped, but not necessarily obfuscated. At least from what we've seen when we looked at binary-only drivers. Herald: Microphone number two. Q: How do you decide where to place the red zones? From what I heard, you talked about instrumenting the allocators, but, well, there are a lot of variables on the stack, so how do you deal with those? gannimo: Oh, yeah, that's actually super cool. I refer to some extent to the paper that is on the GitHub repo as well. If you think about it, modern compilers use canaries for buffers. Are you aware of stack canaries—how stack canaries work? So, stack canaries—like, if the compiler sees there's a buffer that may be overflown, it places a stack canary between the buffer and any other data. What we use is we—as part of our analysis tool, we find these stack canaries, remove the code that does the stack canary, and use this space to place our red zones. So we actually hack the stack in areas, remove that code, and add ASan red zones into the empty stack canaries that are now there. It's actually a super cool optimization because we piggyback on what kind of work the compiler already did for us before, and we can then leverage that to gain additional benefits and protect the stack as well. Q: Thanks. Angel: Another question from the Internet. Q: Yes. Did you consider lifting the binary code to LLVM IR instead of generating assembler source? gannimo: Yes. laughter But, so—a little bit longer answer. Yes, we did consider that. Yes, it would be super nice to lift to LLVM IR. We've actually looked into this. It's incredibly hard. It's incredibly complex. There's no direct mapping between the machine code equivalent and the LLVM IR. You would still need to recover all the types. So it's like this magic dream that you recover full LLVM IR, then do heavyweight transformations on top of it. But this is incredibly hard because if you compile down from LLVM IR to machine code, you lose a massive amount of information. You would have to find a way to recover all of that information, which is pretty much impossible and undecidable for many cases. So for example, just as a note, we only recover control flow and we only desymbolize control flow. For data references—we don't support instrumentation of data references yet because there's still an undecidable problem that we are facing with. I can talk more about this offline, or there is a note in the paper as well. So this is just a small problem. Only if you're lifting to assembly files. If you're lifting to LLVM IR, you would have to do full end-to-end type recovery, which is massively more complicated. Yes, it would be super nice. Unfortunately, it is undecidable and really, really hard. So you can come up with some heuristics, but there is no solution that will do this in—that will be correct 100 percent of the time. Angel: We'll take one more question from microphone number six. Q: Thank you for your talk. What kind of disassemblers did you use for RetroWrite, and did you have problems with the wrong disassembly? And if so, how did you handle it? Nspace: So, RetroWrite—so we used Capstone for the disassembly. gannimo: An amazing tool, by the way. Nspace: Yeah. So the idea is that, like, we need some kind of—some information about where the functions are. So for the kernel modules, this is actually fine because kernel modules come with this sort of information because the kernel needs it, to build stack traces, for example. For userspace binaries, this is somewhat less common, but you can use another tool to try to do function identification. And we do, like—sort of, like, disassemble the entire function. So we have run into some issues with, like—in AT&T syntax, because like we wanted to use gas, GNU's assembler, for, for... gannimo: Reassembling. Nspace: Reassembly, yeah. And some instructions are a lot—you can express the same, like, two different instructions, like five-byte NOP and six-byte NOP, using the same string of, like, text—a mnemonic, an operand string. But the problem is that, like, the kernel doesn't like it and crashes. This took me like two days to debug. gannimo: So the kernel uses dynamic binary patching when it runs, at runtime, and it uses fixed offsets, so if you replace a five-byte NOP with a six-byte NOP or vice versa, your offsets change and your kernel just blows up in your face. Q: So it was kind of a case-on-case basis where you saw the errors coming out of the disassembly and you had to fix it? Nspace: So sorry, can you repeat the question? Q: Like, for example, if you—if some instruction is not supported by the disassembler, so you saw that it crashed, that there's something wrong, and then you fix it by hand? Nspace: Yeah, well, if we saw that there was a problem with it, this—like, I don't recall having any unknown instructions in the dissasembler. I don't think I've ever had a problem with that. But yeah, this was a lot of, like, you know, engineering work. gannimo: So let me repeat. The problem was not a bug in the disassembler, but an issue with the instruction format—that the same mnemonic can be translated into two different instructions, one of which was five bytes long, the other one was six bytes long. Both used the exact same mnemonic. Right, so this was an issue with assembly encoding. Q: But you had no problems with unsupported instructions which couldn't be disassembled? Nspace: No, no. Not as far as I know, at least. Angel: We have one more minute, so a very short question from microphone number two. Q: Does it work? Ah. Is your binary instrumentation equally powerful as kernel address space... I mean, kASan? So, does it detect all the memory corruptions on stack, heap and globals? gannimo: No globals. But heap—it does all of them on the heap. There's some slight variation on the stack because we have to piggyback on the canary stuff. As I mentioned quickly before, there is no reflowing and full recovery of data layouts. So to get anything on the stack, we have to piggyback on existing compiler extensions like stack canaries. But—so we don't support intra-object overflows on the stack. But we do leverage the stack in areas to get some stack benefits, which is, I don't know, 90, 95 percent there because the stack canaries are pretty good. For heap, we get the same precision. For globals, we have very limited support. Q: Thanks. Angel: So that's all the time we have for this talk. You can find the speakers, I think, afterwards offline. Please give them a big round of applause for an interesting talk. applause 36c3 postrol music Subtitles created by c3subtitles.de in the year 2021. Join, and help us!