*preroll music* Herald: Our next speaker for today is a computer science PhD student at UC Santa Barbara. He is a member of the Shellfish Hacking Team and he's also the organizer of the IECTF Hacking Competition. Please give a big round of applause to Nilo Redini. *applause* Nilo: Thanks for the introduction, hello to everyone. My name is Nilo, and today I'm going to present you my work Koronte: identifying multi-binary vulnerabilities in embedded firmware at scale. This work is a co-joint effort between me and several of my colleagues at University of Santa Barbara and ASU. This talk is going to be about IoT devices. So before starting, let's see an overview about IoT devices. IoT devices are everywhere. As the research suggests, they will reach the 20 billion units by the end of the next year. And a recent study conducted this year in 2019 on 16 million households showed that more than 70 percent of homes in North America already have an IoT network connected device. IoT devices make everyday life smarter. You can literally say "Alexa, I'm cold" and Alexa will interact with the thermostat and increase the temperature of your room. Usually the way we interact with the IoT devices is through our smartphone. We send a request to the local network, to some device, router or door lock, or we might send the same request through a cloud endpoint, which is usually managed by the vendor of the IoT device. Another way is through the IoT hubs, smartphone will send the request to some IoT hub, which in turn will send the request to some other IoT devices. As you can imagine, IoT devices use and collect our data and some data is more sensitive than other. For instance, think of all the data that is collected by my lightbulb or data that is collected by our security camera. As such, IoT devices can compromise people's safety and privacy. Things, for example, about the security implication of a faulty smartlock or the brakes of your smart car. So the question that we asked is: Are IoT devices secure? Well, like everything else, they are not. OK, in 2016 the Mirai botnet compromised and leveraged millions of IoT devices to disrupt core Internet services such as Twitter, GitHub and Netflix. And in 2018, 154 vulnerabilities affecting IoT devices were published, which represented an increment of 15% compared to 2017 and an increase of 115% compared to 2016. So then we wonder: So why is it hard to secure IoT devices? To answer this question we have to look up how IoT devices work and they are made. Usually when you remove all the plastic and peripherals IoT devices look like this. A board with some chips laying on it. Usually you can find the big chip, the microcontroller which runs the firmware and one or more peripheral controllers which interact with external peripherals such as the motor of, your smart lock or cameras. Though the design is generic, implementations are very diverse. For instance, firmware may run on several different architectures such as ARM, MIPS, x86, PowerPC and so forth. And sometimes they are even proprietary, which means that if a security analyst wants to understand what's going on in the firmware, he'll have a hard time if he doesn't have the vendor specifics. Also, they're operating in environments with limited resources, which means that they run small and optimized code. For instance, vendors might implement their own version of some known algorithm in an optimized way. Also, IoT devices manage external peripherals that often use custom code. Again, with peripherals we mean like cameras, sensors and so forth. The firmware of IoT devices can be either Linux based or a blob firmware, Linux based are by far the most common. A study showed that 86% of firmware are based on Linux and on the other hand, blobs firmware are usually operating systems and user applications packaged in a single binary. In any case, firmware samples are usually made of multiple components. For instance, let's say that you have your smart phone and you send a request to your IoT device. This request will be received by a binary which we term as body binary, which in this example is an webserver. The request will be received, parsed, and then it might be sent to another binary code, the handler binary, which will take the request, work on it, produce an answer, send it back to the webserver, which in turn would produce a response to send to the smartphone. So to come back to the question why is it hard to secure IoT devices? Well, the answer is because IoT devices are in practice very diverse. Of course, there have been various work that have been proposed to analyze and secure firmware for IoT devices. Some of them using static analysis. Others using dynamic analysis and several others using a combination of both. Here I wrote several of them. Again at the end of the presentation there is a bibliography with the title of these works. Of course, all these approaches have some problems. For instance, the current dynamic analysis are hard to apply to scale because of the customized environments that IoT devices work on. Usually when you try to dynamically execute a firmware, it's gonna check if the peripherals are connected and are working properly. In a case where you can't have the peripherals, it's gonna be hard to actually run the firmware. Also current static analysis approaches are based on what we call the single binary approach, which means that binaries from a firmware are taken individually and analysed. This approach might produce many false positives. For instance, so let's say again that we have our two binaries. This is actually an example that we found on one firmware, so the web server will take the user request, will parse the request and produce some data, will set this data to an environment variable and eventually will execute the handle binary. Now, if you see the parsing function contains a string compare which checks if some keyword is present in the request. And if so, it just returns the whole request. Otherwise, it will constrain the size of the request to 128 bytes and return it. The handler binary in turn when spawned will receive the data by doing a getenv on the query string, but also will getenv on another environment variable which in this case is not user controlled and they user cannot influence the content of this variable. Then it's gonna call function process_request. This function eventually will do two string copies. One from the user data, the other one from the log path on two different local variables that are 128 bytes long. Now in the first case, as we have seen before, the data can be greater than 128 bytes and this string copy may result in a bug. While in the second case it will not. Because here we assume that the system handles its own data in a good manner. So throughout this work, we're gonna call the first type of binary, the setter binary, which means that it is the binary that takes the data and set the data for another binary to be consumed. And the second type of binary we called them the getter binary. So the current bug finding tools are inadequate because other bugs are left undiscovered if the analysis only consider those binaries that received network requests or they're likely to produce many false positives if the analysis considers all of them individually. So then we wonder how these different components actually communicate. They communicate through what are called interprocess communication, which basically it's a finite set of paradigms used by binaries to communicate such as files, environment variables, MMIO and so forth. All these pieces are represented by data keys, which are file names, or in the case of the example before here on the right, it's the query string environment variable. Each binary that relies on some shared data must know the endpoint where such data will be available, for instance, again, like a file name or like even a socket endpoint or the environment variable. This means that usually, data keys are coded in the program itself, as we saw before. To find bugs in firmware, in a precise manner, we need to track how user data is introduced and propagated across the different binaries. Okay, let's talk about our work. Before you start talking about Karonte, we define our threat model. We hypotesized that attacker sends arbitrary requests over the network, both LAN and WAN directly to the IoT device. Though we said before that sometimes IoT device can communicate through the clouds, research showed that some form of local communication is usually available, for instance, during the setup phase of the device. Karonte is defined as a static analysis tool that tracks data flow across multiple binaries, to find vulnerabilities. Let's see how it works. So the first step, Karonte find those binaries that introduce the user input into the firmware. We call these border binaries, which are the binaries, that basically interface the device to the outside world. Which in the example is our web server. Then it tracks how a data is shared with other binaries within the firmware sample. Which we'll understand in this example, the web server communicates with the handle binary, and builds what we call the BDG. BDG which stands for binary dependency graph. It's basically a graph representation of the data dependencies among different binaries. Then we detect vulnerabilities that arise from the misuse of the data using the BDG. This is an overview of our system. We start by taking a packed firmware, we unpack it. We find the border binaries. Then we build the binary dependency graph, which relies on a set of CPFs, as we will see soon. CPF stands for Communication Paradigm Finder. Then we find the specifics of the communication, for instance, like the constraints applied to the data that is shared through our module multi-binary data-flow analysis. Eventually we run our insecure interaction detection module, which basically takes all the information and produces alerts. Our system is completely static and relies on our static taint engine. So let's see each one of these steps, more in details. The unpacking procedure is pretty easy, we use the off-the-shelf firmware unpacking tool binwalk. And then we have to find the border binaries. Now we see that border binaries basically are binaries that receive data from the network. And we hypotesize that will contain parsers to validate the data that they received. So in order to find them, we have to find parsers which accept data from network and parse this data. To find parsers we rely on related work, which basically uses a few metrics and define through a number the likelihood for a function to contain parsing capabilities. These metrics that we used are number of basic blocks, number of memory comparison operations and number of branches. Now while these define parsers, we also have to find if a binary takes data from the network. As such, we define two more metrics. The first one, we check if binary contains any network related keywords as SOAP, http and so forth. And then we check if there exists a data flow between read from socket and a memory comparison operation. Once for each function, we got all these metrics, we compute what is called a parsing score, which basically is just a sum of products. Once we got a parsing score for each function in a binary, we represent the binary with its highest parsing score. Once we got that for each binary in the firmware we cluster them using the DBSCAN density based algorithm and consider the cluster with the highest parsing score as containing the set of border binaries. After this, we build the binary dependency graph. Again the binary dependency graph represents the data dependency among the binaries in a firmware sample. For instance, this simple graph will tell us that a binary A communicates with binary C using files and the same binary A communicates with another binary B using environment variables. Let's see how this works. So we start from the identified border binaries and then we taint the data compared against network related keywords that we found and run a static analysis, static taint analysis to detect whether the binary relies on any IPC paradigm to share the data. If we find that it does, we establish if the binary is a setter or a getter, which again means that if the binary is setting the data to be consumed by another binary, or if the binary actually gets the data and consumes it. Then we retrieve the employed data key which in the example before was the keyword QUERY_STRING. And finally we scan the firmware sample to find other binaries that may rely on the same data keys and schedule them for further analysis. To understand whether a binary relies on any IPC, we use what we call CPFs, which again means communication paradigm finder. We design a CPF for each IPC. And the CPFs are also used to find the same data keys within the firmware sample. We also provide Karonte with a generic CPF to cover those cases where the IPC is unknown. Or those cases were the vendor implemented their own versions of some IPC. So for example they don't use the setenv. But they implemented their own setenv. The idea behind this generic CPF that we call the semantic CPF is that data keys has to be used as index to set, or to get some data in this simple example. So let's see how the BDG algorithm works. We start from the body binary, which again will start from the server request and will pass the URI and we see that here. it runs a string comparison against some network related keyword. As such, we taint the variable P. And we see that the variable P is returned from the function to these two different points. As such, we continue. And now we see that data gets tainted and the variable data, it's passed to the function setenv. At this point, the environment CPF will understand that tainted data is passed, is set to an environment variable and will understand that this binary is indeed the setter binary that uses the environment. Then we retrieve the data key QUERY_STRING and we'll search within the firmware sample all the other binaries that rely on the same data key. And it will find that this binary relies on the same data key and will schedule this for further analysis. After this algorithm we build the BDG by creating edges between setters and getters for each data key. The multi binary data flow analysis uses the BDG to find and propagate the data constraints from a setter to a getter. Now, through this we apply only the least three constraints, which means that ideally between two program points, there might be an infinite number of parts and ideally in theory an infinite amount of constraints that we can propagate to the setter binary to the getter binary. But since our goal here is to find bugs, we only propagate the least strict set of constraints. Let's see an example. So again, we have our two binaries and we see that the variable that is passed to the setenv function is data, which comes from two different parts from the parse URI function. In the first case, the data that its passed is unconstrained one in the second case, a line 8 is constrained to be at most 128 bytes. As such, we only propagate the constraints of the first guy. In turn, the getter binary will retrieve this variable from the environment and set the variable query. Oh, sorry. Which in this case will be unconstrained. Insecure interaction detection run a static taint analysis and check whether tainted data can reach a sink in an unsafe way. We consider as sinks memcpy like functions which are functions that implement semantically equivalent memcyp, strcpy and so forth. We raise alert if we see that there is a dereference of a tainted variable and if we see there are comparisons of tainted variables in loop conditions to detect possible DoS vulnerabilities. Let's see an example again. So we got here. We know that our query variable is tainted and it's unconstrained. And then we follow the taint in the function process_request, which we see will eventually copy the data from q to arg. Now we see that arg is 128 bytes long while q is unconstrained and therefore we generate an alert here. Our static taint engine is based on BootStomp and is completely based on symbolic execution, which means that the taint is propagated following the program data flow. Let's see an example. So assuming that we have this code, the first instruction takes the result from some seed function that might return for instance, some user input. And in a symbolic world, what we do is we create a symbolic variable ty and assign to it a tainted variable that we call TAINT_ty, which is the taint target. The next destruction X takes the value ty plus 5 and a symbolic word. We just follow the data flow and x gets assigned TAINT_ty plus 5 which effectively taints also X. If at some point X is overwritten with some constant data, the taint is automatically removed. In its original design, BootStomp, the taint is removed also when data is constrained. For instance, here we can see that the variable n is tainted but then is constrained between two values 0 and 255. And therefore, the taint is removed. In our taint engine we have two additions. We added a path prioritization strategy and we add taint dependencies. The path prioritization strategy valorizes paths that propagate the taint and deprioritizes those that remove it. For instance, say again that some user input comes from some function and the variable user input gets tainted. Gets tainted and then is passed to another function called parse. Here, if you see there are possibly an infinite number of symbolic parts in this while. But only 1 will return tainted data. While the others won't. So the path prioritization strategy valorizes this path instead of the others. This has been implemented by finding basic blocks within a function that return a nonconstant data. And if one is found, we follow its return before considering the others. Taint dependencies allows smart untaint strategies. Let's see again the example. So we know that user input here is tainted, is then parsed and then we see that it's length is checked and stored in a variable n. Its size is checked and if it's higher than 512 bytes, the function will return. Otherwise it copies the data. Now in this case, it might happen that if this strlen function is not analyzed because of some static analysis input decisions, the taint tag of cmd might be different from the taint tag of n and in this case, though, and gets untainted, cmd is not untainted and the strcpy can raise, sorry, carries a false positive. So to fix this problem. Basically we create a dependency between the taint tag of n and the taint tag of cmd. And when n gets untainted, cmd gets untainted as well. So we don't have more false positives. This procedure is automatic and we find functions that implement streamlined semantically equivalent code and create taint tag dependencies. OK. Let's see our evaluation. We ran 3 different evaluations on 2 different data sets. The first one composed by 53 latest firmware samples from seven vendors and a second one 899 firmware gathered from related work. In the first case, we can see that the total number of binaries considered are 8.5k, few more than that. And our system generated 87 alerts of which 51 were found to be true positive and 34 of them were multibinary vulnerabilities, which means that the vulnerability was found by tracking the data flow from the setter to the getter binary. We also ran a comparative evaluation, which basically we tried to measure the effort that an analyst would go through in analyzing firmware using different strategies. In the first one, we consider each and every binary in the firmware sample independently and run the analysis for up to seven days for each firmware. The system generated almost 21000 alerts. Considering only almost 2.5k binaries. In the second case we found the border binaries, the parsers and we statically analyzed only them, and the system generated 9.3k alerts. Notice that in this case, since we don't know how the user input is introduced, like in this experiment, we consider every IPC that we find in the binary as a possible source of user input. And this is true for all of them. In the third case we ran the BDG but we consider each binaries independently. Which means that we don't propagate constraints and we run a static single corner analysis on each one of them. And the system generated almost 15000 alerts. Finally, we run Karonte and the generated alerts were only 74. We also run a larger scale analysis on 899 firmware samples. And we found that almost 40% of them were multi binary, which means that the network functionalities were carried on by more than one binary. And the system generated 1000 alerts. Now, there is a lot going on in this table, like details are on the paper. Here in this presentation I just go through some as I'll motivate. So we found that on average, a firmware contains 4 border binaries. A BDG contains 5 binaries and some BDG have more than 10 binaries. Also, we plot some statistics and we found that 80% of the firmware were analysed within a day, as you can see from the top left figure. However, experiments presented a great variance which we found was due to implementation details. For instance we found that angr would take more than seven hours to build some CFGs. And sometimes they were due to a high number of data keys. Also, we found that the number of paths, as you can see from this second picture from the top, the number of paths do not have an impact on the total time. And as you can see from the bottom two pictures, performance not heavily affected by firmware size. Firmware size here we mean the number of binaries in a firmware sample and the total number of basic blocks. So let's see how to run Karonte. The procedure is pretty straightforward. So first you get a firmware sample. You create a configuration file containing information of the firmware sample and then you run it. So let's see how. So this is an example of a configuration file. It contains the information, but most of them are optional. The only ones that are not are this one: Firmware path, that is the path to your firmware. And this too, the architecture of the firmware and the base address if the firmware is a blob, is a firmware blob. All the other fields are optional. And you can set them if you have some information about the firmware. A detailed explanation of all of these fields are on our GitHub repo. Once you set the configuration file, you can run Karonte. Now we provide a Docker container, you can find the link on our GitHub repo. And I'm gonna run it, but it's not gonna finish because it's gonna take several hours. But all you have to do is merely... *typing noises* just run it on the configuration file and it's gonna do each step that we saw. Eventually I'm going to stop it because it's going to take several hours anyway. Eventually it will produce a result file that... I ran this yesterday so you can see it here. There is a lot going on here. I'm just gonna go through some important like information. So one thing that you can see is that these are the border binaries that Karonte found. Now, there might be some false positives. I'm not sure how many there are here. But as long as there are no false negatives or the number is very low, it's fine. It's good. In this case, wait. Oh, I might have removed something. All right, here, perfect. In this case, this guy httpd is a true positive, which is the web server that we were talking before. Then we have the BDG. In this case, we can see that Karonte found that httpd communicates with two different binaries, fileaccess.cgi and cgibin. Then we have information about the CPFs. For instance, here we can see that. Sorry. So we can see here that httpd has 28 data keys. And that the semantics CPF found 27 of them and then there might be one other here or somewhere that I don't see . Anyway. And then we have a list of alerts. Now, thanks. Now, some of those may be duplicates because of loops, so you can go ahead and inspect all of them manually. But I wrote a utility that you can use, which is basically it's gonna filter out all the loops for you. Now to remember how I called it. This guy? Yeah. And you can see that in total it generated, the system generated 6... 7... 8 alerts. So let's see one of them. Oh, and I recently realized that the path that I'm reporting on the log. It's not the path from the setter binary to the getter binary, to the sink. But it's only related to the getter binary up to the sink. I'm gonna fix this in the next days and report the whole paths. Anyway. So here we can see that the key content type contains user input and it's passed in an unsafe way to the sink address at this address. Now. And the binary in question is called fileaccess.cgi. So we can see what happens there. *keyboard noises* If you see here, we have a string copy that copies the content of haystack to destination, haystack comes basically from this getenv. And if you see destination comes as parameter from this function and return and these and this by for it's as big as 0x68 bytes. And this turned out to be actually a positive. OK. So in summary, we presented a strategy to track data flow across different binaries. We evaluated our system on 952 firmware samples and some takeaways. Analyzing firmware is not easy and vulnerabilities persist. We found out that firmware are made of interconnected components and static analysis can still be used to efficiently find vulnerabilities at scale and finding that communication is key for precision. Here's a list of bibliography that I use throughout the presentation and I'm gonna take questions. *applause* Herald: So thank you, Nilo, for a very interesting talk. If you have questions, we have three microphones one, two and three. If you have a question, please go head to the microphone and we'll take your question. Yes. Microphone number two. Q: Do you rely on imports from libc or something like that or do you have some issues with like statically linked binaries, stripped binaries or is it all semantic analysis of a function? Nilo: So. Okay. We use angr. So for example, if you have an indirect call, we use angr to figure out, what's the target? And to answer your question like if you use libc some CPFs do, for instance, then environment CPF do any checks, if the setenv or getenv functions are called. But also we use the semantic CPF, which basically in cases where information are missing like there is no such thing as libc or some vendors reimplemented their own functions. We use the CPF to actually try to understand the semantics of the function and understand if it's, for example, a custom setenv. Q: Yeah, thanks. Herald: Microphone number three. Q: In embedded environments you often have also that the getter might work on a DMA, some kind of vendor driver on a DMA. Are you considering this? And second part of the question, how would you then distinguish this from your generic IPC? Because I can imagine that they look very similar in the actual code. Nilo: So if I understand correctly your question, you mention a case of MMIO where some data is retrieved directly from some address in memory. So what we found is that these addresses are usually hardcoded somewhere. So the vendor knows that, for example, from this address A to this address B if some data is some data from this peripheral. So when we find that some hardcoded address, like we think that this is like some read from some interesting data. Q: Okay. And this would be also distinguishable from your sort of CPF, the generic CPF would be distinguishable... Nilo: Yeah. Yeah, yeah. Q: ...from a DMA driver by using this fixed address assuming. Nilo: Yeah. That's what the semantic CPF does, among the other things. Q: Okay. Thank you. Nilo: Sure. Herald: Another question for microphone number 3. Q: What's the license for Karonte? Nilo: Sorry? Q: I checked the software license, I checked the git repository and there is no license like at all. Nilo: That is a very good question. I haven't thought about it yet. I will. Herald: Any more questions from here or from the Internet? Okay. Then a big round of applause to Nilo again for your talk. *postroll music* Subtitles created by many many volunteers and the c3subtitles.de team. Join us, and help us!