0:00:00.000,0:00:09.804 preroll music 0:00:09.804,0:00:24.745 Herald: Our next speaker for today is a[br]computer science PhD student at UC Santa 0:00:24.745,0:00:30.805 Barbara. He is a member of the Shellfish[br]Hacking Team and he's also the organizer 0:00:30.805,0:00:35.816 of the IECTF Hacking Competition. Please[br]give a big round of applause to Nilo 0:00:35.816,0:00:36.228 Redini. 0:00:36.228,0:00:39.510 applause 0:00:39.510,0:00:46.671 Nilo: Thanks for the introduction, hello[br]to everyone. My name is Nilo, and today 0:00:46.671,0:00:52.330 I'm going to present you my work Koronte:[br]identifying multi-binary vulnerabilities 0:00:52.330,0:00:56.486 in embedded firmware at scale. This work[br]is a co-joint effort between me and 0:00:56.486,0:01:02.101 several of my colleagues at University of[br]Santa Barbara and ASU. This talk is going 0:01:02.101,0:01:08.247 to be about IoT devices. So before[br]starting, let's see an overview about IoT 0:01:08.247,0:01:13.904 devices. IoT devices are everywhere. As[br]the research suggests, they will reach the 0:01:13.904,0:01:19.762 20 billion units by the end of the next[br]year. And a recent study conducted this 0:01:19.762,0:01:25.769 year in 2019 on 16 million households[br]showed that more than 70 percent of homes 0:01:25.769,0:01:31.836 in North America already have an IoT[br]network connected device. IoT devices make 0:01:31.836,0:01:37.660 everyday life smarter. You can literally[br]say "Alexa, I'm cold" and Alexa will 0:01:37.660,0:01:43.573 interact with the thermostat and increase[br]the temperature of your room. Usually the 0:01:43.573,0:01:49.610 way we interact with the IoT devices is[br]through our smartphone. We send a request 0:01:49.610,0:01:55.164 to the local network, to some device,[br]router or door lock, or we might send the 0:01:55.164,0:02:01.139 same request through a cloud endpoint,[br]which is usually managed by the vendor of 0:02:01.139,0:02:07.290 the IoT device. Another way is through the[br]IoT hubs, smartphone will send the request 0:02:07.290,0:02:13.663 to some IoT hub, which in turn will send[br]the request to some other IoT devices. As 0:02:13.663,0:02:18.879 you can imagine, IoT devices use and[br]collect our data and some data is more 0:02:18.879,0:02:23.376 sensitive than other. For instance, think[br]of all the data that is collected by my 0:02:23.376,0:02:29.731 lightbulb or data that is collected by our[br]security camera. As such, IoT devices can 0:02:29.731,0:02:37.081 compromise people's safety and privacy.[br]Things, for example, about the security 0:02:37.081,0:02:44.330 implication of a faulty smartlock or the[br]brakes of your smart car. So the question 0:02:44.330,0:02:53.126 that we asked is: Are IoT devices secure?[br]Well, like everything else, they are not. 0:02:53.126,0:03:00.953 OK, in 2016 the Mirai botnet compromised[br]and leveraged millions of IoT devices to 0:03:00.953,0:03:06.965 disrupt core Internet services such as[br]Twitter, GitHub and Netflix. And in 2018, 0:03:06.965,0:03:13.294 154 vulnerabilities affecting IoT devices[br]were published, which represented an 0:03:13.294,0:03:20.915 increment of 15% compared to 2017 and an[br]increase of 115% compared to 2016. So then 0:03:20.915,0:03:27.710 we wonder: So why is it hard to secure IoT[br]devices? To answer this question we have 0:03:27.710,0:03:33.635 to look up how IoT devices work and they[br]are made. Usually when you remove all the 0:03:33.635,0:03:40.415 plastic and peripherals IoT devices look[br]like this. A board with some chips laying 0:03:40.415,0:03:45.604 on it. Usually you can find the big chip,[br]the microcontroller which runs the 0:03:45.604,0:03:50.535 firmware and one or more peripheral[br]controllers which interact with external 0:03:50.535,0:03:57.188 peripherals such as the motor of, your[br]smart lock or cameras. Though the design 0:03:57.188,0:04:03.445 is generic, implementations are very[br]diverse. For instance, firmware may run on 0:04:03.445,0:04:08.775 several different architectures such as[br]ARM, MIPS, x86, PowerPC and so forth. And 0:04:08.775,0:04:14.349 sometimes they are even proprietary, which[br]means that if a security analyst wants to 0:04:14.349,0:04:20.041 understand what's going on in the[br]firmware, he'll have a hard time if he 0:04:20.041,0:04:26.060 doesn't have the vendor specifics. Also,[br]they're operating in environments with 0:04:26.060,0:04:30.563 limited resources, which means that they[br]run small and optimized code. For 0:04:30.563,0:04:38.041 instance, vendors might implement their[br]own version of some known algorithm in an 0:04:38.041,0:04:45.265 optimized way. Also, IoT devices manage[br]external peripherals that often use custom 0:04:45.265,0:04:51.245 code. Again, with peripherals we mean like[br]cameras, sensors and so forth. The 0:04:51.245,0:04:57.479 firmware of IoT devices can be either[br]Linux based or a blob firmware, Linux 0:04:57.479,0:05:03.127 based are by far the most common. A study[br]showed that 86% of firmware are based on 0:05:03.127,0:05:07.900 Linux and on the other hand, blobs[br]firmware are usually operating systems and 0:05:07.900,0:05:15.010 user applications packaged in a single[br]binary. In any case, firmware samples are 0:05:15.010,0:05:20.020 usually made of multiple components. For[br]instance, let's say that you have your 0:05:20.020,0:05:26.410 smart phone and you send a request to your[br]IoT device. This request will be received 0:05:26.410,0:05:33.190 by a binary which we term as body binary,[br]which in this example is an webserver. The 0:05:33.190,0:05:37.990 request will be received, parsed, and then[br]it might be sent to another binary code, 0:05:37.990,0:05:43.150 the handler binary, which will take the[br]request, work on it, produce an answer, 0:05:43.150,0:05:48.130 send it back to the webserver, which in[br]turn would produce a response to send to 0:05:48.130,0:05:54.100 the smartphone. So to come back to the[br]question why is it hard to secure IoT 0:05:54.100,0:06:01.060 devices? Well, the answer is because IoT[br]devices are in practice very diverse. Of 0:06:01.060,0:06:05.890 course, there have been various work that[br]have been proposed to analyze and secure 0:06:05.890,0:06:11.500 firmware for IoT devices. Some of them[br]using static analysis. Others using 0:06:11.500,0:06:15.910 dynamic analysis and several others using[br]a combination of both. Here I wrote 0:06:15.910,0:06:19.690 several of them. Again at the end of the[br]presentation there is a bibliography with 0:06:19.690,0:06:28.990 the title of these works. Of course, all[br]these approaches have some problems. For 0:06:28.990,0:06:33.850 instance, the current dynamic analysis are[br]hard to apply to scale because of the 0:06:33.850,0:06:39.430 customized environments that IoT devices[br]work on. Usually when you try to 0:06:39.430,0:06:45.400 dynamically execute a firmware, it's gonna[br]check if the peripherals are connected and 0:06:45.400,0:06:49.780 are working properly. In a case where you[br]can't have the peripherals, it's gonna be 0:06:49.780,0:06:55.390 hard to actually run the firmware. Also[br]current static analysis approaches are 0:06:55.390,0:07:00.580 based on what we call the single binary[br]approach, which means that binaries from a 0:07:00.580,0:07:05.620 firmware are taken individually and[br]analysed. This approach might produce many 0:07:05.620,0:07:11.530 false positives. For instance, so let's[br]say again that we have our two binaries. 0:07:11.530,0:07:17.320 This is actually an example that we found[br]on one firmware, so the web server will 0:07:17.320,0:07:22.990 take the user request, will parse the[br]request and produce some data, will set 0:07:22.990,0:07:27.430 this data to an environment variable and[br]eventually will execute the handle binary. 0:07:27.430,0:07:33.670 Now, if you see the parsing function[br]contains a string compare which checks if 0:07:33.670,0:07:37.930 some keyword is present in the request.[br]And if so, it just returns the whole 0:07:37.930,0:07:43.780 request. Otherwise, it will constrain the[br]size of the request to 128 bytes and 0:07:43.780,0:07:51.790 return it. The handler binary in turn when[br]spawned will receive the data by doing a 0:07:51.790,0:07:59.380 getenv on the query string, but also will[br]getenv on another environment variable 0:07:59.380,0:08:04.060 which in this case is not user controlled[br]and they user cannot influence the content 0:08:04.060,0:08:10.480 of this variable. Then it's gonna call[br]function process_request. This function 0:08:10.480,0:08:16.690 eventually will do two string copies. One[br]from the user data, the other one from the 0:08:16.690,0:08:22.930 log path on two different local variables[br]that are 128 bytes long. Now in the first 0:08:22.930,0:08:28.360 case, as we have seen before, the data can[br]be greater than 128 bytes and this string 0:08:28.360,0:08:33.460 copy may result in a bug. While in the[br]second case it will not. Because here we 0:08:33.460,0:08:40.810 assume that the system handles its own[br]data in a good manner. So throughout this 0:08:40.810,0:08:45.550 work, we're gonna call the first type of[br]binary, the setter binary, which means 0:08:45.550,0:08:50.530 that it is the binary that takes the data[br]and set the data for another binary to be 0:08:50.530,0:08:57.700 consumed. And the second type of binary we[br]called them the getter binary. So the 0:08:57.700,0:09:01.570 current bug finding tools are inadequate[br]because other bugs are left undiscovered 0:09:01.570,0:09:08.080 if the analysis only consider those[br]binaries that received network requests or 0:09:08.080,0:09:12.750 they're likely to produce many false[br]positives if the analysis considers all of 0:09:12.750,0:09:19.410 them individually. So then we wonder how[br]these different components actually 0:09:19.410,0:09:23.430 communicate. They communicate through what[br]are called interprocess communication, 0:09:23.430,0:09:28.890 which basically it's a finite set of[br]paradigms used by binaries to communicate 0:09:28.890,0:09:36.660 such as files, environment variables, MMIO[br]and so forth. All these pieces are 0:09:36.660,0:09:42.150 represented by data keys, which are file[br]names, or in the case of the example 0:09:42.150,0:09:49.440 before here on the right, it's the query[br]string environment variable. Each binary 0:09:49.440,0:09:53.280 that relies on some shared data must know[br]the endpoint where such data will be 0:09:53.280,0:09:57.540 available, for instance, again, like a[br]file name or like even a socket endpoint 0:09:58.080,0:10:02.910 or the environment variable. This means[br]that usually, data keys are coded in the 0:10:02.910,0:10:10.770 program itself, as we saw before. To find[br]bugs in firmware, in a precise manner, we 0:10:10.770,0:10:14.100 need to track how user data is introduced[br]and propagated across the different 0:10:14.100,0:10:22.680 binaries. Okay, let's talk about our work.[br]Before you start talking about Karonte, we 0:10:22.680,0:10:27.930 define our threat model. We hypotesized[br]that attacker sends arbitrary requests 0:10:27.930,0:10:33.360 over the network, both LAN and WAN[br]directly to the IoT device. Though we said 0:10:33.360,0:10:38.640 before that sometimes IoT device can[br]communicate through the clouds, research 0:10:38.640,0:10:42.690 showed that some form of local[br]communication is usually available, for 0:10:42.690,0:10:50.040 instance, during the setup phase of the[br]device. Karonte is defined as a static 0:10:50.040,0:10:54.270 analysis tool that tracks data flow across[br]multiple binaries, to find 0:10:54.270,0:11:00.690 vulnerabilities. Let's see how it works.[br]So the first step, Karonte find those 0:11:00.690,0:11:04.590 binaries that introduce the user input[br]into the firmware. We call these border 0:11:04.590,0:11:09.180 binaries, which are the binaries, that[br]basically interface the device to the 0:11:09.180,0:11:15.570 outside world. Which in the example is our[br]web server. Then it tracks how a data is 0:11:15.570,0:11:20.760 shared with other binaries within the[br]firmware sample. Which we'll understand in 0:11:20.760,0:11:25.170 this example, the web server communicates[br]with the handle binary, and builds what we 0:11:25.170,0:11:30.630 call the BDG. BDG which stands for binary[br]dependency graph. It's basically a graph 0:11:30.630,0:11:39.720 representation of the data dependencies[br]among different binaries. Then we detect 0:11:39.720,0:11:45.360 vulnerabilities that arise from the misuse[br]of the data using the BDG. This is an 0:11:45.360,0:11:52.650 overview of our system. We start by taking[br]a packed firmware, we unpack it. We find 0:11:52.650,0:11:58.740 the border binaries. Then we build the[br]binary dependency graph, which relies on a 0:11:58.740,0:12:04.800 set of CPFs, as we will see soon. CPF[br]stands for Communication Paradigm Finder. 0:12:04.800,0:12:10.320 Then we find the specifics of the[br]communication, for instance, like the 0:12:10.320,0:12:16.140 constraints applied to the data that is[br]shared through our module multi-binary 0:12:16.140,0:12:20.550 data-flow analysis. Eventually we run our[br]insecure interaction detection module, 0:12:20.550,0:12:26.040 which basically takes all the information[br]and produces alerts. Our system is 0:12:26.040,0:12:32.430 completely static and relies on our static[br]taint engine. So let's see each one of 0:12:32.430,0:12:37.320 these steps, more in details. The[br]unpacking procedure is pretty easy, we use 0:12:37.320,0:12:42.600 the off-the-shelf firmware unpacking tool[br]binwalk. And then we have to find the 0:12:42.600,0:12:47.730 border binaries. Now we see that border[br]binaries basically are binaries that 0:12:47.730,0:12:54.150 receive data from the network. And we[br]hypotesize that will contain parsers to 0:12:54.150,0:12:57.930 validate the data that they received. So[br]in order to find them, we have to find 0:12:57.930,0:13:04.170 parsers which accept data from network and[br]parse this data. To find parsers we rely 0:13:04.170,0:13:12.900 on related work, which basically uses a[br]few metrics and define through a number 0:13:12.900,0:13:18.000 the likelihood for a function to contain[br]parsing capabilities. These metrics that 0:13:18.000,0:13:22.470 we used are number of basic blocks, number[br]of memory comparison operations and number 0:13:22.470,0:13:29.070 of branches. Now while these define[br]parsers, we also have to find if a binary 0:13:29.070,0:13:34.110 takes data from the network. As such, we[br]define two more metrics. The first one, we 0:13:34.110,0:13:39.480 check if binary contains any network[br]related keywords as SOAP, http and so 0:13:39.480,0:13:45.240 forth. And then we check if there exists a[br]data flow between read from socket and a 0:13:45.240,0:13:51.660 memory comparison operation. Once for each[br]function, we got all these metrics, we 0:13:51.660,0:13:56.070 compute what is called a parsing score,[br]which basically is just a sum of products. 0:13:56.070,0:14:01.710 Once we got a parsing score for each[br]function in a binary, we represent the 0:14:01.710,0:14:07.680 binary with its highest parsing score.[br]Once we got that for each binary in the 0:14:07.680,0:14:14.370 firmware we cluster them using the DBSCAN[br]density based algorithm and consider the 0:14:14.370,0:14:18.240 cluster with the highest parsing score as[br]containing the set of border binaries. 0:14:18.240,0:14:25.620 After this, we build the binary dependency[br]graph. Again the binary dependency graph 0:14:25.620,0:14:29.790 represents the data dependency among the[br]binaries in a firmware sample. For 0:14:29.790,0:14:35.430 instance, this simple graph will tell us[br]that a binary A communicates with binary C 0:14:35.430,0:14:40.770 using files and the same binary A[br]communicates with another binary B using 0:14:40.770,0:14:47.310 environment variables. Let's see how this[br]works. So we start from the identified 0:14:47.310,0:14:53.010 border binaries and then we taint the data[br]compared against network related keywords 0:14:53.010,0:14:58.320 that we found and run a static analysis,[br]static taint analysis to detect whether 0:14:58.320,0:15:04.680 the binary relies on any IPC paradigm to[br]share the data. If we find that it does, 0:15:04.680,0:15:09.360 we establish if the binary is a setter or[br]a getter, which again means that if the 0:15:09.360,0:15:13.320 binary is setting the data to be consumed[br]by another binary, or if the binary 0:15:13.320,0:15:20.520 actually gets the data and consumes it.[br]Then we retrieve the employed data key 0:15:20.520,0:15:25.860 which in the example before was the[br]keyword QUERY_STRING. And finally we scan 0:15:25.860,0:15:30.450 the firmware sample to find other binaries[br]that may rely on the same data keys and 0:15:30.450,0:15:35.820 schedule them for further analysis. To[br]understand whether a binary relies on any 0:15:35.820,0:15:42.510 IPC, we use what we call CPFs, which again[br]means communication paradigm finder. We 0:15:42.510,0:15:52.290 design a CPF for each IPC. And the CPFs[br]are also used to find the same data keys 0:15:52.290,0:15:56.280 within the firmware sample. We also[br]provide Karonte with a generic CPF to 0:15:56.280,0:16:00.390 cover those cases where the IPC is[br]unknown. Or those cases were the vendor 0:16:00.390,0:16:06.090 implemented their own versions of some[br]IPC. So for example they don't use the 0:16:06.090,0:16:13.350 setenv. But they implemented their own[br]setenv. The idea behind this generic CPF 0:16:13.350,0:16:19.740 that we call the semantic CPF is that data[br]keys has to be used as index to set, or to 0:16:19.740,0:16:27.870 get some data in this simple example. So[br]let's see how the BDG algorithm works. We 0:16:27.870,0:16:31.890 start from the body binary, which again[br]will start from the server request and 0:16:31.890,0:16:38.250 will pass the URI and we see that here. it[br]runs a string comparison against some 0:16:38.250,0:16:44.850 network related keyword. As such, we taint[br]the variable P. And we see that the 0:16:44.850,0:16:52.800 variable P is returned from the function[br]to these two different points. As such, we 0:16:52.800,0:16:57.180 continue. And now we see that data gets[br]tainted and the variable data, it's passed 0:16:57.180,0:17:02.310 to the function setenv. At this point, the[br]environment CPF will understand that 0:17:02.310,0:17:08.460 tainted data is passed, is set to an[br]environment variable and will understand 0:17:08.460,0:17:13.680 that this binary is indeed the setter[br]binary that uses the environment. Then we 0:17:13.680,0:17:18.540 retrieve the data key QUERY_STRING and[br]we'll search within the firmware sample 0:17:18.540,0:17:28.066 all the other binaries that rely on the[br]same data key. And it will find that this 0:17:28.066,0:17:29.880 binary relies on the same data key and[br]will schedule this for further analysis. 0:17:29.880,0:17:37.020 After this algorithm we build the BDG by[br]creating edges between setters and getters 0:17:37.020,0:17:45.150 for each data key. The multi binary data[br]flow analysis uses the BDG to find and 0:17:45.150,0:17:51.270 propagate the data constraints from a[br]setter to a getter. Now, through this we 0:17:51.270,0:17:56.610 apply only the least three constraints,[br]which means that ideally between two 0:17:56.610,0:18:02.760 program points, there might be an infinite[br]number of parts and ideally in theory an 0:18:02.760,0:18:06.690 infinite amount of constraints that we can[br]propagate to the setter binary to the 0:18:06.690,0:18:11.790 getter binary. But since our goal here is[br]to find bugs, we only propagate the least 0:18:11.790,0:18:17.040 strict set of constraints. Let's see an[br]example. So again, we have our two 0:18:17.040,0:18:24.060 binaries and we see that the variable that[br]is passed to the setenv function is data, 0:18:24.060,0:18:29.490 which comes from two different parts from[br]the parse URI function. In the first case, 0:18:29.490,0:18:35.040 the data that its passed is unconstrained[br]one in the second case, a line 8 is 0:18:35.040,0:18:40.470 constrained to be at most 128 bytes. As[br]such, we only propagate the constraints of 0:18:40.470,0:18:49.980 the first guy. In turn, the getter binary[br]will retrieve this variable from the 0:18:49.980,0:18:55.830 environment and set the variable query.[br]Oh, sorry. Which in this case will be 0:18:55.830,0:19:03.390 unconstrained. Insecure interaction[br]detection run a static taint analysis and 0:19:03.390,0:19:07.650 check whether tainted data can reach a[br]sink in an unsafe way. We consider as 0:19:07.650,0:19:12.660 sinks memcpy like functions which are[br]functions that implement semantically 0:19:12.660,0:19:19.050 equivalent memcyp, strcpy and so forth. We[br]raise alert if we see that there is a 0:19:19.050,0:19:23.100 dereference of a tainted variable and if[br]we see there are comparisons of tainted 0:19:23.100,0:19:31.620 variables in loop conditions to detect[br]possible DoS vulnerabilities. Let's see an 0:19:31.620,0:19:37.260 example again. So we got here. We know[br]that our query variable is tainted and 0:19:37.260,0:19:43.770 it's unconstrained. And then we follow the[br]taint in the function process_request, 0:19:43.770,0:19:52.740 which we see will eventually copy the data[br]from q to arg. Now we see that arg is 128 0:19:52.740,0:20:01.050 bytes long while q is unconstrained and[br]therefore we generate an alert here. Our 0:20:01.050,0:20:04.980 static taint engine is based on BootStomp[br]and is completely based on symbolic 0:20:04.980,0:20:09.750 execution, which means that the taint is[br]propagated following the program data 0:20:09.750,0:20:14.430 flow. Let's see an example. So assuming[br]that we have this code, the first 0:20:14.430,0:20:19.620 instruction takes the result from some[br]seed function that might return for 0:20:19.620,0:20:25.755 instance, some user input. And in a[br]symbolic world, what we do is we create a 0:20:25.755,0:20:33.630 symbolic variable ty and assign to it a[br]tainted variable that we call TAINT_ty, 0:20:33.630,0:20:40.290 which is the taint target. The next[br]destruction X takes the value ty plus 5 0:20:40.290,0:20:46.890 and a symbolic word. We just follow the[br]data flow and x gets assigned TAINT_ty 0:20:46.890,0:20:54.300 plus 5 which effectively taints also X. If[br]at some point X is overwritten with some 0:20:54.300,0:21:00.900 constant data, the taint is automatically[br]removed. In its original design, 0:21:00.900,0:21:07.860 BootStomp, the taint is removed also when[br]data is constrained. For instance, here we 0:21:07.860,0:21:11.880 can see that the variable n is tainted but[br]then is constrained between two values 0 0:21:11.880,0:21:19.770 and 255. And therefore, the taint is[br]removed. In our taint engine we have two 0:21:19.770,0:21:26.610 additions. We added a path prioritization[br]strategy and we add taint dependencies. 0:21:26.610,0:21:32.430 The path prioritization strategy valorizes[br]paths that propagate the taint and 0:21:33.030,0:21:39.030 deprioritizes those that remove it. For[br]instance, say again that some user input 0:21:39.030,0:21:46.110 comes from some function and the variable[br]user input gets tainted. Gets tainted and 0:21:46.110,0:21:51.180 then is passed to another function called[br]parse. Here, if you see there are possibly 0:21:51.180,0:21:57.930 an infinite number of symbolic parts in[br]this while. But only 1 will return tainted 0:21:57.930,0:22:05.490 data. While the others won't. So the path[br]prioritization strategy valorizes this 0:22:05.490,0:22:09.990 path instead of the others. This has been[br]implemented by finding basic blocks within 0:22:09.990,0:22:16.140 a function that return a nonconstant data.[br]And if one is found, we follow its return 0:22:16.140,0:22:21.870 before considering the others. Taint[br]dependencies allows smart untaint 0:22:21.870,0:22:26.310 strategies. Let's see again the example.[br]So we know that user input here is 0:22:26.310,0:22:33.900 tainted, is then parsed and then we see[br]that it's length is checked and stored in 0:22:33.900,0:22:40.755 a variable n. Its size is checked and if[br]it's higher than 512 bytes, the function 0:22:40.755,0:22:48.210 will return. Otherwise it copies the data.[br]Now in this case, it might happen that if 0:22:48.210,0:22:53.535 this strlen function is not analyzed[br]because of some static analysis input 0:22:53.535,0:23:00.780 decisions, the taint tag of cmd might be[br]different from the taint tag of n and in 0:23:00.780,0:23:07.380 this case, though, and gets untainted, cmd[br]is not untainted and the strcpy can raise, 0:23:07.380,0:23:15.540 sorry, carries a false positive. So to fix[br]this problem. Basically we create a 0:23:15.540,0:23:21.360 dependency between the taint tag of n and[br]the taint tag of cmd. And when n gets 0:23:21.360,0:23:28.410 untainted, cmd gets untainted as well. So[br]we don't have more false positives. This 0:23:28.410,0:23:33.330 procedure is automatic and we find[br]functions that implement streamlined 0:23:33.330,0:23:40.140 semantically equivalent code and create[br]taint tag dependencies. OK. Let's see our 0:23:40.140,0:23:48.240 evaluation. We ran 3 different evaluations[br]on 2 different data sets. The first one 0:23:48.240,0:23:55.140 composed by 53 latest firmware samples[br]from seven vendors and a second one 899 0:23:55.140,0:24:02.340 firmware gathered from related work. In[br]the first case, we can see that the total 0:24:02.340,0:24:09.720 number of binaries considered are 8.5k,[br]few more than that. And our system 0:24:09.720,0:24:15.900 generated 87 alerts of which 51 were found[br]to be true positive and 34 of them were 0:24:15.900,0:24:21.960 multibinary vulnerabilities, which means[br]that the vulnerability was found by 0:24:21.960,0:24:27.990 tracking the data flow from the setter to[br]the getter binary. We also ran a 0:24:27.990,0:24:32.010 comparative evaluation, which basically we[br]tried to measure the effort that an 0:24:32.010,0:24:37.260 analyst would go through in analyzing[br]firmware using different strategies. In 0:24:37.260,0:24:41.280 the first one, we consider each and every[br]binary in the firmware sample 0:24:41.280,0:24:49.050 independently and run the analysis for up[br]to seven days for each firmware. The 0:24:49.050,0:24:57.390 system generated almost 21000 alerts.[br]Considering only almost 2.5k binaries. In 0:24:57.390,0:25:04.020 the second case we found the border[br]binaries, the parsers and we statically 0:25:04.020,0:25:11.070 analyzed only them, and the system[br]generated 9.3k alerts. Notice that in this 0:25:11.070,0:25:15.630 case, since we don't know how the user[br]input is introduced, like in this 0:25:15.630,0:25:21.120 experiment, we consider every IPC that we[br]find in the binary as a possible source of 0:25:21.120,0:25:28.470 user input. And this is true for all of[br]them. In the third case we ran the BDG but 0:25:28.470,0:25:33.060 we consider each binaries independently.[br]Which means that we don't propagate 0:25:33.060,0:25:37.800 constraints and we run a static single[br]corner analysis on each one of them. And 0:25:37.800,0:25:45.750 the system generated almost 15000 alerts.[br]Finally, we run Karonte and the generated 0:25:45.750,0:25:55.230 alerts were only 74. We also run a larger[br]scale analysis on 899 firmware samples. 0:25:55.230,0:26:01.380 And we found that almost 40% of them were[br]multi binary, which means that the network 0:26:01.380,0:26:08.220 functionalities were carried on by more[br]than one binary. And the system generated 0:26:08.220,0:26:16.620 1000 alerts. Now, there is a lot going on[br]in this table, like details are on the 0:26:16.620,0:26:21.660 paper. Here in this presentation I just go[br]through some as I'll motivate. So we found 0:26:21.660,0:26:27.360 that on average, a firmware contains 4[br]border binaries. A BDG contains 5 binaries 0:26:27.360,0:26:34.050 and some BDG have more than 10 binaries.[br]Also, we plot some statistics and we found 0:26:34.050,0:26:39.030 that 80% of the firmware were analysed[br]within a day, as you can see from the top 0:26:39.030,0:26:46.350 left figure. However, experiments[br]presented a great variance which we found 0:26:46.350,0:26:51.300 was due to implementation details. For[br]instance we found that angr would take 0:26:51.300,0:26:56.220 more than seven hours to build some CFGs.[br]And sometimes they were due to a high 0:26:56.220,0:27:01.650 number of data keys. Also, we found that[br]the number of paths, as you can see from 0:27:01.650,0:27:09.480 this second picture from the top, the[br]number of paths do not have an impact on 0:27:09.480,0:27:15.030 the total time. And as you can see from[br]the bottom two pictures, performance not 0:27:15.870,0:27:23.610 heavily affected by firmware size.[br]Firmware size here we mean the number of 0:27:23.610,0:27:29.610 binaries in a firmware sample and the[br]total number of basic blocks. So let's see 0:27:29.610,0:27:35.190 how to run Karonte. The procedure is[br]pretty straightforward. So first you get a 0:27:35.190,0:27:38.790 firmware sample. You create a[br]configuration file containing information 0:27:38.790,0:27:45.150 of the firmware sample and then you run[br]it. So let's see how. So this is an 0:27:45.150,0:27:51.450 example of a configuration file. It[br]contains the information, but most of them 0:27:51.450,0:27:55.290 are optional. The only ones that are not[br]are this one: Firmware path, that is the 0:27:55.290,0:28:00.300 path to your firmware. And this too, the[br]architecture of the firmware and the base 0:28:00.300,0:28:07.170 address if the firmware is a blob, is a[br]firmware blob. All the other fields are 0:28:07.170,0:28:12.381 optional. And you can set them if you have[br]some information about the firmware. A 0:28:12.381,0:28:18.330 detailed explanation of all of these[br]fields are on our GitHub repo. Once you 0:28:18.330,0:28:23.981 set the configuration file, you can run[br]Karonte. Now we provide a Docker 0:28:23.981,0:28:28.666 container, you can find the link on our[br]GitHub repo. And I'm gonna run it, but 0:28:28.666,0:28:41.402 it's not gonna finish because it's gonna[br]take several hours. But all you have to do 0:28:41.402,0:28:53.225 is merely... typing noises just run it[br]on the configuration file and it's gonna 0:28:53.225,0:28:57.630 do each step that we saw. Eventually I'm[br]going to stop it because it's going to 0:28:57.630,0:29:02.537 take several hours anyway. Eventually it[br]will produce a result file that... I ran 0:29:02.537,0:29:07.857 this yesterday so you can see it here.[br]There is a lot going on here. I'm just 0:29:07.857,0:29:14.780 gonna go through some important like[br]information. So one thing that you can see 0:29:14.780,0:29:21.923 is that these are the border binaries that[br]Karonte found. Now, there might be some 0:29:21.923,0:29:26.360 false positives. I'm not sure how many[br]there are here. But as long as there are 0:29:26.360,0:29:32.131 no false negatives or the number is very[br]low, it's fine. It's good. In this case, 0:29:32.131,0:29:38.879 wait. Oh, I might have removed something.[br]All right, here, perfect. In this case, 0:29:38.879,0:29:45.444 this guy httpd is a true positive, which[br]is the web server that we were talking 0:29:45.444,0:29:52.185 before. Then we have the BDG. In this[br]case, we can see that Karonte found that 0:29:52.185,0:30:00.252 httpd communicates with two different[br]binaries, fileaccess.cgi and cgibin. Then 0:30:00.252,0:30:10.799 we have information about the CPFs. For[br]instance, here we can see that. Sorry. So 0:30:10.799,0:30:19.775 we can see here that httpd has 28 data[br]keys. And that the semantics CPF found 27 0:30:19.775,0:30:26.823 of them and then there might be one other[br]here or somewhere that I don't see . 0:30:26.823,0:30:35.835 Anyway. And then we have a list of alerts.[br]Now, thanks. Now, some of those may be 0:30:35.835,0:30:44.135 duplicates because of loops, so you can go[br]ahead and inspect all of them manually. 0:30:44.135,0:30:50.982 But I wrote a utility that you can use,[br]which is basically it's gonna filter out 0:30:50.982,0:31:02.100 all the loops for you. Now to remember how[br]I called it. This guy? Yeah. And you can 0:31:02.100,0:31:13.368 see that in total it generated, the system[br]generated 6... 7... 8 alerts. So let's see 0:31:13.368,0:31:20.579 one of them. Oh, and I recently realized[br]that the path that I'm reporting on the 0:31:20.579,0:31:25.970 log. It's not the path from the setter[br]binary to the getter binary, to the sink. 0:31:25.970,0:31:31.426 But it's only related to the getter binary[br]up to the sink. I'm gonna fix this in the 0:31:31.426,0:31:37.552 next days and report the whole paths.[br]Anyway. So here we can see that the key 0:31:37.552,0:31:43.395 content type contains user input and it's[br]passed in an unsafe way to the sink 0:31:43.395,0:31:49.688 address at this address. Now. And the[br]binary in question is called 0:31:49.688,0:32:02.416 fileaccess.cgi. So we can see what happens[br]there. keyboard noises If you see here, 0:32:02.416,0:32:12.480 we have a string copy that copies the[br]content of haystack to destination, 0:32:12.480,0:32:20.751 haystack comes basically from this getenv.[br]And if you see destination comes as 0:32:20.751,0:32:30.001 parameter from this function and return[br]and these and this by for it's as big as 0:32:30.001,0:32:38.895 0x68 bytes. And this turned out to be[br]actually a positive. OK. So in summary, we 0:32:38.895,0:32:46.529 presented a strategy to track data flow[br]across different binaries. We evaluated 0:32:46.529,0:32:52.972 our system on 952 firmware samples and[br]some takeaways. Analyzing firmware is not 0:32:52.972,0:32:58.156 easy and vulnerabilities persist. We found[br]out that firmware are made of 0:32:58.156,0:33:02.660 interconnected components and static[br]analysis can still be used to efficiently 0:33:02.660,0:33:07.730 find vulnerabilities at scale and finding[br]that communication is key for precision. 0:33:07.730,0:33:12.229 Here's a list of bibliography that I use[br]throughout the presentation and I'm gonna 0:33:12.229,0:33:12.956 take questions. 0:33:12.956,0:33:18.431 applause 0:33:18.431,0:33:27.366 Herald: So thank you, Nilo, for a very[br]interesting talk. If you have questions, 0:33:27.366,0:33:32.470 we have three microphones one, two and[br]three. If you have a question, please go 0:33:32.470,0:33:37.684 head to the microphone and we'll take your[br]question. Yes. Microphone number two. 0:33:37.684,0:33:41.995 Q: Do you rely on imports from libc or[br]something like that or do you have some 0:33:41.995,0:33:46.733 issues with like statically linked[br]binaries, stripped binaries or is it all 0:33:46.733,0:33:51.895 semantic analysis of a function?[br]Nilo: So. Okay. We use angr. So for 0:33:51.895,0:33:57.277 example, if you have an indirect call, we[br]use angr to figure out, what's the target? 0:33:57.277,0:34:02.627 And to answer your question like if you[br]use libc some CPFs do, for instance, then 0:34:02.627,0:34:08.313 environment CPF do any checks, if the[br]setenv or getenv functions are called. But 0:34:08.313,0:34:12.873 also we use the semantic CPF, which[br]basically in cases where information are 0:34:12.873,0:34:17.687 missing like there is no such thing as[br]libc or some vendors reimplemented their 0:34:17.687,0:34:21.977 own functions. We use the CPF to actually[br]try to understand the semantics of the 0:34:21.977,0:34:25.888 function and understand if it's, for[br]example, a custom setenv. 0:34:25.888,0:34:29.900 Q: Yeah, thanks.[br]Herald: Microphone number three. 0:34:29.900,0:34:36.905 Q: In embedded environments you often have[br]also that the getter might work on a DMA, 0:34:36.905,0:34:43.233 some kind of vendor driver on a DMA. Are[br]you considering this? And second part of 0:34:43.233,0:34:47.793 the question, how would you then[br]distinguish this from your generic IPC? 0:34:47.793,0:34:52.502 Because I can imagine that they look very[br]similar in the actual code. 0:34:52.502,0:34:58.752 Nilo: So if I understand correctly your[br]question, you mention a case of MMIO where 0:34:58.752,0:35:03.956 some data is retrieved directly from some[br]address in memory. So what we found is 0:35:03.956,0:35:08.434 that these addresses are usually hardcoded[br]somewhere. So the vendor knows that, for 0:35:08.434,0:35:13.280 example, from this address A to this[br]address B if some data is some data from 0:35:13.280,0:35:18.857 this peripheral. So when we find that some[br]hardcoded address, like we think that this 0:35:18.857,0:35:21.688 is like some read from some interesting[br]data. 0:35:21.688,0:35:28.073 Q: Okay. And this would be also[br]distinguishable from your sort of CPF, the 0:35:28.073,0:35:32.180 generic CPF would be distinguishable...[br]Nilo: Yeah. Yeah, yeah. 0:35:32.180,0:35:35.775 Q: ...from a DMA driver by using this[br]fixed address assuming. 0:35:35.775,0:35:39.827 Nilo: Yeah. That's what the semantic CPF[br]does, among the other things. 0:35:39.827,0:35:41.336 Q: Okay. Thank you.[br]Nilo: Sure. 0:35:41.336,0:35:43.856 Herald: Another question for microphone[br]number 3. 0:35:43.856,0:35:46.117 Q: What's the license for Karonte?[br]Nilo: Sorry? 0:35:46.117,0:35:51.130 Q: I checked the software license, I[br]checked the git repository and there is no 0:35:51.130,0:35:53.440 license like at all.[br]Nilo: That is a very good question. I 0:35:53.440,0:36:00.610 haven't thought about it yet. I will.[br]Herald: Any more questions from here or 0:36:00.610,0:36:04.410 from the Internet? Okay. Then a big round[br]of applause to Nilo again for your talk. 0:36:04.410,0:36:24.820 postroll music 0:36:24.820,0:36:31.630 Subtitles created by many many volunteers and[br]the c3subtitles.de team. Join us, and help us! 9:59:59.000,9:59:59.000