36C3 - Identifying Multi-Binary Vulnerabilities in Embedded Firmware at Scale

0:00 - 0:10

preroll music
0:10 - 0:25

Herald: Our next speaker for today is a
computer science PhD student at UC Santa
0:25 - 0:31

Barbara. He is a member of the Shellfish
Hacking Team and he's also the organizer
0:31 - 0:36

of the IECTF Hacking Competition. Please
give a big round of applause to Nilo
0:36 - 0:36

Redini.
0:36 - 0:40

applause
0:40 - 0:47

Nilo: Thanks for the introduction, hello
to everyone. My name is Nilo, and today
0:47 - 0:52

I'm going to present you my work Koronte:
identifying multi-binary vulnerabilities
0:52 - 0:56

in embedded firmware at scale. This work
is a co-joint effort between me and
0:56 - 1:02

several of my colleagues at University of
Santa Barbara and ASU. This talk is going
1:02 - 1:08

to be about IoT devices. So before
starting, let's see an overview about IoT
1:08 - 1:14

devices. IoT devices are everywhere. As
the research suggests, they will reach the
1:14 - 1:20

20 billion units by the end of the next
year. And a recent study conducted this
1:20 - 1:26

year in 2019 on 16 million households
showed that more than 70 percent of homes
1:26 - 1:32

in North America already have an IoT
network connected device. IoT devices make
1:32 - 1:38

everyday life smarter. You can literally
say "Alexa, I'm cold" and Alexa will
1:38 - 1:44

interact with the thermostat and increase
the temperature of your room. Usually the
1:44 - 1:50

way we interact with the IoT devices is
through our smartphone. We send a request
1:50 - 1:55

to the local network, to some device,
router or door lock, or we might send the
1:55 - 2:01

same request through a cloud endpoint,
which is usually managed by the vendor of
2:01 - 2:07

the IoT device. Another way is through the
IoT hubs, smartphone will send the request
2:07 - 2:14

to some IoT hub, which in turn will send
the request to some other IoT devices. As
2:14 - 2:19

you can imagine, IoT devices use and
collect our data and some data is more
2:19 - 2:23

sensitive than other. For instance, think
of all the data that is collected by my
2:23 - 2:30

lightbulb or data that is collected by our
security camera. As such, IoT devices can
2:30 - 2:37

compromise people's safety and privacy.
Things, for example, about the security
2:37 - 2:44

implication of a faulty smartlock or the
brakes of your smart car. So the question
2:44 - 2:53

that we asked is: Are IoT devices secure?
Well, like everything else, they are not.
2:53 - 3:01

OK, in 2016 the Mirai botnet compromised
and leveraged millions of IoT devices to
3:01 - 3:07

disrupt core Internet services such as
Twitter, GitHub and Netflix. And in 2018,
3:07 - 3:13

154 vulnerabilities affecting IoT devices
were published, which represented an
3:13 - 3:21

increment of 15% compared to 2017 and an
increase of 115% compared to 2016. So then
3:21 - 3:28

we wonder: So why is it hard to secure IoT
devices? To answer this question we have
3:28 - 3:34

to look up how IoT devices work and they
are made. Usually when you remove all the
3:34 - 3:40

plastic and peripherals IoT devices look
like this. A board with some chips laying
3:40 - 3:46

on it. Usually you can find the big chip,
the microcontroller which runs the
3:46 - 3:51

firmware and one or more peripheral
controllers which interact with external
3:51 - 3:57

peripherals such as the motor of, your
smart lock or cameras. Though the design
3:57 - 4:03

is generic, implementations are very
diverse. For instance, firmware may run on
4:03 - 4:09

several different architectures such as
ARM, MIPS, x86, PowerPC and so forth. And
4:09 - 4:14

sometimes they are even proprietary, which
means that if a security analyst wants to
4:14 - 4:20

understand what's going on in the
firmware, he'll have a hard time if he
4:20 - 4:26

doesn't have the vendor specifics. Also,
they're operating in environments with
4:26 - 4:31

limited resources, which means that they
run small and optimized code. For
4:31 - 4:38

instance, vendors might implement their
own version of some known algorithm in an
4:38 - 4:45

optimized way. Also, IoT devices manage
external peripherals that often use custom
4:45 - 4:51

code. Again, with peripherals we mean like
cameras, sensors and so forth. The
4:51 - 4:57

firmware of IoT devices can be either
Linux based or a blob firmware, Linux
4:57 - 5:03

based are by far the most common. A study
showed that 86% of firmware are based on
5:03 - 5:08

Linux and on the other hand, blobs
firmware are usually operating systems and
5:08 - 5:15

user applications packaged in a single
binary. In any case, firmware samples are
5:15 - 5:20

usually made of multiple components. For
instance, let's say that you have your
5:20 - 5:26

smart phone and you send a request to your
IoT device. This request will be received
5:26 - 5:33

by a binary which we term as body binary,
which in this example is an webserver. The
5:33 - 5:38

request will be received, parsed, and then
it might be sent to another binary code,
5:38 - 5:43

the handler binary, which will take the
request, work on it, produce an answer,
5:43 - 5:48

send it back to the webserver, which in
turn would produce a response to send to
5:48 - 5:54

the smartphone. So to come back to the
question why is it hard to secure IoT
5:54 - 6:01

devices? Well, the answer is because IoT
devices are in practice very diverse. Of
6:01 - 6:06

course, there have been various work that
have been proposed to analyze and secure
6:06 - 6:12

firmware for IoT devices. Some of them
using static analysis. Others using
6:12 - 6:16

dynamic analysis and several others using
a combination of both. Here I wrote
6:16 - 6:20

several of them. Again at the end of the
presentation there is a bibliography with
6:20 - 6:29

the title of these works. Of course, all
these approaches have some problems. For
6:29 - 6:34

instance, the current dynamic analysis are
hard to apply to scale because of the
6:34 - 6:39

customized environments that IoT devices
work on. Usually when you try to
6:39 - 6:45

dynamically execute a firmware, it's gonna
check if the peripherals are connected and
6:45 - 6:50

are working properly. In a case where you
can't have the peripherals, it's gonna be
6:50 - 6:55

hard to actually run the firmware. Also
current static analysis approaches are
6:55 - 7:01

based on what we call the single binary
approach, which means that binaries from a
7:01 - 7:06

firmware are taken individually and
analysed. This approach might produce many
7:06 - 7:12

false positives. For instance, so let's
say again that we have our two binaries.
7:12 - 7:17

This is actually an example that we found
on one firmware, so the web server will
7:17 - 7:23

take the user request, will parse the
request and produce some data, will set
7:23 - 7:27

this data to an environment variable and
eventually will execute the handle binary.
7:27 - 7:34

Now, if you see the parsing function
contains a string compare which checks if
7:34 - 7:38

some keyword is present in the request.
And if so, it just returns the whole
7:38 - 7:44

request. Otherwise, it will constrain the
size of the request to 128 bytes and
7:44 - 7:52

return it. The handler binary in turn when
spawned will receive the data by doing a
7:52 - 7:59

getenv on the query string, but also will
getenv on another environment variable
7:59 - 8:04

which in this case is not user controlled
and they user cannot influence the content
8:04 - 8:10

of this variable. Then it's gonna call
function process_request. This function
8:10 - 8:17

eventually will do two string copies. One
from the user data, the other one from the
8:17 - 8:23

log path on two different local variables
that are 128 bytes long. Now in the first
8:23 - 8:28

case, as we have seen before, the data can
be greater than 128 bytes and this string
8:28 - 8:33

copy may result in a bug. While in the
second case it will not. Because here we
8:33 - 8:41

assume that the system handles its own
data in a good manner. So throughout this
8:41 - 8:46

work, we're gonna call the first type of
binary, the setter binary, which means
8:46 - 8:51

that it is the binary that takes the data
and set the data for another binary to be
8:51 - 8:58

consumed. And the second type of binary we
called them the getter binary. So the
8:58 - 9:02

current bug finding tools are inadequate
because other bugs are left undiscovered
9:02 - 9:08

if the analysis only consider those
binaries that received network requests or
9:08 - 9:13

they're likely to produce many false
positives if the analysis considers all of
9:13 - 9:19

them individually. So then we wonder how
these different components actually
9:19 - 9:23

communicate. They communicate through what
are called interprocess communication,
9:23 - 9:29

which basically it's a finite set of
paradigms used by binaries to communicate
9:29 - 9:37

such as files, environment variables, MMIO
and so forth. All these pieces are
9:37 - 9:42

represented by data keys, which are file
names, or in the case of the example
9:42 - 9:49

before here on the right, it's the query
string environment variable. Each binary
9:49 - 9:53

that relies on some shared data must know
the endpoint where such data will be
9:53 - 9:58

available, for instance, again, like a
file name or like even a socket endpoint
9:58 - 10:03

or the environment variable. This means
that usually, data keys are coded in the
10:03 - 10:11

program itself, as we saw before. To find
bugs in firmware, in a precise manner, we
10:11 - 10:14

need to track how user data is introduced
and propagated across the different
10:14 - 10:23

binaries. Okay, let's talk about our work.
Before you start talking about Karonte, we
10:23 - 10:28

define our threat model. We hypotesized
that attacker sends arbitrary requests
10:28 - 10:33

over the network, both LAN and WAN
directly to the IoT device. Though we said
10:33 - 10:39

before that sometimes IoT device can
communicate through the clouds, research
10:39 - 10:43

showed that some form of local
communication is usually available, for
10:43 - 10:50

instance, during the setup phase of the
device. Karonte is defined as a static
10:50 - 10:54

analysis tool that tracks data flow across
multiple binaries, to find
10:54 - 11:01

vulnerabilities. Let's see how it works.
So the first step, Karonte find those
11:01 - 11:05

binaries that introduce the user input
into the firmware. We call these border
11:05 - 11:09

binaries, which are the binaries, that
basically interface the device to the
11:09 - 11:16

outside world. Which in the example is our
web server. Then it tracks how a data is
11:16 - 11:21

shared with other binaries within the
firmware sample. Which we'll understand in
11:21 - 11:25

this example, the web server communicates
with the handle binary, and builds what we
11:25 - 11:31

call the BDG. BDG which stands for binary
dependency graph. It's basically a graph
11:31 - 11:40

representation of the data dependencies
among different binaries. Then we detect
11:40 - 11:45

vulnerabilities that arise from the misuse
of the data using the BDG. This is an
11:45 - 11:53

overview of our system. We start by taking
a packed firmware, we unpack it. We find
11:53 - 11:59

the border binaries. Then we build the
binary dependency graph, which relies on a
11:59 - 12:05

set of CPFs, as we will see soon. CPF
stands for Communication Paradigm Finder.
12:05 - 12:10

Then we find the specifics of the
communication, for instance, like the
12:10 - 12:16

constraints applied to the data that is
shared through our module multi-binary
12:16 - 12:21

data-flow analysis. Eventually we run our
insecure interaction detection module,
12:21 - 12:26

which basically takes all the information
and produces alerts. Our system is
12:26 - 12:32

completely static and relies on our static
taint engine. So let's see each one of
12:32 - 12:37

these steps, more in details. The
unpacking procedure is pretty easy, we use
12:37 - 12:43

the off-the-shelf firmware unpacking tool
binwalk. And then we have to find the
12:43 - 12:48

border binaries. Now we see that border
binaries basically are binaries that
12:48 - 12:54

receive data from the network. And we
hypotesize that will contain parsers to
12:54 - 12:58

validate the data that they received. So
in order to find them, we have to find
12:58 - 13:04

parsers which accept data from network and
parse this data. To find parsers we rely
13:04 - 13:13

on related work, which basically uses a
few metrics and define through a number
13:13 - 13:18

the likelihood for a function to contain
parsing capabilities. These metrics that
13:18 - 13:22

we used are number of basic blocks, number
of memory comparison operations and number
13:22 - 13:29

of branches. Now while these define
parsers, we also have to find if a binary
13:29 - 13:34

takes data from the network. As such, we
define two more metrics. The first one, we
13:34 - 13:39

check if binary contains any network
related keywords as SOAP, http and so
13:39 - 13:45

forth. And then we check if there exists a
data flow between read from socket and a
13:45 - 13:52

memory comparison operation. Once for each
function, we got all these metrics, we
13:52 - 13:56

compute what is called a parsing score,
which basically is just a sum of products.
13:56 - 14:02

Once we got a parsing score for each
function in a binary, we represent the
14:02 - 14:08

binary with its highest parsing score.
Once we got that for each binary in the
14:08 - 14:14

firmware we cluster them using the DBSCAN
density based algorithm and consider the
14:14 - 14:18

cluster with the highest parsing score as
containing the set of border binaries.
14:18 - 14:26

After this, we build the binary dependency
graph. Again the binary dependency graph
14:26 - 14:30

represents the data dependency among the
binaries in a firmware sample. For
14:30 - 14:35

instance, this simple graph will tell us
that a binary A communicates with binary C
14:35 - 14:41

using files and the same binary A
communicates with another binary B using
14:41 - 14:47

environment variables. Let's see how this
works. So we start from the identified
14:47 - 14:53

border binaries and then we taint the data
compared against network related keywords
14:53 - 14:58

that we found and run a static analysis,
static taint analysis to detect whether
14:58 - 15:05

the binary relies on any IPC paradigm to
share the data. If we find that it does,
15:05 - 15:09

we establish if the binary is a setter or
a getter, which again means that if the
15:09 - 15:13

binary is setting the data to be consumed
by another binary, or if the binary
15:13 - 15:21

actually gets the data and consumes it.
Then we retrieve the employed data key
15:21 - 15:26

which in the example before was the
keyword QUERY_STRING. And finally we scan
15:26 - 15:30

the firmware sample to find other binaries
that may rely on the same data keys and
15:30 - 15:36

schedule them for further analysis. To
understand whether a binary relies on any
15:36 - 15:43

IPC, we use what we call CPFs, which again
means communication paradigm finder. We
15:43 - 15:52

design a CPF for each IPC. And the CPFs
are also used to find the same data keys
15:52 - 15:56

within the firmware sample. We also
provide Karonte with a generic CPF to
15:56 - 16:00

cover those cases where the IPC is
unknown. Or those cases were the vendor
16:00 - 16:06

implemented their own versions of some
IPC. So for example they don't use the
16:06 - 16:13

setenv. But they implemented their own
setenv. The idea behind this generic CPF
16:13 - 16:20

that we call the semantic CPF is that data
keys has to be used as index to set, or to
16:20 - 16:28

get some data in this simple example. So
let's see how the BDG algorithm works. We
16:28 - 16:32

start from the body binary, which again
will start from the server request and
16:32 - 16:38

will pass the URI and we see that here. it
runs a string comparison against some
16:38 - 16:45

network related keyword. As such, we taint
the variable P. And we see that the
16:45 - 16:53

variable P is returned from the function
to these two different points. As such, we
16:53 - 16:57

continue. And now we see that data gets
tainted and the variable data, it's passed
16:57 - 17:02

to the function setenv. At this point, the
environment CPF will understand that
17:02 - 17:08

tainted data is passed, is set to an
environment variable and will understand
17:08 - 17:14

that this binary is indeed the setter
binary that uses the environment. Then we
17:14 - 17:19

retrieve the data key QUERY_STRING and
we'll search within the firmware sample
17:19 - 17:28

all the other binaries that rely on the
same data key. And it will find that this
17:28 - 17:30

binary relies on the same data key and
will schedule this for further analysis.
17:30 - 17:37

After this algorithm we build the BDG by
creating edges between setters and getters
17:37 - 17:45

for each data key. The multi binary data
flow analysis uses the BDG to find and
17:45 - 17:51

propagate the data constraints from a
setter to a getter. Now, through this we
17:51 - 17:57

apply only the least three constraints,
which means that ideally between two
17:57 - 18:03

program points, there might be an infinite
number of parts and ideally in theory an
18:03 - 18:07

infinite amount of constraints that we can
propagate to the setter binary to the
18:07 - 18:12

getter binary. But since our goal here is
to find bugs, we only propagate the least
18:12 - 18:17

strict set of constraints. Let's see an
example. So again, we have our two
18:17 - 18:24

binaries and we see that the variable that
is passed to the setenv function is data,
18:24 - 18:29

which comes from two different parts from
the parse URI function. In the first case,
18:29 - 18:35

the data that its passed is unconstrained
one in the second case, a line 8 is
18:35 - 18:40

constrained to be at most 128 bytes. As
such, we only propagate the constraints of
18:40 - 18:50

the first guy. In turn, the getter binary
will retrieve this variable from the
18:50 - 18:56

environment and set the variable query.
Oh, sorry. Which in this case will be
18:56 - 19:03

unconstrained. Insecure interaction
detection run a static taint analysis and
19:03 - 19:08

check whether tainted data can reach a
sink in an unsafe way. We consider as
19:08 - 19:13

sinks memcpy like functions which are
functions that implement semantically
19:13 - 19:19

equivalent memcyp, strcpy and so forth. We
raise alert if we see that there is a
19:19 - 19:23

dereference of a tainted variable and if
we see there are comparisons of tainted
19:23 - 19:32

variables in loop conditions to detect
possible DoS vulnerabilities. Let's see an
19:32 - 19:37

example again. So we got here. We know
that our query variable is tainted and
19:37 - 19:44

it's unconstrained. And then we follow the
taint in the function process_request,
19:44 - 19:53

which we see will eventually copy the data
from q to arg. Now we see that arg is 128
19:53 - 20:01

bytes long while q is unconstrained and
therefore we generate an alert here. Our
20:01 - 20:05

static taint engine is based on BootStomp
and is completely based on symbolic
20:05 - 20:10

execution, which means that the taint is
propagated following the program data
20:10 - 20:14

flow. Let's see an example. So assuming
that we have this code, the first
20:14 - 20:20

instruction takes the result from some
seed function that might return for
20:20 - 20:26

instance, some user input. And in a
symbolic world, what we do is we create a
20:26 - 20:34

symbolic variable ty and assign to it a
tainted variable that we call TAINT_ty,
20:34 - 20:40

which is the taint target. The next
destruction X takes the value ty plus 5
20:40 - 20:47

and a symbolic word. We just follow the
data flow and x gets assigned TAINT_ty
20:47 - 20:54

plus 5 which effectively taints also X. If
at some point X is overwritten with some
20:54 - 21:01

constant data, the taint is automatically
removed. In its original design,
21:01 - 21:08

BootStomp, the taint is removed also when
data is constrained. For instance, here we
21:08 - 21:12

can see that the variable n is tainted but
then is constrained between two values 0
21:12 - 21:20

and 255. And therefore, the taint is
removed. In our taint engine we have two
21:20 - 21:27

additions. We added a path prioritization
strategy and we add taint dependencies.
21:27 - 21:32

The path prioritization strategy valorizes
paths that propagate the taint and
21:33 - 21:39

deprioritizes those that remove it. For
instance, say again that some user input
21:39 - 21:46

comes from some function and the variable
user input gets tainted. Gets tainted and
21:46 - 21:51

then is passed to another function called
parse. Here, if you see there are possibly
21:51 - 21:58

an infinite number of symbolic parts in
this while. But only 1 will return tainted
21:58 - 22:05

data. While the others won't. So the path
prioritization strategy valorizes this
22:05 - 22:10

path instead of the others. This has been
implemented by finding basic blocks within
22:10 - 22:16

a function that return a nonconstant data.
And if one is found, we follow its return
22:16 - 22:22

before considering the others. Taint
dependencies allows smart untaint
22:22 - 22:26

strategies. Let's see again the example.
So we know that user input here is
22:26 - 22:34

tainted, is then parsed and then we see
that it's length is checked and stored in
22:34 - 22:41

a variable n. Its size is checked and if
it's higher than 512 bytes, the function
22:41 - 22:48

will return. Otherwise it copies the data.
Now in this case, it might happen that if
22:48 - 22:54

this strlen function is not analyzed
because of some static analysis input
22:54 - 23:01

decisions, the taint tag of cmd might be
different from the taint tag of n and in
23:01 - 23:07

this case, though, and gets untainted, cmd
is not untainted and the strcpy can raise,
23:07 - 23:16

sorry, carries a false positive. So to fix
this problem. Basically we create a
23:16 - 23:21

dependency between the taint tag of n and
the taint tag of cmd. And when n gets
23:21 - 23:28

untainted, cmd gets untainted as well. So
we don't have more false positives. This
23:28 - 23:33

procedure is automatic and we find
functions that implement streamlined
23:33 - 23:40

semantically equivalent code and create
taint tag dependencies. OK. Let's see our
23:40 - 23:48

evaluation. We ran 3 different evaluations
on 2 different data sets. The first one
23:48 - 23:55

composed by 53 latest firmware samples
from seven vendors and a second one 899
23:55 - 24:02

firmware gathered from related work. In
the first case, we can see that the total
24:02 - 24:10

number of binaries considered are 8.5k,
few more than that. And our system
24:10 - 24:16

generated 87 alerts of which 51 were found
to be true positive and 34 of them were
24:16 - 24:22

multibinary vulnerabilities, which means
that the vulnerability was found by
24:22 - 24:28

tracking the data flow from the setter to
the getter binary. We also ran a
24:28 - 24:32

comparative evaluation, which basically we
tried to measure the effort that an
24:32 - 24:37

analyst would go through in analyzing
firmware using different strategies. In
24:37 - 24:41

the first one, we consider each and every
binary in the firmware sample
24:41 - 24:49

independently and run the analysis for up
to seven days for each firmware. The
24:49 - 24:57

system generated almost 21000 alerts.
Considering only almost 2.5k binaries. In
24:57 - 25:04

the second case we found the border
binaries, the parsers and we statically
25:04 - 25:11

analyzed only them, and the system
generated 9.3k alerts. Notice that in this
25:11 - 25:16

case, since we don't know how the user
input is introduced, like in this
25:16 - 25:21

experiment, we consider every IPC that we
find in the binary as a possible source of
25:21 - 25:28

user input. And this is true for all of
them. In the third case we ran the BDG but
25:28 - 25:33

we consider each binaries independently.
Which means that we don't propagate
25:33 - 25:38

constraints and we run a static single
corner analysis on each one of them. And
25:38 - 25:46

the system generated almost 15000 alerts.
Finally, we run Karonte and the generated
25:46 - 25:55

alerts were only 74. We also run a larger
scale analysis on 899 firmware samples.
25:55 - 26:01

And we found that almost 40% of them were
multi binary, which means that the network
26:01 - 26:08

functionalities were carried on by more
than one binary. And the system generated
26:08 - 26:17

1000 alerts. Now, there is a lot going on
in this table, like details are on the
26:17 - 26:22

paper. Here in this presentation I just go
through some as I'll motivate. So we found
26:22 - 26:27

that on average, a firmware contains 4
border binaries. A BDG contains 5 binaries
26:27 - 26:34

and some BDG have more than 10 binaries.
Also, we plot some statistics and we found
26:34 - 26:39

that 80% of the firmware were analysed
within a day, as you can see from the top
26:39 - 26:46

left figure. However, experiments
presented a great variance which we found
26:46 - 26:51

was due to implementation details. For
instance we found that angr would take
26:51 - 26:56

more than seven hours to build some CFGs.
And sometimes they were due to a high
26:56 - 27:02

number of data keys. Also, we found that
the number of paths, as you can see from
27:02 - 27:09

this second picture from the top, the
number of paths do not have an impact on
27:09 - 27:15

the total time. And as you can see from
the bottom two pictures, performance not
27:16 - 27:24

heavily affected by firmware size.
Firmware size here we mean the number of
27:24 - 27:30

binaries in a firmware sample and the
total number of basic blocks. So let's see
27:30 - 27:35

how to run Karonte. The procedure is
pretty straightforward. So first you get a
27:35 - 27:39

firmware sample. You create a
configuration file containing information
27:39 - 27:45

of the firmware sample and then you run
it. So let's see how. So this is an
27:45 - 27:51

example of a configuration file. It
contains the information, but most of them
27:51 - 27:55

are optional. The only ones that are not
are this one: Firmware path, that is the
27:55 - 28:00

path to your firmware. And this too, the
architecture of the firmware and the base
28:00 - 28:07

address if the firmware is a blob, is a
firmware blob. All the other fields are
28:07 - 28:12

optional. And you can set them if you have
some information about the firmware. A
28:12 - 28:18

detailed explanation of all of these
fields are on our GitHub repo. Once you
28:18 - 28:24

set the configuration file, you can run
Karonte. Now we provide a Docker
28:24 - 28:29

container, you can find the link on our
GitHub repo. And I'm gonna run it, but
28:29 - 28:41

it's not gonna finish because it's gonna
take several hours. But all you have to do
28:41 - 28:53

is merely... typing noises just run it
on the configuration file and it's gonna
28:53 - 28:58

do each step that we saw. Eventually I'm
going to stop it because it's going to
28:58 - 29:03

take several hours anyway. Eventually it
will produce a result file that... I ran
29:03 - 29:08

this yesterday so you can see it here.
There is a lot going on here. I'm just
29:08 - 29:15

gonna go through some important like
information. So one thing that you can see
29:15 - 29:22

is that these are the border binaries that
Karonte found. Now, there might be some
29:22 - 29:26

false positives. I'm not sure how many
there are here. But as long as there are
29:26 - 29:32

no false negatives or the number is very
low, it's fine. It's good. In this case,
29:32 - 29:39

wait. Oh, I might have removed something.
All right, here, perfect. In this case,
29:39 - 29:45

this guy httpd is a true positive, which
is the web server that we were talking
29:45 - 29:52

before. Then we have the BDG. In this
case, we can see that Karonte found that
29:52 - 30:00

httpd communicates with two different
binaries, fileaccess.cgi and cgibin. Then
30:00 - 30:11

we have information about the CPFs. For
instance, here we can see that. Sorry. So
30:11 - 30:20

we can see here that httpd has 28 data
keys. And that the semantics CPF found 27
30:20 - 30:27

of them and then there might be one other
here or somewhere that I don't see .
30:27 - 30:36

Anyway. And then we have a list of alerts.
Now, thanks. Now, some of those may be
30:36 - 30:44

duplicates because of loops, so you can go
ahead and inspect all of them manually.
30:44 - 30:51

But I wrote a utility that you can use,
which is basically it's gonna filter out
30:51 - 31:02

all the loops for you. Now to remember how
I called it. This guy? Yeah. And you can
31:02 - 31:13

see that in total it generated, the system
generated 6... 7... 8 alerts. So let's see
31:13 - 31:21

one of them. Oh, and I recently realized
that the path that I'm reporting on the
31:21 - 31:26

log. It's not the path from the setter
binary to the getter binary, to the sink.
31:26 - 31:31

But it's only related to the getter binary
up to the sink. I'm gonna fix this in the
31:31 - 31:38

next days and report the whole paths.
Anyway. So here we can see that the key
31:38 - 31:43

content type contains user input and it's
passed in an unsafe way to the sink
31:43 - 31:50

address at this address. Now. And the
binary in question is called
31:50 - 32:02

fileaccess.cgi. So we can see what happens
there. keyboard noises If you see here,
32:02 - 32:12

we have a string copy that copies the
content of haystack to destination,
32:12 - 32:21

haystack comes basically from this getenv.
And if you see destination comes as
32:21 - 32:30

parameter from this function and return
and these and this by for it's as big as
32:30 - 32:39

0x68 bytes. And this turned out to be
actually a positive. OK. So in summary, we
32:39 - 32:47

presented a strategy to track data flow
across different binaries. We evaluated
32:47 - 32:53

our system on 952 firmware samples and
some takeaways. Analyzing firmware is not
32:53 - 32:58

easy and vulnerabilities persist. We found
out that firmware are made of
32:58 - 33:03

interconnected components and static
analysis can still be used to efficiently
33:03 - 33:08

find vulnerabilities at scale and finding
that communication is key for precision.
33:08 - 33:12

Here's a list of bibliography that I use
throughout the presentation and I'm gonna
33:12 - 33:13

take questions.
33:13 - 33:18

applause
33:18 - 33:27

Herald: So thank you, Nilo, for a very
interesting talk. If you have questions,
33:27 - 33:32

we have three microphones one, two and
three. If you have a question, please go
33:32 - 33:38

head to the microphone and we'll take your
question. Yes. Microphone number two.
33:38 - 33:42

Q: Do you rely on imports from libc or
something like that or do you have some
33:42 - 33:47

issues with like statically linked
binaries, stripped binaries or is it all
33:47 - 33:52

semantic analysis of a function?
Nilo: So. Okay. We use angr. So for
33:52 - 33:57

example, if you have an indirect call, we
use angr to figure out, what's the target?
33:57 - 34:03

And to answer your question like if you
use libc some CPFs do, for instance, then
34:03 - 34:08

environment CPF do any checks, if the
setenv or getenv functions are called. But
34:08 - 34:13

also we use the semantic CPF, which
basically in cases where information are
34:13 - 34:18

missing like there is no such thing as
libc or some vendors reimplemented their
34:18 - 34:22

own functions. We use the CPF to actually
try to understand the semantics of the
34:22 - 34:26

function and understand if it's, for
example, a custom setenv.
34:26 - 34:30

Q: Yeah, thanks.
Herald: Microphone number three.
34:30 - 34:37

Q: In embedded environments you often have
also that the getter might work on a DMA,
34:37 - 34:43

some kind of vendor driver on a DMA. Are
you considering this? And second part of
34:43 - 34:48

the question, how would you then
distinguish this from your generic IPC?
34:48 - 34:53

Because I can imagine that they look very
similar in the actual code.
34:53 - 34:59

Nilo: So if I understand correctly your
question, you mention a case of MMIO where
34:59 - 35:04

some data is retrieved directly from some
address in memory. So what we found is
35:04 - 35:08

that these addresses are usually hardcoded
somewhere. So the vendor knows that, for
35:08 - 35:13

example, from this address A to this
address B if some data is some data from
35:13 - 35:19

this peripheral. So when we find that some
hardcoded address, like we think that this
35:19 - 35:22

is like some read from some interesting
data.
35:22 - 35:28

Q: Okay. And this would be also
distinguishable from your sort of CPF, the
35:28 - 35:32

generic CPF would be distinguishable...
Nilo: Yeah. Yeah, yeah.
35:32 - 35:36

Q: ...from a DMA driver by using this
fixed address assuming.
35:36 - 35:40

Nilo: Yeah. That's what the semantic CPF
does, among the other things.
35:40 - 35:41

Q: Okay. Thank you.
Nilo: Sure.
35:41 - 35:44

Herald: Another question for microphone
number 3.
35:44 - 35:46

Q: What's the license for Karonte?
Nilo: Sorry?
35:46 - 35:51

Q: I checked the software license, I
checked the git repository and there is no
35:51 - 35:53

license like at all.
Nilo: That is a very good question. I
35:53 - 36:01

haven't thought about it yet. I will.
Herald: Any more questions from here or
36:01 - 36:04

from the Internet? Okay. Then a big round
of applause to Nilo again for your talk.
36:04 - 36:25

postroll music
36:25 - 36:32

Subtitles created by many many volunteers and
the c3subtitles.de team. Join us, and help us!

Title:: 36C3 - Identifying Multi-Binary Vulnerabilities in Embedded Firmware at Scale
Description:: more » « less
Video Language:: English
Duration:: 36:36

	C3Subtitles edited English subtitles for 36C3 - Identifying Multi-Binary Vulnerabilities in Embedded Firmware at Scale
	C3Subtitles edited English subtitles for 36C3 - Identifying Multi-Binary Vulnerabilities in Embedded Firmware at Scale
	C3Subtitles edited English subtitles for 36C3 - Identifying Multi-Binary Vulnerabilities in Embedded Firmware at Scale
	C3Subtitles edited English subtitles for 36C3 - Identifying Multi-Binary Vulnerabilities in Embedded Firmware at Scale
	C3Subtitles edited English subtitles for 36C3 - Identifying Multi-Binary Vulnerabilities in Embedded Firmware at Scale
	C3Subtitles edited English subtitles for 36C3 - Identifying Multi-Binary Vulnerabilities in Embedded Firmware at Scale
	C3Subtitles edited English subtitles for 36C3 - Identifying Multi-Binary Vulnerabilities in Embedded Firmware at Scale
	C3Subtitles edited English subtitles for 36C3 - Identifying Multi-Binary Vulnerabilities in Embedded Firmware at Scale

Show all

English subtitles

Incomplete

Revisions Compare revisions

Revision 9 Edited

C3Subtitles
Revision 8 Uploaded

C3Subtitles
Revision 7 Uploaded

C3Subtitles
Revision 6 Edited

C3Subtitles
Revision 5 Edited

C3Subtitles
Revision 4 Edited

C3Subtitles
Revision 3 Uploaded

C3Subtitles
Revision 2 Uploaded

C3Subtitles
Revision 1 Edited

C3Subtitles

	Revision Number	Author	Created
	9	C3Subtitles
	8	C3Subtitles
	7	C3Subtitles
	6	C3Subtitles
	5	C3Subtitles
	4	C3Subtitles
	3	C3Subtitles
	2	C3Subtitles
	1	C3Subtitles

36C3 - Identifying Multi-Binary Vulnerabilities in Embedded Firmware at Scale

Revisions Compare revisions

Our website uses cookies

Operating cookies (Required)