*preroll music*
Herald: Our next speaker for today is a
computer science PhD student at UC Santa
Barbara. He is a member of the Shellfish
Hacking Team and he's also the organizer
of the IECTF Hacking Competition. Please
give a big round of applause to Nilo
Redini.
*applause*
Nilo: Thanks for the introduction, hello
to everyone. My name is Nilo, and today
I'm going to present you my work Koronte:
identifying multi-binary vulnerabilities
in embedded firmware at scale. This work
is a co-joint effort between me and
several of my colleagues at University of
Santa Barbara and ASU. This talk is going
to be about IoT devices. So before
starting, let's see an overview about IoT
devices. IoT devices are everywhere. As
the research suggests, they will reach the
20 billion units by the end of the next
year. And a recent study conducted this
year in 2019 on 16 million households
showed that more than 70 percent of homes
in North America already have an IoT
network connected device. IoT devices make
everyday life smarter. You can literally
say "Alexa, I'm cold" and Alexa will
interact with the thermostat and increase
the temperature of your room. Usually the
way we interact with the IoT devices is
through our smartphone. We send a request
to the local network, to some device,
router or door lock, or we might send the
same request through a cloud endpoint,
which is usually managed by the vendor of
the IoT device. Another way is through the
IoT hubs, smartphone will send the request
to some IoT hub, which in turn will send
the request to some other IoT devices. As
you can imagine, IoT devices use and
collect our data and some data is more
sensitive than other. For instance, think
of all the data that is collected by my
lightbulb or data that is collected by our
security camera. As such, IoT devices can
compromise people's safety and privacy.
Things, for example, about the security
implication of a faulty smartlock or the
brakes of your smart car. So the question
that we asked is: Are IoT devices secure?
Well, like everything else, they are not.
OK, in 2016 the Mirai botnet compromised
and leveraged millions of IoT devices to
disrupt core Internet services such as
Twitter, GitHub and Netflix. And in 2018,
154 vulnerabilities affecting IoT devices
were published, which represented an
increment of 15% compared to 2017 and an
increase of 115% compared to 2016. So then
we wonder: So why is it hard to secure IoT
devices? To answer this question we have
to look up how IoT devices work and they
are made. Usually when you remove all the
plastic and peripherals IoT devices look
like this. A board with some chips laying
on it. Usually you can find the big chip,
the microcontroller which runs the
firmware and one or more peripheral
controllers which interact with external
peripherals such as the motor of, your
smart lock or cameras. Though the design
is generic, implementations are very
diverse. For instance, firmware may run on
several different architectures such as
ARM, MIPS, x86, PowerPC and so forth. And
sometimes they are even proprietary, which
means that if a security analyst wants to
understand what's going on in the
firmware, he'll have a hard time if he
doesn't have the vendor specifics. Also,
they're operating in environments with
limited resources, which means that they
run small and optimized code. For
instance, vendors might implement their
own version of some known algorithm in an
optimized way. Also, IoT devices manage
external peripherals that often use custom
code. Again, with peripherals we mean like
cameras, sensors and so forth. The
firmware of IoT devices can be either
Linux based or a blob firmware, Linux
based are by far the most common. A study
showed that 86% of firmware are based on
Linux and on the other hand, blobs
firmware are usually operating systems and
user applications packaged in a single
binary. In any case, firmware samples are
usually made of multiple components. For
instance, let's say that you have your
smart phone and you send a request to your
IoT device. This request will be received
by a binary which we term as body binary,
which in this example is an webserver. The
request will be received, parsed, and then
it might be sent to another binary code,
the handler binary, which will take the
request, work on it, produce an answer,
send it back to the webserver, which in
turn would produce a response to send to
the smartphone. So to come back to the
question why is it hard to secure IoT
devices? Well, the answer is because IoT
devices are in practice very diverse. Of
course, there have been various work that
have been proposed to analyze and secure
firmware for IoT devices. Some of them
using static analysis. Others using
dynamic analysis and several others using
a combination of both. Here I wrote
several of them. Again at the end of the
presentation there is a bibliography with
the title of these works. Of course, all
these approaches have some problems. For
instance, the current dynamic analysis are
hard to apply to scale because of the
customized environments that IoT devices
work on. Usually when you try to
dynamically execute a firmware, it's gonna
check if the peripherals are connected and
are working properly. In a case where you
can't have the peripherals, it's gonna be
hard to actually run the firmware. Also
current static analysis approaches are
based on what we call the single binary
approach, which means that binaries from a
firmware are taken individually and
analysed. This approach might produce many
false positives. For instance, so let's
say again that we have our two binaries.
This is actually an example that we found
on one firmware, so the web server will
take the user request, will parse the
request and produce some data, will set
this data to an environment variable and
eventually will execute the handle binary.
Now, if you see the parsing function
contains a string compare which checks if
some keyword is present in the request.
And if so, it just returns the whole
request. Otherwise, it will constrain the
size of the request to 128 bytes and
return it. The handler binary in turn when
spawned will receive the data by doing a
getenv on the query string, but also will
getenv on another environment variable
which in this case is not user controlled
and they user cannot influence the content
of this variable. Then it's gonna call
function process_request. This function
eventually will do two string copies. One
from the user data, the other one from the
log path on two different local variables
that are 128 bytes long. Now in the first
case, as we have seen before, the data can
be greater than 128 bytes and this string
copy may result in a bug. While in the
second case it will not. Because here we
assume that the system handles its own
data in a good manner. So throughout this
work, we're gonna call the first type of
binary, the setter binary, which means
that it is the binary that takes the data
and set the data for another binary to be
consumed. And the second type of binary we
called them the getter binary. So the
current bug finding tools are inadequate
because other bugs are left undiscovered
if the analysis only consider those
binaries that received network requests or
they're likely to produce many false
positives if the analysis considers all of
them individually. So then we wonder how
these different components actually
communicate. They communicate through what
are called interprocess communication,
which basically it's a finite set of
paradigms used by binaries to communicate
such as files, environment variables, MMIO
and so forth. All these pieces are
represented by data keys, which are file
names, or in the case of the example
before here on the right, it's the query
string environment variable. Each binary
that relies on some shared data must know
the endpoint where such data will be
available, for instance, again, like a
file name or like even a socket endpoint
or the environment variable. This means
that usually, data keys are coded in the
program itself, as we saw before. To find
bugs in firmware, in a precise manner, we
need to track how user data is introduced
and propagated across the different
binaries. Okay, let's talk about our work.
Before you start talking about Karonte, we
define our threat model. We hypotesized
that attacker sends arbitrary requests
over the network, both LAN and WAN
directly to the IoT device. Though we said
before that sometimes IoT device can
communicate through the clouds, research
showed that some form of local
communication is usually available, for
instance, during the setup phase of the
device. Karonte is defined as a static
analysis tool that tracks data flow across
multiple binaries, to find
vulnerabilities. Let's see how it works.
So the first step, Karonte find those
binaries that introduce the user input
into the firmware. We call these border
binaries, which are the binaries, that
basically interface the device to the
outside world. Which in the example is our
web server. Then it tracks how a data is
shared with other binaries within the
firmware sample. Which we'll understand in
this example, the web server communicates
with the handle binary, and builds what we
call the BDG. BDG which stands for binary
dependency graph. It's basically a graph
representation of the data dependencies
among different binaries. Then we detect
vulnerabilities that arise from the misuse
of the data using the BDG. This is an
overview of our system. We start by taking
a packed firmware, we unpack it. We find
the border binaries. Then we build the
binary dependency graph, which relies on a
set of CPFs, as we will see soon. CPF
stands for Communication Paradigm Finder.
Then we find the specifics of the
communication, for instance, like the
constraints applied to the data that is
shared through our module multi-binary
data-flow analysis. Eventually we run our
insecure interaction detection module,
which basically takes all the information
and produces alerts. Our system is
completely static and relies on our static
taint engine. So let's see each one of
these steps, more in details. The
unpacking procedure is pretty easy, we use
the off-the-shelf firmware unpacking tool
binwalk. And then we have to find the
border binaries. Now we see that border
binaries basically are binaries that
receive data from the network. And we
hypotesize that will contain parsers to
validate the data that they received. So
in order to find them, we have to find
parsers which accept data from network and
parse this data. To find parsers we rely
on related work, which basically uses a
few metrics and define through a number
the likelihood for a function to contain
parsing capabilities. These metrics that
we used are number of basic blocks, number
of memory comparison operations and number
of branches. Now while these define
parsers, we also have to find if a binary
takes data from the network. As such, we
define two more metrics. The first one, we
check if binary contains any network
related keywords as SOAP, http and so
forth. And then we check if there exists a
data flow between read from socket and a
memory comparison operation. Once for each
function, we got all these metrics, we
compute what is called a parsing score,
which basically is just a sum of products.
Once we got a parsing score for each
function in a binary, we represent the
binary with its highest parsing score.
Once we got that for each binary in the
firmware we cluster them using the DBSCAN
density based algorithm and consider the
cluster with the highest parsing score as
containing the set of border binaries.
After this, we build the binary dependency
graph. Again the binary dependency graph
represents the data dependency among the
binaries in a firmware sample. For
instance, this simple graph will tell us
that a binary A communicates with binary C
using files and the same binary A
communicates with another binary B using
environment variables. Let's see how this
works. So we start from the identified
border binaries and then we taint the data
compared against network related keywords
that we found and run a static analysis,
static taint analysis to detect whether
the binary relies on any IPC paradigm to
share the data. If we find that it does,
we establish if the binary is a setter or
a getter, which again means that if the
binary is setting the data to be consumed
by another binary, or if the binary
actually gets the data and consumes it.
Then we retrieve the employed data key
which in the example before was the
keyword QUERY_STRING. And finally we scan
the firmware sample to find other binaries
that may rely on the same data keys and
schedule them for further analysis. To
understand whether a binary relies on any
IPC, we use what we call CPFs, which again
means communication paradigm finder. We
design a CPF for each IPC. And the CPFs
are also used to find the same data keys
within the firmware sample. We also
provide Karonte with a generic CPF to
cover those cases where the IPC is
unknown. Or those cases were the vendor
implemented their own versions of some
IPC. So for example they don't use the
setenv. But they implemented their own
setenv. The idea behind this generic CPF
that we call the semantic CPF is that data
keys has to be used as index to set, or to
get some data in this simple example. So
let's see how the BDG algorithm works. We
start from the body binary, which again
will start from the server request and
will pass the URI and we see that here. it
runs a string comparison against some
network related keyword. As such, we taint
the variable P. And we see that the
variable P is returned from the function
to these two different points. As such, we
continue. And now we see that data gets
tainted and the variable data, it's passed
to the function setenv. At this point, the
environment CPF will understand that
tainted data is passed, is set to an
environment variable and will understand
that this binary is indeed the setter
binary that uses the environment. Then we
retrieve the data key QUERY_STRING and
we'll search within the firmware sample
all the other binaries that rely on the
same data key. And it will find that this
binary relies on the same data key and
will schedule this for further analysis.
After this algorithm we build the BDG by
creating edges between setters and getters
for each data key. The multi binary data
flow analysis uses the BDG to find and
propagate the data constraints from a
setter to a getter. Now, through this we
apply only the least three constraints,
which means that ideally between two
program points, there might be an infinite
number of parts and ideally in theory an
infinite amount of constraints that we can
propagate to the setter binary to the
getter binary. But since our goal here is
to find bugs, we only propagate the least
strict set of constraints. Let's see an
example. So again, we have our two
binaries and we see that the variable that
is passed to the setenv function is data,
which comes from two different parts from
the parse URI function. In the first case,
the data that its passed is unconstrained
one in the second case, a line 8 is
constrained to be at most 128 bytes. As
such, we only propagate the constraints of
the first guy. In turn, the getter binary
will retrieve this variable from the
environment and set the variable query.
Oh, sorry. Which in this case will be
unconstrained. Insecure interaction
detection run a static taint analysis and
check whether tainted data can reach a
sink in an unsafe way. We consider as
sinks memcpy like functions which are
functions that implement semantically
equivalent memcyp, strcpy and so forth. We
raise alert if we see that there is a
dereference of a tainted variable and if
we see there are comparisons of tainted
variables in loop conditions to detect
possible DoS vulnerabilities. Let's see an
example again. So we got here. We know
that our query variable is tainted and
it's unconstrained. And then we follow the
taint in the function process_request,
which we see will eventually copy the data
from q to arg. Now we see that arg is 128
bytes long while q is unconstrained and
therefore we generate an alert here. Our
static taint engine is based on BootStomp
and is completely based on symbolic
execution, which means that the taint is
propagated following the program data
flow. Let's see an example. So assuming
that we have this code, the first
instruction takes the result from some
seed function that might return for
instance, some user input. And in a
symbolic world, what we do is we create a
symbolic variable ty and assign to it a
tainted variable that we call TAINT_ty,
which is the taint target. The next
destruction X takes the value ty plus 5
and a symbolic word. We just follow the
data flow and x gets assigned TAINT_ty
plus 5 which effectively taints also X. If
at some point X is overwritten with some
constant data, the taint is automatically
removed. In its original design,
BootStomp, the taint is removed also when
data is constrained. For instance, here we
can see that the variable n is tainted but
then is constrained between two values 0
and 255. And therefore, the taint is
removed. In our taint engine we have two
additions. We added a path prioritization
strategy and we add taint dependencies.
The path prioritization strategy valorizes
paths that propagate the taint and
deprioritizes those that remove it. For
instance, say again that some user input
comes from some function and the variable
user input gets tainted. Gets tainted and
then is passed to another function called
parse. Here, if you see there are possibly
an infinite number of symbolic parts in
this while. But only 1 will return tainted
data. While the others won't. So the path
prioritization strategy valorizes this
path instead of the others. This has been
implemented by finding basic blocks within
a function that return a nonconstant data.
And if one is found, we follow its return
before considering the others. Taint
dependencies allows smart untaint
strategies. Let's see again the example.
So we know that user input here is
tainted, is then parsed and then we see
that it's length is checked and stored in
a variable n. Its size is checked and if
it's higher than 512 bytes, the function
will return. Otherwise it copies the data.
Now in this case, it might happen that if
this strlen function is not analyzed
because of some static analysis input
decisions, the taint tag of cmd might be
different from the taint tag of n and in
this case, though, and gets untainted, cmd
is not untainted and the strcpy can raise,
sorry, carries a false positive. So to fix
this problem. Basically we create a
dependency between the taint tag of n and
the taint tag of cmd. And when n gets
untainted, cmd gets untainted as well. So
we don't have more false positives. This
procedure is automatic and we find
functions that implement streamlined
semantically equivalent code and create
taint tag dependencies. OK. Let's see our
evaluation. We ran 3 different evaluations
on 2 different data sets. The first one
composed by 53 latest firmware samples
from seven vendors and a second one 899
firmware gathered from related work. In
the first case, we can see that the total
number of binaries considered are 8.5k,
few more than that. And our system
generated 87 alerts of which 51 were found
to be true positive and 34 of them were
multibinary vulnerabilities, which means
that the vulnerability was found by
tracking the data flow from the setter to
the getter binary. We also ran a
comparative evaluation, which basically we
tried to measure the effort that an
analyst would go through in analyzing
firmware using different strategies. In
the first one, we consider each and every
binary in the firmware sample
independently and run the analysis for up
to seven days for each firmware. The
system generated almost 21000 alerts.
Considering only almost 2.5k binaries. In
the second case we found the border
binaries, the parsers and we statically
analyzed only them, and the system
generated 9.3k alerts. Notice that in this
case, since we don't know how the user
input is introduced, like in this
experiment, we consider every IPC that we
find in the binary as a possible source of
user input. And this is true for all of
them. In the third case we ran the BDG but
we consider each binaries independently.
Which means that we don't propagate
constraints and we run a static single
corner analysis on each one of them. And
the system generated almost 15000 alerts.
Finally, we run Karonte and the generated
alerts were only 74. We also run a larger
scale analysis on 899 firmware samples.
And we found that almost 40% of them were
multi binary, which means that the network
functionalities were carried on by more
than one binary. And the system generated
1000 alerts. Now, there is a lot going on
in this table, like details are on the
paper. Here in this presentation I just go
through some as I'll motivate. So we found
that on average, a firmware contains 4
border binaries. A BDG contains 5 binaries
and some BDG have more than 10 binaries.
Also, we plot some statistics and we found
that 80% of the firmware were analysed
within a day, as you can see from the top
left figure. However, experiments
presented a great variance which we found
was due to implementation details. For
instance we found that angr would take
more than seven hours to build some CFGs.
And sometimes they were due to a high
number of data keys. Also, we found that
the number of paths, as you can see from
this second picture from the top, the
number of paths do not have an impact on
the total time. And as you can see from
the bottom two pictures, performance not
heavily affected by firmware size.
Firmware size here we mean the number of
binaries in a firmware sample and the
total number of basic blocks. So let's see
how to run Karonte. The procedure is
pretty straightforward. So first you get a
firmware sample. You create a
configuration file containing information
of the firmware sample and then you run
it. So let's see how. So this is an
example of a configuration file. It
contains the information, but most of them
are optional. The only ones that are not
are this one: Firmware path, that is the
path to your firmware. And this too, the
architecture of the firmware and the base
address if the firmware is a blob, is a
firmware blob. All the other fields are
optional. And you can set them if you have
some information about the firmware. A
detailed explanation of all of these
fields are on our GitHub repo. Once you
set the configuration file, you can run
Karonte. Now we provide a Docker
container, you can find the link on our
GitHub repo. And I'm gonna run it, but
it's not gonna finish because it's gonna
take several hours. But all you have to do
is merely... *typing noises* just run it
on the configuration file and it's gonna
do each step that we saw. Eventually I'm
going to stop it because it's going to
take several hours anyway. Eventually it
will produce a result file that... I ran
this yesterday so you can see it here.
There is a lot going on here. I'm just
gonna go through some important like
information. So one thing that you can see
is that these are the border binaries that
Karonte found. Now, there might be some
false positives. I'm not sure how many
there are here. But as long as there are
no false negatives or the number is very
low, it's fine. It's good. In this case,
wait. Oh, I might have removed something.
All right, here, perfect. In this case,
this guy httpd is a true positive, which
is the web server that we were talking
before. Then we have the BDG. In this
case, we can see that Karonte found that
httpd communicates with two different
binaries, fileaccess.cgi and cgibin. Then
we have information about the CPFs. For
instance, here we can see that. Sorry. So
we can see here that httpd has 28 data
keys. And that the semantics CPF found 27
of them and then there might be one other
here or somewhere that I don't see .
Anyway. And then we have a list of alerts.
Now, thanks. Now, some of those may be
duplicates because of loops, so you can go
ahead and inspect all of them manually.
But I wrote a utility that you can use,
which is basically it's gonna filter out
all the loops for you. Now to remember how
I called it. This guy? Yeah. And you can
see that in total it generated, the system
generated 6... 7... 8 alerts. So let's see
one of them. Oh, and I recently realized
that the path that I'm reporting on the
log. It's not the path from the setter
binary to the getter binary, to the sink.
But it's only related to the getter binary
up to the sink. I'm gonna fix this in the
next days and report the whole paths.
Anyway. So here we can see that the key
content type contains user input and it's
passed in an unsafe way to the sink
address at this address. Now. And the
binary in question is called
fileaccess.cgi. So we can see what happens
there. *keyboard noises* If you see here,
we have a string copy that copies the
content of haystack to destination,
haystack comes basically from this getenv.
And if you see destination comes as
parameter from this function and return
and these and this by for it's as big as
0x68 bytes. And this turned out to be
actually a positive. OK. So in summary, we
presented a strategy to track data flow
across different binaries. We evaluated
our system on 952 firmware samples and
some takeaways. Analyzing firmware is not
easy and vulnerabilities persist. We found
out that firmware are made of
interconnected components and static
analysis can still be used to efficiently
find vulnerabilities at scale and finding
that communication is key for precision.
Here's a list of bibliography that I use
throughout the presentation and I'm gonna
take questions.
*applause*
Herald: So thank you, Nilo, for a very
interesting talk. If you have questions,
we have three microphones one, two and
three. If you have a question, please go
head to the microphone and we'll take your
question. Yes. Microphone number two.
Q: Do you rely on imports from libc or
something like that or do you have some
issues with like statically linked
binaries, stripped binaries or is it all
semantic analysis of a function?
Nilo: So. Okay. We use angr. So for
example, if you have an indirect call, we
use angr to figure out, what's the target?
And to answer your question like if you
use libc some CPFs do, for instance, then
environment CPF do any checks, if the
setenv or getenv functions are called. But
also we use the semantic CPF, which
basically in cases where information are
missing like there is no such thing as
libc or some vendors reimplemented their
own functions. We use the CPF to actually
try to understand the semantics of the
function and understand if it's, for
example, a custom setenv.
Q: Yeah, thanks.
Herald: Microphone number three.
Q: In embedded environments you often have
also that the getter might work on a DMA,
some kind of vendor driver on a DMA. Are
you considering this? And second part of
the question, how would you then
distinguish this from your generic IPC?
Because I can imagine that they look very
similar in the actual code.
Nilo: So if I understand correctly your
question, you mention a case of MMIO where
some data is retrieved directly from some
address in memory. So what we found is
that these addresses are usually hardcoded
somewhere. So the vendor knows that, for
example, from this address A to this
address B if some data is some data from
this peripheral. So when we find that some
hardcoded address, like we think that this
is like some read from some interesting
data.
Q: Okay. And this would be also
distinguishable from your sort of CPF, the
generic CPF would be distinguishable...
Nilo: Yeah. Yeah, yeah.
Q: ...from a DMA driver by using this
fixed address assuming.
Nilo: Yeah. That's what the semantic CPF
does, among the other things.
Q: Okay. Thank you.
Nilo: Sure.
Herald: Another question for microphone
number 3.
Q: What's the license for Karonte?
Nilo: Sorry?
Q: I checked the software license, I
checked the git repository and there is no
license like at all.
Nilo: That is a very good question. I
haven't thought about it yet. I will.
Herald: Any more questions from here or
from the Internet? Okay. Then a big round
of applause to Nilo again for your talk.
*postroll music*
Subtitles created by many many volunteers and
the c3subtitles.de team. Join us, and help us!