-
Music
Herald: The next talk is about how risky
-
is software you use. So you may be heard
about Trump versus a Russian security
-
company. We won't judge this, we won't
comment this, but we dislike the
-
prejudgments of this case. Tim Carstens
and Parker Thompson will tell you a little
-
bit more about how risky the software is
you use. Tim Carstens is CITL's Acting
-
Director and Parker Thompson is CITL's
lead engineer. Please welcome with a very,
-
very warm applause: Tim and Parker!
Thanks.
-
Applause
Tim Carstens: Howdy, howdy. So my name is
-
Tim Carstens. I'm the acting director of
the cyber independent testing lab. It's
-
four words there, we'll talk about all for
today, especially cyber. With me today as
-
our lead engineer Parker Thompson. Not on
stage or our other collaborators: Patrick
-
Stach, Sarah Zatko, and present in the
room but not on stage - Mudge. So today
-
we're going to be talking about our work,
the lead in. The introduction that was
-
given is phrased in terms of Kaspersky and
all of that, I'm not gonna be speaking
-
about Kaspersky and I guarantee you I'm
not gonna be speaking about my president.
-
Right, yeah? Okay. Thank you.
Applause
-
All right, so why don't we go ahead and
kick off: I'll mention now parts of this
-
presentation are going to be quite
technical. Not most of it, and I will
-
always include analogies and all these
other things if you are here in security
-
but not a bit-twiddler. But if you do want
to be able to review some of the technical
-
material, if I go through it too fast you
like to read if you're a mathematician or
-
if you are a computer scientist, our sides
are already available for download at this
-
site here. We think our pal our partners
at power door for getting that set up for
-
us. Let's let's get started on the real
material here. Alright, so we are CITL: a
-
nonprofit organization based in the United
States founded by our chief scientist
-
Sarah Zatko and our board chair Mudge. And
our mission is a public good mission - we
-
are hackers but our mission here is
actually to look out for people who do not
-
know very much about machines
or as much as the other hackers do.
-
Specifically, we seek to improve the state
of software security by providing the
-
public with accurate reporting on the
security of popular software, right? And
-
so there was a mouthful for you. But no
doubt, no doubt, every single one of you
-
has received questions of the form: what
do I run on my phone, what do I do with
-
this, what do I do with that, how do I
protect myself - all these other things
-
lots of people in the general public
looking for agency in computing. No one's
-
offering it to them, and so we're trying
to go ahead and provide a forcing function
-
on the software field in order to, you
know, again be able to enable consumers
-
and users and all these things. Our social
good work is funded largely by charitable
-
monies from the Ford Foundation whom we
thank a great deal, but we also have major
-
partnerships with Consumer Reports, which
is a major organization in the United
-
States that generally, broadly, looks at
consumer goods for safety and performance.
-
But also partners with The Digital
Standard, which probably would be of great
-
interest to many people here at Congress
as it is a holistic standard for
-
protecting user rights. We'll talk about
some of the work that goes into those
-
things here in a bit, but first I want to
give the big picture of what it is we're
-
really trying to do in one one short
little sentence. Something like this but
-
for security, right? What are the
important facts, how does it rate, you
-
know, is it easy to consume, is it easy to
go ahead and look and say this thing is
-
good this thing is not good. Something
like this, but for software security.
-
Sounds hard doesn't it? So I want to talk
a little bit about what I mean by
-
something like this.
There are lots of consumer outlook and
-
watchdog and protection groups - some
private, some government, which are
-
looking to do this for various things that
are not a software security. And you can
-
see some examples here that are big in the
United States - I happen to not like these
-
as much as some of the newer consumer
labels coming out from the EU. But
-
nonetheless they are examples of the kinds
of things people have done in other
-
fields, fields that are not security to
try to achieve that same end. And when
-
these things work well, it is for three
reasons: One, it has to contain the
-
relevant information. Two: it has to be
based in fact, we're not talking opinions,
-
this is not a book club or something like
that. And three: it has to be actionable,
-
it has to be actionable - you have to be
able to know how to make a decision based
-
on it. How do you do that for software
security? How do you do that for
-
software security? So the rest of the talk
is going to go in three parts.
-
First, we're going to give a bit of an
overview for more of the consumer facing
-
side of things for that we do: look at
some data that we have reported on early
-
and all these other kinds of good things.
We're then going to go ahead and get
-
terrifyingly, terrifyingly technical. And
then after that we'll talk about tools to
-
actually implement all this stuff. The
technical part comes before the tools. So
-
it just tells you how terrifyingly
technical we're gonna get. It's gonna be
-
fun right. So how do you do this for
software security: a consumer version. So,
-
if you set forth to the task of trying to
measure software security, right, many
-
people here probably do work in the
security field perhaps as consultants
-
doing reviews; certainly I used to. Then
probably what you're thinking to yourself
-
right now is that there are lots and lots
and lots and lots of things that affect
-
the security of a piece of software. Some
of which are, mmm, you're only gonna see
-
them if you go reversing. And some of
which are just you know kicking around on
-
the ground waiting for you to notice,
right. So we're going to talk about both
-
of those kinds of things that you might
measure. But here you see these giant
-
charts that basically go through on the
left - on the left we have Microsoft Excel
-
on OS X on the right Google Chrome for OS
X this is a couple years old at this point
-
maybe one and a half years old but over
here I'm not expecting you to be able to
-
read these - the real point is to say look
at all of the different things you can
-
measure very easily.
How do you distill, it how do you boil it
-
down, right. So this is a the opposite of
a good consumer safety label. This is just
-
um if you ever done any consulting this is
the kind of report you hand a client to
-
tell them how good their software is,
right? It's the opposite of consumer
-
grade. But the reason I'm showing it here
is because, you know, I'm gonna call out
-
some things and maybe you can't process
all of this because it's too much
-
material, you know. But I'm gonna call it
some things and once I call them out just
-
like NP you're gonna recognize them
instantly. So for example, Excel, at the
-
time of this review - look at this column
of dots. What's this dots telling you?
-
It's telling you look at all these
libraries -all of them are 32-bit only.
-
Not 64 bits, not 64 bits. Take a look at
Chrome - exact opposite, exact opposite
-
64-bit binary, right? What are some other
things? Excel, again, on OSX maybe you can
-
see these danger warning signs that go
straight straight up the whole thing.
-
That's the the absence of major heat
protection flags in the binary headers.
-
We'll talk about some what that means
exactly in a bit. But also if you hop over
-
here you'll see like yeah yeah yeah like
Chrome has all the different heat
-
protections that a binary might enable, on
OSX that is, but it also has more dots in
-
this column here off to the right. And
what do those dots represent?
-
Those dots represent functions, functions
that historically have been the source of
-
you know if you call these functions are
very hard to call correctly - if you're a
-
C programmer the "gets" function is a good
example. But there are lots of them. And
-
you can see here the Chrome doesn't mind,
it uses them all a bunch. And Excel not so
-
much. And if you know the history of
Microsoft and the trusted computing
-
initiative and the SDO and all of that you
will know that a very long time ago
-
Microsoft made the decision and they said
we're gonna start purging some of these
-
risky functions from our code bases
because we think it's easier to ban them
-
than teach our devs to use them correctly.
And you see that reverberating out in
-
their software. Google on the other hand
says yeah yeah yeah those functions can be
-
dangerous to use but if you know how to
use them they can be very good and so
-
they're permitted. The point all of this
is building to is that if you start by
-
just measuring every little thing that
like your static analyzers can detect in a
-
piece of software. Two things: one, you
wind up with way more data than you can
-
show in a slide. And two: the engineering
process, the software development life
-
cycle that went into the software will
leave behind artifacts that tell you
-
something about the decisions that went
into designing that engineering process.
-
And so you know, Google for example:
quite rigorous as far as hitting you know
-
GCC dash, and then enable all of the
compiler protections. Microsoft may be
-
less good at that, but much more rigid in
things that's were very popular ideas when
-
they introduced trusted computing,
alright. So the big takeaway from this
-
material is that again the software
engineering process results in artifacts
-
in the software that people can find.
Alright. Ok, so that's that's a whole
-
bunch of data, certainly it's not a
consumer-friendly label. So how do you
-
start to get in towards the consumer zone?
Well, the main defect of the big reports
-
that we just saw is that it's too much
information. It's a very dense on data but
-
it's very hard to distill it to the "so
what" of it, right?
-
And so this here is one of our earlier
attempts to go ahead and do that
-
distillation. What are these charts how
did we come up with these? Well on the
-
previous slide when we saw all these
different factors that you can analyze in
-
software, basically here's whose views
that we arrive at this. For each of those
-
things: pick a weight. Go ahead and
compute a score, average against the
-
weight: tada, now you have some number.
You can do that for each of the libraries
-
and the piece of software. And if you do
that for each of the libraries in the
-
software you can then go ahead and produce
these histograms to show, you know, like
-
this percentage of the DLLs had a score in
this range. Boom, there's a bar, right.
-
How do you pick those weights? We'll talk
about that in a sec - it's very technical.
-
But the the takeaway though, is you know
that you wind up with these charts. Now
-
I've obscured the labels, I've obscured
the labels and the reason I've done that
-
is because I don't really care that much
about the actual counts. I want to talk
-
about the shapes, the shapes of these
charts: it's a qualitative thing.
-
So here: good scores appear on the right,
bad scores appear on the left. The
-
histogram measuring all the libraries and
components and so a very secure piece of
-
software in this model manifests as a tall
bar far to the right. And you can see a
-
clear example at in our custom Gentoo
build. Anyone here is a Gentoo fan knows -
-
hey I'm going to install this thing, I
think I'm going to go ahead and turn on
-
every single one of those flags, and lo
and behold if you do that yeah you wind up
-
with tall bar far to the right. Here's in
Ubuntu 16, I bet it's 16.04 but I don't
-
recall exactly, 16 LTS. Here you see a lot
of tall bars to the right - not quite as
-
consolidated as a custom Gentoo build, but
that makes sense doesn't it right? Because
-
then you know you don't do your whole
Ubuntu build. Now I want to contrast. I
-
want to contrast. So over here on the
right we see in the same model, an
-
analysis of the firmware obtained from two
smart televisions. Last year's models from
-
Samsung and LG. And here the model
numbers. We did this work in concert with
-
Consumer Reports. And what do you notice
about these histograms, right. Are the
-
bars tall and to the right? No, they look
almost normal, not quite, but that doesn't
-
really matter. The main thing that matters
is that this is the shape you would expect
-
to get if you were playing a random game
basically to decide what security features
-
to enable in your software. This is the
shape of not having a security program, is
-
my bet. That's my bet. And so what do you
see? You see heavy concentration here in
-
the middle, right, that seems fair, and
like it tails off. On the Samsung nothing
-
scored all that great, same on the LG.
Both of them are you know running their
-
respective operating systems and they're
basically just inheriting whatever
-
security came from whatever open source
thing they forked, right.
-
So this is this is the kind of message,
this right here is the kind of thing that
-
we serve to exist for. This is us
producing charts showing that the current
-
practices in the not-so consumer-friendly
space of running your own Linux distros
-
far exceed the products being delivered,
certainly in this case in the smart TV
-
market. But I think you might agree with
me, it's much worse than this. So let's
-
dig into that a little bit more, I have a
different point that I want to make about
-
that same data set - so this table here
this table is again looking at the LG
-
Samsung and Gentoo Linux installations.
And on this table we're just pulling out
-
some of the easy to identify security
features you might enable in a binary
-
right. So percentage of binaries with
address space layout randomization, right?
-
Let's talk about that on our Gentoo build
it's over 99%. That also holds for the
-
Amazon Linux AMI - it holds in Ubuntu.
ASLR is incredibly common in modern Linux.
-
And despite that, fewer than 70 percent of
the binaries on the LG television had it
-
enabled. Fewer than 70 percent. And the
Samsung was doing, you know, better than
-
that I guess, but 80 percent is a pretty
disappointing when a default Linux
-
install, you know, mainstream Linux distro
is going to get you 99, right? And it only
-
gets worse, it only gets worse right you
know?
-
RELRO support, if you don't know what that
is that's ok but if you do, look abysmal
-
coverage look at this abysmal coverage
coming out of these IOT devices very sad.
-
And you see it over and over and over
again. I'm showing this because some
-
people in this room or watching this video
ship software - and I have a message, I
-
have a message to those people who ship
software who aren't working on say Chrome
-
or any of the other big-name Pwn2Own kinds
of targets. Look at this: you can be
-
leading the pack by mastering the
fundamentals. You can be leading the pack
-
by mastering the fundamentals. This is a
point that really as a security field we
-
really need to be driving home. You know,
one of the things that we're seeing here
-
in our data is that if you're the vendor
who is shipping the product everyone has
-
heard of in the security field and maybe
your game is pretty decent right? If
-
you're shipping say Windows or if you're
shipping Firefox or whatever. But if
-
you're if you're doing one of these things
where people are just kind of beating you
-
up for default passwords, then your
problems are way further than just default
-
passwords, right? Like the house, the
house is messy it needs to be cleaned,
-
needs to be cleaned. So the rest of the
talk like I said we're going to be
-
discussing a lot of other things that
amount to getting you know a peek behind
-
the curtain and where some of these things
come from and getting very specific about
-
how this business works, but if you're
interested in more of the high level
-
material - especially if you're interested
in interesting results and insights, some
-
of which I'm going to have here later. But
I really encourage you though to take a
-
look at the talk from this past summer by
our chief scientist Sarah Zatko, which is
-
predominantly on the topic of surprising
results in the data.
-
Today, though, this being our first time
presenting here in Europe, we figured we
-
would take more of an overarching kind of
view. What we're doing and why we're
-
excited about it and where it's headed. So
we're about to move into a little bit of
-
the underlying theory, you know. Why do I
think it's reasonable to even try to
-
measure the security of software from a
technical perspective. But before we can
-
get into that I need to talk a little bit
about our goals, so that the decisions and
-
the theory; the motivation is clear,
right. Our goals are really simple: it's a
-
very easy organization to run because of
that. Goal number one: remain independent
-
of vendor influence. We are not the first
organization to purport to be looking out
-
for the consumer. But unlike many of our
predecessors, we are not taking money from
-
the people we review, right? Seems like
some basic stuff. Seems like some basic
-
stuff right? Thank you, okay.
Two: automated, comparable, quantitative
-
analysis. Why automated? Well, we need our
test results to be reproducible. And Tim
-
goes in opens up your software in IDA and
finds a bunch of stuff that makes them all
-
stoped - that's not a very repeatable kind
of a standard for things. And so we're
-
interested in things which are automated.
We'll talk about, maybe a few hackers in
-
here know how hard that is. We'll talk
about that, but then last we also we're
-
well acting as a watchdog - we're
protecting the interests of the user, the
-
consumer, however you would like to look
at it. But we also have three non goals,
-
three non goals that are equally
important. One: we have a non goal of
-
finding and disclosing vulnerabilities. I
reserve the right to find and disclose
-
vulnerabilities. But that's not my goal,
it's not my goal. Another non goal is to
-
tell software vendors what to do. If a
vendor asks me how to remediate their
-
terrible score, I will tell them what we
are measuring but I'm not there to help
-
them remediate. It's on them to be able to
ship a secure product without me holding
-
their hand. We'll see. And then three:
non-goal, perform free security testing
-
for vendors. Our testing happens after you
release. Because when you release your
-
software you are telling people it is
ready to be used. Is it really though, is
-
it really though, right?
Applause
-
Yeah, thank you. Yeah, so we are not there
to give you a preview of what your score
-
will be. There is no sum of money you can
hand me that will get you an early preview
-
of what your score is - you can try me,
you can try me: there's a fee for trying
-
me. There's a fee for trying me. But I'm
not gonna look at your stuff until I'm
-
ready to drop it, right. Yeah bitte, yeah.
All right. So moving into this theory
-
territory. Three big questions, three big
questions that need to be addressed if you
-
want to do our work efficiently: what
works, what works for improving security,
-
what are the things that need or that you
really want to see in software. Two: how
-
do you recognize when it's being done?
It's no good if someone hands you a piece
-
of software and says, "I've done all the
latest things" and it's a complete black
-
box. If you can't check the claim, the
claim is as good as false, in practical
-
terms, period, right. Software has to be
reviewable or a priori, I'll think you're
-
full of it. And then three: who's doing it
- of all the things that work, that you
-
can recognize, who's actually doing it.
You know, let's go ahead - our field is
-
famous for ruining people's holidays and
weekends over Friday bug disclosures, you
-
know New Year's Eve bug disclosures. I
would like us to also be famous for
-
calling out those teams and those software
organizations which are being as good as
-
the bad guys are being bad, yeah? So
provide someone an incentive to be maybe
-
happy to see us for a change, right. Okay,
so thank you. Yeah, all right. So how do
-
we actually pull these things off; the
basic idea. So, I'm going to get into some
-
deeper theory: if you're not a theorist I
want you to focus on this slide.
-
And I'm gonna bring it back, it's not all
theory from here on out after this but if
-
you're not a theorist I really want you to
focus on this slide. The basic motivation,
-
the basic motivation behind what we're
doing; the technical motivation - why we
-
think that it's possible to measure and
report on security. It all boils down to
-
this right. So we start with a thought
experiment, a gedankent, right? Given a
-
piece of software we can ask: overall, how
secure is it? Kind of a vague question but
-
you could imagine you know there's
versions of that question. And two: what
-
are its vulnerabilities. Maybe you want to
nitpick with me about what the word
-
vulnerability means but broadly you know
this is a much more specific question
-
right. And here's here's the enticing
thing: the first question appears to ask
-
for less information than the second
question. And maybe if we were taking bets
-
I would put my money on, yes, it actually
does ask for less information. What do I
-
mean by that what do I mean by that? Well,
let's say that someone told you all of the
-
vulnerabilities in a system right? They
said, "Hey, I got them all", right? You're
-
like all right that's cool, that's cool.
And if someone asks you hey how secure is
-
this system you can give them a very
precise answer. You can say it has N
-
vulnerabilities, and they're of this kind
and like all this stuff right so certainly
-
the second question is enough to answer
the first. But, is the reverse true?
-
Namely, if someone were to tell you, for
example, "hey, this piece of software has
-
exactly 32 vulnerabilities in it." Does
that make it easier to find any of them?
-
Right, there's room for to maybe do that
using some algorithms that are not yet in
-
existence.
Certainly the computer scientists in here
-
are saying, "well, you know, yeah maybe
counting the number of SAT solutions
-
doesn't help you practically find
solutions. But it might and we just don't
-
know." Okay fine fine fine. Maybe these
things are the same, but the my experience
-
in security, and the experience of many
others perhaps is that they probably
-
aren't the same question. And this
motivates what I'm calling here is Zatko's
-
question, which is basically asking for an
algorithm that demonstrates that the first
-
question is easier than the second
question, right. So Zatko's question:
-
develop a heuristic which can to
efficiently answer one, but not
-
necessarily two. If you're looking for a
metaphor, if you want to know why I care
-
about this distinction, I want you to
think about some certain controversial
-
technologies: maybe think about say
nuclear technology, right. An algorithm
-
that answers one, but not two, it's a very
safe algorithm to publish. Very safe
-
algorithm publish indeed. Okay, Claude
Shannon would like more information. happy
-
to oblige. Let's take a look at this
question from a different perspective
-
maybe a more hands-on perspective: the
hacker perspective, right? If you're a
-
hacker and you're watching me up here and
I'm waving my hands around and I'm showing
-
you charts maybe you're thinking to
yourself yeah boy, what do you got? Right,
-
how does this actually go. And maybe what
you're thinking to yourself is that, you
-
know, finding good vulns: that's an
artisan craft right? You're in IDA, you
-
know you're reversing old way you're doing
all these things or hit and Comm, I don't
-
know all that stuff. And like, you know,
this kind of clever game; cleverness is
-
not like this thing that feels very
automatable. But you know on the other
-
hand there are a lot of tools that do
automate things and so it's not completely
-
not automatable.
And if you're into fuzzing then perhaps
-
you are aware of this very simple
observation, which is that if your harness
-
is perfect if you really know what you're
doing if you have a decent fuzzer then in
-
principle fuzzing can find every single
problem. You have to be able to look for
-
it you have to be able harness for it but
in principle it will, right. So the hacker
-
perspective on Zatko's question is maybe
of two minds on the one hand assessing
-
security is a game of cleverness, but on
the other hand we're kind of right now at
-
the cusp of having some game-changing tech
really go - maybe you're saying like
-
fuzzing is not at the cusp, I promise it's
just at the cusp. We haven't seen all the
-
fuzzing has to offer right and so maybe
there's room maybe there's room for some
-
automation to be possible in pursuit of
Zatko's question. Of course, there are
-
many challenges still in, you know, using
existing hacker technology. Mostly of the
-
form of various open questions. For
example if you're into fuzzing, you know,
-
hey: identifying unique crashes. There's
an open question. We'll talk about some of
-
those, we'll talk about some of those. But
I'm going to offer another perspective
-
here: so maybe you're not in the business
of doing software reviews but you know a
-
little computer science. And maybe that
computer science has you wondering what's
-
this guy talking about, right. I'm here to
acknowledge that. So whatever you think
-
the word security means: I've got a list
of questions up here. Whatever you think
-
the word security means, probably, some of
these questions are relevant to your
-
definition. Right.
Does the software have a hidden backdoor
-
or any kind of hidden functionality, does
it handle crypto material correctly, etc,
-
so forth. Anyone in here who knows some
computers abilities theory knows that
-
every single one of these questions and
many others like them are undecidable due
-
to reasons essentially no different than
the reason the halting problem is
-
undecidable,\ which is to say due to
reasons essentially first identified and
-
studied by Alan Turing a long time before
we had microarchitectures and all these
-
other things. And so, the computability
perspective says that, you know, whatever
-
your definition of security is ultimately
you have this recognizability problem:
-
fancy way of saying that algorithms won't
be able to recognize secure software
-
because of the undecidability these
issues. The takeaway, the takeaway is that
-
the computability angle on all of this
says: anyone who's in the business that
-
we're in has to use heuristics. You have
to, you have to.
-
All right, this guy gets it. All right, so
on the tech side our last technical
-
perspective that we're going to take now
is certainly the most abstract which is
-
the Bayesian perspective, right. So if
you're a frequentist, you need to get with
-
the times you know it's everything
Bayesian now. So, let's talk about this
-
for a bit. Only two slides of math, I
promise, only two! So, let's say that I
-
have some corpus of software. Perhaps it's
a collection of all modern browsers,
-
perhaps it's the collection of all the
packages in the Debian repository, perhaps
-
it's everything on github that builds on
this system, perhaps it's a hard drive
-
full of warez that some guy mailed you,
right? You have some corpus of software
-
and for a random program in that corpus we
can consider this probability: the
-
probability distribution of which software
is secure versus which is not. For reasons
-
described on the computability
perspective, this number is not a
-
computable number for any reasonable
definition of security. So that's a neat
-
and so, for practical terms, if you want
to do some probabilistic reasoning, you
-
need some surrogate for that and so we
consider this here. So, instead of
-
considering the probability that a piece
of software is secure, a non computable
-
non verifiable claim, we take a look here
at this indexed collection of
-
probabilities. This is an infinite
countable family of probability
-
distributions, basically P sub h,k is just
the probability that for a random piece of
-
software in the corpus, h work units of
fuzzing will find no more than k unique
-
crashes, right. And why is this relevant?
Well, at the bottom we have this analytic
-
observation, which is that in the limit as
h goes to infinity you're basically
-
saying: "Hey, you know, if I fuzz this
thing for infinity times, you know, what's
-
that look like?" And, essentially, here we
have analytically that this should
-
converge. The P sub h,1 should converge to
the probability that a piece of software
-
just simply cannot be made to crash. Not
the same thing as being secure, but
-
certainly not a small concern relevant to
security. So, none of that stuff actually
-
was Bayesian yet, so we need to get there.
And so here we go, right: so, the previous
-
slide described a probability distribution
measured based on fuzzing. But fuzzing is
-
expensive and it is also not an answer to
Zatko's question because it finds
-
vulnerabilities, it doesn't measure
security in the general sense and so
-
here's where we make the jump to
conditional probabilities: Let M be some
-
observable property of software has ASLR,
has RELRO, calls these functions, doesn't
-
call those functions... take your pick.
For random s in S we now consider these
-
conditional probability distributions and
this is the same kind of probability as we
-
had on the previous slide but conditioned
on this observable being true, and this
-
leads to the refined of the Siddall
variant of Zatko's question:
-
Which observable properties of software
satisfy that, when the software has
-
property m, the probability of fuzzing
being hard is very high? That's what this
-
version of this question phrases, and here
we say, you know, in large log(h)/k, in
-
other words: exponentially more fuzzing
than you expect to find bugs. So this is
-
the technical version of what we're after.
All of this can be explored, you can
-
brute-force your way to finding all of
this stuff, and that's exactly what we're
-
doing. So we're looking for all kinds of
things, we're looking for all kinds of
-
things that correlate with fuzzing having
low yield on a piece of software, and
-
there's a lot of ways in which that can
happen. It could be that you are looking
-
at a feature of software that literally
prevents crashes. Maybe it's the never
-
crash flag, I don't know. But most of the
things I've talked about, ASLR, RERO, etc.
-
don't prevent crashes. In fact a ASLR can
take non-crashing programs and make them
-
crashing. It's the number one reason
vendors don't enable it, right? So why am
-
I talking about ASLR? Why am I talking
about RELRO? Why am i talking about all
-
these things that have nothing to do with
stopping crashes and I'm claiming I'm
-
measuring crashes? This is because, in the
Bayesian perspective, correlation is not
-
the same thing as causation, right?
Correlation is not the same thing as
-
causation. It could be that M's presence
literally prevents crashes, but it could
-
also be that, by some underlying
coincidence, the things we're looking for
-
are mostly only found in software that's
robust against crashing.
-
If you're looking for security, I submit
to you that the difference doesn't matter.
-
Okay, end of my math, danke. I will now go
ahead and do this like really nice analogy
-
of all those things that I just described,
right. So we're looking for indicators of
-
a piece of software being secure enough to
be good for consumers, right. So here's an
-
analogy. Let's say you're a geologist, you
study minerals and all of that and you're
-
looking for diamonds. Who isn't, right?
Want those diamonds! And like how do you
-
find diamonds? Even in places that are
rich in diamonds, diamonds are not common.
-
You don't just go walking around in your
boots, kicking until your toe stubs on a
-
diamond? You don't do that. Instead you
look for other minerals that are mostly
-
only found near diamonds but are much more
abundant in those locations than the
-
diamonds. So, this is mineral science 101,
I guess, I don't know. So, for example,
-
you want to go find diamond: put on your
boots and go kicking until you find some
-
chromite, look for some diopside, you
know, look for some garnet. None of these
-
things turn into diamonds, none of these
things cause diamonds but if you're
-
finding good concentrations of these
things, then, statistically, there's
-
probably diamonds nearby. That's what
we're doing. We're not looking for the
-
things that cause good security per se.
Rather, we're looking for the indicators
-
that you have put the effort into your
software, right? How's that working out
-
for us? How's that working out for us?
Well, we're still doing studies. It's, you
-
know, early to say exactly but we do have
the following interesting coincidence: and
-
so, here presented I have a collection of
prices that somebody gave much for so-
-
called the underground exploits. And I can
tell you these prices are maybe a little
-
low these days but if you work in that
business, if you go to Cyscin, if you do
-
that kind of stuff, maybe you know that
this is ballpark, it's ballpark.
-
Alright, and, just a coincidence, maybe it
means we're on the right track, I don't
-
know, but it's an encouraging sign: When
we run these programs through our
-
analysis, our rankings more or less
correspond to the actual prices that you
-
encounter in the wild for access via these
applications. Up above, I have one of our
-
histogram charts. You can see here that
Chrome and Edge in this particular model
-
scored very close to the same and it's a
test model, so, let's say they're
-
basically the same.
Firefox, you know, behind there a little
-
bit. I don't have Safari on this chart
because - this or all Windows applications
-
- but the Safari score falls in between.
So, lots of theory, lots of theory, lots
-
of theory and then we have this. So, we're
going to go ahead now and hand off to our
-
lead engineer, Parker, who is going to
talk about some of the concrete stuff, the
-
non-chalkboard stuff, the software stuff
that actually makes this work.
-
Thompson: Yeah, so I want to talk about
the process of actually doing it. Building
-
the tooling that's required to collect
these observables. Effectively, how do you
-
go mining for indicator indicator
minerals? But first the progression of
-
where we are and where we're going. We
initially broke this out into three major
-
tracks of our technology. We have our
static analysis engine, which started as a
-
prototype, and we have now recently
completed a much more mature and solid
-
engine that's allowing us to be much more
extensible and digging deeper into
-
programs, and provide a much more deep
observables. Then, we have the data
-
collection and data reporting. Tim showed
some of our early stabs at this. We're
-
right now in the process of building new
engines to make the data more accessible
-
and easy to work with and hopefully more
of that will be available soon. Finally,
-
we have our fuzzer track. We needed to get
some early data, so we played with some
-
existing off-the-shelf fuzzers, including
AFL, and, while that was fun,
-
unfortunately it's a lot of work to
manually instrument a lot of fuzzers for
-
hundreds of binaries.
So, we then built an automated solution
-
that started to get us closer to having a
fuzzing harness that could autogenerate
-
itself, depending on the software, the
software's behavior. But, right now,
-
unfortunately that technology showed us
more deficiencies than it showed
-
successes. So, we are now working on a
much more mature fuzzer that will allow us
-
to dig deeper into programs as we're
running and collect very specific things
-
that we need for our model and our
analysis. But on to our analytic pipeline
-
today. This is one of the most concrete
components of our engine and one of the
-
most fun!
We effectively wanted some type of
-
software hopper, where you could just pour
programs in, installers and then, on the
-
other end, come reports: Fully annotated
and actionable information that we can
-
present to people. So, we went about the
process of building a large-scale engine.
-
It starts off with a simple REST API,
where we can push software in, which then
-
gets moved over to our computation cluster
that effectively provides us a fabric to
-
work with. It makes is made up of a lot of
different software suites, starting off
-
with our data processing, which is done by
apache spark and then moves over into data
-
data handling and data analysis in spark,
and then we have a common HDFS layer to
-
provide a place for the data to be stored
and then a resource manager and Yarn. All
-
of that is backed by our compute and data
nodes, which scale out linearly. That then
-
moves into our data science engine, which
is effectively spark with Apache Zeppelin,
-
which provides us a really fun interface
where we can work with the data in an
-
interactive manner but be kicking off
large-scale jobs into the cluster. And
-
finally, this goes into our report
generation engine. What this bought us,
-
was the ability to linearly scale and make
that hopper bigger and bigger as we need,
-
but also provide us a way to process data
that doesn't fit in a single machine's
-
RAM. You can push the instance sizes as
you large as you want, but we have
-
datasets that blow away any single host
RAM set. So this allows us to work with
-
really large collections of observables.
I want to dive down now into our actual
-
static analysis. But first we have to
explore the problem space, because it's a
-
nasty one. Effectively in settles mission
is to process as much software as
-
possible. Hopefully all of it, but it's
hard to get your hand on all the binaries
-
that are out there. When you start to look
at that problem you understand there's a
-
lot of combinations: there's a lot of CPU
architectures, there's a lot of operating
-
systems, there's a lot of file formats,
there's a lot of environments the software
-
gets deployed into, and every single one
of them has their own app Archer app
-
armory features. And it can be
specifically set for one combination
-
button on another and you don't want to
penalize a developer for not turning on a
-
feature they had no access to ever turn
on. So effectively we need to solve this
-
in a much more generic way. And so what we
did is our static analysis engine
-
effectively looks like a gigantic
collection of abstraction libraries to
-
handle binary programs. You take in some
type of input file be it ELF, PE, MachO
-
and then the pipeline splits. It goes off
into two major analyzer classes, our
-
format analyzers, which look at the
software much like how a linker or loader
-
would look at it. I want to understand how
it's going to be loaded up, what type of
-
armory feature is going to be applied and
then we can run analyzers over that. In
-
order to achieve that we need abstraction
libraries that can provide us an abstract
-
memory map, a symbol resolver, generic
section properties. So all that feeds in
-
and then we run over a collection of
analyzers to collect data and observables.
-
Next we have our code analyzers, these are
the analyzers that run over the code
-
itself. I need to be able to look at every
possible executable path. In order to do
-
that we need to do function discovery,
feed that into a control flow recovery
-
engine, and then as a post-processing step
dig through all of the possible metadata
-
in the software, such as like a switch
table, or something like that to get even
-
deeper into the software. Then this
provides us a basic list of basic blocks,
-
functions, instruction ranges. And does so
in an efficient manner so we can process a
-
lot of software as it goes. Then all that
gets fed over into the main modular
-
analyzers. Finally, all of this comes
together and gets put into a gigantic blob
-
of observables and fed up to the pipeline.
We really want to thank the Ford
-
Foundation for supporting our work in
this, because the pipeline and the static
-
analysis has been a massive boon for our
project and we're only beginning now to
-
really get our engine running and we're
having a great time with it. So digging
-
into the observables themselves, what are
we looking at and let's break them apart.
-
So the format structure components, things
like ASLR, DEP, RELRO.
-
basic app armory, that's going to go into
the feature and gonna be enabled at the OS
-
layer when it gets loaded up or linked.
And we also collect other metadata about
-
the program such as like: "What libraries
are linked in?", "What's its dependency
-
tree look like – completely?", "How did
those software, how did those library
-
score?", because that can affect your main
software. Interesting example on Linux, if
-
you link a library that requires an
executable stack, guess what your software
-
now has an executable stack, even if you
didn't mark that. So we need to be owners
-
to understand what ecosystem the software
is gonna live in. And the code structure
-
analyzers look at things like
functionality: "What's the software
-
doing?", "What type of app armory is
getting injected into the code?". A great
-
example of that is something like stack
guards or fortify source. These are our
-
main features that only really apply and
can be observed inside of the control flow
-
or inside of the actual instructions
themselves. This is why control
-
photographs are key.
We played around with a number of
-
different ways of analyzing software that
we could scale out and ultimately we had
-
to come down to working with control
photographs. Provided here is a basic
-
visualization of what I'm talking about
with a control photograph, provided by
-
Benja, which has wonderful visualization
tools, hence this photo, and not our
-
engine because we don't build their very
many visualization engines. But you
-
basically have a function that's broken up
into basic blocks, which is broken up into
-
instructions, and then you have basic flow
between them. Having this as an iterable
-
structure that we can work with, allows us
to walk over that and walk every single
-
instruction, understand the references,
understand where code and data is being
-
referenced, and how is it being
referenced.
-
And then what type of functionalities
being used, so this is a great way to find
-
something, like whether or not your stack
guards are being applied on every function
-
that needs them, how deep are they being
applied, and is the compiler possibly
-
introducing errors into your armory
features. which are interesting side
-
studies. Also why we did this is because
we want to push the concept of what type
-
of observables even farther. Let's say
take this example you want to be able to
-
take instruction abstractions. Let's say
for all major architectures you can break
-
them up into major categories. Be it
arithmetic instructions, data manipulation
-
instructions, like load stores and then
control flow instructions. Then with these
-
basic fundamental building blocks you can
make artifacts. Think of them like a unit
-
of functionality: has some type of input,
some type of output, it provides some type
-
of operation on it. And then with these
little units of functionality, you can
-
link them together and think of these
artifacts as may be sub-basic block or
-
crossing a few basic blocks, but a
different way to break up the software.
-
Because a basic block is just a branch
break, but we want to look at
-
functionality brakes, because these
artifacts can provide the basic
-
fundamental building blocks of the
software itself. It's more important, when
-
we want to start doing symbolic lifting.
So that we can lift the entire software up
-
into a generic representation, that we can
slice and dice as needed.
-
Moving from there, I want to talk about
fuzzing a little bit more. Fuzzing is
-
effectively at the heart of our project.
It provides us the rich dataset that we
-
can use to derive a model. It also
provides us awesome other metadata on the
-
side. But why? Why do we care about
fuzzing? Why is fuzzing the metric, that
-
you build an engine, that you build a
model that you drive some type of reason
-
from? So think of the set of bugs,
vulnerabilities, and exploitable
-
vulnerabilities. In an ideal world you'd
want to just have a machine that pulls out
-
exploitable vulnerabilities.
Unfortunately, this is exceedingly costly
-
for a series of decision problems, that go
between these sets. So now consider the
-
superset of bugs or faults. A fuzzer can
easily recognize, or other software can
-
easily recognize faults, but if you want
to move down the sets you unfortunately
-
need to jump through a lot of decision
hoops. For example, if you want to move to
-
a vulnerability you have to understand:
Does the attacker have some type of
-
control? Is there a trust boundary being
crossed? Is this software configured in
-
the right way for this to be vulnerable
right now? So they're human factors that
-
are not deducible from the outside. You
then amplify this decision problem even
-
worse going to exploitable
vulnerabilities. So if we collect the
-
superset of bugs, we will know that there
is some proportion of subsets in there.
-
And this provides us a datasets easily
recognizable and we can collect in a cost-
-
efficient manner. Finally, fuzzing is key
and we're investing a lot of our time
-
right now and working on a new fuzzing
engine, because there are some key things
-
we want to do.
We want to be able to understand all of
-
the different paths the software could be
taking, and as you're fuzzing you're
-
effectively driving the software down as
many unique paths while referencing as
-
many unique data manipulations as
possible. So if we save off every path,
-
annotate the ones that are faulting, we
now have this beautiful rich data set of
-
exactly where the software went as we were
driving it in specific ways. Then we feed
-
that back into our static analysis engine
and begin to generate those instruction
-
out of those instruction abstractions,
those artifacts. And with that, imagine we
-
have these gigantic traces of instruction
abstractions. From there we can then begin
-
to train the model to explore around the
fault location and begin to understand and
-
try and study the fundamental building
blocks of what a bug looks like in an
-
abstract instruction agnostic way. This is
why we're spending a lot of time on our
-
Fuzzing engine right now. But hopefully
soon we'll be able to talk about that more
-
and maybe a tech track and not the policy
track.
-
C: Yeah, so from then on when anything
went wrong with the computer we said it
-
had bugs in it. laughs All right, I
promised you a technical journey, I
-
promised you a technical journey into the
dark abyss of as deep as you want to get
-
with it. So let's go ahead and bring it
up. Let's wrap it up and bring it up a
-
little bit here. We've talked a great deal
today about some theory. We've talked
-
about development in our tooling and
everything else and so I figured I should
-
end with some things that are not in
progress, but in fact which are done in
-
yesterday's news. Just to go ahead and
make that shared here with Europe. So in
-
the midst of all of our development we
have been discovering and reporting bugs,
-
again this not our primary purpose really.
But you know you can't help but do it. You
-
know how computers are these days. You
find bugs just for turning them on, right?
-
So we've been disclosing all of that a
little while ago. At DEFCON and Black Hat
-
our chief scientist Sarah together with
Mudge went ahead and dropped this
-
bombshell on the Firefox team which is
that for some period of time they had ASLR
-
disabled on OS X. When we first found it
we assumed it was a bug in our tools. When
-
we first mentioned it in a talk they came
to us and said it's definitely a bug on
-
our tools or might be or some level of
surprise and then people started looking
-
into it and in fact at one point it had
been enabled and then temporarily
-
disabled. No one knew, everyone thought it
was on. It takes someone looking to notice
-
that kind of stuff, right. Major shout out
though, they fixed it immediately despite
-
our full disclosure on stage and
everything. So very impressed, but in
-
addition to popping surprises on people
we've also been doing the usual process of
-
submitting patches and bugs, particularly
to LLVM and Qemu and if you work in
-
software analysis you could probably guess
why.
-
Incidentally, if you're looking for a
target to fuzz if you want to go home from
-
CCC and you want to find a ton of findings
LLVM comes with a bunch of parsers. You
-
should fuzz them, you should fuzz them and
I say that because I know for a fact you
-
are gonna get a bunch of findings and it'd
be really nice. I would appreciate it if I
-
didn't have to pay the people to fix them.
So if you wouldn't mind disclosing them
-
that would help. But besides these bug
reports and all these other things we've
-
also been working with lots of others. You
know we gave a talk earlier this summer,
-
Sarah gave a talk earlier this summer,
about these things and she presented
-
findings on comparing some of these base
scores of different Linux distributions.
-
And based on those findings there was a
person on the fedora red team, Jason
-
Calloway, who sat there and well I can't
read his mind but I'm sure that he was
-
thinking to himself: golly it would be
nice to not, you know, be surprised at the
-
next one of these talks. They score very
well by the way. They were leading in
-
many, many of our metrics. Well, in any
case, he left Vegas and he went back home
-
and him and his colleagues have been
working on essentially re-implementing
-
much of our tooling so that they can check
the stuff that we check before they
-
release. Before they release. Looking for
security before you release. So that would
-
be a good thing for others to do and I'm
hoping that that idea really catches on.
-
laughs Yeah, yeah right, that would be
nice. That would be nice.
-
But in addition to that, in addition to
that our mission really is to get results
-
out to the public and so in order to
achieve that, we have broad partnerships
-
with Consumer Reports and the digital
standard. Especially if you're into cyber
-
policy, I really encourage you to take a
look at the proposed digital standard,
-
which is encompassing of the things we
look for and and and so much more. URLs,
-
data, traffic, motion and cryptography and
update mechanisms and all that good stuff.
-
So, where we are and where we're going,
the big takeaways here for if you're
-
looking for that, so what, three points
for you: one we are building a tooling
-
necessary to do larger and larger and
larger studies regarding these surrogate
-
security stores. My hope is that in some
period of the not-too-distant future, I
-
would like to be able to, with my
colleagues, publish some really nice
-
findings about what are the things that
you can observe in software, which have a
-
suspiciously high correlation with the
software being good. Right, nobody really
-
knows right now. It's an empirical
question. As far as I know, the study
-
hasn't been done. We've been running it on
the small scale. We're building the
-
tooling to do it on a much larger scale.
We are hoping that this winds up being a
-
useful field in security as that
technology develops. In the meantime our
-
static analyzers are already making
surprising discoveries: hit YouTube and
-
take a look for Sara Zatko's recent talks
at DEFCON/Blackhat. Lots of fun findings
-
in there. Lots of things that anyone who
looks would have found it. Lots of that.
-
And then lastly, if you were in the
business of shipping software and you are
-
thinking to yourself.. okay so these guys,
someone gave them some money to mess up my
-
day and you're wondering: what can I do to
not have my day messed up? One simple
-
piece of advice, one simple piece of
advice: make sure your software employs
-
every exploit mitigation technique Mudge
has ever or will ever hear of. And he's
-
heard of a lot of them. He's only gonna,
you know all that, turn all those things
-
on and if you don't know anything about
that stuff, if nobody on your team knows
-
anything about that stuff didn't I don't
even know I'm saying this if you hear you
-
know about that stuff so do that. If
you're not here, then you should be here.
-
Danke, Danke.
Herald Angel: Thank you, Tim and Parker.
-
Do we have any questions from the
audience? It's really hard to see you with
-
that bright light in my face. I think the
signal angel has a question. Signal Angel:
-
So the IRC channel was impressed by your
tools and your models that you wrote. And
-
they are wondering what's going to happen
to that, because you do have funding from
-
the Ford foundation now and so what are
your plans with this? Do you plan on
-
commercializing this or is it going to be
open source or how do we get our hands on
-
this?
C: It's an excellent question. So for the
-
time being the money that we are receiving
is to develop the tooling, pay for the AWS
-
instances, pay for the engineers and all
that stuff. The direction as an
-
organization that we would like to take
things I have no interest in running a
-
monopoly. That sounds like a fantastic
amount of work and I really don't want to
-
do it. However, I have a great deal of
interest in taking the gains that we are
-
making in the technology and releasing the
data so that other competent researchers
-
can go through and find useful things that
we may not have noticed ourselves. So
-
we're not at a point where we are
releasing data in bulk just yet, but that
-
is simply a matter of engineering our
tools, are still in flux as we, you know.
-
When we do that, we want to make sure the
data is correct and so our software has to
-
have its own low bug counts and all these
other things. But ultimately for the
-
scientific aspect of our mission. Though
the science is not our primary mission.
-
Our primary mission is to apply it to help
consumers. At the same time, it is our
-
belief that an opaque model is as good as
crap, no one should trust an opaque model,
-
if somebody is telling you that they have
some statistics and they do not provide
-
you with any underlying data and it is not
reproducible you should ignore them.
-
Consequently what we are working towards
right now is getting to a point where we
-
will be able to share all of those
findings. The surrogate scores, the
-
interesting correlations between
observables and fuzzing. All that will be
-
public as the material comes online.
Signal Angel: Thank you.
-
C: Thank you.
Herald Angel: Thank you. And microphone
-
number three please.
Mic3: Hi, thanks so some really
-
interesting work you presented here. So
there's something I'm not sure I
-
understand about the approach that you're
taking. If you are evaluating the security
-
of say a library function or the
implementation of a network protocol for
-
example you know there'd be a precise
specification you could check that
-
against. And the techniques you're using
would make sense to me. But it's not so
-
clear since you've set the goal that
you've set for yourself is to evaluate
-
security of consumer software. It's not
clear to me whether it's fair to call
-
these results security scores in the
absence of a threat model so. So my
-
question is, you know, how is it
meaningful to make a claim that a piece of
-
software is secure if you don't have a
threat model for it?
-
C: This is an excellent question and I
anyone who disagrees is they should the
-
wrong. Security without a threat model is
not security at all. It's absolutely a
-
true point. So the things that we are
looking for, most of them are things that
-
you will already find present in your
threat model. And so for example we were
-
reporting on the presence of things like a
ASLR and lots of other things that get to
-
the heart of exploitability of a piece of
software. So for example if we are
-
reviewing a piece of software, that has no
attack surface
-
then it is canonically not in the threat
model and in that sense it makes no sense
-
to report on its overall security. On the
other hand, if we're talking about
-
software like say a word processor, a
browser, anything on your phone, anything
-
that talks on the network, we're talking
about those kinds of applications then I
-
would argue that exploit mitigations and
the other things that we are measuring are
-
almost certainly very relevant. So there's
a sense in which what we are measuring is
-
the lowest common denominator among what
we imagine or the dominant threat models
-
for the applications. The hand-wavy
answer, but I promised heuristics so there
-
you go.
Mic3: Thanks.
-
C: Thank you.
Herald Angel: Any questions? No raising
-
hands, okay. And then the herald can ask a
question, because I never can. So the
-
question is: you mentioned earlier these
security labels and for example what
-
institution could give out the security
labels? Because as obviously the vendor
-
has no interest in IT security?
C: Yes it's a very good question. So our
-
partnership with Consumer Reports. I don't
know if you're familiar with them, but in
-
the United States Consumer Reports is a
major huge consumer watchdog organization.
-
They test the safety of automobiles, they
test you know lots of consumer appliances.
-
All kinds of things both to see if they
function more or less as advertised but
-
most importantly they're checking for
quality, reliability and safety. So our
-
partnership with Consumer Reports is all
about us doing our work and then
-
publishing that. And so for example the
televisions that we presented the data on
-
all of that was collected and published in
partnership with Consumer Reports.
-
Herald: Thank you.
C: Thank you.
-
Herald: Any other questions for stream. I
hear a no. Well in this case people thank
-
you.
Thank Tim and Parker for their nice talk
-
and please give them a very very warm hall
round of applause.
-
applause
C: Thank you. T: Thank you.
-
subtitles created by c3subtitles.de
in the year 2017. Join, and help us!