34C3 - How risky is the software you use?

Edit subtitles

0:00 - 0:17

Music
Herald: The next talk is about how risky
0:17 - 0:23

is software you use. So you may be heard
about Trump versus a Russian security
0:23 - 0:31

company. We won't judge this, we won't
comment this, but we dislike the
0:31 - 0:37

prejudgments of this case. Tim Carstens
and Parker Thompson will tell you a little
0:37 - 0:43

bit more about how risky the software is
you use. Tim Carstens is CITL's Acting
0:43 - 0:48

Director and Parker Thompson is CITL's
lead engineer. Please welcome with a very,
0:48 - 0:54

very warm applause: Tim and Parker!
Thanks.
0:54 - 1:05

Applause
Tim Carstens: Howdy, howdy. So my name is
1:05 - 1:13

Tim Carstens. I'm the acting director of
the cyber independent testing lab. It's
1:13 - 1:19

four words there, we'll talk about all for
today, especially cyber. With me today as
1:19 - 1:26

our lead engineer Parker Thompson. Not on
stage or our other collaborators: Patrick
1:26 - 1:33

Stach, Sarah Zatko, and present in the
room but not on stage - Mudge. So today
1:33 - 1:37

we're going to be talking about our work,
the lead in. The introduction that was
1:37 - 1:40

given is phrased in terms of Kaspersky and
all of that, I'm not gonna be speaking
1:40 - 1:45

about Kaspersky and I guarantee you I'm
not gonna be speaking about my president.
1:45 - 1:50

Right, yeah? Okay. Thank you.
Applause
1:50 - 1:55

All right, so why don't we go ahead and
kick off: I'll mention now parts of this
1:55 - 2:01

presentation are going to be quite
technical. Not most of it, and I will
2:01 - 2:04

always include analogies and all these
other things if you are here in security
2:04 - 2:11

but not a bit-twiddler. But if you do want
to be able to review some of the technical
2:11 - 2:15

material, if I go through it too fast you
like to read if you're a mathematician or
2:15 - 2:21

if you are a computer scientist, our sides
are already available for download at this
2:21 - 2:25

site here. We think our pal our partners
at power door for getting that set up for
2:25 - 2:32

us. Let's let's get started on the real
material here. Alright, so we are CITL: a
2:32 - 2:36

nonprofit organization based in the United
States founded by our chief scientist
2:36 - 2:43

Sarah Zatko and our board chair Mudge. And
our mission is a public good mission - we
2:43 - 2:47

are hackers but our mission here is
actually to look out for people who do not
2:47 - 2:50

know very much about machines
or as much as the other hackers do.
2:50 - 2:56

Specifically, we seek to improve the state
of software security by providing the
2:56 - 3:01

public with accurate reporting on the
security of popular software, right? And
3:01 - 3:06

so there was a mouthful for you. But no
doubt, no doubt, every single one of you
3:06 - 3:11

has received questions of the form: what
do I run on my phone, what do I do with
3:11 - 3:14

this, what do I do with that, how do I
protect myself - all these other things
3:14 - 3:20

lots of people in the general public
looking for agency in computing. No one's
3:20 - 3:25

offering it to them, and so we're trying
to go ahead and provide a forcing function
3:25 - 3:30

on the software field in order to, you
know, again be able to enable consumers
3:30 - 3:36

and users and all these things. Our social
good work is funded largely by charitable
3:36 - 3:41

monies from the Ford Foundation whom we
thank a great deal, but we also have major
3:41 - 3:45

partnerships with Consumer Reports, which
is a major organization in the United
3:45 - 3:52

States that generally, broadly, looks at
consumer goods for safety and performance.
3:52 - 3:56

But also partners with The Digital
Standard, which probably would be of great
3:56 - 3:59

interest to many people here at Congress
as it is a holistic standard for
3:59 - 4:04

protecting user rights. We'll talk about
some of the work that goes into those
4:04 - 4:10

things here in a bit, but first I want to
give the big picture of what it is we're
4:10 - 4:18

really trying to do in one one short
little sentence. Something like this but
4:18 - 4:24

for security, right? What are the
important facts, how does it rate, you
4:24 - 4:27

know, is it easy to consume, is it easy to
go ahead and look and say this thing is
4:27 - 4:31

good this thing is not good. Something
like this, but for software security.
4:33 - 4:39

Sounds hard doesn't it? So I want to talk
a little bit about what I mean by
4:39 - 4:45

something like this.
There are lots of consumer outlook and
4:45 - 4:50

watchdog and protection groups - some
private, some government, which are
4:50 - 4:55

looking to do this for various things that
are not a software security. And you can
4:55 - 4:58

see some examples here that are big in the
United States - I happen to not like these
4:58 - 5:02

as much as some of the newer consumer
labels coming out from the EU. But
5:02 - 5:05

nonetheless they are examples of the kinds
of things people have done in other
5:05 - 5:11

fields, fields that are not security to
try to achieve that same end. And when
5:11 - 5:17

these things work well, it is for three
reasons: One, it has to contain the
5:17 - 5:23

relevant information. Two: it has to be
based in fact, we're not talking opinions,
5:23 - 5:29

this is not a book club or something like
that. And three: it has to be actionable,
5:29 - 5:33

it has to be actionable - you have to be
able to know how to make a decision based
5:33 - 5:36

on it. How do you do that for software
security? How do you do that for
5:36 - 5:44

software security? So the rest of the talk
is going to go in three parts.
5:44 - 5:49

First, we're going to give a bit of an
overview for more of the consumer facing
5:49 - 5:53

side of things for that we do: look at
some data that we have reported on early
5:53 - 5:57

and all these other kinds of good things.
We're then going to go ahead and get
5:58 - 6:06

terrifyingly, terrifyingly technical. And
then after that we'll talk about tools to
6:06 - 6:10

actually implement all this stuff. The
technical part comes before the tools. So
6:10 - 6:12

it just tells you how terrifyingly
technical we're gonna get. It's gonna be
6:12 - 6:20

fun right. So how do you do this for
software security: a consumer version. So,
6:20 - 6:25

if you set forth to the task of trying to
measure software security, right, many
6:25 - 6:28

people here probably do work in the
security field perhaps as consultants
6:28 - 6:32

doing reviews; certainly I used to. Then
probably what you're thinking to yourself
6:32 - 6:39

right now is that there are lots and lots
and lots and lots of things that affect
6:39 - 6:44

the security of a piece of software. Some
of which are, mmm, you're only gonna see
6:44 - 6:48

them if you go reversing. And some of
which are just you know kicking around on
6:48 - 6:52

the ground waiting for you to notice,
right. So we're going to talk about both
6:52 - 6:56

of those kinds of things that you might
measure. But here you see these giant
6:56 - 7:03

charts that basically go through on the
left - on the left we have Microsoft Excel
7:03 - 7:08

on OS X on the right Google Chrome for OS
X this is a couple years old at this point
7:08 - 7:13

maybe one and a half years old but over
here I'm not expecting you to be able to
7:13 - 7:16

read these - the real point is to say look
at all of the different things you can
7:16 - 7:20

measure very easily.
How do you distill, it how do you boil it
7:20 - 7:27

down, right. So this is a the opposite of
a good consumer safety label. This is just
7:27 - 7:30

um if you ever done any consulting this is
the kind of report you hand a client to
7:30 - 7:33

tell them how good their software is,
right? It's the opposite of consumer
7:33 - 7:40

grade. But the reason I'm showing it here
is because, you know, I'm gonna call out
7:40 - 7:43

some things and maybe you can't process
all of this because it's too much
7:43 - 7:47

material, you know. But I'm gonna call it
some things and once I call them out just
7:47 - 7:53

like NP you're gonna recognize them
instantly. So for example, Excel, at the
7:53 - 7:57

time of this review - look at this column
of dots. What's this dots telling you?
7:57 - 8:00

It's telling you look at all these
libraries -all of them are 32-bit only.
8:00 - 8:07

Not 64 bits, not 64 bits. Take a look at
Chrome - exact opposite, exact opposite
8:07 - 8:14

64-bit binary, right? What are some other
things? Excel, again, on OSX maybe you can
8:14 - 8:20

see these danger warning signs that go
straight straight up the whole thing.
8:20 - 8:28

That's the the absence of major heat
protection flags in the binary headers.
8:28 - 8:32

We'll talk about some what that means
exactly in a bit. But also if you hop over
8:32 - 8:36

here you'll see like yeah yeah yeah like
Chrome has all the different heat
8:36 - 8:42

protections that a binary might enable, on
OSX that is, but it also has more dots in
8:42 - 8:45

this column here off to the right. And
what do those dots represent?
8:45 - 8:52

Those dots represent functions, functions
that historically have been the source of
8:52 - 8:54

you know if you call these functions are
very hard to call correctly - if you're a
8:54 - 8:59

C programmer the "gets" function is a good
example. But there are lots of them. And
8:59 - 9:03

you can see here the Chrome doesn't mind,
it uses them all a bunch. And Excel not so
9:03 - 9:08

much. And if you know the history of
Microsoft and the trusted computing
9:08 - 9:12

initiative and the SDO and all of that you
will know that a very long time ago
9:12 - 9:17

Microsoft made the decision and they said
we're gonna start purging some of these
9:17 - 9:22

risky functions from our code bases
because we think it's easier to ban them
9:22 - 9:25

than teach our devs to use them correctly.
And you see that reverberating out in
9:25 - 9:29

their software. Google on the other hand
says yeah yeah yeah those functions can be
9:29 - 9:32

dangerous to use but if you know how to
use them they can be very good and so
9:32 - 9:39

they're permitted. The point all of this
is building to is that if you start by
9:39 - 9:43

just measuring every little thing that
like your static analyzers can detect in a
9:43 - 9:48

piece of software. Two things: one, you
wind up with way more data than you can
9:48 - 9:55

show in a slide. And two: the engineering
process, the software development life
9:55 - 10:00

cycle that went into the software will
leave behind artifacts that tell you
10:00 - 10:05

something about the decisions that went
into designing that engineering process.
10:05 - 10:10

And so you know, Google for example:
quite rigorous as far as hitting you know
10:10 - 10:14

GCC dash, and then enable all of the
compiler protections. Microsoft may be
10:14 - 10:20

less good at that, but much more rigid in
things that's were very popular ideas when
10:20 - 10:24

they introduced trusted computing,
alright. So the big takeaway from this
10:24 - 10:29

material is that again the software
engineering process results in artifacts
10:29 - 10:36

in the software that people can find.
Alright. Ok, so that's that's a whole
10:36 - 10:41

bunch of data, certainly it's not a
consumer-friendly label. So how do you
10:41 - 10:46

start to get in towards the consumer zone?
Well, the main defect of the big reports
10:46 - 10:51

that we just saw is that it's too much
information. It's a very dense on data but
10:51 - 10:56

it's very hard to distill it to the "so
what" of it, right?
10:56 - 11:00

And so this here is one of our earlier
attempts to go ahead and do that
11:00 - 11:05

distillation. What are these charts how
did we come up with these? Well on the
11:05 - 11:08

previous slide when we saw all these
different factors that you can analyze in
11:08 - 11:14

software, basically here's whose views
that we arrive at this. For each of those
11:14 - 11:19

things: pick a weight. Go ahead and
compute a score, average against the
11:19 - 11:22

weight: tada, now you have some number.
You can do that for each of the libraries
11:22 - 11:26

and the piece of software. And if you do
that for each of the libraries in the
11:26 - 11:29

software you can then go ahead and produce
these histograms to show, you know, like
11:29 - 11:36

this percentage of the DLLs had a score in
this range. Boom, there's a bar, right.
11:36 - 11:39

How do you pick those weights? We'll talk
about that in a sec - it's very technical.
11:39 - 11:45

But the the takeaway though, is you know
that you wind up with these charts. Now
11:45 - 11:48

I've obscured the labels, I've obscured
the labels and the reason I've done that
11:48 - 11:52

is because I don't really care that much
about the actual counts. I want to talk
11:52 - 11:57

about the shapes, the shapes of these
charts: it's a qualitative thing.
11:57 - 12:03

So here: good scores appear on the right,
bad scores appear on the left. The
12:03 - 12:06

histogram measuring all the libraries and
components and so a very secure piece of
12:06 - 12:13

software in this model manifests as a tall
bar far to the right. And you can see a
12:13 - 12:18

clear example at in our custom Gentoo
build. Anyone here is a Gentoo fan knows -
12:18 - 12:21

hey I'm going to install this thing, I
think I'm going to go ahead and turn on
12:21 - 12:25

every single one of those flags, and lo
and behold if you do that yeah you wind up
12:25 - 12:31

with tall bar far to the right. Here's in
Ubuntu 16, I bet it's 16.04 but I don't
12:31 - 12:36

recall exactly, 16 LTS. Here you see a lot
of tall bars to the right - not quite as
12:36 - 12:40

consolidated as a custom Gentoo build, but
that makes sense doesn't it right? Because
12:40 - 12:45

then you know you don't do your whole
Ubuntu build. Now I want to contrast. I
12:45 - 12:50

want to contrast. So over here on the
right we see in the same model, an
12:50 - 12:56

analysis of the firmware obtained from two
smart televisions. Last year's models from
12:56 - 13:00

Samsung and LG. And here the model
numbers. We did this work in concert with
13:00 - 13:05

Consumer Reports. And what do you notice
about these histograms, right. Are the
13:05 - 13:12

bars tall and to the right? No, they look
almost normal, not quite, but that doesn't
13:12 - 13:17

really matter. The main thing that matters
is that this is the shape you would expect
13:17 - 13:24

to get if you were playing a random game
basically to decide what security features
13:24 - 13:28

to enable in your software. This is the
shape of not having a security program, is
13:28 - 13:34

my bet. That's my bet. And so what do you
see? You see heavy concentration here in
13:34 - 13:39

the middle, right, that seems fair, and
like it tails off. On the Samsung nothing
13:39 - 13:44

scored all that great, same on the LG.
Both of them are you know running their
13:44 - 13:47

respective operating systems and they're
basically just inheriting whatever
13:47 - 13:51

security came from whatever open source
thing they forked, right.
13:51 - 13:55

So this is this is the kind of message,
this right here is the kind of thing that
13:55 - 14:02

we serve to exist for. This is us
producing charts showing that the current
14:02 - 14:08

practices in the not-so consumer-friendly
space of running your own Linux distros
14:08 - 14:13

far exceed the products being delivered,
certainly in this case in the smart TV
14:13 - 14:25

market. But I think you might agree with
me, it's much worse than this. So let's
14:25 - 14:28

dig into that a little bit more, I have a
different point that I want to make about
14:28 - 14:34

that same data set - so this table here
this table is again looking at the LG
14:34 - 14:40

Samsung and Gentoo Linux installations.
And on this table we're just pulling out
14:40 - 14:44

some of the easy to identify security
features you might enable in a binary
14:44 - 14:50

right. So percentage of binaries with
address space layout randomization, right?
14:50 - 14:56

Let's talk about that on our Gentoo build
it's over 99%. That also holds for the
14:56 - 15:03

Amazon Linux AMI - it holds in Ubuntu.
ASLR is incredibly common in modern Linux.
15:03 - 15:09

And despite that, fewer than 70 percent of
the binaries on the LG television had it
15:09 - 15:14

enabled. Fewer than 70 percent. And the
Samsung was doing, you know, better than
15:14 - 15:20

that I guess, but 80 percent is a pretty
disappointing when a default Linux
15:20 - 15:25

install, you know, mainstream Linux distro
is going to get you 99, right? And it only
15:25 - 15:28

gets worse, it only gets worse right you
know?
15:28 - 15:32

RELRO support, if you don't know what that
is that's ok but if you do, look abysmal
15:32 - 15:38

coverage look at this abysmal coverage
coming out of these IOT devices very sad.
15:38 - 15:41

And you see it over and over and over
again. I'm showing this because some
15:41 - 15:46

people in this room or watching this video
ship software - and I have a message, I
15:46 - 15:50

have a message to those people who ship
software who aren't working on say Chrome
15:50 - 15:59

or any of the other big-name Pwn2Own kinds
of targets. Look at this: you can be
15:59 - 16:02

leading the pack by mastering the
fundamentals. You can be leading the pack
16:02 - 16:07

by mastering the fundamentals. This is a
point that really as a security field we
16:07 - 16:11

really need to be driving home. You know,
one of the things that we're seeing here
16:11 - 16:16

in our data is that if you're the vendor
who is shipping the product everyone has
16:16 - 16:19

heard of in the security field and maybe
your game is pretty decent right? If
16:19 - 16:24

you're shipping say Windows or if you're
shipping Firefox or whatever. But if
16:24 - 16:26

you're if you're doing one of these things
where people are just kind of beating you
16:26 - 16:31

up for default passwords, then your
problems are way further than just default
16:31 - 16:35

passwords, right? Like the house, the
house is messy it needs to be cleaned,
16:35 - 16:43

needs to be cleaned. So the rest of the
talk like I said we're going to be
16:43 - 16:47

discussing a lot of other things that
amount to getting you know a peek behind
16:47 - 16:51

the curtain and where some of these things
come from and getting very specific about
16:51 - 16:54

how this business works, but if you're
interested in more of the high level
16:54 - 16:59

material - especially if you're interested
in interesting results and insights, some
16:59 - 17:02

of which I'm going to have here later. But
I really encourage you though to take a
17:02 - 17:07

look at the talk from this past summer by
our chief scientist Sarah Zatko, which is
17:07 - 17:11

predominantly on the topic of surprising
results in the data.
17:15 - 17:19

Today, though, this being our first time
presenting here in Europe, we figured we
17:19 - 17:23

would take more of an overarching kind of
view. What we're doing and why we're
17:23 - 17:27

excited about it and where it's headed. So
we're about to move into a little bit of
17:27 - 17:32

the underlying theory, you know. Why do I
think it's reasonable to even try to
17:32 - 17:35

measure the security of software from a
technical perspective. But before we can
17:35 - 17:39

get into that I need to talk a little bit
about our goals, so that the decisions and
17:39 - 17:45

the theory; the motivation is clear,
right. Our goals are really simple: it's a
17:45 - 17:51

very easy organization to run because of
that. Goal number one: remain independent
17:51 - 17:56

of vendor influence. We are not the first
organization to purport to be looking out
17:56 - 18:02

for the consumer. But unlike many of our
predecessors, we are not taking money from
18:02 - 18:10

the people we review, right? Seems like
some basic stuff. Seems like some basic
18:10 - 18:18

stuff right? Thank you, okay.
Two: automated, comparable, quantitative
18:18 - 18:24

analysis. Why automated? Well, we need our
test results to be reproducible. And Tim
18:24 - 18:28

goes in opens up your software in IDA and
finds a bunch of stuff that makes them all
18:28 - 18:33

stoped - that's not a very repeatable kind
of a standard for things. And so we're
18:33 - 18:36

interested in things which are automated.
We'll talk about, maybe a few hackers in
18:36 - 18:40

here know how hard that is. We'll talk
about that, but then last we also we're
18:40 - 18:44

well acting as a watchdog - we're
protecting the interests of the user, the
18:44 - 18:48

consumer, however you would like to look
at it. But we also have three non goals,
18:48 - 18:53

three non goals that are equally
important. One: we have a non goal of
18:53 - 18:57

finding and disclosing vulnerabilities. I
reserve the right to find and disclose
18:57 - 19:01

vulnerabilities. But that's not my goal,
it's not my goal. Another non goal is to
19:01 - 19:05

tell software vendors what to do. If a
vendor asks me how to remediate their
19:05 - 19:08

terrible score, I will tell them what we
are measuring but I'm not there to help
19:08 - 19:12

them remediate. It's on them to be able to
ship a secure product without me holding
19:12 - 19:19

their hand. We'll see. And then three:
non-goal, perform free security testing
19:19 - 19:24

for vendors. Our testing happens after you
release. Because when you release your
19:24 - 19:29

software you are telling people it is
ready to be used. Is it really though, is
19:29 - 19:32

it really though, right?
Applause
19:32 - 19:37

Yeah, thank you. Yeah, so we are not there
to give you a preview of what your score
19:37 - 19:42

will be. There is no sum of money you can
hand me that will get you an early preview
19:42 - 19:46

of what your score is - you can try me,
you can try me: there's a fee for trying
19:46 - 19:50

me. There's a fee for trying me. But I'm
not gonna look at your stuff until I'm
19:50 - 19:59

ready to drop it, right. Yeah bitte, yeah.
All right. So moving into this theory
19:59 - 20:03

territory. Three big questions, three big
questions that need to be addressed if you
20:03 - 20:07

want to do our work efficiently: what
works, what works for improving security,
20:07 - 20:13

what are the things that need or that you
really want to see in software. Two: how
20:13 - 20:17

do you recognize when it's being done?
It's no good if someone hands you a piece
20:17 - 20:20

of software and says, "I've done all the
latest things" and it's a complete black
20:20 - 20:25

box. If you can't check the claim, the
claim is as good as false, in practical
20:25 - 20:30

terms, period, right. Software has to be
reviewable or a priori, I'll think you're
20:30 - 20:36

full of it. And then three: who's doing it
- of all the things that work, that you
20:36 - 20:40

can recognize, who's actually doing it.
You know, let's go ahead - our field is
20:40 - 20:47

famous for ruining people's holidays and
weekends over Friday bug disclosures, you
20:47 - 20:52

know New Year's Eve bug disclosures. I
would like us to also be famous for
20:52 - 20:59

calling out those teams and those software
organizations which are being as good as
20:59 - 21:04

the bad guys are being bad, yeah? So
provide someone an incentive to be maybe
21:04 - 21:19

happy to see us for a change, right. Okay,
so thank you. Yeah, all right. So how do
21:19 - 21:26

we actually pull these things off; the
basic idea. So, I'm going to get into some
21:26 - 21:29

deeper theory: if you're not a theorist I
want you to focus on this slide.
21:29 - 21:33

And I'm gonna bring it back, it's not all
theory from here on out after this but if
21:33 - 21:39

you're not a theorist I really want you to
focus on this slide. The basic motivation,
21:39 - 21:43

the basic motivation behind what we're
doing; the technical motivation - why we
21:43 - 21:47

think that it's possible to measure and
report on security. It all boils down to
21:47 - 21:53

this right. So we start with a thought
experiment, a gedankent, right? Given a
21:53 - 21:59

piece of software we can ask: overall, how
secure is it? Kind of a vague question but
21:59 - 22:03

you could imagine you know there's
versions of that question. And two: what
22:03 - 22:08

are its vulnerabilities. Maybe you want to
nitpick with me about what the word
22:08 - 22:11

vulnerability means but broadly you know
this is a much more specific question
22:11 - 22:19

right. And here's here's the enticing
thing: the first question appears to ask
22:19 - 22:25

for less information than the second
question. And maybe if we were taking bets
22:25 - 22:29

I would put my money on, yes, it actually
does ask for less information. What do I
22:29 - 22:33

mean by that what do I mean by that? Well,
let's say that someone told you all of the
22:33 - 22:38

vulnerabilities in a system right? They
said, "Hey, I got them all", right? You're
22:38 - 22:42

like all right that's cool, that's cool.
And if someone asks you hey how secure is
22:42 - 22:45

this system you can give them a very
precise answer. You can say it has N
22:45 - 22:49

vulnerabilities, and they're of this kind
and like all this stuff right so certainly
22:49 - 22:55

the second question is enough to answer
the first. But, is the reverse true?
22:55 - 22:58

Namely, if someone were to tell you, for
example, "hey, this piece of software has
22:58 - 23:06

exactly 32 vulnerabilities in it." Does
that make it easier to find any of them?
23:06 - 23:12

Right, there's room for to maybe do that
using some algorithms that are not yet in
23:12 - 23:16

existence.
Certainly the computer scientists in here
23:16 - 23:19

are saying, "well, you know, yeah maybe
counting the number of SAT solutions
23:19 - 23:23

doesn't help you practically find
solutions. But it might and we just don't
23:23 - 23:27

know." Okay fine fine fine. Maybe these
things are the same, but the my experience
23:27 - 23:30

in security, and the experience of many
others perhaps is that they probably
23:30 - 23:37

aren't the same question. And this
motivates what I'm calling here is Zatko's
23:37 - 23:41

question, which is basically asking for an
algorithm that demonstrates that the first
23:41 - 23:46

question is easier than the second
question, right. So Zatko's question:
23:46 - 23:49

develop a heuristic which can to
efficiently answer one, but not
23:49 - 23:54

necessarily two. If you're looking for a
metaphor, if you want to know why I care
23:54 - 23:57

about this distinction, I want you to
think about some certain controversial
23:57 - 24:01

technologies: maybe think about say
nuclear technology, right. An algorithm
24:01 - 24:05

that answers one, but not two, it's a very
safe algorithm to publish. Very safe
24:05 - 24:11

algorithm publish indeed. Okay, Claude
Shannon would like more information. happy
24:11 - 24:16

to oblige. Let's take a look at this
question from a different perspective
24:16 - 24:19

maybe a more hands-on perspective: the
hacker perspective, right? If you're a
24:19 - 24:22

hacker and you're watching me up here and
I'm waving my hands around and I'm showing
24:22 - 24:26

you charts maybe you're thinking to
yourself yeah boy, what do you got? Right,
24:26 - 24:30

how does this actually go. And maybe what
you're thinking to yourself is that, you
24:30 - 24:34

know, finding good vulns: that's an
artisan craft right? You're in IDA, you
24:34 - 24:37

know you're reversing old way you're doing
all these things or hit and Comm, I don't
24:37 - 24:41

know all that stuff. And like, you know,
this kind of clever game; cleverness is
24:41 - 24:47

not like this thing that feels very
automatable. But you know on the other
24:47 - 24:51

hand there are a lot of tools that do
automate things and so it's not completely
24:51 - 24:57

not automatable.
And if you're into fuzzing then perhaps
24:57 - 25:02

you are aware of this very simple
observation, which is that if your harness
25:02 - 25:05

is perfect if you really know what you're
doing if you have a decent fuzzer then in
25:05 - 25:09

principle fuzzing can find every single
problem. You have to be able to look for
25:09 - 25:14

it you have to be able harness for it but
in principle it will, right. So the hacker
25:14 - 25:19

perspective on Zatko's question is maybe
of two minds on the one hand assessing
25:19 - 25:22

security is a game of cleverness, but on
the other hand we're kind of right now at
25:22 - 25:26

the cusp of having some game-changing tech
really go - maybe you're saying like
25:26 - 25:30

fuzzing is not at the cusp, I promise it's
just at the cusp. We haven't seen all the
25:30 - 25:34

fuzzing has to offer right and so maybe
there's room maybe there's room for some
25:34 - 25:41

automation to be possible in pursuit of
Zatko's question. Of course, there are
25:41 - 25:46

many challenges still in, you know, using
existing hacker technology. Mostly of the
25:46 - 25:50

form of various open questions. For
example if you're into fuzzing, you know,
25:50 - 25:53

hey: identifying unique crashes. There's
an open question. We'll talk about some of
25:53 - 25:57

those, we'll talk about some of those. But
I'm going to offer another perspective
25:57 - 26:01

here: so maybe you're not in the business
of doing software reviews but you know a
26:01 - 26:06

little computer science. And maybe that
computer science has you wondering what's
26:06 - 26:13

this guy talking about, right. I'm here to
acknowledge that. So whatever you think
26:13 - 26:17

the word security means: I've got a list
of questions up here. Whatever you think
26:17 - 26:20

the word security means, probably, some of
these questions are relevant to your
26:20 - 26:23

definition. Right.
Does the software have a hidden backdoor
26:23 - 26:27

or any kind of hidden functionality, does
it handle crypto material correctly, etc,
26:27 - 26:30

so forth. Anyone in here who knows some
computers abilities theory knows that
26:30 - 26:34

every single one of these questions and
many others like them are undecidable due
26:34 - 26:38

to reasons essentially no different than
the reason the halting problem is
26:38 - 26:41

undecidable,\ which is to say due to
reasons essentially first identified and
26:41 - 26:46

studied by Alan Turing a long time before
we had microarchitectures and all these
26:46 - 26:50

other things. And so, the computability
perspective says that, you know, whatever
26:50 - 26:55

your definition of security is ultimately
you have this recognizability problem:
26:55 - 26:58

fancy way of saying that algorithms won't
be able to recognize secure software
26:58 - 27:03

because of the undecidability these
issues. The takeaway, the takeaway is that
27:03 - 27:07

the computability angle on all of this
says: anyone who's in the business that
27:07 - 27:12

we're in has to use heuristics. You have
to, you have to.
27:15 - 27:25

All right, this guy gets it. All right, so
on the tech side our last technical
27:25 - 27:28

perspective that we're going to take now
is certainly the most abstract which is
27:28 - 27:32

the Bayesian perspective, right. So if
you're a frequentist, you need to get with
27:32 - 27:37

the times you know it's everything
Bayesian now. So, let's talk about this
27:37 - 27:44

for a bit. Only two slides of math, I
promise, only two! So, let's say that I
27:44 - 27:47

have some corpus of software. Perhaps it's
a collection of all modern browsers,
27:47 - 27:51

perhaps it's the collection of all the
packages in the Debian repository, perhaps
27:51 - 27:54

it's everything on github that builds on
this system, perhaps it's a hard drive
27:54 - 27:58

full of warez that some guy mailed you,
right? You have some corpus of software
27:58 - 28:03

and for a random program in that corpus we
can consider this probability: the
28:03 - 28:07

probability distribution of which software
is secure versus which is not. For reasons
28:07 - 28:11

described on the computability
perspective, this number is not a
28:11 - 28:17

computable number for any reasonable
definition of security. So that's a neat
28:17 - 28:21

and so, for practical terms, if you want
to do some probabilistic reasoning, you
28:21 - 28:28

need some surrogate for that and so we
consider this here. So, instead of
28:28 - 28:31

considering the probability that a piece
of software is secure, a non computable
28:31 - 28:36

non verifiable claim, we take a look here
at this indexed collection of
28:36 - 28:39

probabilities. This is an infinite
countable family of probability
28:39 - 28:44

distributions, basically P sub h,k is just
the probability that for a random piece of
28:44 - 28:50

software in the corpus, h work units of
fuzzing will find no more than k unique
28:50 - 28:56

crashes, right. And why is this relevant?
Well, at the bottom we have this analytic
28:56 - 28:59

observation, which is that in the limit as
h goes to infinity you're basically
28:59 - 29:04

saying: "Hey, you know, if I fuzz this
thing for infinity times, you know, what's
29:04 - 29:08

that look like?" And, essentially, here we
have analytically that this should
29:08 - 29:13

converge. The P sub h,1 should converge to
the probability that a piece of software
29:13 - 29:16

just simply cannot be made to crash. Not
the same thing as being secure, but
29:16 - 29:24

certainly not a small concern relevant to
security. So, none of that stuff actually
29:24 - 29:31

was Bayesian yet, so we need to get there.
And so here we go, right: so, the previous
29:31 - 29:35

slide described a probability distribution
measured based on fuzzing. But fuzzing is
29:35 - 29:39

expensive and it is also not an answer to
Zatko's question because it finds
29:39 - 29:44

vulnerabilities, it doesn't measure
security in the general sense and so
29:44 - 29:47

here's where we make the jump to
conditional probabilities: Let M be some
29:47 - 29:52

observable property of software has ASLR,
has RELRO, calls these functions, doesn't
29:52 - 29:57

call those functions... take your pick.
For random s in S we now consider these
29:57 - 30:02

conditional probability distributions and
this is the same kind of probability as we
30:02 - 30:08

had on the previous slide but conditioned
on this observable being true, and this
30:08 - 30:11

leads to the refined of the Siddall
variant of Zatko's question:
30:11 - 30:17

Which observable properties of software
satisfy that, when the software has
30:17 - 30:23

property m, the probability of fuzzing
being hard is very high? That's what this
30:23 - 30:27

version of this question phrases, and here
we say, you know, in large log(h)/k, in
30:27 - 30:32

other words: exponentially more fuzzing
than you expect to find bugs. So this is
30:32 - 30:36

the technical version of what we're after.
All of this can be explored, you can
30:36 - 30:40

brute-force your way to finding all of
this stuff, and that's exactly what we're
30:40 - 30:48

doing. So we're looking for all kinds of
things, we're looking for all kinds of
30:48 - 30:54

things that correlate with fuzzing having
low yield on a piece of software, and
30:54 - 30:57

there's a lot of ways in which that can
happen. It could be that you are looking
30:57 - 31:01

at a feature of software that literally
prevents crashes. Maybe it's the never
31:01 - 31:08

crash flag, I don't know. But most of the
things I've talked about, ASLR, RERO, etc.
31:08 - 31:12

don't prevent crashes. In fact a ASLR can
take non-crashing programs and make them
31:12 - 31:17

crashing. It's the number one reason
vendors don't enable it, right? So why am
31:17 - 31:20

I talking about ASLR? Why am I talking
about RELRO? Why am i talking about all
31:20 - 31:23

these things that have nothing to do with
stopping crashes and I'm claiming I'm
31:23 - 31:27

measuring crashes? This is because, in the
Bayesian perspective, correlation is not
31:27 - 31:32

the same thing as causation, right?
Correlation is not the same thing as
31:32 - 31:35

causation. It could be that M's presence
literally prevents crashes, but it could
31:35 - 31:40

also be that, by some underlying
coincidence, the things we're looking for
31:40 - 31:44

are mostly only found in software that's
robust against crashing.
31:44 - 31:49

If you're looking for security, I submit
to you that the difference doesn't matter.
31:49 - 31:55

Okay, end of my math, danke. I will now go
ahead and do this like really nice analogy
31:55 - 31:59

of all those things that I just described,
right. So we're looking for indicators of
31:59 - 32:04

a piece of software being secure enough to
be good for consumers, right. So here's an
32:04 - 32:08

analogy. Let's say you're a geologist, you
study minerals and all of that and you're
32:08 - 32:14

looking for diamonds. Who isn't, right?
Want those diamonds! And like how do you
32:14 - 32:18

find diamonds? Even in places that are
rich in diamonds, diamonds are not common.
32:18 - 32:21

You don't just go walking around in your
boots, kicking until your toe stubs on a
32:21 - 32:27

diamond? You don't do that. Instead you
look for other minerals that are mostly
32:27 - 32:32

only found near diamonds but are much more
abundant in those locations than the
32:32 - 32:38

diamonds. So, this is mineral science 101,
I guess, I don't know. So, for example,
32:38 - 32:41

you want to go find diamond: put on your
boots and go kicking until you find some
32:41 - 32:46

chromite, look for some diopside, you
know, look for some garnet. None of these
32:46 - 32:50

things turn into diamonds, none of these
things cause diamonds but if you're
32:50 - 32:54

finding good concentrations of these
things, then, statistically, there's
32:54 - 32:58

probably diamonds nearby. That's what
we're doing. We're not looking for the
32:58 - 33:03

things that cause good security per se.
Rather, we're looking for the indicators
33:03 - 33:08

that you have put the effort into your
software, right? How's that working out
33:08 - 33:15

for us? How's that working out for us?
Well, we're still doing studies. It's, you
33:15 - 33:18

know, early to say exactly but we do have
the following interesting coincidence: and
33:18 - 33:25

so, here presented I have a collection of
prices that somebody gave much for so-
33:25 - 33:30

called the underground exploits. And I can
tell you these prices are maybe a little
33:30 - 33:34

low these days but if you work in that
business, if you go to Cyscin, if you do
33:34 - 33:39

that kind of stuff, maybe you know that
this is ballpark, it's ballpark.
33:39 - 33:44

Alright, and, just a coincidence, maybe it
means we're on the right track, I don't
33:44 - 33:49

know, but it's an encouraging sign: When
we run these programs through our
33:49 - 33:53

analysis, our rankings more or less
correspond to the actual prices that you
33:53 - 33:58

encounter in the wild for access via these
applications. Up above, I have one of our
33:58 - 34:02

histogram charts. You can see here that
Chrome and Edge in this particular model
34:02 - 34:06

scored very close to the same and it's a
test model, so, let's say they're
34:06 - 34:11

basically the same.
Firefox, you know, behind there a little
34:11 - 34:15

bit. I don't have Safari on this chart
because - this or all Windows applications
34:15 - 34:21

- but the Safari score falls in between.
So, lots of theory, lots of theory, lots
34:21 - 34:28

of theory and then we have this. So, we're
going to go ahead now and hand off to our
34:28 - 34:32

lead engineer, Parker, who is going to
talk about some of the concrete stuff, the
34:32 - 34:35

non-chalkboard stuff, the software stuff
that actually makes this work.
34:36 - 34:41

Thompson: Yeah, so I want to talk about
the process of actually doing it. Building
34:41 - 34:45

the tooling that's required to collect
these observables. Effectively, how do you
34:45 - 34:51

go mining for indicator indicator
minerals? But first the progression of
34:51 - 34:56

where we are and where we're going. We
initially broke this out into three major
34:56 - 35:00

tracks of our technology. We have our
static analysis engine, which started as a
35:00 - 35:06

prototype, and we have now recently
completed a much more mature and solid
35:06 - 35:10

engine that's allowing us to be much more
extensible and digging deeper into
35:10 - 35:16

programs, and provide a much more deep
observables. Then, we have the data
35:16 - 35:22

collection and data reporting. Tim showed
some of our early stabs at this. We're
35:22 - 35:25

right now in the process of building new
engines to make the data more accessible
35:25 - 35:30

and easy to work with and hopefully more
of that will be available soon. Finally,
35:30 - 35:36

we have our fuzzer track. We needed to get
some early data, so we played with some
35:36 - 35:41

existing off-the-shelf fuzzers, including
AFL, and, while that was fun,
35:41 - 35:44

unfortunately it's a lot of work to
manually instrument a lot of fuzzers for
35:44 - 35:49

hundreds of binaries.
So, we then built an automated solution
35:49 - 35:53

that started to get us closer to having a
fuzzing harness that could autogenerate
35:53 - 35:58

itself, depending on the software, the
software's behavior. But, right now,
35:58 - 36:02

unfortunately that technology showed us
more deficiencies than it showed
36:02 - 36:07

successes. So, we are now working on a
much more mature fuzzer that will allow us
36:07 - 36:13

to dig deeper into programs as we're
running and collect very specific things
36:13 - 36:21

that we need for our model and our
analysis. But on to our analytic pipeline
36:21 - 36:26

today. This is one of the most concrete
components of our engine and one of the
36:26 - 36:29

most fun!
We effectively wanted some type of
36:29 - 36:35

software hopper, where you could just pour
programs in, installers and then, on the
36:35 - 36:40

other end, come reports: Fully annotated
and actionable information that we can
36:40 - 36:45

present to people. So, we went about the
process of building a large-scale engine.
36:45 - 36:50

It starts off with a simple REST API,
where we can push software in, which then
36:50 - 36:56

gets moved over to our computation cluster
that effectively provides us a fabric to
36:56 - 37:00

work with. It makes is made up of a lot of
different software suites, starting off
37:00 - 37:07

with our data processing, which is done by
apache spark and then moves over into data
37:07 - 37:13

data handling and data analysis in spark,
and then we have a common HDFS layer to
37:13 - 37:18

provide a place for the data to be stored
and then a resource manager and Yarn. All
37:18 - 37:22

of that is backed by our compute and data
nodes, which scale out linearly. That then
37:22 - 37:28

moves into our data science engine, which
is effectively spark with Apache Zeppelin,
37:28 - 37:30

which provides us a really fun interface
where we can work with the data in an
37:30 - 37:36

interactive manner but be kicking off
large-scale jobs into the cluster. And
37:36 - 37:40

finally, this goes into our report
generation engine. What this bought us,
37:40 - 37:46

was the ability to linearly scale and make
that hopper bigger and bigger as we need,
37:46 - 37:51

but also provide us a way to process data
that doesn't fit in a single machine's
37:51 - 37:54

RAM. You can push the instance sizes as
you large as you want, but we have
37:54 - 38:00

datasets that blow away any single host
RAM set. So this allows us to work with
38:00 - 38:09

really large collections of observables.
I want to dive down now into our actual
38:09 - 38:13

static analysis. But first we have to
explore the problem space, because it's a
38:13 - 38:19

nasty one. Effectively in settles mission
is to process as much software as
38:19 - 38:26

possible. Hopefully all of it, but it's
hard to get your hand on all the binaries
38:26 - 38:29

that are out there. When you start to look
at that problem you understand there's a
38:29 - 38:35

lot of combinations: there's a lot of CPU
architectures, there's a lot of operating
38:35 - 38:39

systems, there's a lot of file formats,
there's a lot of environments the software
38:39 - 38:43

gets deployed into, and every single one
of them has their own app Archer app
38:43 - 38:47

armory features. And it can be
specifically set for one combination
38:47 - 38:52

button on another and you don't want to
penalize a developer for not turning on a
38:52 - 38:56

feature they had no access to ever turn
on. So effectively we need to solve this
38:56 - 39:01

in a much more generic way. And so what we
did is our static analysis engine
39:01 - 39:05

effectively looks like a gigantic
collection of abstraction libraries to
39:05 - 39:12

handle binary programs. You take in some
type of input file be it ELF, PE, MachO
39:12 - 39:18

and then the pipeline splits. It goes off
into two major analyzer classes, our
39:18 - 39:22

format analyzers, which look at the
software much like how a linker or loader
39:22 - 39:27

would look at it. I want to understand how
it's going to be loaded up, what type of
39:27 - 39:31

armory feature is going to be applied and
then we can run analyzers over that. In
39:31 - 39:35

order to achieve that we need abstraction
libraries that can provide us an abstract
39:35 - 39:41

memory map, a symbol resolver, generic
section properties. So all that feeds in
39:41 - 39:46

and then we run over a collection of
analyzers to collect data and observables.
39:46 - 39:50

Next we have our code analyzers, these are
the analyzers that run over the code
39:50 - 39:58

itself. I need to be able to look at every
possible executable path. In order to do
39:58 - 40:02

that we need to do function discovery,
feed that into a control flow recovery
40:02 - 40:08

engine, and then as a post-processing step
dig through all of the possible metadata
40:08 - 40:13

in the software, such as like a switch
table, or something like that to get even
40:13 - 40:21

deeper into the software. Then this
provides us a basic list of basic blocks,
40:21 - 40:24

functions, instruction ranges. And does so
in an efficient manner so we can process a
40:24 - 40:31

lot of software as it goes. Then all that
gets fed over into the main modular
40:31 - 40:37

analyzers. Finally, all of this comes
together and gets put into a gigantic blob
40:37 - 40:42

of observables and fed up to the pipeline.
We really want to thank the Ford
40:42 - 40:47

Foundation for supporting our work in
this, because the pipeline and the static
40:47 - 40:52

analysis has been a massive boon for our
project and we're only beginning now to
40:52 - 40:59

really get our engine running and we're
having a great time with it. So digging
40:59 - 41:04

into the observables themselves, what are
we looking at and let's break them apart.
41:04 - 41:09

So the format structure components, things
like ASLR, DEP, RELRO.
41:09 - 41:13

basic app armory, that's going to go into
the feature and gonna be enabled at the OS
41:13 - 41:18

layer when it gets loaded up or linked.
And we also collect other metadata about
41:18 - 41:22

the program such as like: "What libraries
are linked in?", "What's its dependency
41:22 - 41:26

tree look like – completely?", "How did
those software, how did those library
41:26 - 41:32

score?", because that can affect your main
software. Interesting example on Linux, if
41:32 - 41:36

you link a library that requires an
executable stack, guess what your software
41:36 - 41:40

now has an executable stack, even if you
didn't mark that. So we need to be owners
41:40 - 41:45

to understand what ecosystem the software
is gonna live in. And the code structure
41:45 - 41:48

analyzers look at things like
functionality: "What's the software
41:48 - 41:53

doing?", "What type of app armory is
getting injected into the code?". A great
41:53 - 41:56

example of that is something like stack
guards or fortify source. These are our
41:56 - 42:02

main features that only really apply and
can be observed inside of the control flow
42:02 - 42:08

or inside of the actual instructions
themselves. This is why control
42:08 - 42:11

photographs are key.
We played around with a number of
42:11 - 42:16

different ways of analyzing software that
we could scale out and ultimately we had
42:16 - 42:20

to come down to working with control
photographs. Provided here is a basic
42:20 - 42:23

visualization of what I'm talking about
with a control photograph, provided by
42:23 - 42:29

Benja, which has wonderful visualization
tools, hence this photo, and not our
42:29 - 42:33

engine because we don't build their very
many visualization engines. But you
42:33 - 42:38

basically have a function that's broken up
into basic blocks, which is broken up into
42:38 - 42:43

instructions, and then you have basic flow
between them. Having this as an iterable
42:43 - 42:48

structure that we can work with, allows us
to walk over that and walk every single
42:48 - 42:51

instruction, understand the references,
understand where code and data is being
42:51 - 42:54

referenced, and how is it being
referenced.
42:54 - 42:58

And then what type of functionalities
being used, so this is a great way to find
42:58 - 43:03

something, like whether or not your stack
guards are being applied on every function
43:03 - 43:08

that needs them, how deep are they being
applied, and is the compiler possibly
43:08 - 43:12

introducing errors into your armory
features. which are interesting side
43:12 - 43:20

studies. Also why we did this is because
we want to push the concept of what type
43:20 - 43:28

of observables even farther. Let's say
take this example you want to be able to
43:28 - 43:34

take instruction abstractions. Let's say
for all major architectures you can break
43:34 - 43:39

them up into major categories. Be it
arithmetic instructions, data manipulation
43:39 - 43:46

instructions, like load stores and then
control flow instructions. Then with these
43:46 - 43:53

basic fundamental building blocks you can
make artifacts. Think of them like a unit
43:53 - 43:56

of functionality: has some type of input,
some type of output, it provides some type
43:56 - 44:01

of operation on it. And then with these
little units of functionality, you can
44:01 - 44:05

link them together and think of these
artifacts as may be sub-basic block or
44:05 - 44:09

crossing a few basic blocks, but a
different way to break up the software.
44:09 - 44:13

Because a basic block is just a branch
break, but we want to look at
44:13 - 44:19

functionality brakes, because these
artifacts can provide the basic
44:19 - 44:25

fundamental building blocks of the
software itself. It's more important, when
44:25 - 44:29

we want to start doing symbolic lifting.
So that we can lift the entire software up
44:29 - 44:35

into a generic representation, that we can
slice and dice as needed.
44:39 - 44:43

Moving from there, I want to talk about
fuzzing a little bit more. Fuzzing is
44:43 - 44:47

effectively at the heart of our project.
It provides us the rich dataset that we
44:47 - 44:52

can use to derive a model. It also
provides us awesome other metadata on the
44:52 - 44:58

side. But why? Why do we care about
fuzzing? Why is fuzzing the metric, that
44:58 - 45:05

you build an engine, that you build a
model that you drive some type of reason
45:05 - 45:12

from? So think of the set of bugs,
vulnerabilities, and exploitable
45:12 - 45:17

vulnerabilities. In an ideal world you'd
want to just have a machine that pulls out
45:17 - 45:20

exploitable vulnerabilities.
Unfortunately, this is exceedingly costly
45:20 - 45:26

for a series of decision problems, that go
between these sets. So now consider the
45:26 - 45:32

superset of bugs or faults. A fuzzer can
easily recognize, or other software can
45:32 - 45:37

easily recognize faults, but if you want
to move down the sets you unfortunately
45:37 - 45:43

need to jump through a lot of decision
hoops. For example, if you want to move to
45:43 - 45:46

a vulnerability you have to understand:
Does the attacker have some type of
45:46 - 45:51

control? Is there a trust boundary being
crossed? Is this software configured in
45:51 - 45:55

the right way for this to be vulnerable
right now? So they're human factors that
45:55 - 45:59

are not deducible from the outside. You
then amplify this decision problem even
45:59 - 46:05

worse going to exploitable
vulnerabilities. So if we collect the
46:05 - 46:11

superset of bugs, we will know that there
is some proportion of subsets in there.
46:11 - 46:16

And this provides us a datasets easily
recognizable and we can collect in a cost-
46:16 - 46:22

efficient manner. Finally, fuzzing is key
and we're investing a lot of our time
46:22 - 46:27

right now and working on a new fuzzing
engine, because there are some key things
46:27 - 46:32

we want to do.
We want to be able to understand all of
46:32 - 46:35

the different paths the software could be
taking, and as you're fuzzing you're
46:35 - 46:40

effectively driving the software down as
many unique paths while referencing as
46:40 - 46:48

many unique data manipulations as
possible. So if we save off every path,
46:48 - 46:52

annotate the ones that are faulting, we
now have this beautiful rich data set of
46:52 - 46:57

exactly where the software went as we were
driving it in specific ways. Then we feed
46:57 - 47:02

that back into our static analysis engine
and begin to generate those instruction
47:02 - 47:08

out of those instruction abstractions,
those artifacts. And with that, imagine we
47:08 - 47:15

have these gigantic traces of instruction
abstractions. From there we can then begin
47:15 - 47:21

to train the model to explore around the
fault location and begin to understand and
47:21 - 47:27

try and study the fundamental building
blocks of what a bug looks like in an
47:27 - 47:33

abstract instruction agnostic way. This is
why we're spending a lot of time on our
47:33 - 47:37

Fuzzing engine right now. But hopefully
soon we'll be able to talk about that more
47:37 - 47:40

and maybe a tech track and not the policy
track.
47:45 - 47:49

C: Yeah, so from then on when anything
went wrong with the computer we said it
47:49 - 47:56

had bugs in it. laughs All right, I
promised you a technical journey, I
47:56 - 47:59

promised you a technical journey into the
dark abyss of as deep as you want to get
47:59 - 48:03

with it. So let's go ahead and bring it
up. Let's wrap it up and bring it up a
48:03 - 48:07

little bit here. We've talked a great deal
today about some theory. We've talked
48:07 - 48:10

about development in our tooling and
everything else and so I figured I should
48:10 - 48:14

end with some things that are not in
progress, but in fact which are done in
48:14 - 48:21

yesterday's news. Just to go ahead and
make that shared here with Europe. So in
48:21 - 48:24

the midst of all of our development we
have been discovering and reporting bugs,
48:24 - 48:29

again this not our primary purpose really.
But you know you can't help but do it. You
48:29 - 48:32

know how computers are these days. You
find bugs just for turning them on, right?
48:32 - 48:39

So we've been disclosing all of that a
little while ago. At DEFCON and Black Hat
48:39 - 48:43

our chief scientist Sarah together with
Mudge went ahead and dropped this
48:43 - 48:48

bombshell on the Firefox team which is
that for some period of time they had ASLR
48:48 - 48:54

disabled on OS X. When we first found it
we assumed it was a bug in our tools. When
48:54 - 48:58

we first mentioned it in a talk they came
to us and said it's definitely a bug on
48:58 - 49:03

our tools or might be or some level of
surprise and then people started looking
49:03 - 49:09

into it and in fact at one point it had
been enabled and then temporarily
49:09 - 49:13

disabled. No one knew, everyone thought it
was on. It takes someone looking to notice
49:13 - 49:18

that kind of stuff, right. Major shout out
though, they fixed it immediately despite
49:18 - 49:24

our full disclosure on stage and
everything. So very impressed, but in
49:24 - 49:28

addition to popping surprises on people
we've also been doing the usual process of
49:28 - 49:33

submitting patches and bugs, particularly
to LLVM and Qemu and if you work in
49:33 - 49:36

software analysis you could probably guess
why.
49:37 - 49:39

Incidentally, if you're looking for a
target to fuzz if you want to go home from
49:39 - 49:46

CCC and you want to find a ton of findings
LLVM comes with a bunch of parsers. You
49:46 - 49:50

should fuzz them, you should fuzz them and
I say that because I know for a fact you
49:50 - 49:53

are gonna get a bunch of findings and it'd
be really nice. I would appreciate it if I
49:53 - 49:56

didn't have to pay the people to fix them.
So if you wouldn't mind disclosing them
49:56 - 50:00

that would help. But besides these bug
reports and all these other things we've
50:00 - 50:04

also been working with lots of others. You
know we gave a talk earlier this summer,
50:04 - 50:07

Sarah gave a talk earlier this summer,
about these things and she presented
50:07 - 50:12

findings on comparing some of these base
scores of different Linux distributions.
50:12 - 50:16

And based on those findings there was a
person on the fedora red team, Jason
50:16 - 50:20

Calloway, who sat there and well I can't
read his mind but I'm sure that he was
50:20 - 50:25

thinking to himself: golly it would be
nice to not, you know, be surprised at the
50:25 - 50:29

next one of these talks. They score very
well by the way. They were leading in
50:29 - 50:34

many, many of our metrics. Well, in any
case, he left Vegas and he went back home
50:34 - 50:37

and him and his colleagues have been
working on essentially re-implementing
50:37 - 50:42

much of our tooling so that they can check
the stuff that we check before they
50:42 - 50:48

release. Before they release. Looking for
security before you release. So that would
50:48 - 50:52

be a good thing for others to do and I'm
hoping that that idea really catches on.
50:52 - 50:59

laughs Yeah, yeah right, that would be
nice. That would be nice.
50:59 - 51:04

But in addition to that, in addition to
that our mission really is to get results
51:04 - 51:08

out to the public and so in order to
achieve that, we have broad partnerships
51:08 - 51:12

with Consumer Reports and the digital
standard. Especially if you're into cyber
51:12 - 51:16

policy, I really encourage you to take a
look at the proposed digital standard,
51:16 - 51:21

which is encompassing of the things we
look for and and and so much more. URLs,
51:21 - 51:26

data, traffic, motion and cryptography and
update mechanisms and all that good stuff.
51:26 - 51:32

So, where we are and where we're going,
the big takeaways here for if you're
51:32 - 51:36

looking for that, so what, three points
for you: one we are building a tooling
51:36 - 51:40

necessary to do larger and larger and
larger studies regarding these surrogate
51:40 - 51:45

security stores. My hope is that in some
period of the not-too-distant future, I
51:45 - 51:49

would like to be able to, with my
colleagues, publish some really nice
51:49 - 51:52

findings about what are the things that
you can observe in software, which have a
51:52 - 51:57

suspiciously high correlation with the
software being good. Right, nobody really
51:57 - 52:00

knows right now. It's an empirical
question. As far as I know, the study
52:00 - 52:03

hasn't been done. We've been running it on
the small scale. We're building the
52:03 - 52:07

tooling to do it on a much larger scale.
We are hoping that this winds up being a
52:07 - 52:11

useful field in security as that
technology develops. In the meantime our
52:11 - 52:16

static analyzers are already making
surprising discoveries: hit YouTube and
52:16 - 52:21

take a look for Sara Zatko's recent talks
at DEFCON/Blackhat. Lots of fun findings
52:21 - 52:26

in there. Lots of things that anyone who
looks would have found it. Lots of that.
52:26 - 52:29

And then lastly, if you were in the
business of shipping software and you are
52:29 - 52:33

thinking to yourself.. okay so these guys,
someone gave them some money to mess up my
52:33 - 52:37

day and you're wondering: what can I do to
not have my day messed up? One simple
52:37 - 52:41

piece of advice, one simple piece of
advice: make sure your software employs
52:41 - 52:46

every exploit mitigation technique Mudge
has ever or will ever hear of. And he's
52:46 - 52:50

heard of a lot of them. He's only gonna,
you know all that, turn all those things
52:50 - 52:52

on and if you don't know anything about
that stuff, if nobody on your team knows
52:52 - 52:57

anything about that stuff didn't I don't
even know I'm saying this if you hear you
52:57 - 53:01

know about that stuff so do that. If
you're not here, then you should be here.
53:04 - 53:16

Danke, Danke.
Herald Angel: Thank you, Tim and Parker.
53:18 - 53:24

Do we have any questions from the
audience? It's really hard to see you with
53:24 - 53:30

that bright light in my face. I think the
signal angel has a question. Signal Angel:
53:30 - 53:35

So the IRC channel was impressed by your
tools and your models that you wrote. And
53:35 - 53:38

they are wondering what's going to happen
to that, because you do have funding from
53:38 - 53:42

the Ford foundation now and so what are
your plans with this? Do you plan on
53:42 - 53:46

commercializing this or is it going to be
open source or how do we get our hands on
53:46 - 53:49

this?
C: It's an excellent question. So for the
53:49 - 53:54

time being the money that we are receiving
is to develop the tooling, pay for the AWS
53:54 - 53:58

instances, pay for the engineers and all
that stuff. The direction as an
53:58 - 54:01

organization that we would like to take
things I have no interest in running a
54:01 - 54:05

monopoly. That sounds like a fantastic
amount of work and I really don't want to
54:05 - 54:09

do it. However, I have a great deal of
interest in taking the gains that we are
54:09 - 54:14

making in the technology and releasing the
data so that other competent researchers
54:14 - 54:19

can go through and find useful things that
we may not have noticed ourselves. So
54:19 - 54:22

we're not at a point where we are
releasing data in bulk just yet, but that
54:22 - 54:26

is simply a matter of engineering our
tools, are still in flux as we, you know.
54:26 - 54:29

When we do that, we want to make sure the
data is correct and so our software has to
54:29 - 54:34

have its own low bug counts and all these
other things. But ultimately for the
54:34 - 54:38

scientific aspect of our mission. Though
the science is not our primary mission.
54:38 - 54:42

Our primary mission is to apply it to help
consumers. At the same time, it is our
54:42 - 54:48

belief that an opaque model is as good as
crap, no one should trust an opaque model,
54:48 - 54:51

if somebody is telling you that they have
some statistics and they do not provide
54:51 - 54:55

you with any underlying data and it is not
reproducible you should ignore them.
54:55 - 54:58

Consequently what we are working towards
right now is getting to a point where we
54:58 - 55:03

will be able to share all of those
findings. The surrogate scores, the
55:03 - 55:06

interesting correlations between
observables and fuzzing. All that will be
55:06 - 55:09

public as the material comes online.
Signal Angel: Thank you.
55:09 - 55:12

C: Thank you.
Herald Angel: Thank you. And microphone
55:12 - 55:15

number three please.
Mic3: Hi, thanks so some really
55:15 - 55:18

interesting work you presented here. So
there's something I'm not sure I
55:18 - 55:23

understand about the approach that you're
taking. If you are evaluating the security
55:23 - 55:26

of say a library function or the
implementation of a network protocol for
55:26 - 55:30

example you know there'd be a precise
specification you could check that
55:30 - 55:35

against. And the techniques you're using
would make sense to me. But it's not so
55:35 - 55:38

clear since you've set the goal that
you've set for yourself is to evaluate
55:38 - 55:44

security of consumer software. It's not
clear to me whether it's fair to call
55:44 - 55:47

these results security scores in the
absence of a threat model so. So my
55:47 - 55:50

question is, you know, how is it
meaningful to make a claim that a piece of
55:50 - 55:52

software is secure if you don't have a
threat model for it?
55:52 - 55:56

C: This is an excellent question and I
anyone who disagrees is they should the
55:56 - 56:01

wrong. Security without a threat model is
not security at all. It's absolutely a
56:01 - 56:06

true point. So the things that we are
looking for, most of them are things that
56:06 - 56:09

you will already find present in your
threat model. And so for example we were
56:09 - 56:12

reporting on the presence of things like a
ASLR and lots of other things that get to
56:12 - 56:17

the heart of exploitability of a piece of
software. So for example if we are
56:17 - 56:20

reviewing a piece of software, that has no
attack surface
56:20 - 56:24

then it is canonically not in the threat
model and in that sense it makes no sense
56:24 - 56:29

to report on its overall security. On the
other hand, if we're talking about
56:29 - 56:33

software like say a word processor, a
browser, anything on your phone, anything
56:33 - 56:36

that talks on the network, we're talking
about those kinds of applications then I
56:36 - 56:39

would argue that exploit mitigations and
the other things that we are measuring are
56:39 - 56:44

almost certainly very relevant. So there's
a sense in which what we are measuring is
56:44 - 56:48

the lowest common denominator among what
we imagine or the dominant threat models
56:48 - 56:53

for the applications. The hand-wavy
answer, but I promised heuristics so there
56:53 - 56:55

you go.
Mic3: Thanks.
56:55 - 57:02

C: Thank you.
Herald Angel: Any questions? No raising
57:02 - 57:07

hands, okay. And then the herald can ask a
question, because I never can. So the
57:07 - 57:12

question is: you mentioned earlier these
security labels and for example what
57:12 - 57:16

institution could give out the security
labels? Because as obviously the vendor
57:16 - 57:22

has no interest in IT security?
C: Yes it's a very good question. So our
57:22 - 57:26

partnership with Consumer Reports. I don't
know if you're familiar with them, but in
57:26 - 57:31

the United States Consumer Reports is a
major huge consumer watchdog organization.
57:31 - 57:37

They test the safety of automobiles, they
test you know lots of consumer appliances.
57:37 - 57:40

All kinds of things both to see if they
function more or less as advertised but
57:40 - 57:45

most importantly they're checking for
quality, reliability and safety. So our
57:45 - 57:50

partnership with Consumer Reports is all
about us doing our work and then
57:50 - 57:54

publishing that. And so for example the
televisions that we presented the data on
57:54 - 57:58

all of that was collected and published in
partnership with Consumer Reports.
57:58 - 58:01

Herald: Thank you.
C: Thank you.
58:03 - 58:12

Herald: Any other questions for stream. I
hear a no. Well in this case people thank
58:12 - 58:16

you.
Thank Tim and Parker for their nice talk
58:16 - 58:20

and please give them a very very warm hall
round of applause.
58:20 - 58:25

applause
C: Thank you. T: Thank you.
58:25 - 58:51

subtitles created by c3subtitles.de
in the year 2017. Join, and help us!

Title:: 34C3 - How risky is the software you use?
Description:: more » « less
Video Language:: English
Duration:: 58:51

	C3Subtitles edited English subtitles for 34C3 - How risky is the software you use?
	kalyan edited English subtitles for 34C3 - How risky is the software you use?
	Maximilian Marx edited English subtitles for 34C3 - How risky is the software you use?
	Bar Sch edited English subtitles for 34C3 - How risky is the software you use?

English subtitles

Revisions

Revision 4 Edited

C3Subtitles

34C3 - How risky is the software you use?

Revisions

Our website uses cookies

Operating cookies (Required)