-
34c3 intro
-
Herald: He's been publishing in academia
-
for almost 30 years. Please join me in
giving him a warm welcome to 34c3.
-
Applause
Alastiar Reid: Thank you very much for
-
your introduction. So I'm going to be
talking about the ARM processors and
-
they're incredibly widely used. You find
them in phones, tablets, IOT devices,
-
hard-disk drives; they're all over. And if
you think about it, what I'm saying is:
-
They're in all the things, which contain
your most private and personal data, so
-
it's really, really important that they do
exactly what they should be doing and
-
nothing else. We need to make sure we
really understand them and that means it's
-
important that we can analyze them for any
malware, look for vulnerabilities and so
-
on. So I'm going to be talking about some
work I started about six years ago,
-
creating a very precise specification of
what an ARM processor does and I'm going
-
to be talking also about back in April I'm
released this their specification in a
-
machine readable form. And I should say
that I'm working with Kimbridge University
-
to turn that in something you can use. So
so this talk I'm going to mostly actually
-
talk about this executable processor
specification, that's going to be the bulk
-
of the talk but at the end that sets me up
very nicely to talk about a formally
-
verified software. So I thought, given the
theme of the Congress, it would be more
-
useful to emphasize things you could
actually do. So the specification that ARM
-
released, what's in it? Well, there's lots
of instruction descriptions of course but
-
there's also lots of really interesting
security features; things to do with
-
memory protection, exceptions, privilege
checks and so on.
-
So there's lots of really interesting
stuff ... of your interest in their how
-
secure your devices and how to make sure
it really is secure. Throughout the talk
-
I'll be giving a bunch of links; you can
go and download the specs right now from
-
the first link but please wait to the end
of the talk and there's also the
-
specification rendered as HTML, as tools
that can take the specification release
-
apart and find useful information in it;
I've written blogs and papers about it as
-
well. And so let's just dive into, look at
the bits of the actual specification. So
-
the first thing is all the really
important security features in the specif-
-
in our processor are controlled by, what I
call, the system control registers and the
-
top dog among all those control registers
is this one here called SCTLR. Just trips
-
off the tongue, doesn't it? So SCTLR is
where - it's full of lots of individual
-
control bits, which affect either
optimizations, the processor's doing, or
-
also security features. And so let's just
zoom in and one of them, to give you an
-
idea of what kind of information the spec
contains.
-
So here's some documentation, telling you
about a WXN bit. What does that do? It
-
makes sure that any code, any, that your
stack cannot contain code. you can't boot
-
instructions on the code, and on the
stack, because they're proce- if you set
-
this bit the processor won't execute them.
In other words: This is the bit that
-
triggered the requirement for things like
return-oriented programming. Okay, so what
-
can you do with this fact? Well, suppose
you're in the habit of reverse engineering
-
some code. You might, your disassemble may
show you something like this. And you're
-
probably all staring at this going: "What
on earth does that do?", because it's
-
extremely cryptic. But using the
information that's in the XML version of
-
the release you could easily figure out
how to build, how to decode some of the
-
more cryptic parts and go "Oh actually,
it's that SCTLR register", right, so you
-
could decode the cryptic name for the
number for the register into that. But you
-
could do a bit more than that. You can see
it's setting one of the bits in the
-
register so - it is of course the bit I
just told you about WXN, so we could dig
-
into that and, so we could kind of use the
information from the XML to render it,
-
perhaps, as like this.
So you can make things a bit more useful
-
and you can also take that documentation
that was there, that told you what the WXN
-
bit does, and maybe, if you're a feeling
really energetic, you could, when you
-
click on it, mouse over or whatever, it
could bring up some of that documentation.
-
And and that makes specf- that makes it
much easier to understand what the cryptic
-
piece of code at the top is doing. Okay,
so that's just a very shallow thing you
-
can get from the specification. If we dig
into the instruction descriptions there's
-
also things like the assembly language of
- the specification of the assembly syntax
-
and. So, something I did a few years ago
and which I just wrote a blog article
-
about over the weekend was was it's
possible to actually take that
-
specification, turn it into a
disassembler. So I first of all
-
transformed it into the code I'm showing
at the bottom. What this is showing is how
-
to
take a binary description of an
-
instruction so that's the the top line of
typewriter font and and then turn that
-
into strings, which is what this the code
at the bottom is describing how to do. So
-
so you can use this as a disassembler.
It's actually possible to run it in
-
reverse and use it as an assembler; you
basically run the code from bottom to top
-
and you can turn strings into binary
instructions. And we'll see more of this
-
running things in reverse in a few slides
time. So the main thing about instructions
-
is of course they do something. So the
specification contains a description of
-
exactly what an instruction does and that
description is as code, which, as a
-
programmer, makes me very happy, right. I
don't understand the English words in the
-
specification but I do understand what the
code does. So one thing you can do with
-
code is execute it, so let's just walk
through - let's suppose ... take an
-
instruction and I match against that
diagram there. I might get some values for
-
for some of the variables from that and
then I can walk through, step by step,
-
evaluating this code, until I eventually
realize that register five is having some
-
value assigned to it.
Okay, so that's a fairly basic thing you
-
can do with the specification;
interpreters are fairly easy thing to
-
implement but once you have it there's a
lot you can build on top of it. And one
-
thing that's surprisingly easy to
implement is to extract a symbolic
-
representation of what the instruction
does. So I'll just show you quickly how
-
you do that, using the interpreter. So
let's take those same concrete values but
-
I'm also, I've added three other variables
at the side there, which I'm going to
-
treat as symbolic variables. And as I walk
through the code I won't just calculate
-
some concrete values, like the value five
or 32, I'll also build up a graph,
-
representing exactly how I came up with
these values at five and so on. So as I'm
-
executing the code I can build a graph
representing exactly what this code is
-
doing. And what I can do now is just throw
away the code and focus on what that graph
-
is telling me.
So I now have a symbolic representation of
-
one slice through that, through that
instruction and I can feed that to a
-
constraint solver, so if any of you have
heard of Z3 or SMT solvers, that's what
-
I'm talking about here. And a constraint
solver is a really useful tool, because
-
you can run code forwards through it, so
given some input values you can say what
-
are the outputs from this function or
from this instruction but you can also run
-
them backwards, given some
output value, some final result you want
-
to see, you can ask what inputs would
cause this to happen. And this is just
-
tremendously useful if you're trying to
figure out what instructions you want to
-
use to generate some particular effect.
All right, so if you're trying to figure
-
out how some particular register could be
accessed, how some particular thing could
-
be turned on or off, being able to ask
what inputs will cause this to happen is
-
incredibly useful. And you can also use
the constraint solver to ask for
-
intermediate values, in the middle of the
calculation. You know if you know some
-
value you want to see there you can ask
what the inputs that would cause that to
-
happen.
So, if any of you are familiar with tools
-
like KLEE, which is a symbolic execution
tool for based on LLVM, then they use
-
something similar to this. So, I've shown
you are fairly simple draft, something I
-
could show you how you build it, kind of
on the fly. This is the actual graph for
-
that same instruction. Here I've added in
a lot more nodes to do with some of the
-
functions that were being called and to do
with the calculating, whether to take a
-
branch or not. So this is about 80 or 90
nodes. And I've been experimenting with
-
extending this in two ways. So this is
just one path through the execution of an
-
instruction, so one way to improve on that
is to build a graph that represents all
-
possible paths through the instruction.
And that's much more useful, because you
-
can you then can point at something and
say "how can I make that happen?" and it
-
will tell you exactly what inputs will
cause the path that will make that happen.
-
I've also been experimenting with taking
the entire specification, right, so that
-
stuff that handles exceptions, that
fetches instructions or execute
-
instructions. It contains all
instructions. And I've been experimenting
-
with building a graph representing the
entire specification all at once. And
-
that's even more useful, because now
pretty much any question you have about
-
the specification, you can ask the
constraint solver and it will come back
-
and give you an answer. And this graph for
the full specification is quite large.
-
It's about half a million nodes and that's
for the smallest specification that our
-
major uses. So it's about half a million
nodes. But the great thing is modern
-
constraint solvers can read in that half
million nodes and still give you answers.
-
Typically in about 1 to 10 seconds for
most of the questions posed to them. So,
-
these are just tremendously useful tools,
if you're wanting to be able to understand
-
exactly what the specification does,
and exactly how some program is going to
-
behave or figure out what program you want
to write to make it behave some particular
-
way. Okay so I've talked a lot about
instructions, but most of us actually run
-
programs, right? So to turn this the
specification into something in execute
-
programs, in other words turn it into a
simulator for the ARM architecture, you
-
need to add a little bit of a loop that
will handle interrupts, fake instructions
-
and then it can execute them and handle
any exceptions. So, I... So I did this. I
-
added this loop to this specification. And
then, I thought I'd better try testing the
-
specification. And, so the good news for
me, because I work for ARM I have access
-
to ARM's internal test suite. Which is
something that ARM has been working on
-
basically since the company started 25, 30
years ago. So it's quite a large test
-
suite. It is extremely thorough, has tens
of thousands of test programs in it, runs
-
billions of instructions. And so I set
about testing the, testing the
-
specification using this test suite. And
if any of you have tested software you'll
-
be familiar with a graph that looks like
this, right? The start things don't work
-
all that well. Gradually you get things
working pretty well but then there's a
-
heavy tail of difficult to find bugs.
Okay, so, and that's basically what I
-
found when I was testing the
specification. But you should all be
-
shocked by what I just said. Because what
I'm seeing is ARM's official specification
-
could not pass ARM's official test suite
when I started, right. I mean that's
-
that's pretty shocking. And I'm telling
you this and emphasizing it, not because
-
I think ARM's specification was especially
bad. I think it was just typically bad. I
-
think if you were to take any
specification for, you know, any real-
-
world system and actually test it against
the test suite, well first of all you'd
-
find it's not an executable specification
most the time and secondly you'd find it
-
wouldn't work. And you'd probably find it
would work as badly as ARM's did. So it's
-
not just that it didn't pass all the
tests. It didn't pass any tests. In fact
-
it took me weeks to get the processor or
the specification to come out of reset.
-
Right? Just to get the starting fix to get
the first instruction took weeks. So and I
-
think so I think you'd find this if you
were to try any other specification.
-
What's my next slide? So, moving on to
useful thing you can do with the
-
specification, something we tried last
summer was using it for fuzz testing. So,
-
fuzz testing consists of taking a program
and throwing random inputs at the
-
program and just seeing what breaks and it
pretty much always breaks and a modern
-
fuzztester like maybe AFL to make it
more effective. It monitors something
-
about how the program is executing and
uses that to guide its choice of what to
-
change next. So, if it's finding...in
particular monitor whether an
-
instruction... whether the program is
taking a branch or not and if it sees it's
-
taking lots of new branches then it goes:
"Ok I'll keep trying more of what I'm
-
doing at the moment because it seems to be
effective, and if it's and not finding any
-
new branches, then it will look for
something else to try changing. So, that's
-
kind of a normal fuzzing. What you're
doing is basically trying to kind of
-
maximize your branch coverage. So, what we
did though, when we did this with the
-
specification, was, we actually monitored
branches not just in the in the binary
-
that we were analyzing but also in the
specification. And what this gave us was
-
basically: if you got.. Suppose, your the
binary you're analyzing is just straight
-
line code. There's no branches in it at
all. Then there's nothing really for your
-
fuzzer to work with right. So it doesn't
see that the code is interesting, it
-
quickly moves on to something else.
But maybe your straight line cord is doing
-
something really interesting, like
accessing a privileged register. Well,
-
there will be a branch around that to
check that you do have the privilege you
-
require.
Or maybe it's accessing a memory in which
-
cases memory protection checks. There's
also checks for: Is this a device
-
register? Or is this a kind of RAM or ROM?
So, there's a number of different branches
-
there and that gives the fuzzer
interesting things to... interesting
-
feedback to play with. So, when we set
this going on one of our microkernel,
-
we analyzed the system call interface for
that microkernel. And one of the things
-
the fuzzer surprised us with was it said:
Suppose I take the stack pointer and point
-
it into the middle of this device space
and then take an exception immediately,
-
what happens? And the answer was: there
was a bug in the microkernel and what it
-
did was: it first thing it would do is
read from the stack, where the stack
-
pointer was pointing, so we do a read from
one of the devices, which doesn't wasn't
-
really what was intended. In fact it
completely breaks a security model. So,
-
fuzztesting using not just coverage of
the monitoring branches in the binary but
-
also the specification can find you a
bunch of really interesting things. And I
-
hope some of you will I pick this idea up
and run with it. So, the reason that I was
-
doing all this work was I wanted to be
able to formally verify ARM processors and
-
so I needed to create a specification
before I could do that. So, sorry, let me
-
just take a drink here... So, this is an
overly simplified picture of a processor,
-
it's more or less what processors looked
like 25, 30 years ago, in fact.
-
But it's good enough for the example. So,
if you want to check a processor, is
-
correct, then what you can do is: add an
extra logic, which will monitor the values
-
at the start of an instruction executing
the values that are finally produced at
-
the end of an instruction executing, and
then if you feed those the specification
-
you can see what the right answer should
have been. You can compare that with what
-
the processor actually did. So, you can do
this using a test-based approach, right:
-
just feed in inputs and check that
everything matches.
-
But you can also do it using a formal
verification tool called a "bounded model
-
checker". And a bounded model checker is
like one of those constraint solvers I
-
mentioned earlier on crack cocaine. So,
what it will do is: it won't just try kind
-
of one step for what can happen but will
actually try multiple steps: all possible
-
combinations of inputs for one
instruction, two instructions, three
-
instructions, longer and longer sequences
of instructions, looking for ways they can
-
fail the property. So, and this turned out
to be an incredibly effective way of
-
finding bugs in our processors.
We've actually, we're going to be using
-
this on all future processor developments.
So, there's a paper that we wrote about
-
this but, I recommend that you go
find the video for the top by Clifford
-
Wolf from a couple of hours ago, which
describes a very similar process. Okay, so
-
I'm encouraging you to see the
specification and find something awesome
-
to do with it. There's a bunch of ideas I
have suggested there, and there's a few
-
extras which I didn't have time to
describe here. But now, let me turn to
-
this title of the talk: How can you trust
formally verified software? So, what I'm
-
talking about here is: verifying a program
against some specification. But, of
-
course, programs don't just run in a
vacuum. They run into some operating
-
system that, they use some libraries and
they're also written in some programming
-
language. And, so, when you verify your
program against your specification, you're
-
also verifying them against the
specifications of Linux, glibc and ISO-C,
-
none of which have good specifications. In
fact, they're just terrible specifications
-
which I bear little resemblance to what
compilers and operating systems actually
-
do in practice. So, if you... the current
state of things is: if you do verify your
-
program against these specifications, you
will find lots of bugs. You will make your
-
software a lot more reliable, but you'll
be doing it against specifications, which
-
are probably not very
good. Just as ARM's specification was not
-
very good before I tested it really
thoroughly. And so: Do I trust formally
-
verified software? No, not really. It's a
lot better for being formally verified,
-
but I'd also want to test it quite
thoroughly and maybe to use a bit of fuzz
-
testing as well. Okay, so this is my final
slide, by the way. So, I'm encouraging you
-
to go out and do something with the
specification. If you're interested in
-
this, then do contact me! Do ask me
questions, if you have trouble. I'm also
-
going to mention: if there's any white hat
hackers out there in the... white hat
-
hackers in the audience, then do please
talk to me or Milisch Meriac who's here in
-
the front row, if you're interested in
working at ARM and I like to thank a whole
-
lot of people at ARM and elsewhere, who've
helped me in this work and also I'd like
-
to thank you for giving me your attention
for the last half hour.
-
Applause
-
Herald: So, we have time for some
-
questions. I would ask anyone that has a
question to line up at one of the
-
microphones that are in the aisles here
one through eight. Questions for Alastair
-
there about formal verification, about
working at ARM, how is the weather in
-
Cambridge. Try to keep it on topic. And
signal angel: do we have already questions
-
from online?
Signal Angel: No questions yet.
-
Herald: Okay, then let's go to microphone
number one.
-
Microphone 1: Hi... I was just
AR: Maybe tiptoes.
-
MIC 1: Yes, I was just curious how are you
actually using the machine specification
-
in order to generate VCs for the SMG
solver. Because you didn't really get a
-
chance to talk about that I guess.
AR: Trying to think how I can describe
-
that briefly... My basic idea is to take
the specification, which basically... the
-
specification is describing a piece of
hardware. And so, I try to do what a
-
hardware engineer would do, if they were
actually building a machine that
-
implemented it. So I end up with something
that's essentially a giant circuit. So,
-
that was the graph I was describing. So,
whenever this control flow, whenever the
-
control flow joins back up, I have to
introduce things to slide between the
-
value of what was calculated in the left
or the right. And so on.
-
MIC 1: Yeah, I guess I'm mostly curious
about, like, what logics you're using for,
-
like, the solvers and stuff like that.
AR: I see. Just very basic solvers because
-
they run fastest.
MIC 1: Then ugh
-
H: Thank you. So, microphone 6 please.
MIC 6: I was just wondering, if you could
-
talk some little bit about the language
you used to write your specification.
-
AR: So this is a language which basically
I inherited from ARM's documentation. So,
-
all processors are described using
pseudocode and what I realized was that
-
the pseudocode the ARM had started writing
was actually very close to being a
-
language. And so, I sort of reverse
engineered the language hiding inside the
-
pseudocode, built some tools for it, and
just kind of figured out what the language
-
could possibly mean, given what I thought
processors actually did.
-
And the language itself is... it's just a
very simple imperative language. It's
-
actually got inherits from BBC basic for
anyone who's about the same age as me and
-
remembers BBC basic or it's a bit like
Pascal... It's it's not a complicated
-
language. It's actually designed so that
as many people as possible can read it and
-
know exactly what it says without any
doubt, without having to consult a
-
language lawyer.
H: Signal angel? Yet anything?
-
Signal Angel: Yes, now we've got a
question: "Has ARM its own form of the
-
Intel Management engine?"
-
AR: No is the short answer. Yeah, I don't
-
think we have anything quite like that. If
you... yeah, I'll just say no.
-
Laughter
H: Microphone one.
-
MIC 1: Hi! On that question that we had
before about the specification language:
-
Have you considered using Z3's own
language for expressing the instructions?
-
AR: So, Z3's own language is basically
write kind of s-expressions, which, if you
-
like lisp is wonderful but for anybody who
doesn't like Lisp it's a bit of a turn-off
-
and a bit of a barrier to understanding
it. So, again the specification is
-
designed so that people can look at it and
go: "Oh, I see what's going on here", and
-
not have... and I just try to minimize the
barriers.
-
MIC 1: Fair enough!
H: Last call, signal angel!
-
SA: How long does the complete test of the
arms specification take?
-
AR: About two years. So, yeah, so a modern
processor team about 80% of that team will
-
be verification engineers. And, so,
they're basically writing new tests,
-
running old tests, diagnosing them, just
doing that continuously for the entire
-
life of a project, which is probably about
three years. And after about the first
-
year, you basically have a processor that it
mostly works, and after that it's kind of
-
debugging it and it's, you know, kind of
fine-tuning the performance. So, yeah, it
-
takes a really long time. To run the
actual tests, I don't actually know
-
because actually one of my colleagues in
the audience, they've actually run the
-
tests. But, yeah, I don't know... and we
use a lot of processors in parallel, so I
-
really don't have an idea.
H: Thank you so much, Alastair! Let's give
-
him another warm round of applause!
Applause
-
34c3 outro
-
subtitles created by c3subtitles.de
in the year 2018. Join, and help us!