WEBVTT
00:00:00.000 --> 00:00:14.710
34c3 intro
00:00:14.710 --> 00:00:16.290
Herald: He's been publishing in academia
00:00:16.290 --> 00:00:22.520
for almost 30 years. Please join me in
giving him a warm welcome to 34c3.
00:00:22.520 --> 00:00:30.367
Applause
Alastiar Reid: Thank you very much for
00:00:30.367 --> 00:00:35.797
your introduction. So I'm going to be
talking about the ARM processors and
00:00:35.797 --> 00:00:41.300
they're incredibly widely used. You find
them in phones, tablets, IOT devices,
00:00:41.300 --> 00:00:46.449
hard-disk drives; they're all over. And if
you think about it, what I'm saying is:
00:00:46.449 --> 00:00:50.500
They're in all the things, which contain
your most private and personal data, so
00:00:50.500 --> 00:00:54.819
it's really, really important that they do
exactly what they should be doing and
00:00:54.819 --> 00:00:59.469
nothing else. We need to make sure we
really understand them and that means it's
00:00:59.469 --> 00:01:05.309
important that we can analyze them for any
malware, look for vulnerabilities and so
00:01:05.309 --> 00:01:11.990
on. So I'm going to be talking about some
work I started about six years ago,
00:01:11.990 --> 00:01:19.560
creating a very precise specification of
what an ARM processor does and I'm going
00:01:19.560 --> 00:01:26.670
to be talking also about back in April I'm
released this their specification in a
00:01:26.670 --> 00:01:34.189
machine readable form. And I should say
that I'm working with Kimbridge University
00:01:34.189 --> 00:01:42.539
to turn that in something you can use. So
so this talk I'm going to mostly actually
00:01:42.539 --> 00:01:47.560
talk about this executable processor
specification, that's going to be the bulk
00:01:47.560 --> 00:01:53.249
of the talk but at the end that sets me up
very nicely to talk about a formally
00:01:53.249 --> 00:01:57.979
verified software. So I thought, given the
theme of the Congress, it would be more
00:01:57.979 --> 00:02:05.539
useful to emphasize things you could
actually do. So the specification that ARM
00:02:05.539 --> 00:02:10.660
released, what's in it? Well, there's lots
of instruction descriptions of course but
00:02:10.660 --> 00:02:14.690
there's also lots of really interesting
security features; things to do with
00:02:14.690 --> 00:02:20.380
memory protection, exceptions, privilege
checks and so on.
00:02:20.380 --> 00:02:24.440
So there's lots of really interesting
stuff ... of your interest in their how
00:02:24.440 --> 00:02:30.690
secure your devices and how to make sure
it really is secure. Throughout the talk
00:02:30.690 --> 00:02:36.080
I'll be giving a bunch of links; you can
go and download the specs right now from
00:02:36.080 --> 00:02:42.580
the first link but please wait to the end
of the talk and there's also the
00:02:42.580 --> 00:02:47.790
specification rendered as HTML, as tools
that can take the specification release
00:02:47.790 --> 00:02:52.560
apart and find useful information in it;
I've written blogs and papers about it as
00:02:52.560 --> 00:03:00.910
well. And so let's just dive into, look at
the bits of the actual specification. So
00:03:00.910 --> 00:03:07.150
the first thing is all the really
important security features in the specif-
00:03:07.150 --> 00:03:13.070
in our processor are controlled by, what I
call, the system control registers and the
00:03:13.070 --> 00:03:19.570
top dog among all those control registers
is this one here called SCTLR. Just trips
00:03:19.570 --> 00:03:26.370
off the tongue, doesn't it? So SCTLR is
where - it's full of lots of individual
00:03:26.370 --> 00:03:32.120
control bits, which affect either
optimizations, the processor's doing, or
00:03:32.120 --> 00:03:36.950
also security features. And so let's just
zoom in and one of them, to give you an
00:03:36.950 --> 00:03:40.210
idea of what kind of information the spec
contains.
00:03:40.210 --> 00:03:46.770
So here's some documentation, telling you
about a WXN bit. What does that do? It
00:03:46.770 --> 00:03:53.190
makes sure that any code, any, that your
stack cannot contain code. you can't boot
00:03:53.190 --> 00:03:58.250
instructions on the code, and on the
stack, because they're proce- if you set
00:03:58.250 --> 00:04:04.180
this bit the processor won't execute them.
In other words: This is the bit that
00:04:04.180 --> 00:04:12.220
triggered the requirement for things like
return-oriented programming. Okay, so what
00:04:12.220 --> 00:04:18.380
can you do with this fact? Well, suppose
you're in the habit of reverse engineering
00:04:18.380 --> 00:04:24.310
some code. You might, your disassemble may
show you something like this. And you're
00:04:24.310 --> 00:04:28.120
probably all staring at this going: "What
on earth does that do?", because it's
00:04:28.120 --> 00:04:33.131
extremely cryptic. But using the
information that's in the XML version of
00:04:33.131 --> 00:04:40.710
the release you could easily figure out
how to build, how to decode some of the
00:04:40.710 --> 00:04:45.710
more cryptic parts and go "Oh actually,
it's that SCTLR register", right, so you
00:04:45.710 --> 00:04:50.490
could decode the cryptic name for the
number for the register into that. But you
00:04:50.490 --> 00:04:54.720
could do a bit more than that. You can see
it's setting one of the bits in the
00:04:54.720 --> 00:05:00.990
register so - it is of course the bit I
just told you about WXN, so we could dig
00:05:00.990 --> 00:05:08.740
into that and, so we could kind of use the
information from the XML to render it,
00:05:08.740 --> 00:05:13.560
perhaps, as like this.
So you can make things a bit more useful
00:05:13.560 --> 00:05:17.550
and you can also take that documentation
that was there, that told you what the WXN
00:05:17.550 --> 00:05:23.470
bit does, and maybe, if you're a feeling
really energetic, you could, when you
00:05:23.470 --> 00:05:27.320
click on it, mouse over or whatever, it
could bring up some of that documentation.
00:05:27.320 --> 00:05:31.870
And and that makes specf- that makes it
much easier to understand what the cryptic
00:05:31.870 --> 00:05:36.240
piece of code at the top is doing. Okay,
so that's just a very shallow thing you
00:05:36.240 --> 00:05:43.720
can get from the specification. If we dig
into the instruction descriptions there's
00:05:43.720 --> 00:05:52.570
also things like the assembly language of
- the specification of the assembly syntax
00:05:52.570 --> 00:05:59.030
and. So, something I did a few years ago
and which I just wrote a blog article
00:05:59.030 --> 00:06:05.190
about over the weekend was was it's
possible to actually take that
00:06:05.190 --> 00:06:09.900
specification, turn it into a
disassembler. So I first of all
00:06:09.900 --> 00:06:16.170
transformed it into the code I'm showing
at the bottom. What this is showing is how
00:06:16.170 --> 00:06:18.250
to
take a binary description of an
00:06:18.250 --> 00:06:25.710
instruction so that's the the top line of
typewriter font and and then turn that
00:06:25.710 --> 00:06:31.470
into strings, which is what this the code
at the bottom is describing how to do. So
00:06:31.470 --> 00:06:35.242
so you can use this as a disassembler.
It's actually possible to run it in
00:06:35.242 --> 00:06:41.940
reverse and use it as an assembler; you
basically run the code from bottom to top
00:06:41.940 --> 00:06:47.080
and you can turn strings into binary
instructions. And we'll see more of this
00:06:47.080 --> 00:06:57.430
running things in reverse in a few slides
time. So the main thing about instructions
00:06:57.430 --> 00:07:03.330
is of course they do something. So the
specification contains a description of
00:07:03.330 --> 00:07:08.650
exactly what an instruction does and that
description is as code, which, as a
00:07:08.650 --> 00:07:13.150
programmer, makes me very happy, right. I
don't understand the English words in the
00:07:13.150 --> 00:07:17.900
specification but I do understand what the
code does. So one thing you can do with
00:07:17.900 --> 00:07:22.221
code is execute it, so let's just walk
through - let's suppose ... take an
00:07:22.221 --> 00:07:29.300
instruction and I match against that
diagram there. I might get some values for
00:07:29.300 --> 00:07:35.680
for some of the variables from that and
then I can walk through, step by step,
00:07:35.680 --> 00:07:40.380
evaluating this code, until I eventually
realize that register five is having some
00:07:40.380 --> 00:07:43.889
value assigned to it.
Okay, so that's a fairly basic thing you
00:07:43.889 --> 00:07:47.380
can do with the specification;
interpreters are fairly easy thing to
00:07:47.380 --> 00:07:53.210
implement but once you have it there's a
lot you can build on top of it. And one
00:07:53.210 --> 00:07:57.820
thing that's surprisingly easy to
implement is to extract a symbolic
00:07:57.820 --> 00:08:01.680
representation of what the instruction
does. So I'll just show you quickly how
00:08:01.680 --> 00:08:07.400
you do that, using the interpreter. So
let's take those same concrete values but
00:08:07.400 --> 00:08:11.600
I'm also, I've added three other variables
at the side there, which I'm going to
00:08:11.600 --> 00:08:16.620
treat as symbolic variables. And as I walk
through the code I won't just calculate
00:08:16.620 --> 00:08:24.230
some concrete values, like the value five
or 32, I'll also build up a graph,
00:08:24.230 --> 00:08:29.710
representing exactly how I came up with
these values at five and so on. So as I'm
00:08:29.710 --> 00:08:34.830
executing the code I can build a graph
representing exactly what this code is
00:08:34.830 --> 00:08:41.188
doing. And what I can do now is just throw
away the code and focus on what that graph
00:08:41.188 --> 00:08:46.259
is telling me.
So I now have a symbolic representation of
00:08:46.259 --> 00:08:51.540
one slice through that, through that
instruction and I can feed that to a
00:08:51.540 --> 00:08:56.369
constraint solver, so if any of you have
heard of Z3 or SMT solvers, that's what
00:08:56.369 --> 00:09:05.129
I'm talking about here. And a constraint
solver is a really useful tool, because
00:09:05.129 --> 00:09:09.940
you can run code forwards through it, so
given some input values you can say what
00:09:09.940 --> 00:09:16.930
are the outputs from this function or
from this instruction but you can also run
00:09:16.930 --> 00:09:22.279
them backwards, given some
output value, some final result you want
00:09:22.279 --> 00:09:26.790
to see, you can ask what inputs would
cause this to happen. And this is just
00:09:26.790 --> 00:09:31.070
tremendously useful if you're trying to
figure out what instructions you want to
00:09:31.070 --> 00:09:36.389
use to generate some particular effect.
All right, so if you're trying to figure
00:09:36.389 --> 00:09:41.800
out how some particular register could be
accessed, how some particular thing could
00:09:41.800 --> 00:09:47.449
be turned on or off, being able to ask
what inputs will cause this to happen is
00:09:47.449 --> 00:09:53.569
incredibly useful. And you can also use
the constraint solver to ask for
00:09:53.569 --> 00:09:58.069
intermediate values, in the middle of the
calculation. You know if you know some
00:09:58.069 --> 00:10:02.859
value you want to see there you can ask
what the inputs that would cause that to
00:10:02.859 --> 00:10:07.260
happen.
So, if any of you are familiar with tools
00:10:07.260 --> 00:10:13.949
like KLEE, which is a symbolic execution
tool for based on LLVM, then they use
00:10:13.949 --> 00:10:19.999
something similar to this. So, I've shown
you are fairly simple draft, something I
00:10:19.999 --> 00:10:26.300
could show you how you build it, kind of
on the fly. This is the actual graph for
00:10:26.300 --> 00:10:31.259
that same instruction. Here I've added in
a lot more nodes to do with some of the
00:10:31.259 --> 00:10:35.980
functions that were being called and to do
with the calculating, whether to take a
00:10:35.980 --> 00:10:46.279
branch or not. So this is about 80 or 90
nodes. And I've been experimenting with
00:10:46.279 --> 00:10:52.360
extending this in two ways. So this is
just one path through the execution of an
00:10:52.360 --> 00:10:56.779
instruction, so one way to improve on that
is to build a graph that represents all
00:10:56.779 --> 00:11:00.499
possible paths through the instruction.
And that's much more useful, because you
00:11:00.499 --> 00:11:04.990
can you then can point at something and
say "how can I make that happen?" and it
00:11:04.990 --> 00:11:11.920
will tell you exactly what inputs will
cause the path that will make that happen.
00:11:11.920 --> 00:11:16.351
I've also been experimenting with taking
the entire specification, right, so that
00:11:16.351 --> 00:11:20.509
stuff that handles exceptions, that
fetches instructions or execute
00:11:20.509 --> 00:11:24.699
instructions. It contains all
instructions. And I've been experimenting
00:11:24.699 --> 00:11:29.790
with building a graph representing the
entire specification all at once. And
00:11:29.790 --> 00:11:33.470
that's even more useful, because now
pretty much any question you have about
00:11:33.470 --> 00:11:36.720
the specification, you can ask the
constraint solver and it will come back
00:11:36.720 --> 00:11:43.490
and give you an answer. And this graph for
the full specification is quite large.
00:11:43.490 --> 00:11:48.540
It's about half a million nodes and that's
for the smallest specification that our
00:11:48.540 --> 00:11:53.050
major uses. So it's about half a million
nodes. But the great thing is modern
00:11:53.050 --> 00:11:58.009
constraint solvers can read in that half
million nodes and still give you answers.
00:11:58.009 --> 00:12:04.040
Typically in about 1 to 10 seconds for
most of the questions posed to them. So,
00:12:04.040 --> 00:12:08.259
these are just tremendously useful tools,
if you're wanting to be able to understand
00:12:08.259 --> 00:12:15.670
exactly what the specification does,
and exactly how some program is going to
00:12:15.670 --> 00:12:20.689
behave or figure out what program you want
to write to make it behave some particular
00:12:20.689 --> 00:12:28.850
way. Okay so I've talked a lot about
instructions, but most of us actually run
00:12:28.850 --> 00:12:33.990
programs, right? So to turn this the
specification into something in execute
00:12:33.990 --> 00:12:38.260
programs, in other words turn it into a
simulator for the ARM architecture, you
00:12:38.260 --> 00:12:43.839
need to add a little bit of a loop that
will handle interrupts, fake instructions
00:12:43.839 --> 00:12:49.980
and then it can execute them and handle
any exceptions. So, I... So I did this. I
00:12:49.980 --> 00:12:55.860
added this loop to this specification. And
then, I thought I'd better try testing the
00:12:55.860 --> 00:13:03.610
specification. And, so the good news for
me, because I work for ARM I have access
00:13:03.610 --> 00:13:08.259
to ARM's internal test suite. Which is
something that ARM has been working on
00:13:08.259 --> 00:13:12.720
basically since the company started 25, 30
years ago. So it's quite a large test
00:13:12.720 --> 00:13:17.579
suite. It is extremely thorough, has tens
of thousands of test programs in it, runs
00:13:17.579 --> 00:13:25.140
billions of instructions. And so I set
about testing the, testing the
00:13:25.140 --> 00:13:30.119
specification using this test suite. And
if any of you have tested software you'll
00:13:30.119 --> 00:13:33.620
be familiar with a graph that looks like
this, right? The start things don't work
00:13:33.620 --> 00:13:39.529
all that well. Gradually you get things
working pretty well but then there's a
00:13:39.529 --> 00:13:45.899
heavy tail of difficult to find bugs.
Okay, so, and that's basically what I
00:13:45.899 --> 00:13:49.869
found when I was testing the
specification. But you should all be
00:13:49.869 --> 00:13:56.379
shocked by what I just said. Because what
I'm seeing is ARM's official specification
00:13:56.379 --> 00:14:03.290
could not pass ARM's official test suite
when I started, right. I mean that's
00:14:03.290 --> 00:14:10.269
that's pretty shocking. And I'm telling
you this and emphasizing it, not because
00:14:10.269 --> 00:14:16.850
I think ARM's specification was especially
bad. I think it was just typically bad. I
00:14:16.850 --> 00:14:22.589
think if you were to take any
specification for, you know, any real-
00:14:22.589 --> 00:14:27.819
world system and actually test it against
the test suite, well first of all you'd
00:14:27.819 --> 00:14:32.389
find it's not an executable specification
most the time and secondly you'd find it
00:14:32.389 --> 00:14:38.499
wouldn't work. And you'd probably find it
would work as badly as ARM's did. So it's
00:14:38.499 --> 00:14:42.850
not just that it didn't pass all the
tests. It didn't pass any tests. In fact
00:14:42.850 --> 00:14:47.809
it took me weeks to get the processor or
the specification to come out of reset.
00:14:47.809 --> 00:14:54.009
Right? Just to get the starting fix to get
the first instruction took weeks. So and I
00:14:54.009 --> 00:15:00.379
think so I think you'd find this if you
were to try any other specification.
00:15:00.379 --> 00:15:11.129
What's my next slide? So, moving on to
useful thing you can do with the
00:15:11.129 --> 00:15:16.959
specification, something we tried last
summer was using it for fuzz testing. So,
00:15:16.959 --> 00:15:21.490
fuzz testing consists of taking a program
and throwing random inputs at the
00:15:21.490 --> 00:15:27.959
program and just seeing what breaks and it
pretty much always breaks and a modern
00:15:27.959 --> 00:15:34.329
fuzztester like maybe AFL to make it
more effective. It monitors something
00:15:34.329 --> 00:15:39.351
about how the program is executing and
uses that to guide its choice of what to
00:15:39.351 --> 00:15:45.181
change next. So, if it's finding...in
particular monitor whether an
00:15:45.181 --> 00:15:50.179
instruction... whether the program is
taking a branch or not and if it sees it's
00:15:50.179 --> 00:15:54.240
taking lots of new branches then it goes:
"Ok I'll keep trying more of what I'm
00:15:54.240 --> 00:15:58.379
doing at the moment because it seems to be
effective, and if it's and not finding any
00:15:58.379 --> 00:16:03.910
new branches, then it will look for
something else to try changing. So, that's
00:16:03.910 --> 00:16:06.970
kind of a normal fuzzing. What you're
doing is basically trying to kind of
00:16:06.970 --> 00:16:15.230
maximize your branch coverage. So, what we
did though, when we did this with the
00:16:15.230 --> 00:16:20.389
specification, was, we actually monitored
branches not just in the in the binary
00:16:20.389 --> 00:16:28.399
that we were analyzing but also in the
specification. And what this gave us was
00:16:28.399 --> 00:16:33.040
basically: if you got.. Suppose, your the
binary you're analyzing is just straight
00:16:33.040 --> 00:16:37.459
line code. There's no branches in it at
all. Then there's nothing really for your
00:16:37.459 --> 00:16:41.519
fuzzer to work with right. So it doesn't
see that the code is interesting, it
00:16:41.519 --> 00:16:47.199
quickly moves on to something else.
But maybe your straight line cord is doing
00:16:47.199 --> 00:16:50.940
something really interesting, like
accessing a privileged register. Well,
00:16:50.940 --> 00:16:54.899
there will be a branch around that to
check that you do have the privilege you
00:16:54.899 --> 00:16:58.250
require.
Or maybe it's accessing a memory in which
00:16:58.250 --> 00:17:03.649
cases memory protection checks. There's
also checks for: Is this a device
00:17:03.649 --> 00:17:09.750
register? Or is this a kind of RAM or ROM?
So, there's a number of different branches
00:17:09.750 --> 00:17:13.429
there and that gives the fuzzer
interesting things to... interesting
00:17:13.429 --> 00:17:20.320
feedback to play with. So, when we set
this going on one of our microkernel,
00:17:22.770 --> 00:17:30.130
we analyzed the system call interface for
that microkernel. And one of the things
00:17:30.130 --> 00:17:35.330
the fuzzer surprised us with was it said:
Suppose I take the stack pointer and point
00:17:35.330 --> 00:17:39.940
it into the middle of this device space
and then take an exception immediately,
00:17:39.940 --> 00:17:45.530
what happens? And the answer was: there
was a bug in the microkernel and what it
00:17:45.530 --> 00:17:48.679
did was: it first thing it would do is
read from the stack, where the stack
00:17:48.679 --> 00:17:55.320
pointer was pointing, so we do a read from
one of the devices, which doesn't wasn't
00:17:55.320 --> 00:18:00.060
really what was intended. In fact it
completely breaks a security model. So,
00:18:00.060 --> 00:18:06.610
fuzztesting using not just coverage of
the monitoring branches in the binary but
00:18:06.610 --> 00:18:10.940
also the specification can find you a
bunch of really interesting things. And I
00:18:10.940 --> 00:18:20.110
hope some of you will I pick this idea up
and run with it. So, the reason that I was
00:18:20.110 --> 00:18:29.290
doing all this work was I wanted to be
able to formally verify ARM processors and
00:18:29.290 --> 00:18:34.769
so I needed to create a specification
before I could do that. So, sorry, let me
00:18:34.769 --> 00:18:42.690
just take a drink here... So, this is an
overly simplified picture of a processor,
00:18:42.690 --> 00:18:48.159
it's more or less what processors looked
like 25, 30 years ago, in fact.
00:18:48.159 --> 00:18:53.170
But it's good enough for the example. So,
if you want to check a processor, is
00:18:53.170 --> 00:18:59.770
correct, then what you can do is: add an
extra logic, which will monitor the values
00:18:59.770 --> 00:19:03.140
at the start of an instruction executing
the values that are finally produced at
00:19:03.140 --> 00:19:07.700
the end of an instruction executing, and
then if you feed those the specification
00:19:07.700 --> 00:19:11.830
you can see what the right answer should
have been. You can compare that with what
00:19:11.830 --> 00:19:17.620
the processor actually did. So, you can do
this using a test-based approach, right:
00:19:17.620 --> 00:19:21.200
just feed in inputs and check that
everything matches.
00:19:21.200 --> 00:19:25.570
But you can also do it using a formal
verification tool called a "bounded model
00:19:25.570 --> 00:19:31.250
checker". And a bounded model checker is
like one of those constraint solvers I
00:19:31.250 --> 00:19:36.760
mentioned earlier on crack cocaine. So,
what it will do is: it won't just try kind
00:19:36.760 --> 00:19:41.799
of one step for what can happen but will
actually try multiple steps: all possible
00:19:41.799 --> 00:19:46.130
combinations of inputs for one
instruction, two instructions, three
00:19:46.130 --> 00:19:49.809
instructions, longer and longer sequences
of instructions, looking for ways they can
00:19:49.809 --> 00:19:56.279
fail the property. So, and this turned out
to be an incredibly effective way of
00:19:56.279 --> 00:20:00.530
finding bugs in our processors.
We've actually, we're going to be using
00:20:00.530 --> 00:20:07.720
this on all future processor developments.
So, there's a paper that we wrote about
00:20:07.720 --> 00:20:12.299
this but, I recommend that you go
find the video for the top by Clifford
00:20:12.299 --> 00:20:20.620
Wolf from a couple of hours ago, which
describes a very similar process. Okay, so
00:20:20.620 --> 00:20:25.779
I'm encouraging you to see the
specification and find something awesome
00:20:25.779 --> 00:20:30.070
to do with it. There's a bunch of ideas I
have suggested there, and there's a few
00:20:30.070 --> 00:20:36.850
extras which I didn't have time to
describe here. But now, let me turn to
00:20:36.850 --> 00:20:43.350
this title of the talk: How can you trust
formally verified software? So, what I'm
00:20:43.350 --> 00:20:50.200
talking about here is: verifying a program
against some specification. But, of
00:20:50.200 --> 00:20:54.510
course, programs don't just run in a
vacuum. They run into some operating
00:20:54.510 --> 00:21:01.559
system that, they use some libraries and
they're also written in some programming
00:21:01.559 --> 00:21:08.210
language. And, so, when you verify your
program against your specification, you're
00:21:08.210 --> 00:21:15.309
also verifying them against the
specifications of Linux, glibc and ISO-C,
00:21:15.309 --> 00:21:21.940
none of which have good specifications. In
fact, they're just terrible specifications
00:21:21.940 --> 00:21:27.550
which I bear little resemblance to what
compilers and operating systems actually
00:21:27.550 --> 00:21:34.559
do in practice. So, if you... the current
state of things is: if you do verify your
00:21:34.559 --> 00:21:39.580
program against these specifications, you
will find lots of bugs. You will make your
00:21:39.580 --> 00:21:47.961
software a lot more reliable, but you'll
be doing it against specifications, which
00:21:47.961 --> 00:21:51.639
are probably not very
good. Just as ARM's specification was not
00:21:51.639 --> 00:21:57.850
very good before I tested it really
thoroughly. And so: Do I trust formally
00:21:57.850 --> 00:22:03.350
verified software? No, not really. It's a
lot better for being formally verified,
00:22:03.350 --> 00:22:06.591
but I'd also want to test it quite
thoroughly and maybe to use a bit of fuzz
00:22:06.591 --> 00:22:15.549
testing as well. Okay, so this is my final
slide, by the way. So, I'm encouraging you
00:22:15.549 --> 00:22:19.500
to go out and do something with the
specification. If you're interested in
00:22:19.500 --> 00:22:25.779
this, then do contact me! Do ask me
questions, if you have trouble. I'm also
00:22:25.779 --> 00:22:31.759
going to mention: if there's any white hat
hackers out there in the... white hat
00:22:31.759 --> 00:22:36.639
hackers in the audience, then do please
talk to me or Milisch Meriac who's here in
00:22:36.639 --> 00:22:44.750
the front row, if you're interested in
working at ARM and I like to thank a whole
00:22:44.750 --> 00:22:50.150
lot of people at ARM and elsewhere, who've
helped me in this work and also I'd like
00:22:50.150 --> 00:22:54.149
to thank you for giving me your attention
for the last half hour.
00:22:54.149 --> 00:22:59.789
Applause
00:22:59.789 --> 00:23:03.330
Herald: So, we have time for some
00:23:03.330 --> 00:23:06.340
questions. I would ask anyone that has a
question to line up at one of the
00:23:06.340 --> 00:23:10.559
microphones that are in the aisles here
one through eight. Questions for Alastair
00:23:10.559 --> 00:23:14.019
there about formal verification, about
working at ARM, how is the weather in
00:23:14.019 --> 00:23:19.539
Cambridge. Try to keep it on topic. And
signal angel: do we have already questions
00:23:19.539 --> 00:23:22.590
from online?
Signal Angel: No questions yet.
00:23:22.590 --> 00:23:25.409
Herald: Okay, then let's go to microphone
number one.
00:23:25.409 --> 00:23:33.150
Microphone 1: Hi... I was just
AR: Maybe tiptoes.
00:23:33.150 --> 00:23:38.009
MIC 1: Yes, I was just curious how are you
actually using the machine specification
00:23:38.009 --> 00:23:42.360
in order to generate VCs for the SMG
solver. Because you didn't really get a
00:23:42.360 --> 00:23:47.759
chance to talk about that I guess.
AR: Trying to think how I can describe
00:23:47.759 --> 00:23:56.639
that briefly... My basic idea is to take
the specification, which basically... the
00:23:56.639 --> 00:24:00.630
specification is describing a piece of
hardware. And so, I try to do what a
00:24:00.630 --> 00:24:04.610
hardware engineer would do, if they were
actually building a machine that
00:24:04.610 --> 00:24:09.360
implemented it. So I end up with something
that's essentially a giant circuit. So,
00:24:09.360 --> 00:24:14.289
that was the graph I was describing. So,
whenever this control flow, whenever the
00:24:14.289 --> 00:24:18.960
control flow joins back up, I have to
introduce things to slide between the
00:24:18.960 --> 00:24:22.259
value of what was calculated in the left
or the right. And so on.
00:24:22.259 --> 00:24:27.270
MIC 1: Yeah, I guess I'm mostly curious
about, like, what logics you're using for,
00:24:27.270 --> 00:24:31.240
like, the solvers and stuff like that.
AR: I see. Just very basic solvers because
00:24:31.240 --> 00:24:34.440
they run fastest.
MIC 1: Then ugh
00:24:34.440 --> 00:24:40.539
H: Thank you. So, microphone 6 please.
MIC 6: I was just wondering, if you could
00:24:40.539 --> 00:24:46.730
talk some little bit about the language
you used to write your specification.
00:24:46.730 --> 00:24:54.309
AR: So this is a language which basically
I inherited from ARM's documentation. So,
00:24:54.309 --> 00:24:58.440
all processors are described using
pseudocode and what I realized was that
00:24:58.440 --> 00:25:02.789
the pseudocode the ARM had started writing
was actually very close to being a
00:25:02.789 --> 00:25:06.879
language. And so, I sort of reverse
engineered the language hiding inside the
00:25:06.879 --> 00:25:14.200
pseudocode, built some tools for it, and
just kind of figured out what the language
00:25:14.200 --> 00:25:20.370
could possibly mean, given what I thought
processors actually did.
00:25:20.370 --> 00:25:27.850
And the language itself is... it's just a
very simple imperative language. It's
00:25:27.850 --> 00:25:33.740
actually got inherits from BBC basic for
anyone who's about the same age as me and
00:25:33.740 --> 00:25:39.620
remembers BBC basic or it's a bit like
Pascal... It's it's not a complicated
00:25:39.620 --> 00:25:44.059
language. It's actually designed so that
as many people as possible can read it and
00:25:44.059 --> 00:25:47.540
know exactly what it says without any
doubt, without having to consult a
00:25:47.540 --> 00:25:53.239
language lawyer.
H: Signal angel? Yet anything?
00:25:53.239 --> 00:25:58.990
Signal Angel: Yes, now we've got a
question: "Has ARM its own form of the
00:25:58.999 --> 00:26:01.099
Intel Management engine?"
00:26:08.950 --> 00:26:12.220
AR: No is the short answer. Yeah, I don't
00:26:12.220 --> 00:26:20.160
think we have anything quite like that. If
you... yeah, I'll just say no.
00:26:20.160 --> 00:26:22.929
Laughter
H: Microphone one.
00:26:22.929 --> 00:26:27.970
MIC 1: Hi! On that question that we had
before about the specification language:
00:26:27.970 --> 00:26:33.779
Have you considered using Z3's own
language for expressing the instructions?
00:26:33.779 --> 00:26:41.380
AR: So, Z3's own language is basically
write kind of s-expressions, which, if you
00:26:41.380 --> 00:26:46.690
like lisp is wonderful but for anybody who
doesn't like Lisp it's a bit of a turn-off
00:26:46.690 --> 00:26:50.169
and a bit of a barrier to understanding
it. So, again the specification is
00:26:50.169 --> 00:26:54.410
designed so that people can look at it and
go: "Oh, I see what's going on here", and
00:26:54.410 --> 00:26:57.259
not have... and I just try to minimize the
barriers.
00:26:57.259 --> 00:27:03.230
MIC 1: Fair enough!
H: Last call, signal angel!
00:27:04.684 --> 00:27:07.923
SA: How long does the complete test of the
arms specification take?
00:27:09.540 --> 00:27:20.929
AR: About two years. So, yeah, so a modern
processor team about 80% of that team will
00:27:20.929 --> 00:27:26.679
be verification engineers. And, so,
they're basically writing new tests,
00:27:26.679 --> 00:27:30.150
running old tests, diagnosing them, just
doing that continuously for the entire
00:27:30.150 --> 00:27:34.960
life of a project, which is probably about
three years. And after about the first
00:27:34.960 --> 00:27:38.860
year, you basically have a processor that it
mostly works, and after that it's kind of
00:27:38.860 --> 00:27:47.590
debugging it and it's, you know, kind of
fine-tuning the performance. So, yeah, it
00:27:47.590 --> 00:27:53.929
takes a really long time. To run the
actual tests, I don't actually know
00:27:53.929 --> 00:28:00.400
because actually one of my colleagues in
the audience, they've actually run the
00:28:00.400 --> 00:28:07.960
tests. But, yeah, I don't know... and we
use a lot of processors in parallel, so I
00:28:07.960 --> 00:28:12.620
really don't have an idea.
H: Thank you so much, Alastair! Let's give
00:28:12.620 --> 00:28:18.476
him another warm round of applause!
Applause
00:28:18.476 --> 00:28:24.017
34c3 outro
00:28:24.017 --> 00:28:40.000
subtitles created by c3subtitles.de
in the year 2018. Join, and help us!