1
00:00:00,000 --> 00:00:14,710
34c3 intro
2
00:00:14,710 --> 00:00:16,290
Herald: He's been publishing in academia
3
00:00:16,290 --> 00:00:22,520
for almost 30 years. Please join me in
giving him a warm welcome to 34c3.
4
00:00:22,520 --> 00:00:30,367
Applause
Alastiar Reid: Thank you very much for
5
00:00:30,367 --> 00:00:35,797
your introduction. So I'm going to be
talking about the ARM processors and
6
00:00:35,797 --> 00:00:41,300
they're incredibly widely used. You find
them in phones, tablets, IOT devices,
7
00:00:41,300 --> 00:00:46,449
hard-disk drives; they're all over. And if
you think about it, what I'm saying is:
8
00:00:46,449 --> 00:00:50,500
They're in all the things, which contain
your most private and personal data, so
9
00:00:50,500 --> 00:00:54,819
it's really, really important that they do
exactly what they should be doing and
10
00:00:54,819 --> 00:00:59,469
nothing else. We need to make sure we
really understand them and that means it's
11
00:00:59,469 --> 00:01:05,309
important that we can analyze them for any
malware, look for vulnerabilities and so
12
00:01:05,309 --> 00:01:11,990
on. So I'm going to be talking about some
work I started about six years ago,
13
00:01:11,990 --> 00:01:19,560
creating a very precise specification of
what an ARM processor does and I'm going
14
00:01:19,560 --> 00:01:26,670
to be talking also about back in April I'm
released this their specification in a
15
00:01:26,670 --> 00:01:34,189
machine readable form. And I should say
that I'm working with Kimbridge University
16
00:01:34,189 --> 00:01:42,539
to turn that in something you can use. So
so this talk I'm going to mostly actually
17
00:01:42,539 --> 00:01:47,560
talk about this executable processor
specification, that's going to be the bulk
18
00:01:47,560 --> 00:01:53,249
of the talk but at the end that sets me up
very nicely to talk about a formally
19
00:01:53,249 --> 00:01:57,979
verified software. So I thought, given the
theme of the Congress, it would be more
20
00:01:57,979 --> 00:02:05,539
useful to emphasize things you could
actually do. So the specification that ARM
21
00:02:05,539 --> 00:02:10,660
released, what's in it? Well, there's lots
of instruction descriptions of course but
22
00:02:10,660 --> 00:02:14,690
there's also lots of really interesting
security features; things to do with
23
00:02:14,690 --> 00:02:20,380
memory protection, exceptions, privilege
checks and so on.
24
00:02:20,380 --> 00:02:24,440
So there's lots of really interesting
stuff ... of your interest in their how
25
00:02:24,440 --> 00:02:30,690
secure your devices and how to make sure
it really is secure. Throughout the talk
26
00:02:30,690 --> 00:02:36,080
I'll be giving a bunch of links; you can
go and download the specs right now from
27
00:02:36,080 --> 00:02:42,580
the first link but please wait to the end
of the talk and there's also the
28
00:02:42,580 --> 00:02:47,790
specification rendered as HTML, as tools
that can take the specification release
29
00:02:47,790 --> 00:02:52,560
apart and find useful information in it;
I've written blogs and papers about it as
30
00:02:52,560 --> 00:03:00,910
well. And so let's just dive into, look at
the bits of the actual specification. So
31
00:03:00,910 --> 00:03:07,150
the first thing is all the really
important security features in the specif-
32
00:03:07,150 --> 00:03:13,070
in our processor are controlled by, what I
call, the system control registers and the
33
00:03:13,070 --> 00:03:19,570
top dog among all those control registers
is this one here called SCTLR. Just trips
34
00:03:19,570 --> 00:03:26,370
off the tongue, doesn't it? So SCTLR is
where - it's full of lots of individual
35
00:03:26,370 --> 00:03:32,120
control bits, which affect either
optimizations, the processor's doing, or
36
00:03:32,120 --> 00:03:36,950
also security features. And so let's just
zoom in and one of them, to give you an
37
00:03:36,950 --> 00:03:40,210
idea of what kind of information the spec
contains.
38
00:03:40,210 --> 00:03:46,770
So here's some documentation, telling you
about a WXN bit. What does that do? It
39
00:03:46,770 --> 00:03:53,190
makes sure that any code, any, that your
stack cannot contain code. you can't boot
40
00:03:53,190 --> 00:03:58,250
instructions on the code, and on the
stack, because they're proce- if you set
41
00:03:58,250 --> 00:04:04,180
this bit the processor won't execute them.
In other words: This is the bit that
42
00:04:04,180 --> 00:04:12,220
triggered the requirement for things like
return-oriented programming. Okay, so what
43
00:04:12,220 --> 00:04:18,380
can you do with this fact? Well, suppose
you're in the habit of reverse engineering
44
00:04:18,380 --> 00:04:24,310
some code. You might, your disassemble may
show you something like this. And you're
45
00:04:24,310 --> 00:04:28,120
probably all staring at this going: "What
on earth does that do?", because it's
46
00:04:28,120 --> 00:04:33,131
extremely cryptic. But using the
information that's in the XML version of
47
00:04:33,131 --> 00:04:40,710
the release you could easily figure out
how to build, how to decode some of the
48
00:04:40,710 --> 00:04:45,710
more cryptic parts and go "Oh actually,
it's that SCTLR register", right, so you
49
00:04:45,710 --> 00:04:50,490
could decode the cryptic name for the
number for the register into that. But you
50
00:04:50,490 --> 00:04:54,720
could do a bit more than that. You can see
it's setting one of the bits in the
51
00:04:54,720 --> 00:05:00,990
register so - it is of course the bit I
just told you about WXN, so we could dig
52
00:05:00,990 --> 00:05:08,740
into that and, so we could kind of use the
information from the XML to render it,
53
00:05:08,740 --> 00:05:13,560
perhaps, as like this.
So you can make things a bit more useful
54
00:05:13,560 --> 00:05:17,550
and you can also take that documentation
that was there, that told you what the WXN
55
00:05:17,550 --> 00:05:23,470
bit does, and maybe, if you're a feeling
really energetic, you could, when you
56
00:05:23,470 --> 00:05:27,320
click on it, mouse over or whatever, it
could bring up some of that documentation.
57
00:05:27,320 --> 00:05:31,870
And and that makes specf- that makes it
much easier to understand what the cryptic
58
00:05:31,870 --> 00:05:36,240
piece of code at the top is doing. Okay,
so that's just a very shallow thing you
59
00:05:36,240 --> 00:05:43,720
can get from the specification. If we dig
into the instruction descriptions there's
60
00:05:43,720 --> 00:05:52,570
also things like the assembly language of
- the specification of the assembly syntax
61
00:05:52,570 --> 00:05:59,030
and. So, something I did a few years ago
and which I just wrote a blog article
62
00:05:59,030 --> 00:06:05,190
about over the weekend was was it's
possible to actually take that
63
00:06:05,190 --> 00:06:09,900
specification, turn it into a
disassembler. So I first of all
64
00:06:09,900 --> 00:06:16,170
transformed it into the code I'm showing
at the bottom. What this is showing is how
65
00:06:16,170 --> 00:06:18,250
to
take a binary description of an
66
00:06:18,250 --> 00:06:25,710
instruction so that's the the top line of
typewriter font and and then turn that
67
00:06:25,710 --> 00:06:31,470
into strings, which is what this the code
at the bottom is describing how to do. So
68
00:06:31,470 --> 00:06:35,242
so you can use this as a disassembler.
It's actually possible to run it in
69
00:06:35,242 --> 00:06:41,940
reverse and use it as an assembler; you
basically run the code from bottom to top
70
00:06:41,940 --> 00:06:47,080
and you can turn strings into binary
instructions. And we'll see more of this
71
00:06:47,080 --> 00:06:57,430
running things in reverse in a few slides
time. So the main thing about instructions
72
00:06:57,430 --> 00:07:03,330
is of course they do something. So the
specification contains a description of
73
00:07:03,330 --> 00:07:08,650
exactly what an instruction does and that
description is as code, which, as a
74
00:07:08,650 --> 00:07:13,150
programmer, makes me very happy, right. I
don't understand the English words in the
75
00:07:13,150 --> 00:07:17,900
specification but I do understand what the
code does. So one thing you can do with
76
00:07:17,900 --> 00:07:22,221
code is execute it, so let's just walk
through - let's suppose ... take an
77
00:07:22,221 --> 00:07:29,300
instruction and I match against that
diagram there. I might get some values for
78
00:07:29,300 --> 00:07:35,680
for some of the variables from that and
then I can walk through, step by step,
79
00:07:35,680 --> 00:07:40,380
evaluating this code, until I eventually
realize that register five is having some
80
00:07:40,380 --> 00:07:43,889
value assigned to it.
Okay, so that's a fairly basic thing you
81
00:07:43,889 --> 00:07:47,380
can do with the specification;
interpreters are fairly easy thing to
82
00:07:47,380 --> 00:07:53,210
implement but once you have it there's a
lot you can build on top of it. And one
83
00:07:53,210 --> 00:07:57,820
thing that's surprisingly easy to
implement is to extract a symbolic
84
00:07:57,820 --> 00:08:01,680
representation of what the instruction
does. So I'll just show you quickly how
85
00:08:01,680 --> 00:08:07,400
you do that, using the interpreter. So
let's take those same concrete values but
86
00:08:07,400 --> 00:08:11,600
I'm also, I've added three other variables
at the side there, which I'm going to
87
00:08:11,600 --> 00:08:16,620
treat as symbolic variables. And as I walk
through the code I won't just calculate
88
00:08:16,620 --> 00:08:24,230
some concrete values, like the value five
or 32, I'll also build up a graph,
89
00:08:24,230 --> 00:08:29,710
representing exactly how I came up with
these values at five and so on. So as I'm
90
00:08:29,710 --> 00:08:34,830
executing the code I can build a graph
representing exactly what this code is
91
00:08:34,830 --> 00:08:41,188
doing. And what I can do now is just throw
away the code and focus on what that graph
92
00:08:41,188 --> 00:08:46,259
is telling me.
So I now have a symbolic representation of
93
00:08:46,259 --> 00:08:51,540
one slice through that, through that
instruction and I can feed that to a
94
00:08:51,540 --> 00:08:56,369
constraint solver, so if any of you have
heard of Z3 or SMT solvers, that's what
95
00:08:56,369 --> 00:09:05,129
I'm talking about here. And a constraint
solver is a really useful tool, because
96
00:09:05,129 --> 00:09:09,940
you can run code forwards through it, so
given some input values you can say what
97
00:09:09,940 --> 00:09:16,930
are the outputs from this function or
from this instruction but you can also run
98
00:09:16,930 --> 00:09:22,279
them backwards, given some
output value, some final result you want
99
00:09:22,279 --> 00:09:26,790
to see, you can ask what inputs would
cause this to happen. And this is just
100
00:09:26,790 --> 00:09:31,070
tremendously useful if you're trying to
figure out what instructions you want to
101
00:09:31,070 --> 00:09:36,389
use to generate some particular effect.
All right, so if you're trying to figure
102
00:09:36,389 --> 00:09:41,800
out how some particular register could be
accessed, how some particular thing could
103
00:09:41,800 --> 00:09:47,449
be turned on or off, being able to ask
what inputs will cause this to happen is
104
00:09:47,449 --> 00:09:53,569
incredibly useful. And you can also use
the constraint solver to ask for
105
00:09:53,569 --> 00:09:58,069
intermediate values, in the middle of the
calculation. You know if you know some
106
00:09:58,069 --> 00:10:02,859
value you want to see there you can ask
what the inputs that would cause that to
107
00:10:02,859 --> 00:10:07,260
happen.
So, if any of you are familiar with tools
108
00:10:07,260 --> 00:10:13,949
like KLEE, which is a symbolic execution
tool for based on LLVM, then they use
109
00:10:13,949 --> 00:10:19,999
something similar to this. So, I've shown
you are fairly simple draft, something I
110
00:10:19,999 --> 00:10:26,300
could show you how you build it, kind of
on the fly. This is the actual graph for
111
00:10:26,300 --> 00:10:31,259
that same instruction. Here I've added in
a lot more nodes to do with some of the
112
00:10:31,259 --> 00:10:35,980
functions that were being called and to do
with the calculating, whether to take a
113
00:10:35,980 --> 00:10:46,279
branch or not. So this is about 80 or 90
nodes. And I've been experimenting with
114
00:10:46,279 --> 00:10:52,360
extending this in two ways. So this is
just one path through the execution of an
115
00:10:52,360 --> 00:10:56,779
instruction, so one way to improve on that
is to build a graph that represents all
116
00:10:56,779 --> 00:11:00,499
possible paths through the instruction.
And that's much more useful, because you
117
00:11:00,499 --> 00:11:04,990
can you then can point at something and
say "how can I make that happen?" and it
118
00:11:04,990 --> 00:11:11,920
will tell you exactly what inputs will
cause the path that will make that happen.
119
00:11:11,920 --> 00:11:16,351
I've also been experimenting with taking
the entire specification, right, so that
120
00:11:16,351 --> 00:11:20,509
stuff that handles exceptions, that
fetches instructions or execute
121
00:11:20,509 --> 00:11:24,699
instructions. It contains all
instructions. And I've been experimenting
122
00:11:24,699 --> 00:11:29,790
with building a graph representing the
entire specification all at once. And
123
00:11:29,790 --> 00:11:33,470
that's even more useful, because now
pretty much any question you have about
124
00:11:33,470 --> 00:11:36,720
the specification, you can ask the
constraint solver and it will come back
125
00:11:36,720 --> 00:11:43,490
and give you an answer. And this graph for
the full specification is quite large.
126
00:11:43,490 --> 00:11:48,540
It's about half a million nodes and that's
for the smallest specification that our
127
00:11:48,540 --> 00:11:53,050
major uses. So it's about half a million
nodes. But the great thing is modern
128
00:11:53,050 --> 00:11:58,009
constraint solvers can read in that half
million nodes and still give you answers.
129
00:11:58,009 --> 00:12:04,040
Typically in about 1 to 10 seconds for
most of the questions posed to them. So,
130
00:12:04,040 --> 00:12:08,259
these are just tremendously useful tools,
if you're wanting to be able to understand
131
00:12:08,259 --> 00:12:15,670
exactly what the specification does,
and exactly how some program is going to
132
00:12:15,670 --> 00:12:20,689
behave or figure out what program you want
to write to make it behave some particular
133
00:12:20,689 --> 00:12:28,850
way. Okay so I've talked a lot about
instructions, but most of us actually run
134
00:12:28,850 --> 00:12:33,990
programs, right? So to turn this the
specification into something in execute
135
00:12:33,990 --> 00:12:38,260
programs, in other words turn it into a
simulator for the ARM architecture, you
136
00:12:38,260 --> 00:12:43,839
need to add a little bit of a loop that
will handle interrupts, fake instructions
137
00:12:43,839 --> 00:12:49,980
and then it can execute them and handle
any exceptions. So, I... So I did this. I
138
00:12:49,980 --> 00:12:55,860
added this loop to this specification. And
then, I thought I'd better try testing the
139
00:12:55,860 --> 00:13:03,610
specification. And, so the good news for
me, because I work for ARM I have access
140
00:13:03,610 --> 00:13:08,259
to ARM's internal test suite. Which is
something that ARM has been working on
141
00:13:08,259 --> 00:13:12,720
basically since the company started 25, 30
years ago. So it's quite a large test
142
00:13:12,720 --> 00:13:17,579
suite. It is extremely thorough, has tens
of thousands of test programs in it, runs
143
00:13:17,579 --> 00:13:25,140
billions of instructions. And so I set
about testing the, testing the
144
00:13:25,140 --> 00:13:30,119
specification using this test suite. And
if any of you have tested software you'll
145
00:13:30,119 --> 00:13:33,620
be familiar with a graph that looks like
this, right? The start things don't work
146
00:13:33,620 --> 00:13:39,529
all that well. Gradually you get things
working pretty well but then there's a
147
00:13:39,529 --> 00:13:45,899
heavy tail of difficult to find bugs.
Okay, so, and that's basically what I
148
00:13:45,899 --> 00:13:49,869
found when I was testing the
specification. But you should all be
149
00:13:49,869 --> 00:13:56,379
shocked by what I just said. Because what
I'm seeing is ARM's official specification
150
00:13:56,379 --> 00:14:03,290
could not pass ARM's official test suite
when I started, right. I mean that's
151
00:14:03,290 --> 00:14:10,269
that's pretty shocking. And I'm telling
you this and emphasizing it, not because
152
00:14:10,269 --> 00:14:16,850
I think ARM's specification was especially
bad. I think it was just typically bad. I
153
00:14:16,850 --> 00:14:22,589
think if you were to take any
specification for, you know, any real-
154
00:14:22,589 --> 00:14:27,819
world system and actually test it against
the test suite, well first of all you'd
155
00:14:27,819 --> 00:14:32,389
find it's not an executable specification
most the time and secondly you'd find it
156
00:14:32,389 --> 00:14:38,499
wouldn't work. And you'd probably find it
would work as badly as ARM's did. So it's
157
00:14:38,499 --> 00:14:42,850
not just that it didn't pass all the
tests. It didn't pass any tests. In fact
158
00:14:42,850 --> 00:14:47,809
it took me weeks to get the processor or
the specification to come out of reset.
159
00:14:47,809 --> 00:14:54,009
Right? Just to get the starting fix to get
the first instruction took weeks. So and I
160
00:14:54,009 --> 00:15:00,379
think so I think you'd find this if you
were to try any other specification.
161
00:15:00,379 --> 00:15:11,129
What's my next slide? So, moving on to
useful thing you can do with the
162
00:15:11,129 --> 00:15:16,959
specification, something we tried last
summer was using it for fuzz testing. So,
163
00:15:16,959 --> 00:15:21,490
fuzz testing consists of taking a program
and throwing random inputs at the
164
00:15:21,490 --> 00:15:27,959
program and just seeing what breaks and it
pretty much always breaks and a modern
165
00:15:27,959 --> 00:15:34,329
fuzztester like maybe AFL to make it
more effective. It monitors something
166
00:15:34,329 --> 00:15:39,351
about how the program is executing and
uses that to guide its choice of what to
167
00:15:39,351 --> 00:15:45,181
change next. So, if it's finding...in
particular monitor whether an
168
00:15:45,181 --> 00:15:50,179
instruction... whether the program is
taking a branch or not and if it sees it's
169
00:15:50,179 --> 00:15:54,240
taking lots of new branches then it goes:
"Ok I'll keep trying more of what I'm
170
00:15:54,240 --> 00:15:58,379
doing at the moment because it seems to be
effective, and if it's and not finding any
171
00:15:58,379 --> 00:16:03,910
new branches, then it will look for
something else to try changing. So, that's
172
00:16:03,910 --> 00:16:06,970
kind of a normal fuzzing. What you're
doing is basically trying to kind of
173
00:16:06,970 --> 00:16:15,230
maximize your branch coverage. So, what we
did though, when we did this with the
174
00:16:15,230 --> 00:16:20,389
specification, was, we actually monitored
branches not just in the in the binary
175
00:16:20,389 --> 00:16:28,399
that we were analyzing but also in the
specification. And what this gave us was
176
00:16:28,399 --> 00:16:33,040
basically: if you got.. Suppose, your the
binary you're analyzing is just straight
177
00:16:33,040 --> 00:16:37,459
line code. There's no branches in it at
all. Then there's nothing really for your
178
00:16:37,459 --> 00:16:41,519
fuzzer to work with right. So it doesn't
see that the code is interesting, it
179
00:16:41,519 --> 00:16:47,199
quickly moves on to something else.
But maybe your straight line cord is doing
180
00:16:47,199 --> 00:16:50,940
something really interesting, like
accessing a privileged register. Well,
181
00:16:50,940 --> 00:16:54,899
there will be a branch around that to
check that you do have the privilege you
182
00:16:54,899 --> 00:16:58,250
require.
Or maybe it's accessing a memory in which
183
00:16:58,250 --> 00:17:03,649
cases memory protection checks. There's
also checks for: Is this a device
184
00:17:03,649 --> 00:17:09,750
register? Or is this a kind of RAM or ROM?
So, there's a number of different branches
185
00:17:09,750 --> 00:17:13,429
there and that gives the fuzzer
interesting things to... interesting
186
00:17:13,429 --> 00:17:20,320
feedback to play with. So, when we set
this going on one of our microkernel,
187
00:17:22,770 --> 00:17:30,130
we analyzed the system call interface for
that microkernel. And one of the things
188
00:17:30,130 --> 00:17:35,330
the fuzzer surprised us with was it said:
Suppose I take the stack pointer and point
189
00:17:35,330 --> 00:17:39,940
it into the middle of this device space
and then take an exception immediately,
190
00:17:39,940 --> 00:17:45,530
what happens? And the answer was: there
was a bug in the microkernel and what it
191
00:17:45,530 --> 00:17:48,679
did was: it first thing it would do is
read from the stack, where the stack
192
00:17:48,679 --> 00:17:55,320
pointer was pointing, so we do a read from
one of the devices, which doesn't wasn't
193
00:17:55,320 --> 00:18:00,060
really what was intended. In fact it
completely breaks a security model. So,
194
00:18:00,060 --> 00:18:06,610
fuzztesting using not just coverage of
the monitoring branches in the binary but
195
00:18:06,610 --> 00:18:10,940
also the specification can find you a
bunch of really interesting things. And I
196
00:18:10,940 --> 00:18:20,110
hope some of you will I pick this idea up
and run with it. So, the reason that I was
197
00:18:20,110 --> 00:18:29,290
doing all this work was I wanted to be
able to formally verify ARM processors and
198
00:18:29,290 --> 00:18:34,769
so I needed to create a specification
before I could do that. So, sorry, let me
199
00:18:34,769 --> 00:18:42,690
just take a drink here... So, this is an
overly simplified picture of a processor,
200
00:18:42,690 --> 00:18:48,159
it's more or less what processors looked
like 25, 30 years ago, in fact.
201
00:18:48,159 --> 00:18:53,170
But it's good enough for the example. So,
if you want to check a processor, is
202
00:18:53,170 --> 00:18:59,770
correct, then what you can do is: add an
extra logic, which will monitor the values
203
00:18:59,770 --> 00:19:03,140
at the start of an instruction executing
the values that are finally produced at
204
00:19:03,140 --> 00:19:07,700
the end of an instruction executing, and
then if you feed those the specification
205
00:19:07,700 --> 00:19:11,830
you can see what the right answer should
have been. You can compare that with what
206
00:19:11,830 --> 00:19:17,620
the processor actually did. So, you can do
this using a test-based approach, right:
207
00:19:17,620 --> 00:19:21,200
just feed in inputs and check that
everything matches.
208
00:19:21,200 --> 00:19:25,570
But you can also do it using a formal
verification tool called a "bounded model
209
00:19:25,570 --> 00:19:31,250
checker". And a bounded model checker is
like one of those constraint solvers I
210
00:19:31,250 --> 00:19:36,760
mentioned earlier on crack cocaine. So,
what it will do is: it won't just try kind
211
00:19:36,760 --> 00:19:41,799
of one step for what can happen but will
actually try multiple steps: all possible
212
00:19:41,799 --> 00:19:46,130
combinations of inputs for one
instruction, two instructions, three
213
00:19:46,130 --> 00:19:49,809
instructions, longer and longer sequences
of instructions, looking for ways they can
214
00:19:49,809 --> 00:19:56,279
fail the property. So, and this turned out
to be an incredibly effective way of
215
00:19:56,279 --> 00:20:00,530
finding bugs in our processors.
We've actually, we're going to be using
216
00:20:00,530 --> 00:20:07,720
this on all future processor developments.
So, there's a paper that we wrote about
217
00:20:07,720 --> 00:20:12,299
this but, I recommend that you go
find the video for the top by Clifford
218
00:20:12,299 --> 00:20:20,620
Wolf from a couple of hours ago, which
describes a very similar process. Okay, so
219
00:20:20,620 --> 00:20:25,779
I'm encouraging you to see the
specification and find something awesome
220
00:20:25,779 --> 00:20:30,070
to do with it. There's a bunch of ideas I
have suggested there, and there's a few
221
00:20:30,070 --> 00:20:36,850
extras which I didn't have time to
describe here. But now, let me turn to
222
00:20:36,850 --> 00:20:43,350
this title of the talk: How can you trust
formally verified software? So, what I'm
223
00:20:43,350 --> 00:20:50,200
talking about here is: verifying a program
against some specification. But, of
224
00:20:50,200 --> 00:20:54,510
course, programs don't just run in a
vacuum. They run into some operating
225
00:20:54,510 --> 00:21:01,559
system that, they use some libraries and
they're also written in some programming
226
00:21:01,559 --> 00:21:08,210
language. And, so, when you verify your
program against your specification, you're
227
00:21:08,210 --> 00:21:15,309
also verifying them against the
specifications of Linux, glibc and ISO-C,
228
00:21:15,309 --> 00:21:21,940
none of which have good specifications. In
fact, they're just terrible specifications
229
00:21:21,940 --> 00:21:27,550
which I bear little resemblance to what
compilers and operating systems actually
230
00:21:27,550 --> 00:21:34,559
do in practice. So, if you... the current
state of things is: if you do verify your
231
00:21:34,559 --> 00:21:39,580
program against these specifications, you
will find lots of bugs. You will make your
232
00:21:39,580 --> 00:21:47,961
software a lot more reliable, but you'll
be doing it against specifications, which
233
00:21:47,961 --> 00:21:51,639
are probably not very
good. Just as ARM's specification was not
234
00:21:51,639 --> 00:21:57,850
very good before I tested it really
thoroughly. And so: Do I trust formally
235
00:21:57,850 --> 00:22:03,350
verified software? No, not really. It's a
lot better for being formally verified,
236
00:22:03,350 --> 00:22:06,591
but I'd also want to test it quite
thoroughly and maybe to use a bit of fuzz
237
00:22:06,591 --> 00:22:15,549
testing as well. Okay, so this is my final
slide, by the way. So, I'm encouraging you
238
00:22:15,549 --> 00:22:19,500
to go out and do something with the
specification. If you're interested in
239
00:22:19,500 --> 00:22:25,779
this, then do contact me! Do ask me
questions, if you have trouble. I'm also
240
00:22:25,779 --> 00:22:31,759
going to mention: if there's any white hat
hackers out there in the... white hat
241
00:22:31,759 --> 00:22:36,639
hackers in the audience, then do please
talk to me or Milisch Meriac who's here in
242
00:22:36,639 --> 00:22:44,750
the front row, if you're interested in
working at ARM and I like to thank a whole
243
00:22:44,750 --> 00:22:50,150
lot of people at ARM and elsewhere, who've
helped me in this work and also I'd like
244
00:22:50,150 --> 00:22:54,149
to thank you for giving me your attention
for the last half hour.
245
00:22:54,149 --> 00:22:59,789
Applause
246
00:22:59,789 --> 00:23:03,330
Herald: So, we have time for some
247
00:23:03,330 --> 00:23:06,340
questions. I would ask anyone that has a
question to line up at one of the
248
00:23:06,340 --> 00:23:10,559
microphones that are in the aisles here
one through eight. Questions for Alastair
249
00:23:10,559 --> 00:23:14,019
there about formal verification, about
working at ARM, how is the weather in
250
00:23:14,019 --> 00:23:19,539
Cambridge. Try to keep it on topic. And
signal angel: do we have already questions
251
00:23:19,539 --> 00:23:22,590
from online?
Signal Angel: No questions yet.
252
00:23:22,590 --> 00:23:25,409
Herald: Okay, then let's go to microphone
number one.
253
00:23:25,409 --> 00:23:33,150
Microphone 1: Hi... I was just
AR: Maybe tiptoes.
254
00:23:33,150 --> 00:23:38,009
MIC 1: Yes, I was just curious how are you
actually using the machine specification
255
00:23:38,009 --> 00:23:42,360
in order to generate VCs for the SMG
solver. Because you didn't really get a
256
00:23:42,360 --> 00:23:47,759
chance to talk about that I guess.
AR: Trying to think how I can describe
257
00:23:47,759 --> 00:23:56,639
that briefly... My basic idea is to take
the specification, which basically... the
258
00:23:56,639 --> 00:24:00,630
specification is describing a piece of
hardware. And so, I try to do what a
259
00:24:00,630 --> 00:24:04,610
hardware engineer would do, if they were
actually building a machine that
260
00:24:04,610 --> 00:24:09,360
implemented it. So I end up with something
that's essentially a giant circuit. So,
261
00:24:09,360 --> 00:24:14,289
that was the graph I was describing. So,
whenever this control flow, whenever the
262
00:24:14,289 --> 00:24:18,960
control flow joins back up, I have to
introduce things to slide between the
263
00:24:18,960 --> 00:24:22,259
value of what was calculated in the left
or the right. And so on.
264
00:24:22,259 --> 00:24:27,270
MIC 1: Yeah, I guess I'm mostly curious
about, like, what logics you're using for,
265
00:24:27,270 --> 00:24:31,240
like, the solvers and stuff like that.
AR: I see. Just very basic solvers because
266
00:24:31,240 --> 00:24:34,440
they run fastest.
MIC 1: Then ugh
267
00:24:34,440 --> 00:24:40,539
H: Thank you. So, microphone 6 please.
MIC 6: I was just wondering, if you could
268
00:24:40,539 --> 00:24:46,730
talk some little bit about the language
you used to write your specification.
269
00:24:46,730 --> 00:24:54,309
AR: So this is a language which basically
I inherited from ARM's documentation. So,
270
00:24:54,309 --> 00:24:58,440
all processors are described using
pseudocode and what I realized was that
271
00:24:58,440 --> 00:25:02,789
the pseudocode the ARM had started writing
was actually very close to being a
272
00:25:02,789 --> 00:25:06,879
language. And so, I sort of reverse
engineered the language hiding inside the
273
00:25:06,879 --> 00:25:14,200
pseudocode, built some tools for it, and
just kind of figured out what the language
274
00:25:14,200 --> 00:25:20,370
could possibly mean, given what I thought
processors actually did.
275
00:25:20,370 --> 00:25:27,850
And the language itself is... it's just a
very simple imperative language. It's
276
00:25:27,850 --> 00:25:33,740
actually got inherits from BBC basic for
anyone who's about the same age as me and
277
00:25:33,740 --> 00:25:39,620
remembers BBC basic or it's a bit like
Pascal... It's it's not a complicated
278
00:25:39,620 --> 00:25:44,059
language. It's actually designed so that
as many people as possible can read it and
279
00:25:44,059 --> 00:25:47,540
know exactly what it says without any
doubt, without having to consult a
280
00:25:47,540 --> 00:25:53,239
language lawyer.
H: Signal angel? Yet anything?
281
00:25:53,239 --> 00:25:58,990
Signal Angel: Yes, now we've got a
question: "Has ARM its own form of the
282
00:25:58,999 --> 00:26:01,099
Intel Management engine?"
283
00:26:08,950 --> 00:26:12,220
AR: No is the short answer. Yeah, I don't
284
00:26:12,220 --> 00:26:20,160
think we have anything quite like that. If
you... yeah, I'll just say no.
285
00:26:20,160 --> 00:26:22,929
Laughter
H: Microphone one.
286
00:26:22,929 --> 00:26:27,970
MIC 1: Hi! On that question that we had
before about the specification language:
287
00:26:27,970 --> 00:26:33,779
Have you considered using Z3's own
language for expressing the instructions?
288
00:26:33,779 --> 00:26:41,380
AR: So, Z3's own language is basically
write kind of s-expressions, which, if you
289
00:26:41,380 --> 00:26:46,690
like lisp is wonderful but for anybody who
doesn't like Lisp it's a bit of a turn-off
290
00:26:46,690 --> 00:26:50,169
and a bit of a barrier to understanding
it. So, again the specification is
291
00:26:50,169 --> 00:26:54,410
designed so that people can look at it and
go: "Oh, I see what's going on here", and
292
00:26:54,410 --> 00:26:57,259
not have... and I just try to minimize the
barriers.
293
00:26:57,259 --> 00:27:03,230
MIC 1: Fair enough!
H: Last call, signal angel!
294
00:27:04,684 --> 00:27:07,923
SA: How long does the complete test of the
arms specification take?
295
00:27:09,540 --> 00:27:20,929
AR: About two years. So, yeah, so a modern
processor team about 80% of that team will
296
00:27:20,929 --> 00:27:26,679
be verification engineers. And, so,
they're basically writing new tests,
297
00:27:26,679 --> 00:27:30,150
running old tests, diagnosing them, just
doing that continuously for the entire
298
00:27:30,150 --> 00:27:34,960
life of a project, which is probably about
three years. And after about the first
299
00:27:34,960 --> 00:27:38,860
year, you basically have a processor that it
mostly works, and after that it's kind of
300
00:27:38,860 --> 00:27:47,590
debugging it and it's, you know, kind of
fine-tuning the performance. So, yeah, it
301
00:27:47,590 --> 00:27:53,929
takes a really long time. To run the
actual tests, I don't actually know
302
00:27:53,929 --> 00:28:00,400
because actually one of my colleagues in
the audience, they've actually run the
303
00:28:00,400 --> 00:28:07,960
tests. But, yeah, I don't know... and we
use a lot of processors in parallel, so I
304
00:28:07,960 --> 00:28:12,620
really don't have an idea.
H: Thank you so much, Alastair! Let's give
305
00:28:12,620 --> 00:28:18,476
him another warm round of applause!
Applause
306
00:28:18,476 --> 00:28:24,017
34c3 outro
307
00:28:24,017 --> 00:28:40,000
subtitles created by c3subtitles.de
in the year 2018. Join, and help us!