1
00:00:09,519 --> 00:00:14,479
applause
2
00:00:14,479 --> 00:00:16,930
Peter Sewell: Thank you for that
introduction and to the organisers for
3
00:00:16,930 --> 00:00:23,459
this opportunity to speak to you guys. So
my name is Peter Sewell. I'm an academic
4
00:00:23,459 --> 00:00:30,609
computer scientist and as such, I bear a
heavy burden of guilt for what we have to
5
00:00:30,609 --> 00:00:38,040
work with. Shared guilt, to be fair, but
guilt all the same. In many, many years of
6
00:00:38,040 --> 00:00:44,670
higher education, I know two things about
computers. Maybe you've figured them out
7
00:00:44,670 --> 00:00:55,000
slightly faster. The first thing, there's
an awful lot of them about. Shocking. And
8
00:00:55,000 --> 00:01:03,300
the second is that they go wrong. They go
wrong a lot. Really a lot. A lot. A lot!
9
00:01:03,300 --> 00:01:09,009
And they go wrong in many interestingly
different ways. So sometimes they go wrong
10
00:01:09,009 --> 00:01:17,850
in spectacular ways, right? With
fireworks. And here, we see...
11
00:01:17,850 --> 00:01:24,030
laughterapplause
12
00:01:24,030 --> 00:01:31,260
Here we see half a billion dollars worth
of sparkle. You don't get that every day.
13
00:01:31,260 --> 00:01:34,970
And so what was going on? So as I
understand it, what was going on was that
14
00:01:34,970 --> 00:01:40,600
the - this was the first flight of the
Ariane 5 launcher and it reused some of
15
00:01:40,600 --> 00:01:45,851
the guidance software from Ariane 4.
That's fine. Good, tested code. Should
16
00:01:45,851 --> 00:01:53,030
reuse it. But the launch profile for
Ariane 5 went sideways a bit more - on
17
00:01:53,030 --> 00:01:58,830
purpose - than the Ariane 5 launch. And as
a result, one of the sideways values in
18
00:01:58,830 --> 00:02:04,740
the guidance control software overflowed.
Bad. And hence, the guidance software
19
00:02:04,740 --> 00:02:08,449
decided that something bad was going wrong
and in order to not land rockets on
20
00:02:08,449 --> 00:02:15,841
people's heads, it kindly blew itself up.
Okay. So that's a spectacular. Sometimes
21
00:02:15,841 --> 00:02:21,200
they go wrong in insidious ways. And you
people probably know much more about that
22
00:02:21,200 --> 00:02:29,370
than me. Very insidious ways. Exploited by
bad people to our detriment. And they go
23
00:02:29,370 --> 00:02:36,990
wrong in insidious ways, exploited by our
kindly governments for our protection.
24
00:02:36,990 --> 00:02:44,959
Right? And these - there are many, many,
many different causes of these things but
25
00:02:44,959 --> 00:02:49,420
many of these causes are completely
trivial. You know, you have one line of
26
00:02:49,420 --> 00:02:56,150
code in the wrong place - something like
that, right? So that's very, very bad and
27
00:02:56,150 --> 00:03:01,040
you're all conscious of how bad that is.
But from my point of view, this is as
28
00:03:01,040 --> 00:03:06,959
nothing with the real disaster that we
have been facing and continue to face.
29
00:03:06,959 --> 00:03:14,950
Right? The real disaster is the enormous
waste of human life and energy and effort
30
00:03:14,950 --> 00:03:21,360
and emotion dealing with these crap
machines that we've built. Right? And it's
31
00:03:21,360 --> 00:03:25,489
hard to quantify that. I did a little
experiment a couple of months ago. I
32
00:03:25,489 --> 00:03:34,030
Googled up "Android problem" and "Windows
problem" and got about 300 million hits
33
00:03:34,030 --> 00:03:43,840
for each, right? Somehow indicative of the
amount of wrongness. So this should you
34
00:03:43,840 --> 00:03:49,790
seem kind of familiar to you. If you think
back - think back to when you were young.
35
00:03:49,790 --> 00:03:55,700
Maybe back in the 19th century when you
were young, and what were you doing then
36
00:03:55,700 --> 00:04:00,940
as a community of hackers and makers and
builders? Well, this was the full-on
37
00:04:00,940 --> 00:04:07,130
industrial revolution. You were building
stuff. You were building beautiful bridges
38
00:04:07,130 --> 00:04:14,739
of cast-iron and stone with numerous
arches and pillars. And every so often
39
00:04:14,739 --> 00:04:22,300
they would fall down. I believe someone
miscalculated the wind loading. And we
40
00:04:22,300 --> 00:04:30,630
would build magical steam engines to whisk
us away. And every so often, they would
41
00:04:30,630 --> 00:04:35,770
blow up. And that kind of thing, as you
may have noticed in the last hundred years
42
00:04:35,770 --> 00:04:41,810
- sorry - that kind of thing doesn't
happen very often any more, right? We have
43
00:04:41,810 --> 00:04:46,990
pretty good civil and mechanical
engineering. And what does it mean to have
44
00:04:46,990 --> 00:04:52,620
good civil and mechanical engineering? It
means that we know enough thermodynamics
45
00:04:52,620 --> 00:04:56,750
and materials science and quality control
management and all that kind of thing to
46
00:04:56,750 --> 00:05:01,830
model our designs before we build them and
predict how they're going to behave pretty
47
00:05:01,830 --> 00:05:10,750
well and with pretty high confidence,
right? I see an optimisation in this talk.
48
00:05:10,750 --> 00:05:15,340
For computer science, for computing
systems we can predict with 100 per cent
49
00:05:15,340 --> 00:05:20,450
certainty that they will not work.
Ah, time for lunch!
50
00:05:20,450 --> 00:05:26,509
applause
51
00:05:26,509 --> 00:05:33,330
But it's not a very satisfying answer,
really. So why is it so hard? Let me
52
00:05:33,330 --> 00:05:38,500
discuss a couple of the reasons why
computing is tricky. In some sense,
53
00:05:38,500 --> 00:05:45,430
trickier than steam engines. The first
reason is that there is just too much
54
00:05:45,430 --> 00:05:50,360
code. Right? So this is one of the more
scary graphs I've found on the internet
55
00:05:50,360 --> 00:05:55,470
recently. This is a measure of code in
hundreds of millions of lines. Each of
56
00:05:55,470 --> 00:06:00,250
these blocks I think is 10 million lines
of code. And in the little letters - you
57
00:06:00,250 --> 00:06:04,860
might not be able to see we have Android
and the Large Hadron Collider and Windows
58
00:06:04,860 --> 00:06:12,820
Vista and Facebook, US Army Future Combat
System, Mac OS and software in a high-end
59
00:06:12,820 --> 00:06:15,010
car. Right?
60
00:06:15,010 --> 00:06:23,230
laughterapplause
61
00:06:23,230 --> 00:06:25,320
I'm flying home.
62
00:06:25,320 --> 00:06:27,810
laughter
63
00:06:27,810 --> 00:06:33,520
So we that's a bit of a problem, really.
We're never going to get that right. Then
64
00:06:33,520 --> 00:06:37,410
there's too much legacy complexity there,
right? Really too much. So as
65
00:06:37,410 --> 00:06:41,840
right-minded, pure-minded people you might
think that software and hardware should be
66
00:06:41,840 --> 00:06:46,490
architected beautifully with regular
structure a bit like a Mies van der Rohe
67
00:06:46,490 --> 00:06:53,660
skyscraper or something. Clean. Okay. You
know that's not realistic, right? You
68
00:06:53,660 --> 00:06:59,470
might expect a bit of baroqueness. You
might expect maybe something more like
69
00:06:59,470 --> 00:07:02,680
this. You know, it's a bit twiddly but it
still hangs together. It's still got
70
00:07:02,680 --> 00:07:08,160
structure. It still works. But then you
stop and you think and you realise that
71
00:07:08,160 --> 00:07:15,770
software and hardware is built by very
smart people and usually in big groups and
72
00:07:15,770 --> 00:07:20,300
subject to a whole assortment of rather
intense commercial pressure and related
73
00:07:20,300 --> 00:07:25,210
open-source pressure and using the best
components and tools that they know and
74
00:07:25,210 --> 00:07:27,169
they like.
75
00:07:27,169 --> 00:07:34,520
laughterapplause
76
00:07:34,520 --> 00:07:38,990
Someone from the room: (Indistinct)
You forgot management!
77
00:07:38,990 --> 00:07:42,480
And the best management that
human beings can arrange.
78
00:07:42,480 --> 00:07:49,680
laughterapplause
79
00:07:49,680 --> 00:07:54,910
So we end up with something
spectacularly ingenious.
80
00:07:54,910 --> 00:08:05,490
laughterapplause
81
00:08:05,490 --> 00:08:09,730
You can see here, if you look
closely, you can see all the parts.
82
00:08:09,730 --> 00:08:11,000
laughter
83
00:08:11,000 --> 00:08:15,770
This was just a found picture from the
internet. But you can see there's a bunch
84
00:08:15,770 --> 00:08:19,060
of, you know, transparent sort of
silicon-like stuff down here. Let's say
85
00:08:19,060 --> 00:08:25,079
that's our hardware. And over here, I see
a C compiler with all of its phases.
86
00:08:25,079 --> 00:08:28,599
laughter
87
00:08:28,599 --> 00:08:36,930
And these bottles, that's the file system,
I reckon. And there's a TCP stack in here
88
00:08:36,930 --> 00:08:43,690
somewhere. And there's a piece that
belongs to the NSA, obviously. And up
89
00:08:43,690 --> 00:08:50,259
here, I think up here - yeah, there's a
JavaScript engine and a TLS stack and a
90
00:08:50,259 --> 00:08:56,050
web browser. And then perched at the top
is the software that most people use to
91
00:08:56,050 --> 00:08:57,910
talk to their banks.
92
00:08:57,910 --> 00:09:01,489
laughter
93
00:09:01,489 --> 00:09:09,489
It's just - it's just insane, right? And
then we have to look at how we build these
94
00:09:09,489 --> 00:09:14,089
pieces, right? So we build them
fundamentally by a process of trial and
95
00:09:14,089 --> 00:09:20,649
error, right? Just occasionally we write
some specifications in text. And then we
96
00:09:20,649 --> 00:09:26,029
write some code and maybe we write a few
ad hoc texts and then we test and fix
97
00:09:26,029 --> 00:09:30,610
until the thing is marketable and then we
test and fix and extend until the thing is
98
00:09:30,610 --> 00:09:36,069
no longer marketable and then we keep on
using it until we can't anymore. Right? So
99
00:09:36,069 --> 00:09:40,259
the fundamental process that we're using
to develop these systems is trial and
100
00:09:40,259 --> 00:09:47,930
error by exploring particular execution
paths of the code on particular inputs.
101
00:09:47,930 --> 00:09:54,459
But - you know this, but I want to
highlight - there are too many. The number
102
00:09:54,459 --> 00:09:59,649
of execution paths scales typically at
least exponentially with the size of the
103
00:09:59,649 --> 00:10:04,490
code and the number of states scales
exponentially with the size of the data
104
00:10:04,490 --> 00:10:08,790
that you're dealing with and the number of
possible inputs is also shockingly large.
105
00:10:08,790 --> 00:10:14,920
There is no way that we can do this and we
should perhaps stop deluding ourselves
106
00:10:14,920 --> 00:10:21,129
that there is. And then the most
fundamental reason that computers are hard
107
00:10:21,129 --> 00:10:27,899
is that they are too discrete, right? If
you take a bridge or, I don't know, a
108
00:10:27,899 --> 00:10:36,519
chair or something - where's something
slightly bendable? You take a chair and
109
00:10:36,519 --> 00:10:44,199
you exert a bit more force on it than it's
used to - I'm not very strong - and it
110
00:10:44,199 --> 00:10:51,299
bends a bit. It deforms continuously. If
you take a bridge and you tighten up one
111
00:10:51,299 --> 00:10:55,239
of the bolts a bit too much or you
underspecify one of the girders by 10 per
112
00:10:55,239 --> 00:11:01,019
cent, occasionally you'll reach a point of
catastrophic failure but usually it will
113
00:11:01,019 --> 00:11:04,089
just be a little bit weaker or a little
bit stronger or a little bit wibbly.
114
00:11:04,089 --> 00:11:08,960
Right? But computer systems, you change a
bit - sometimes it doesn't make a
115
00:11:08,960 --> 00:11:13,179
difference but disturbingly often you
change a line of code - as in system of
116
00:11:13,179 --> 00:11:19,600
those bugs that we were talking about -
the whole thing becomes broken, right? So
117
00:11:19,600 --> 00:11:26,929
it's quite different from that traditional
engineering. It's really different. Okay.
118
00:11:26,929 --> 00:11:34,850
So, "A New Dawn", this thing is titled.
What can we do? What might we do? So I'm
119
00:11:34,850 --> 00:11:40,879
going to talk about several different
possibilities, all right? One thing, we
120
00:11:40,879 --> 00:11:46,490
can do more of the stuff which is
traditionally called software engineering.
121
00:11:46,490 --> 00:11:50,910
It's not really engineering in my book,
but some of it's good. I mean, it's
122
00:11:50,910 --> 00:11:55,119
useful, you know? We can do more testing
and have more assertions in our code and
123
00:11:55,119 --> 00:11:58,920
better management maybe and all that kind
of stuff. We can do all that - we should
124
00:11:58,920 --> 00:12:04,339
do all that, but it's not going to make a
very big change, all right? It's not
125
00:12:04,339 --> 00:12:13,989
addressing any of those root causes of the
problem. What else could we do? So as an
126
00:12:13,989 --> 00:12:18,570
academic computer scientist, I've devoted
several years of my life to working in the
127
00:12:18,570 --> 00:12:24,509
area of programming languages and I look
at the programming languages used out
128
00:12:24,509 --> 00:12:36,919
there in the world today. Oh, it's just
disgusting! It's shocking.
129
00:12:36,919 --> 00:12:43,399
applause
130
00:12:43,399 --> 00:12:50,420
The reasons that languages become popular
and built into our shared infrastructure,
131
00:12:50,420 --> 00:12:54,790
the reasons for that have almost nothing
to do with how well those languages are
132
00:12:54,790 --> 00:13:00,600
designed and in fact whether
they are designed at all, right?
133
00:13:00,600 --> 00:13:04,110
applause
134
00:13:04,110 --> 00:13:09,040
So I didn't really want to get too much
hate mail after this talk so I'm going to
135
00:13:09,040 --> 00:13:13,999
try and avoid naming particular languages
as much as I can, but I encourage you to
136
00:13:13,999 --> 00:13:20,279
think away as I speak and imagine the
examples. No, that was - I didn't want to
137
00:13:20,279 --> 00:13:23,730
get much hate mail. Sorry.
138
00:13:23,730 --> 00:13:25,160
laughter
139
00:13:25,160 --> 00:13:26,579
So...
140
00:13:26,579 --> 00:13:32,069
laughterapplause
141
00:13:32,069 --> 00:13:36,540
So at the very least, we could use
programming languages based on ideas from
142
00:13:36,540 --> 00:13:47,389
the 1970s instead of the 1960s. I mean,
really, come on. Right? You know, back in
143
00:13:47,389 --> 00:13:54,120
'67 or '69 or whatever we had ECPL and C
and only a couple of years later we had
144
00:13:54,120 --> 00:14:00,069
languages that guaranteed type and memory
safety and enforcement of abstraction
145
00:14:00,069 --> 00:14:05,089
boundaries and gave us rich mechanisms for
programming so we could say what we mean
146
00:14:05,089 --> 00:14:11,350
and not futz about with 23 million
pointers, about 1 per cent of which were
147
00:14:11,350 --> 00:14:16,629
completely wrong, right? So for any
particular - in the context of any
148
00:14:16,629 --> 00:14:20,459
particular project or any particular
existing legacy infrastructure there may
149
00:14:20,459 --> 00:14:25,789
be very good reasons why one just can't
make a switch. Right? But for us
150
00:14:25,789 --> 00:14:31,860
collectively, there is no excuse. It's
just wretched, right?
151
00:14:31,860 --> 00:14:37,040
applause
152
00:14:37,040 --> 00:14:40,969
It's completely bonkers. And then the
other thing that you see as a programming
153
00:14:40,969 --> 00:14:49,239
language researcher is that - er, well, to
make another analogy, it looks as if
154
00:14:49,239 --> 00:14:55,529
anyone who was capable of driving a car
considered themselves an expert on car
155
00:14:55,529 --> 00:15:03,559
design, right? There is in fact a rather
well-established subject of, you know,
156
00:15:03,559 --> 00:15:08,239
programming language design and we know
how to specify completely precisely,
157
00:15:08,239 --> 00:15:13,019
mathematically precisely not just the
syntax like we had in these BNF grammars
158
00:15:13,019 --> 00:15:18,160
from the 1960s and still have, but also
the behaviour of these things. We know how
159
00:15:18,160 --> 00:15:22,699
to specify a few operational semantics and
the type systems. And we know how to do
160
00:15:22,699 --> 00:15:27,360
mathematical proofs that our language
designs are self-consistent, that they do
161
00:15:27,360 --> 00:15:35,059
guarantee good properties of arbitrary
well-typed programs that you write, right?
162
00:15:35,059 --> 00:15:38,759
We can do this now. We can do this, if we
have to, post hoc. And people are doing
163
00:15:38,759 --> 00:15:45,669
this for JavaScript and C and god help
them, PHP and other such languages. But
164
00:15:45,669 --> 00:15:50,829
even better, we can do it at design time.
And if you do this at design time, you see
165
00:15:50,829 --> 00:15:55,509
the structure. You have to - the act of
writing that specification forces you to
166
00:15:55,509 --> 00:15:59,939
consider all the cases and all the
interactions between features. You can
167
00:15:59,939 --> 00:16:07,861
still get it wrong, but you get it wrong
in more interesting ways. Right? So people
168
00:16:07,861 --> 00:16:12,119
are starting to do this now. So I think
there was a talk on the first day of this
169
00:16:12,119 --> 00:16:17,429
by Hannes and David who are building
system software in maybe not the best sys
170
00:16:17,429 --> 00:16:21,800
language you could possibly imagine, but
something which compared with C is as
171
00:16:21,800 --> 00:16:27,759
manna from heaven. Anil Madhavapeddy and
his colleagues in Cambridge have been
172
00:16:27,759 --> 00:16:32,759
working hard to build system software in
at least moderately modern languages. And
173
00:16:32,759 --> 00:16:40,780
it works. Sometimes you have to use C but
surprisingly often you really don't. Yeah,
174
00:16:40,780 --> 00:16:47,049
so I teach, as my introducer said, in the
University of Cambridge. And I teach
175
00:16:47,049 --> 00:16:51,269
semantics. And one of my fears is that I
will accidentally let someone out in the
176
00:16:51,269 --> 00:16:54,970
world who will accidentally find
themselves over a weekend inventing a
177
00:16:54,970 --> 00:16:58,149
little scripting language which will
accidentally take over the world and
178
00:16:58,149 --> 00:17:05,359
become popular and I won't have explained
to them what they have to do or even that
179
00:17:05,359 --> 00:17:08,189
there is a subject they have to
understand. And if you want to understand
180
00:17:08,189 --> 00:17:18,740
it, there's lots of stuff you can look at.
Okay. Let's go to the uppermost extreme.
181
00:17:18,740 --> 00:17:23,829
If we want software and hardware that
actually works, we should prove it
182
00:17:23,829 --> 00:17:30,700
correct. Right? So this is something which
academics, especially in the domain of
183
00:17:30,700 --> 00:17:36,830
labelled "formal methods" in the '70s and
'80s, they've been pushing this, promoting
184
00:17:36,830 --> 00:17:44,789
this as an idea and a promise and an
exhortation for decades, right? Why don't
185
00:17:44,789 --> 00:17:50,169
we just prove our programs correct? And
for some of those decades it wasn't really
186
00:17:50,169 --> 00:17:54,570
plausible. I remember back when I was
considerably younger, it was a big thing
187
00:17:54,570 --> 00:17:59,540
to proof that if you took a linked list
and reversed it twice, you got back the
188
00:17:59,540 --> 00:18:08,059
same thing. Right? That was hard, then.
But now, well, we can't do this for the
189
00:18:08,059 --> 00:18:14,419
Linux kernel, let's say, but we can
surprising often do this. And I just give
190
00:18:14,419 --> 00:18:19,590
you a couple of the examples of modern
day, state-of-the-art academic
191
00:18:19,590 --> 00:18:25,720
verification projects. People have
verified compilers all the way from C-like
192
00:18:25,720 --> 00:18:31,769
languages down to assembly or from ML-like
languages. You know, higher-order
193
00:18:31,769 --> 00:18:36,959
functional imperative languages all the
way down to binary models of how x86
194
00:18:36,959 --> 00:18:44,470
behaves. And people have verified LLVM
optimisation paths. There's verified
195
00:18:44,470 --> 00:18:49,880
software fault isolation which I believe
goes faster than the original non-verified
196
00:18:49,880 --> 00:18:56,130
form. Win for them. There's a verified
hypervisor from Nectar group in Australia.
197
00:18:56,130 --> 00:19:00,570
Right? Verified secure hypervisor with
proofs of security properties. There's any
198
00:19:00,570 --> 00:19:06,690
amount of work verifying crypto protocols,
sometimes with respect to assumptions - I
199
00:19:06,690 --> 00:19:12,510
think reasonable assumption on what the
underlying mathematics is giving you. So
200
00:19:12,510 --> 00:19:22,919
we can do this. I maybe have to explain a
bit what I mean by "prove". To some
201
00:19:22,919 --> 00:19:30,990
computery people, "I proved it works"
means "I tested it a little bit".
202
00:19:30,990 --> 00:19:32,510
laughter
203
00:19:32,510 --> 00:19:34,960
In extremis, "it compiled".
204
00:19:34,960 --> 00:19:44,120
laughterapplause
205
00:19:44,120 --> 00:19:50,770
To a program analysis person, you might
run some sophisticated tool but written in
206
00:19:50,770 --> 00:19:54,649
untrusted code and possibly doing an
approximate and maybe unsound analysis and
207
00:19:54,649 --> 00:19:59,720
you might find an absence of certain kinds
of bugs. That's tremendously useful; it's
208
00:19:59,720 --> 00:20:04,640
not what we would really call "proved".
I mean, nor would they, to be fair.
209
00:20:04,640 --> 00:20:08,970
Scientists generally don't use the word
because they know they're not doing it,
210
00:20:08,970 --> 00:20:13,559
right? They're doing controlled
experiments, which gives them, you know,
211
00:20:13,559 --> 00:20:20,070
information for or against some
hypothesis. Mathematician write proofs.
212
00:20:20,070 --> 00:20:25,139
You probably did that when you were young.
And those proofs are careful mathematical
213
00:20:25,139 --> 00:20:30,139
arguments usually written on paper with
pencils and their aim is to convince a
214
00:20:30,139 --> 00:20:33,389
human being that if that mathematician was
really nailed up against the wall and
215
00:20:33,389 --> 00:20:38,980
pushed, they could expand that into a
completely detailed proof. But what we
216
00:20:38,980 --> 00:20:47,750
have in computing, we don't have the rich
mathematical structure that these people
217
00:20:47,750 --> 00:20:51,950
are working with, we have a tremendous
amount of ad hoc and stupid detail mixed
218
00:20:51,950 --> 00:20:57,769
in which a smidgin of rich mathematical
structure. So - and we have hundreds of
219
00:20:57,769 --> 00:21:03,399
millions of lines of code. So this is just
not going to cut it. And if we want say
220
00:21:03,399 --> 00:21:08,440
the word "prove" ever, then the only thing
which is legitimate is to do honest
221
00:21:08,440 --> 00:21:13,230
program proof and that means you have to
have some system that machine-checks that
222
00:21:13,230 --> 00:21:17,540
your proof is actually a proof. And
sometime we'll also have machines that
223
00:21:17,540 --> 00:21:22,250
will sort of help and sort of hinder that
process, right? And there's a variety of
224
00:21:22,250 --> 00:21:27,340
these systems. Coq and HOL4 and Isabelle4
- Isabelle/HOL and what have you that we
225
00:21:27,340 --> 00:21:36,799
can look up. So using these systems we can
prove nontrivial facts about these. Not
226
00:21:36,799 --> 00:21:41,530
necessarily that they do what you want
them to do, but that they meet the precise
227
00:21:41,530 --> 00:21:48,519
specification that we've written down. We
can do that. But it's tricky, right? So
228
00:21:48,519 --> 00:21:52,930
these projects, these are by academic
standards all quite big, you know? This is
229
00:21:52,930 --> 00:21:58,009
something like, I don't know, 10 people
for a few years or something. By industry
230
00:21:58,009 --> 00:22:03,399
scale, maybe not that big. By your scale,
maybe not that big. But still a lot of
231
00:22:03,399 --> 00:22:08,149
work for a research group to do. And quite
high-end work. It can take, you know,
232
00:22:08,149 --> 00:22:13,730
literally a few years to become really
fluent in using these tools, right? And
233
00:22:13,730 --> 00:22:20,139
there isn't as yet sort of an open-source,
collaborative movement doing this kind of
234
00:22:20,139 --> 00:22:26,710
stuff. Maybe it's the time to start. Maybe
in five years, ten years. I don't know. If
235
00:22:26,710 --> 00:22:35,680
you want a challenge, there's a challenge.
But it's really quite hard, right? It's
236
00:22:35,680 --> 00:22:40,830
not something that one can quite put
forward credibly as a solution for all
237
00:22:40,830 --> 00:22:46,529
software and hardware development, right?
So that leads me to, like an intermediate
238
00:22:46,529 --> 00:22:53,399
point. That should have been a four there.
What can we do which improves the quality
239
00:22:53,399 --> 00:22:58,340
of our system, which injects some
mathematical rigour but which involves
240
00:22:58,340 --> 00:23:06,370
less work and is kind of easy? And what we
can do is take some of these interfaces
241
00:23:06,370 --> 00:23:11,679
and be precise about what they are. And
initially, we have to do that after the
242
00:23:11,679 --> 00:23:16,941
fact. We have to reverse-engineer good
specs of existing stuff. But then we can
243
00:23:16,941 --> 00:23:23,470
use the same techniques to make much
cleaner and better-tested and things which
244
00:23:23,470 --> 00:23:30,670
are easier to test in the future. That's
the idea. So my colleagues and I have been
245
00:23:30,670 --> 00:23:35,360
doing this kind of stuff for the last
quite a few years and I just want to give
246
00:23:35,360 --> 00:23:41,570
you just a - two little vignettes just to
sort of show you how it goes. So I can't
247
00:23:41,570 --> 00:23:46,759
give you much detail. And this is,
obviously, joint work with a whole bunch
248
00:23:46,759 --> 00:23:55,399
of very smart and good people, some of
which I name here. So this is not me, this
249
00:23:55,399 --> 00:24:00,840
is a whole community effort. So I'm going
to talk for a little bit about the
250
00:24:00,840 --> 00:24:05,799
behaviour of multiprocessors, so at that
hardware interface. And I'm going to talk
251
00:24:05,799 --> 00:24:11,090
a tiny little bit about the behaviour of
the TCP and socket API, right? Network
252
00:24:11,090 --> 00:24:17,169
protocols. So - and there'll be some
similarities and some differences between
253
00:24:17,169 --> 00:24:24,990
these two things. So multiprocessors. You
probably want your computers to go fast.
254
00:24:24,990 --> 00:24:31,090
Most people do. And it's an obvious idea
to glom together, because processors don't
255
00:24:31,090 --> 00:24:35,529
go that fast, let's glom together a whole
bunch of processors and make them all talk
256
00:24:35,529 --> 00:24:46,150
to the same memory. This is not a new
idea. It goes back at least to 1962 when
257
00:24:46,150 --> 00:24:51,380
the Burroughs D825 had, I think, two
processors talking to a single shared
258
00:24:51,380 --> 00:24:55,850
memory. It had outstanding features
including truly modular hardware with
259
00:24:55,850 --> 00:25:02,649
parallel processing throughout, and - some
things do not change very much - the
260
00:25:02,649 --> 00:25:11,640
complement of compiling languages was to
be expanded, right? 1962, so that'll be 52
261
00:25:11,640 --> 00:25:18,820
years ago. Deary me. Okay. Now, how do
these things behave? So let me show you
262
00:25:18,820 --> 00:25:22,899
some code, right? It's a hacker
conference. There should be code. Let me
263
00:25:22,899 --> 00:25:29,030
show you two example programs and these
will both be programs with two threads and
264
00:25:29,030 --> 00:25:34,570
they will both be acting on two shared
variables, X and Y. And in the initial
265
00:25:34,570 --> 00:25:42,899
state, X and Y are both zero. So, the
first program. On this thread we write X
266
00:25:42,899 --> 00:25:48,970
and then we read Y, and over here we write
Y and then we read X, right? Now, these
267
00:25:48,970 --> 00:25:53,629
are operating, you know, in an
interleaving concurrency, you might think,
268
00:25:53,629 --> 00:25:58,030
so there's no synchronisation between
those two threads so we might interleave
269
00:25:58,030 --> 00:26:03,269
them. We might run all of thread 0 before
all of thread 1 or all of thread 1 before
270
00:26:03,269 --> 00:26:10,340
all of thread 0 or they might be mixed in
like that or like that or like that or
271
00:26:10,340 --> 00:26:16,940
like that. There's six possible ways of
interleaving two lists of two things. And
272
00:26:16,940 --> 00:26:22,980
in all of those ways you never see in the
same execution that this reads from the
273
00:26:22,980 --> 00:26:30,169
initial state AND this reads from the
initial state. You never see that. So you
274
00:26:30,169 --> 00:26:35,479
might expect and you might assume in your
code that these are the possible outcomes.
275
00:26:35,479 --> 00:26:40,490
So what happens if you were to actually
run that code on my laptop in a particular
276
00:26:40,490 --> 00:26:46,970
test harness? Well, okay, you see
occasionally they're quite well
277
00:26:46,970 --> 00:26:52,499
synchronised and they both read the other
guy's right. Sorry, big call. They both
278
00:26:52,499 --> 00:26:57,510
read the other guy's right. And quite
often one thread comes completely before
279
00:26:57,510 --> 00:27:09,769
the other. But sometimes, sometimes they
both see 0. Huh. Rats. Well, that's
280
00:27:09,769 --> 00:27:18,840
interesting. Let me show you another
program. So now, thread 0 writes some
281
00:27:18,840 --> 00:27:23,440
stuff. Maybe it writes a big bunch of
data, maybe multi-word data. And then it
282
00:27:23,440 --> 00:27:28,249
writes, let's say that's a flag announcing
to some another thread that that data is
283
00:27:28,249 --> 00:27:33,249
ready to go now. And the other thread
busy-waits reading that value until it
284
00:27:33,249 --> 00:27:38,710
gets 1, until it sees the flag. And then
it reads the data. So you might want, as a
285
00:27:38,710 --> 00:27:43,610
programmer, you might want a guarantee
that this read will always see the data
286
00:27:43,610 --> 00:27:48,759
that you have initialised. Hey? That would
be like a message-passing kind of an
287
00:27:48,759 --> 00:27:56,229
idiom. So what happens if you run that?
Well, on the x86 processors that we've
288
00:27:56,229 --> 00:28:03,929
tested, you always see that value. Good.
This is a descent coding idiom on x86. All
289
00:28:03,929 --> 00:28:12,230
right. On Arma IBM Power processors,
however, you see sometimes you just ignore
290
00:28:12,230 --> 00:28:16,850
the value written and you see the initial
state. So this is not a good
291
00:28:16,850 --> 00:28:25,539
message-passing idiom, right? So what's
going on? Well, these behaviours, it might
292
00:28:25,539 --> 00:28:31,260
be surprising, eh? And when you see
surprising behaviour in hardware, you have
293
00:28:31,260 --> 00:28:36,100
two choices. Either it's a bug in the
hardware, and we have found a number of
294
00:28:36,100 --> 00:28:45,259
bugs in hardware. It's always - it's very
rewarding. Or it's a bug in your
295
00:28:45,259 --> 00:28:51,350
expectation. Or it's a bug in your test
harness. Right? So what you do is you walk
296
00:28:51,350 --> 00:28:58,050
along and you walk up to your friendly
processor architect in IBM or ARM or x86
297
00:28:58,050 --> 00:29:04,770
land, and we have worked with all of these
people. And you ask them, "Is this a bug?"
298
00:29:04,770 --> 00:29:12,360
And a processor architect is a person who
is - has the authority to decide whether
299
00:29:12,360 --> 00:29:16,240
some behaviour is intended or is a bug.
300
00:29:16,240 --> 00:29:17,750
laughter
301
00:29:17,750 --> 00:29:24,669
That's what they do, essentially. And for
these, they say, "Oh, we meant it to be
302
00:29:24,669 --> 00:29:29,239
like that". Right? No question. "We meant
it to be like that. That's perfectly
303
00:29:29,239 --> 00:29:36,559
proper. We have, because you and everybody
else wanted their computers to go fast, we
304
00:29:36,559 --> 00:29:41,690
as processor designers have introduced all
manner of sophisticated optimizations,
305
00:29:41,690 --> 00:29:45,200
which if you were running sequential code,
you would never notice, but if you start
306
00:29:45,200 --> 00:29:49,289
running fancy concurrent code like this,
fancy concurrent code that isn't just
307
00:29:49,289 --> 00:29:56,049
trivially locked, you can notice", so this
is intentional behaviour. And if you want
308
00:29:56,049 --> 00:30:00,129
to write that code, like a mutual
exclusion algorithm or a message-passing
309
00:30:00,129 --> 00:30:06,460
algorithm or something, then you need to
insert special instructions, memory
310
00:30:06,460 --> 00:30:11,180
barrier instructions. And so you go away
and you say, "What do they do?" And you
311
00:30:11,180 --> 00:30:18,949
look up in the vendor documentation. And
you get stuff like this. I'll read it out:
312
00:30:18,949 --> 00:30:22,190
(Reads quickly) "For any applicable pair
AB, the memory barrier ensures that a will
313
00:30:22,190 --> 00:30:24,740
be performed with respect to any processor
or mechanism, to the extent required by
314
00:30:24,740 --> 00:30:27,590
the associated memory coherence required
at beauties, before b is performed with
315
00:30:27,590 --> 00:30:30,360
respect to that prior or mechanism. A
includes all applicable storage accesses
316
00:30:30,360 --> 00:30:33,090
by any such processor or mechanism that
have been performed with respect to P1
317
00:30:33,090 --> 00:30:36,210
before the memory barrier is created. B
includes all applicable storage accesses
318
00:30:36,210 --> 00:30:38,670
by any such processor or mechanism that
are performed after a Load instruction
319
00:30:38,670 --> 00:30:41,290
executed by that processor or mechanism
has returned the value stored by a store
320
00:30:41,290 --> 00:30:42,700
that is in B."
321
00:30:42,700 --> 00:30:49,850
applause
322
00:30:49,850 --> 00:30:58,630
Hands up if you understand what that
means? Hands up if you think that if you
323
00:30:58,630 --> 00:31:02,139
thought about it and read the rest of the
book, you would understand or you could
324
00:31:02,139 --> 00:31:09,730
understand what that means? Hands up the
people who think that we're doomed forever?
325
00:31:09,730 --> 00:31:12,920
laughter
326
00:31:12,920 --> 00:31:16,979
So I'm sorry to the first few groups,
but the last ones are right.
327
00:31:16,979 --> 00:31:22,299
laughterapplause
328
00:31:22,299 --> 00:31:28,850
This looks like it's really intricate and
really carefully thought out stuff. And we
329
00:31:28,850 --> 00:31:32,220
thought that for a while and we spent
literally years trying to make precise
330
00:31:32,220 --> 00:31:37,159
mathematical models that give precise
meaning to these words, but actually it
331
00:31:37,159 --> 00:31:45,460
can't be done. So what do you have to do
in that circumstance? You have to go away
332
00:31:45,460 --> 00:31:52,120
and you have to invent some kind of a
model. And there's a - this is a really
333
00:31:52,120 --> 00:31:56,190
fundamental problem, that we - on the one
hand, we develop our software by this
334
00:31:56,190 --> 00:32:01,960
trial and error process, but on the other
hand are points like this. We have these
335
00:32:01,960 --> 00:32:06,470
loose specifications, supposedly, that
have to cover all manner of behaviour of
336
00:32:06,470 --> 00:32:10,730
many generations of processors that might
all behave the same way, written just in
337
00:32:10,730 --> 00:32:16,769
text. And we tell people they should write
"to the spec", but the only way they have
338
00:32:16,769 --> 00:32:22,450
of developing their code is to run it and
run particular executions on the
339
00:32:22,450 --> 00:32:26,960
particular hardware implementations that
they have, whose relationship to the spec
340
00:32:26,960 --> 00:32:34,740
is very elusive. We can't use those thick
books that you get or those quite weighty
341
00:32:34,740 --> 00:32:39,260
PDFs that you get these days from the
processor vendors to test programs. You
342
00:32:39,260 --> 00:32:43,970
can't feed a program into that thick book
and get output out. And you can't test
343
00:32:43,970 --> 00:32:47,429
processor implementations and you can't
prove anything and you can't even
344
00:32:47,429 --> 00:32:53,430
communicate between human beings, right?
So these things, really, they don't exist.
345
00:32:53,430 --> 00:32:58,500
So what can we do? Well, the best we can
do at this point in time is a bit of
346
00:32:58,500 --> 00:33:04,950
empirical science, right? So we can invent
some model off the top of our heads,
347
00:33:04,950 --> 00:33:09,649
trying to describe just the
programmer-visible behaviour, not the
348
00:33:09,649 --> 00:33:15,529
internal structure. And we can make a
tool. Because that's not prose now. Now we
349
00:33:15,529 --> 00:33:18,950
can - now it's stuff. It's real
mathematics and we can turn that into code
350
00:33:18,950 --> 00:33:23,200
and execute it. We can make tools that
take programs and execute it and - small
351
00:33:23,200 --> 00:33:28,280
programs - tell us the set of all
behaviours allowed by that model. And then
352
00:33:28,280 --> 00:33:33,759
we can compare those sets of behaviours
against the experimental observations. And
353
00:33:33,759 --> 00:33:38,360
you might have to fix the model and you
might find hard bugs. And then the model
354
00:33:38,360 --> 00:33:42,869
also has to make sense to the architect so
you can discuss with the architects what
355
00:33:42,869 --> 00:33:47,100
they intend and what their institution is,
and then you can also prove some other
356
00:33:47,100 --> 00:33:50,980
facts about it to give a bit more
assurance and then because you probably
357
00:33:50,980 --> 00:33:57,249
got this wrong the first couple of times,
you can go back. And this results or has
358
00:33:57,249 --> 00:34:04,509
resulted in models which are not
guaranteed to be correct, but they are
359
00:34:04,509 --> 00:34:08,639
effectively the de facto standard for
understanding the concurrency behaviour of
360
00:34:08,639 --> 00:34:14,300
these things, you know? And various people
use them. And just to give you a tiny
361
00:34:14,300 --> 00:34:18,570
snippet - I'll have to speed up a bit - a
tiny snippet of what the model looks like,
362
00:34:18,570 --> 00:34:22,969
it's basically just an abstract machine.
It's got some stuff for the threads that
363
00:34:22,969 --> 00:34:26,949
handles the programmer-visible speculative
execution, and some stuff for a storage
364
00:34:26,949 --> 00:34:32,739
subsystem that handles the fact that in
these architectures, memory rights can be
365
00:34:32,739 --> 00:34:38,270
propagated to different threads at
different times, right? And there's a -
366
00:34:38,270 --> 00:34:42,540
that model has a state which is just a
collection of some sets and lists and
367
00:34:42,540 --> 00:34:47,230
partial orders and a few other things
which I won't talk about. And it has some
368
00:34:47,230 --> 00:34:51,030
transitions, right? In any state, there
might be several possible transitions.
369
00:34:51,030 --> 00:34:54,699
This is just a sample. I'm not going to
explain all of this. But when you want to
370
00:34:54,699 --> 00:34:58,570
propagate a write to another thread, there
have to be some preconditions that you have to
371
00:34:58,570 --> 00:35:07,140
satisfy. And than there is an action. It's not
amazingly complicated, really. So this is text,
372
00:35:07,140 --> 00:35:13,319
but it's very precisely written text and it has
the great advantage that you can go through
373
00:35:13,319 --> 00:35:16,570
these bullet points with your friendly
architect and say, "Is this what you
374
00:35:16,570 --> 00:35:22,440
mean?" And they can think about it and say
yes or no. But for the actual definition,
375
00:35:22,440 --> 00:35:27,309
that's in mathematics but it's mathematics
that's quite close to code. I mean, it's
376
00:35:27,309 --> 00:35:32,380
terribly close to pure, functional code
with outside effects. And just to give you
377
00:35:32,380 --> 00:35:37,230
an idea, this is some of it and those are
three of those conditions in the real,
378
00:35:37,230 --> 00:35:42,660
honest, true version. Then we can take
this mathematics and because it's in a
379
00:35:42,660 --> 00:35:49,270
particular style, we can compile this into
actually OCaml and then OCaml byte code
380
00:35:49,270 --> 00:35:55,770
and then we can run it for batch jobs. But
then you can compile that OCaml byte code
381
00:35:55,770 --> 00:36:02,579
to JavaScript and glue on an user interface
and stick it into a web browser and then you
382
00:36:02,579 --> 00:36:07,760
have a web interface that lets people
explore this model. And that's also a
383
00:36:07,760 --> 00:36:16,190
necessary part of validating that it makes
sense. Okay. This is not rocket science.
384
00:36:16,190 --> 00:36:21,710
You've missed the talk for rocket science,
I'm sorry. All we're doing here is
385
00:36:21,710 --> 00:36:28,630
specifying the intended behaviour of a
system. Okay? That's - it's not very
386
00:36:28,630 --> 00:36:32,490
technical but it's unusual. And we happen
to be doing it in this mathematical
387
00:36:32,490 --> 00:36:37,049
language, but you could use in fact almost
any language so long as you understand
388
00:36:37,049 --> 00:36:40,170
what you're doing. What you have to
understand is that you're writing
389
00:36:40,170 --> 00:36:45,230
something which is not an implementation.
It is something which, given some observed
390
00:36:45,230 --> 00:36:49,810
behaviour, some arbitrary observed
behaviour, will be able to decide if it's
391
00:36:49,810 --> 00:36:54,580
allowed, if you want it to be allowed or
not, right? It has to be executed as a
392
00:36:54,580 --> 00:37:02,059
test oracle. The key question for getting
this to work is to understand how much
393
00:37:02,059 --> 00:37:05,730
non-determinism or loose specification
there is in the system that you're working
394
00:37:05,730 --> 00:37:08,890
with, right? So if everything is
completely deterministic, you can write a
395
00:37:08,890 --> 00:37:13,839
reference implementation in the cleanest
language of your choice and just compare
396
00:37:13,839 --> 00:37:19,270
the output of that and the output of the
real thing, right? But if there's more
397
00:37:19,270 --> 00:37:24,000
non-determinism, then you can't do that.
I'm going to have to abbreviate this a
398
00:37:24,000 --> 00:37:31,369
little tiny bit. So for the TCP network
protocols, there is more non-determinism.
399
00:37:31,369 --> 00:37:35,220
You know what the TCP is, yes? A protocol
that gives you sort of reliable
400
00:37:35,220 --> 00:37:41,720
connections, assuming that the world is
made of good people, right? You look at
401
00:37:41,720 --> 00:37:47,640
the standards for TCP, and they're the
same kind of wibbly text from the 1980s
402
00:37:47,640 --> 00:37:52,400
and you look at the implementations and
they are a ghastly mess. And you try and
403
00:37:52,400 --> 00:37:55,690
understand the relationship between the
two of them and you can't because the
404
00:37:55,690 --> 00:38:01,740
standard, again, is not the definition. It
doesn't define in all circumstances what
405
00:38:01,740 --> 00:38:07,010
behaviour is allowed and what isn't. So
again, we had to make up a specification
406
00:38:07,010 --> 00:38:14,310
in the first instance just of this host, a
single endpoint, and observing its
407
00:38:14,310 --> 00:38:18,510
sockets, API, calls and returns and its
behaviour on the wire, right? And when
408
00:38:18,510 --> 00:38:21,969
you've decided - and a few internal debug
events. And when you've decided what to
409
00:38:21,969 --> 00:38:26,619
observe, then observation is just a trace,
it's just a list of those events. And you
410
00:38:26,619 --> 00:38:34,750
have to be able to ask your spec "Is that
list admissible or not"? But there's a lot
411
00:38:34,750 --> 00:38:38,820
of non-determinism. Variation between the
implementations, variations within the
412
00:38:38,820 --> 00:38:45,650
implementations. And that's internal. It's
not announced in the API or on the wire
413
00:38:45,650 --> 00:38:50,980
maybe until much later when the
implementation picks a new window size or
414
00:38:50,980 --> 00:38:56,890
something. You can't tell until quite a
lot later in the trace what it's picked.
415
00:38:56,890 --> 00:39:02,740
And that makes the job of checking whether
a trace is admissible by the spec much
416
00:39:02,740 --> 00:39:08,559
harder than it has to be. Right? And if
you had designed the TCP protocol and
417
00:39:08,559 --> 00:39:14,470
those implementations for to be testable
in this very discriminating way, back in
418
00:39:14,470 --> 00:39:20,470
the 1980s when you were designing TCP
protocol, it would be much easier. And for
419
00:39:20,470 --> 00:39:25,650
new protocols, one should make sure that
you're doing this in this particular way.
420
00:39:25,650 --> 00:39:30,930
I don't think I want to talk about that
slide. That's just one of the rules of
421
00:39:30,930 --> 00:39:35,480
that specification. But the key fact about
that spec is the - well, we handled the
422
00:39:35,480 --> 00:39:40,210
real protocols for arbitrary inputs. But
it's structured not just for this
423
00:39:40,210 --> 00:39:46,240
executability as a test oracle, but it's
structured for clarity, not performance,
424
00:39:46,240 --> 00:39:49,750
which is scarcely ever what anyone ever
does, right? So it's a whole new kind of
425
00:39:49,750 --> 00:39:54,480
thing. And the testing is very
discriminating. So we found all manner of
426
00:39:54,480 --> 00:40:03,839
amusing and bizarre bugs which I think I
don't have time to talk about. Okay. So
427
00:40:03,839 --> 00:40:12,610
I've described three kinds of things that
we might do. The first thing - for
428
00:40:12,610 --> 00:40:18,800
goodness sake - I mean, it's just a
no-brainer. Just do it already. Everybody.
429
00:40:18,800 --> 00:40:29,670
All of you. All of you. Right? This middle
zone is a very interesting zone as far as
430
00:40:29,670 --> 00:40:33,920
I'm concerned, right? It's - out of what -
a place where we could get a very good
431
00:40:33,920 --> 00:40:39,220
benefit for not that much effort, right?
We can do it, if necessary, post hoc. We
432
00:40:39,220 --> 00:40:44,490
can also do it and even better, at design
time. We have to do this in a way that
433
00:40:44,490 --> 00:40:48,240
makes our executions testable as test
oracles, and another thing that that
434
00:40:48,240 --> 00:40:52,119
enables is completely random testing. When
you've got a test oracle, you can just
435
00:40:52,119 --> 00:40:58,210
feed in random goop. This is fuzzing, but
better fuzzing - feed in random goop and
436
00:40:58,210 --> 00:41:02,859
check at every point whether what the
behaviour that you're getting is allowable
437
00:41:02,859 --> 00:41:09,271
or not. And then they're written for
clarity, not for performance, and that
438
00:41:09,271 --> 00:41:13,619
means you can see what you're doing,
right? You can see the complexity. If you
439
00:41:13,619 --> 00:41:17,140
go ahead and you add some feature to your
protocol or your programming language or
440
00:41:17,140 --> 00:41:22,420
whatever and you're working just with text
specifications, then you can't see the
441
00:41:22,420 --> 00:41:24,759
interactions. You just chuck in a couple
of paragraphs and you think for a few
442
00:41:24,759 --> 00:41:30,210
minutes, right? And if you're lucky, you
make an implementation. But if you're
443
00:41:30,210 --> 00:41:35,309
writing a spec that has to cover all the
cases like this, the act of doing that
444
00:41:35,309 --> 00:41:41,150
encourages you to think or helps you think
about the excess complexity. And you might
445
00:41:41,150 --> 00:41:46,690
think that's too complex, I'm being silly,
I'll make it simpler. And it has to be
446
00:41:46,690 --> 00:41:51,630
usable for proof. So this, I think also is
pretty much a no-brainer. And it doesn't
447
00:41:51,630 --> 00:41:56,650
require great technical, you know,
mathematical skill either, right? Lots of
448
00:41:56,650 --> 00:42:01,760
people can do this. And then there's
option 3, best option, full-on formal
449
00:42:01,760 --> 00:42:07,930
verification of the key components. And
that is now also in reach. You can imagine
450
00:42:07,930 --> 00:42:16,309
secure systems made using actual verified
compilers and verified hypervisors with
451
00:42:16,309 --> 00:42:22,540
verified TLS stacks and you can imagine
making things out of those or making those
452
00:42:22,540 --> 00:42:27,870
and then making things out of those. And
maybe one should be thinking about that.
453
00:42:27,870 --> 00:42:34,070
So, maybe not appropriate for everything.
If you were writing Word, you would not
454
00:42:34,070 --> 00:42:38,599
wish to do these things. Probably, you're
not. But for this key infrastructure that
455
00:42:38,599 --> 00:42:46,420
we really depend on, we trust even though
it's not trustworthy, we have to do this.
456
00:42:46,420 --> 00:42:52,450
Is this a new dawn? I wonder if there's
anyone standing next to a light switch
457
00:42:52,450 --> 00:42:58,920
that can dim them for me. I didn't ask
them beforehand, so... If you think back
458
00:42:58,920 --> 00:43:05,540
the last 70-odd years, we started building
computers in around 1940. It's been pretty
459
00:43:05,540 --> 00:43:12,760
dark from the point of view of rigorous,
solid, reliable engineering, stuff that is
460
00:43:12,760 --> 00:43:21,760
actually trustworthy. Pretty dark, I would
say. Maybe, just maybe there's a tiny
461
00:43:21,760 --> 00:43:26,339
smidgen of light coming through the doors.
And we start to have a little bit of a
462
00:43:26,339 --> 00:43:31,109
clue and we start to get reusable models
of processor architectures and programming
463
00:43:31,109 --> 00:43:35,540
languages and network protocols and what
have you. It's just the beginnings of the
464
00:43:35,540 --> 00:43:39,790
analogue of that thermodynamics and
materials science and quality control
465
00:43:39,790 --> 00:43:47,819
management and what have you that we have
from mechanical engineering. And we've got
466
00:43:47,819 --> 00:43:57,200
no choice. If we don't, the next 75 years
is going to be a lot worse. You can just
467
00:43:57,200 --> 00:44:02,400
imagine, right? So I went to this - as I'm
sure many of you do - this great talk on
468
00:44:02,400 --> 00:44:08,370
some Apple boot loader exploit yesterday
which was relying on some feature that had
469
00:44:08,370 --> 00:44:14,900
been introduced in the early '80s and was
still present. And you can imagine in 100
470
00:44:14,900 --> 00:44:22,800
years from now, you can imagine as long as
human civilisation endures, the x86
471
00:44:22,800 --> 00:44:28,869
architecture and the socket API and all of
this stuff, it will be embedded in some
472
00:44:28,869 --> 00:44:34,099
monumental ghastly stack of
virtualisation layers forever.
473
00:44:34,099 --> 00:44:45,760
laughterapplause
474
00:44:45,760 --> 00:44:50,720
So I'd like to thank especially all of my
colleagues that have been working with me
475
00:44:50,720 --> 00:44:56,550
or not with me in these directions. I'd
like to thank our generous funders who
476
00:44:56,550 --> 00:45:01,389
support this stuff. I'd like to thank you
for your attention and I exhort you,
477
00:45:01,389 --> 00:45:04,850
sort it out.
478
00:45:04,850 --> 00:45:20,800
applause
479
00:45:20,800 --> 00:45:25,300
Herald: Thank you very much, Peter, for
those interesting insights to our
480
00:45:25,300 --> 00:45:31,269
programming. We have now time for a Q & A,
so those people who have to leave the
481
00:45:31,269 --> 00:45:39,970
room, please do so quietly and as fast as
possible, so we can go on and hear what
482
00:45:39,970 --> 00:45:51,270
the questions are. So we
start with microphone 4, please.
483
00:45:51,270 --> 00:45:58,790
Mic 4: Hello. Thanks for the talk and my
question is if you got an oracle of the
484
00:45:58,790 --> 00:46:04,270
software behaviour which can translate
every possible input to a correct output,
485
00:46:04,270 --> 00:46:08,410
how can this be less complex and
error-prone than the reference
486
00:46:08,410 --> 00:46:10,510
implementation?
487
00:46:10,510 --> 00:46:15,620
Peter: Good question. So the first point
is that this oracle doesn't have to
488
00:46:15,620 --> 00:46:22,010
produce the outputs. It only has to check
that the inputs and the outputs match up.
489
00:46:22,010 --> 00:46:26,560
And then the second point is that in
general it might have to have much of the
490
00:46:26,560 --> 00:46:33,221
same complexity, but by writing it to be
clear as opposed to to be fast, you may
491
00:46:33,221 --> 00:46:37,730
adopt a completely different structure,
right? So the structure of our TCP spec is
492
00:46:37,730 --> 00:46:43,540
organised by the behaviour that you see
for particular transitions. The structure
493
00:46:43,540 --> 00:46:49,160
of a real implementation has fast-path
code and lots of complicated intertwined
494
00:46:49,160 --> 00:46:55,309
branching and all manner of excess
complexity, of implementation complexity,
495
00:46:55,309 --> 00:46:56,879
if you will.
496
00:46:56,879 --> 00:47:01,450
Mic 4: Thanks.
Herald: So microphone 3, please?
497
00:47:01,450 --> 00:47:06,710
Mic 3: I wanted to ask you what you
thought about programming by manipulating
498
00:47:06,710 --> 00:47:13,859
the abstract syntax tree directly so as to
not allow incompilable programs because of
499
00:47:13,859 --> 00:47:18,410
some, like, you're missing a semicolon or
something like that. What's your thoughts
500
00:47:18,410 --> 00:47:20,220
on that?
501
00:47:20,220 --> 00:47:23,700
Peter: So that's - in the grand scheme of
things, I think that's not a very big
502
00:47:23,700 --> 00:47:28,890
deal, right? So there's - people have
worked on structured editors for lordy
503
00:47:28,890 --> 00:47:33,309
knows how many years. It's not a big deal
because it's very easy for a compiler to
504
00:47:33,309 --> 00:47:38,569
detect that kind of thing. And even more,
it's very easy for a compiler of a
505
00:47:38,569 --> 00:47:43,140
sensibly designed language to detect type
errors, even for a very sophisticated type
506
00:47:43,140 --> 00:47:49,589
system. So I don't think that - that's not
- I mean, some people might find it
507
00:47:49,589 --> 00:47:52,930
helpful, but I don't think it's getting to
the heart of the matter.
508
00:47:52,930 --> 00:47:54,250
Mic 3: Thanks.
509
00:47:54,250 --> 00:47:59,850
Herald: So we've got some questions
from our signal angels from the IRC.
510
00:47:59,850 --> 00:48:06,150
Signal angel: Yes. There's one question.
It's about the repository for Isabelle and
511
00:48:06,150 --> 00:48:11,300
HOL proofs. It should be on source forge,
and you said there are no open source
512
00:48:11,300 --> 00:48:15,480
repositories. What about them?
513
00:48:15,480 --> 00:48:20,589
Peter: I'm not quite sure what the
question is, really. So there's a
514
00:48:20,589 --> 00:48:28,230
repository of Isabelle formal proofs which
is the archival form of proofs, it's
515
00:48:28,230 --> 00:48:34,790
called. I didn't really mean to say there
are no open source repositories and in
516
00:48:34,790 --> 00:48:38,500
fact I don't know under what conditions
most of those proofs are licensed. They
517
00:48:38,500 --> 00:48:43,210
probably are open. But there isn't an open
source community building these things,
518
00:48:43,210 --> 00:48:47,710
right? It's still pretty
much an academic pursuit.
519
00:48:47,710 --> 00:48:50,329
Herald: Microphone 2, please.
520
00:48:50,329 --> 00:48:55,530
Mic 2: Hello. Thanks again for your talk.
I just want to add something that you
521
00:48:55,530 --> 00:49:01,970
didn't address, and that's that we
actually need more good studies in
522
00:49:01,970 --> 00:49:07,970
software engineering. We often hear a lot
of experts and also very well-known
523
00:49:07,970 --> 00:49:11,740
computer scientists say things like,
"Well, we just need to do functional
524
00:49:11,740 --> 00:49:17,839
programming and OOP is bad and stuff like
that", which I think there's a lot of
525
00:49:17,839 --> 00:49:24,109
truth to it, but we actually need studies
where we test these kinds of claims that
526
00:49:24,109 --> 00:49:29,530
we make, because when you look at other
fields which it also did compare to, like,
527
00:49:29,530 --> 00:49:34,250
medicine, if somebody comes around and
is well-known and says things like,
528
00:49:34,250 --> 00:49:42,230
"homeopathy works", we don't believe them.
We do trials, we do good trials. And
529
00:49:42,230 --> 00:49:50,430
there's a lot of myths out there, like, or
not well tested things, like "hire the
530
00:49:50,430 --> 00:49:54,739
best programmers, they are 100 times
more productive", "do steady type
531
00:49:54,739 --> 00:50:00,430
systems", and stuff like that. And we need
to verify that this is true, that this
532
00:50:00,430 --> 00:50:02,210
helps. And it's...
533
00:50:02,210 --> 00:50:07,490
Peter: Okay. So in the grand scheme of
things, arguably you're right. In the
534
00:50:07,490 --> 00:50:12,739
grandest scheme of things, computer
science is actually a branch of psychology
535
00:50:12,739 --> 00:50:18,609
or really sociology. We are trying to
understand how large groups of people can
536
00:50:18,609 --> 00:50:25,839
combine to make things which are insanely
complicated. Now, for - would it be good
537
00:50:25,839 --> 00:50:31,329
if we had, you know, solid evidence that
programming in this language was better
538
00:50:31,329 --> 00:50:38,450
than programming in that language? Well,
yes, but it's staggeringly difficult and
539
00:50:38,450 --> 00:50:45,079
expensive to do honest experiments of that
nature and they're scarcely ever done,
540
00:50:45,079 --> 00:50:49,040
right? You might do little experiments on
little groups of students but it's really
541
00:50:49,040 --> 00:50:55,930
difficult. And then some of these things
which I'm saying, when one is familiar
542
00:50:55,930 --> 00:51:01,740
with the different possible options, are
just blindingly obvious. I mean, there are
543
00:51:01,740 --> 00:51:09,161
reasons why these people are using OCaml
for their system programming, right? It's
544
00:51:09,161 --> 00:51:16,609
not - you know, it's not - "Homeopathy is
right", but "Homeopathy is wrong" which is
545
00:51:16,609 --> 00:51:21,230
the kind of argument being put forward.
546
00:51:21,230 --> 00:51:24,260
Herald: Question from
microphone 5, please.
547
00:51:24,260 --> 00:51:29,609
Mic 5: So where are you using ECC
memory for your testing - up here.
548
00:51:29,609 --> 00:51:33,510
Peter: When you say up here,
there's a bit of ambiguity.
549
00:51:33,510 --> 00:51:34,819
Mic 5: Here.
550
00:51:34,819 --> 00:51:37,860
Peter: Thank you. Say again?
551
00:51:37,860 --> 00:51:43,119
Mic 5: So where are you using ECC memory
for the testing you showed about the
552
00:51:43,119 --> 00:51:51,070
results of the multithreaded results due
to memory barriers and memory reorderings?
553
00:51:51,070 --> 00:51:55,450
Peter: Well...
554
00:51:55,450 --> 00:52:00,130
Mic 5: Okay, but even if you were or you
were not, the point I want to make is that
555
00:52:00,130 --> 00:52:05,950
you can expect about 1 bit flip each
minute in a modern computer system that
556
00:52:05,950 --> 00:52:13,670
may completely change what your software
is doing, so perhaps we also have to look
557
00:52:13,670 --> 00:52:19,180
in ways how we can work if something goes
wrong at the very low level so that we
558
00:52:19,180 --> 00:52:25,010
kind of have a check against our
specification on a more higher level of
559
00:52:25,010 --> 00:52:31,940
our software. So it is valuable to do good
engineering on the low levels, but still I
560
00:52:31,940 --> 00:52:37,670
think we will not solve the problems of
computation and computer engineering just
561
00:52:37,670 --> 00:52:42,520
by proving things in the mathematical
domain because computers are physical
562
00:52:42,520 --> 00:52:47,739
entities. They have errors and so on and
we really have to deal with them as well.
563
00:52:47,739 --> 00:52:54,340
Peter: So it's certainly true that there
are such random errors. In fact, I would
564
00:52:54,340 --> 00:52:59,480
put the point that I would argue that you
have to be able to model the physics well
565
00:52:59,480 --> 00:53:03,680
enough and you have to be able to model
the statistics of those errors well
566
00:53:03,680 --> 00:53:07,750
enough, so that's a different kind of
mathematics. And it's certainly valuable
567
00:53:07,750 --> 00:53:13,290
and necessary. But if you look at the
statistic you just quoted, is that the
568
00:53:13,290 --> 00:53:22,109
main cause of why systems go wrong? Except
possibly for space hardware, no. So
569
00:53:22,109 --> 00:53:26,499
important, yes. The most important thing
that we should be paying attention to? I
570
00:53:26,499 --> 00:53:29,940
don't really think so.
571
00:53:29,940 --> 00:53:32,980
Herald: Microphone 6, please?
572
00:53:32,980 --> 00:53:41,200
Mic 6: Hi. I really think that what you're
proposing to specify and verify more key
573
00:53:41,200 --> 00:53:47,160
components is - would be a valuable
addition, but how do you make sure that
574
00:53:47,160 --> 00:53:51,880
your specification actually matches the
behaviour that you want to do or to have?
575
00:53:51,880 --> 00:53:58,300
Peter: So as I described, we do as partial
validation of those specifications, we do
576
00:53:58,300 --> 00:54:03,050
a lot of testing against the
implementations, against a range of
577
00:54:03,050 --> 00:54:07,060
existing implementations. That's one
source of validation. Another source of
578
00:54:07,060 --> 00:54:11,520
validation is that you talk to the
architects and the designers. You want the
579
00:54:11,520 --> 00:54:16,929
internal structure to match their intent.
You want it to be comprehensible to them.
580
00:54:16,929 --> 00:54:21,510
Another kind of validation that we do is
prove properties about them. So we proved
581
00:54:21,510 --> 00:54:28,750
that you can correctly compile from a C11
concurrency, a mathematical model of that,
582
00:54:28,750 --> 00:54:35,432
to a IBM Power concurrency. And that kind
of proof gives you a bit more level of
583
00:54:35,432 --> 00:54:39,450
assurance in both. So none of this is
giving you a total guarantee. You
584
00:54:39,450 --> 00:54:43,980
certainly don't claim a total guarantee.
But it gives you pretty good levels of
585
00:54:43,980 --> 00:54:47,880
assurance by normal standards.
586
00:54:47,880 --> 00:54:49,580
Herald: Mic 4?
587
00:54:49,580 --> 00:54:56,309
Mic 4: Yes. Thanks again. You proposed
random tests, and with my expertise, it's
588
00:54:56,309 --> 00:55:02,859
very annoying if you have tests where the
outcome is sometimes a failure and you
589
00:55:02,859 --> 00:55:10,499
want to have reproducible tests to fix a
bug. So how do we bring this aspects to
590
00:55:10,499 --> 00:55:17,850
test more variety in data and to have it
reproducible together?
591
00:55:17,850 --> 00:55:24,800
Peter: Well, if - as I was mentioning, for
the TCP thing, one of the - so the problem
592
00:55:24,800 --> 00:55:29,150
is reproducibility is exactly this
internal non-determinism, right? The
593
00:55:29,150 --> 00:55:33,389
system makes this scheduling choice or
what have you and you can't see what it
594
00:55:33,389 --> 00:55:39,599
is, or the checker can't see what it is.
So one way to do that is to really design
595
00:55:39,599 --> 00:55:45,369
the protocols in a different way to expose
that non-determinism. The other fact about
596
00:55:45,369 --> 00:55:53,421
this kind of general purpose specification
as test oracle idea is that in some sense,
597
00:55:53,421 --> 00:55:59,240
it doesn't have to be reproducible. Right?
The specification is giving a yes-no
598
00:55:59,240 --> 00:56:05,510
answer for an arbitrary test. And that
means that you can use arbitrary tests.
599
00:56:05,510 --> 00:56:09,730
Figuring out the root causes of the
differences afterwards may be tricky, but
600
00:56:09,730 --> 00:56:13,200
you can use them for assurance.
601
00:56:13,200 --> 00:56:18,489
Herald: So we now got time for two
more questions. Microphone 1, please?
602
00:56:18,489 --> 00:56:23,069
Mic 1: Hiya. Thanks for a great talk. So
what you've described seems to be a middle
603
00:56:23,069 --> 00:56:28,259
ground between a full-on, formal
verification and more practical testing,
604
00:56:28,259 --> 00:56:34,309
like, in between. So where do you see in
the future where formal verifications will
605
00:56:34,309 --> 00:56:39,831
go? Will they converge to the middle or
whether it will still be there just to
606
00:56:39,831 --> 00:56:44,299
verify something that's more
well-specified?
607
00:56:44,299 --> 00:56:49,940
Peter: So the progress of sort of the
whole subject of formal verification in
608
00:56:49,940 --> 00:56:56,930
general, if you look at that over the last
10 years or so, it's been very - it's been
609
00:56:56,930 --> 00:57:01,349
really amazing compared with what we could
do in the 80s and 90s and early 2000s,
610
00:57:01,349 --> 00:57:06,480
right? So the scope of things, the scale
of things which are - for which it is
611
00:57:06,480 --> 00:57:11,180
becoming feasible to do verification is
getting bigger. And I think that will
612
00:57:11,180 --> 00:57:16,339
continue. You know, I don't know when you
might see a completely verified TOR
613
00:57:16,339 --> 00:57:23,980
client, let's say, but that's not
inconceivable now. And 20 years ago, that
614
00:57:23,980 --> 00:57:29,470
would have been completely inconceivable.
So that is expanding and at the same time,
615
00:57:29,470 --> 00:57:36,829
I hope we see more of this interface, text
based oracle specification - and these -
616
00:57:36,829 --> 00:57:41,000
when you're doing formal verification of a
piece of the system, you have to have
617
00:57:41,000 --> 00:57:47,810
these indicators already defined, all
right? So these two fit very well together.
618
00:57:47,810 --> 00:57:51,489
Herald: So the last question
from microphone 2, please.
619
00:57:51,489 --> 00:57:56,040
Mic 2: Hi. You mentioned that you often
find bugs in hardware. Is there any effort
620
00:57:56,040 --> 00:58:00,310
to verify the transistors on chips
themselves as well?
621
00:58:00,310 --> 00:58:03,440
Peter: So there's a whole world of
hardware verification. That wasn't my
622
00:58:03,440 --> 00:58:10,640
topic today. And most of the big processor
vendors have teams working on this.
623
00:58:10,640 --> 00:58:14,440
Unsurprisingly, if you remember your
history, many of them have teams focusing
624
00:58:14,440 --> 00:58:21,059
on the floating-point behaviour of their
processors. And they're doing - they're
625
00:58:21,059 --> 00:58:26,981
also doing by the standards of ten years
ago, pretty spectacularly well. So there
626
00:58:26,981 --> 00:58:33,181
are companies that have the whole of some
execution unit formally verified. There's
627
00:58:33,181 --> 00:58:36,979
been a lot of work over the years on
verification of cache protocols and such
628
00:58:36,979 --> 00:58:43,100
like. Right? There's a lot of work on not
verifying the hardware as a whole, but
629
00:58:43,100 --> 00:58:47,519
verifying that, you know, the RTL-level
description of the hardware matches some
630
00:58:47,519 --> 00:58:54,060
lower level description. So
there is a lot of that going on.
631
00:58:54,060 --> 00:58:57,619
Herald: Thank you very much
again for this great talk.
632
00:58:57,619 --> 00:59:00,480
Give him another applause.
Thank you, Peter.
633
00:59:00,480 --> 00:59:01,790
Peter: Thank you.
634
00:59:01,790 --> 00:59:04,219
applause
635
00:59:04,219 --> 00:59:15,000
subtitles created by c3subtitles.de
Join, and help us!