WEBVTT
00:00:09.519 --> 00:00:14.479
applause
00:00:14.479 --> 00:00:16.930
Peter Sewell: Thank you for that
introduction and to the organisers for
00:00:16.930 --> 00:00:23.459
this opportunity to speak to you guys. So
my name is Peter Sewell. I'm an academic
00:00:23.459 --> 00:00:30.609
computer scientist and as such, I bear a
heavy burden of guilt for what we have to
00:00:30.609 --> 00:00:38.040
work with. Shared guilt, to be fair, but
guilt all the same. In many, many years of
00:00:38.040 --> 00:00:44.670
higher education, I know two things about
computers. Maybe you've figured them out
00:00:44.670 --> 00:00:55.000
slightly faster. The first thing, there's
an awful lot of them about. Shocking. And
00:00:55.000 --> 00:01:03.300
the second is that they go wrong. They go
wrong a lot. Really a lot. A lot. A lot!
00:01:03.300 --> 00:01:09.009
And they go wrong in many interestingly
different ways. So sometimes they go wrong
00:01:09.009 --> 00:01:17.850
in spectacular ways, right? With
fireworks. And here, we see...
00:01:17.850 --> 00:01:24.030
laughterapplause
00:01:24.030 --> 00:01:31.260
Here we see half a billion dollars worth
of sparkle. You don't get that every day.
00:01:31.260 --> 00:01:34.970
And so what was going on? So as I
understand it, what was going on was that
00:01:34.970 --> 00:01:40.600
the - this was the first flight of the
Ariane 5 launcher and it reused some of
00:01:40.600 --> 00:01:45.851
the guidance software from Ariane 4.
That's fine. Good, tested code. Should
00:01:45.851 --> 00:01:53.030
reuse it. But the launch profile for
Ariane 5 went sideways a bit more - on
00:01:53.030 --> 00:01:58.830
purpose - than the Ariane 5 launch. And as
a result, one of the sideways values in
00:01:58.830 --> 00:02:04.740
the guidance control software overflowed.
Bad. And hence, the guidance software
00:02:04.740 --> 00:02:08.449
decided that something bad was going wrong
and in order to not land rockets on
00:02:08.449 --> 00:02:15.841
people's heads, it kindly blew itself up.
Okay. So that's a spectacular. Sometimes
00:02:15.841 --> 00:02:21.200
they go wrong in insidious ways. And you
people probably know much more about that
00:02:21.200 --> 00:02:29.370
than me. Very insidious ways. Exploited by
bad people to our detriment. And they go
00:02:29.370 --> 00:02:36.990
wrong in insidious ways, exploited by our
kindly governments for our protection.
00:02:36.990 --> 00:02:44.959
Right? And these - there are many, many,
many different causes of these things but
00:02:44.959 --> 00:02:49.420
many of these causes are completely
trivial. You know, you have one line of
00:02:49.420 --> 00:02:56.150
code in the wrong place - something like
that, right? So that's very, very bad and
00:02:56.150 --> 00:03:01.040
you're all conscious of how bad that is.
But from my point of view, this is as
00:03:01.040 --> 00:03:06.959
nothing with the real disaster that we
have been facing and continue to face.
00:03:06.959 --> 00:03:14.950
Right? The real disaster is the enormous
waste of human life and energy and effort
00:03:14.950 --> 00:03:21.360
and emotion dealing with these crap
machines that we've built. Right? And it's
00:03:21.360 --> 00:03:25.489
hard to quantify that. I did a little
experiment a couple of months ago. I
00:03:25.489 --> 00:03:34.030
Googled up "Android problem" and "Windows
problem" and got about 300 million hits
00:03:34.030 --> 00:03:43.840
for each, right? Somehow indicative of the
amount of wrongness. So this should you
00:03:43.840 --> 00:03:49.790
seem kind of familiar to you. If you think
back - think back to when you were young.
00:03:49.790 --> 00:03:55.700
Maybe back in the 19th century when you
were young, and what were you doing then
00:03:55.700 --> 00:04:00.940
as a community of hackers and makers and
builders? Well, this was the full-on
00:04:00.940 --> 00:04:07.130
industrial revolution. You were building
stuff. You were building beautiful bridges
00:04:07.130 --> 00:04:14.739
of cast-iron and stone with numerous
arches and pillars. And every so often
00:04:14.739 --> 00:04:22.300
they would fall down. I believe someone
miscalculated the wind loading. And we
00:04:22.300 --> 00:04:30.630
would build magical steam engines to whisk
us away. And every so often, they would
00:04:30.630 --> 00:04:35.770
blow up. And that kind of thing, as you
may have noticed in the last hundred years
00:04:35.770 --> 00:04:41.810
- sorry - that kind of thing doesn't
happen very often any more, right? We have
00:04:41.810 --> 00:04:46.990
pretty good civil and mechanical
engineering. And what does it mean to have
00:04:46.990 --> 00:04:52.620
good civil and mechanical engineering? It
means that we know enough thermodynamics
00:04:52.620 --> 00:04:56.750
and materials science and quality control
management and all that kind of thing to
00:04:56.750 --> 00:05:01.830
model our designs before we build them and
predict how they're going to behave pretty
00:05:01.830 --> 00:05:10.750
well and with pretty high confidence,
right? I see an optimisation in this talk.
00:05:10.750 --> 00:05:15.340
For computer science, for computing
systems we can predict with 100 per cent
00:05:15.340 --> 00:05:20.450
certainty that they will not work.
Ah, time for lunch!
00:05:20.450 --> 00:05:26.509
applause
00:05:26.509 --> 00:05:33.330
But it's not a very satisfying answer,
really. So why is it so hard? Let me
00:05:33.330 --> 00:05:38.500
discuss a couple of the reasons why
computing is tricky. In some sense,
00:05:38.500 --> 00:05:45.430
trickier than steam engines. The first
reason is that there is just too much
00:05:45.430 --> 00:05:50.360
code. Right? So this is one of the more
scary graphs I've found on the internet
00:05:50.360 --> 00:05:55.470
recently. This is a measure of code in
hundreds of millions of lines. Each of
00:05:55.470 --> 00:06:00.250
these blocks I think is 10 million lines
of code. And in the little letters - you
00:06:00.250 --> 00:06:04.860
might not be able to see we have Android
and the Large Hadron Collider and Windows
00:06:04.860 --> 00:06:12.820
Vista and Facebook, US Army Future Combat
System, Mac OS and software in a high-end
00:06:12.820 --> 00:06:15.010
car. Right?
00:06:15.010 --> 00:06:23.230
laughterapplause
00:06:23.230 --> 00:06:25.320
I'm flying home.
00:06:25.320 --> 00:06:27.810
laughter
00:06:27.810 --> 00:06:33.520
So we that's a bit of a problem, really.
We're never going to get that right. Then
00:06:33.520 --> 00:06:37.410
there's too much legacy complexity there,
right? Really too much. So as
00:06:37.410 --> 00:06:41.840
right-minded, pure-minded people you might
think that software and hardware should be
00:06:41.840 --> 00:06:46.490
architected beautifully with regular
structure a bit like a Mies van der Rohe
00:06:46.490 --> 00:06:53.660
skyscraper or something. Clean. Okay. You
know that's not realistic, right? You
00:06:53.660 --> 00:06:59.470
might expect a bit of baroqueness. You
might expect maybe something more like
00:06:59.470 --> 00:07:02.680
this. You know, it's a bit twiddly but it
still hangs together. It's still got
00:07:02.680 --> 00:07:08.160
structure. It still works. But then you
stop and you think and you realise that
00:07:08.160 --> 00:07:15.770
software and hardware is built by very
smart people and usually in big groups and
00:07:15.770 --> 00:07:20.300
subject to a whole assortment of rather
intense commercial pressure and related
00:07:20.300 --> 00:07:25.210
open-source pressure and using the best
components and tools that they know and
00:07:25.210 --> 00:07:27.169
they like.
00:07:27.169 --> 00:07:34.520
laughterapplause
00:07:34.520 --> 00:07:38.990
Someone from the room: (Indistinct)
You forgot management!
00:07:38.990 --> 00:07:42.480
And the best management that
human beings can arrange.
00:07:42.480 --> 00:07:49.680
laughterapplause
00:07:49.680 --> 00:07:54.910
So we end up with something
spectacularly ingenious.
00:07:54.910 --> 00:08:05.490
laughterapplause
00:08:05.490 --> 00:08:09.730
You can see here, if you look
closely, you can see all the parts.
00:08:09.730 --> 00:08:11.000
laughter
00:08:11.000 --> 00:08:15.770
This was just a found picture from the
internet. But you can see there's a bunch
00:08:15.770 --> 00:08:19.060
of, you know, transparent sort of
silicon-like stuff down here. Let's say
00:08:19.060 --> 00:08:25.079
that's our hardware. And over here, I see
a C compiler with all of its phases.
00:08:25.079 --> 00:08:28.599
laughter
00:08:28.599 --> 00:08:36.930
And these bottles, that's the file system,
I reckon. And there's a TCP stack in here
00:08:36.930 --> 00:08:43.690
somewhere. And there's a piece that
belongs to the NSA, obviously. And up
00:08:43.690 --> 00:08:50.259
here, I think up here - yeah, there's a
JavaScript engine and a TLS stack and a
00:08:50.259 --> 00:08:56.050
web browser. And then perched at the top
is the software that most people use to
00:08:56.050 --> 00:08:57.910
talk to their banks.
00:08:57.910 --> 00:09:01.489
laughter
00:09:01.489 --> 00:09:09.489
It's just - it's just insane, right? And
then we have to look at how we build these
00:09:09.489 --> 00:09:14.089
pieces, right? So we build them
fundamentally by a process of trial and
00:09:14.089 --> 00:09:20.649
error, right? Just occasionally we write
some specifications in text. And then we
00:09:20.649 --> 00:09:26.029
write some code and maybe we write a few
ad hoc texts and then we test and fix
00:09:26.029 --> 00:09:30.610
until the thing is marketable and then we
test and fix and extend until the thing is
00:09:30.610 --> 00:09:36.069
no longer marketable and then we keep on
using it until we can't anymore. Right? So
00:09:36.069 --> 00:09:40.259
the fundamental process that we're using
to develop these systems is trial and
00:09:40.259 --> 00:09:47.930
error by exploring particular execution
paths of the code on particular inputs.
00:09:47.930 --> 00:09:54.459
But - you know this, but I want to
highlight - there are too many. The number
00:09:54.459 --> 00:09:59.649
of execution paths scales typically at
least exponentially with the size of the
00:09:59.649 --> 00:10:04.490
code and the number of states scales
exponentially with the size of the data
00:10:04.490 --> 00:10:08.790
that you're dealing with and the number of
possible inputs is also shockingly large.
00:10:08.790 --> 00:10:14.920
There is no way that we can do this and we
should perhaps stop deluding ourselves
00:10:14.920 --> 00:10:21.129
that there is. And then the most
fundamental reason that computers are hard
00:10:21.129 --> 00:10:27.899
is that they are too discrete, right? If
you take a bridge or, I don't know, a
00:10:27.899 --> 00:10:36.519
chair or something - where's something
slightly bendable? You take a chair and
00:10:36.519 --> 00:10:44.199
you exert a bit more force on it than it's
used to - I'm not very strong - and it
00:10:44.199 --> 00:10:51.299
bends a bit. It deforms continuously. If
you take a bridge and you tighten up one
00:10:51.299 --> 00:10:55.239
of the bolts a bit too much or you
underspecify one of the girders by 10 per
00:10:55.239 --> 00:11:01.019
cent, occasionally you'll reach a point of
catastrophic failure but usually it will
00:11:01.019 --> 00:11:04.089
just be a little bit weaker or a little
bit stronger or a little bit wibbly.
00:11:04.089 --> 00:11:08.960
Right? But computer systems, you change a
bit - sometimes it doesn't make a
00:11:08.960 --> 00:11:13.179
difference but disturbingly often you
change a line of code - as in system of
00:11:13.179 --> 00:11:19.600
those bugs that we were talking about -
the whole thing becomes broken, right? So
00:11:19.600 --> 00:11:26.929
it's quite different from that traditional
engineering. It's really different. Okay.
00:11:26.929 --> 00:11:34.850
So, "A New Dawn", this thing is titled.
What can we do? What might we do? So I'm
00:11:34.850 --> 00:11:40.879
going to talk about several different
possibilities, all right? One thing, we
00:11:40.879 --> 00:11:46.490
can do more of the stuff which is
traditionally called software engineering.
00:11:46.490 --> 00:11:50.910
It's not really engineering in my book,
but some of it's good. I mean, it's
00:11:50.910 --> 00:11:55.119
useful, you know? We can do more testing
and have more assertions in our code and
00:11:55.119 --> 00:11:58.920
better management maybe and all that kind
of stuff. We can do all that - we should
00:11:58.920 --> 00:12:04.339
do all that, but it's not going to make a
very big change, all right? It's not
00:12:04.339 --> 00:12:13.989
addressing any of those root causes of the
problem. What else could we do? So as an
00:12:13.989 --> 00:12:18.570
academic computer scientist, I've devoted
several years of my life to working in the
00:12:18.570 --> 00:12:24.509
area of programming languages and I look
at the programming languages used out
00:12:24.509 --> 00:12:36.919
there in the world today. Oh, it's just
disgusting! It's shocking.
00:12:36.919 --> 00:12:43.399
applause
00:12:43.399 --> 00:12:50.420
The reasons that languages become popular
and built into our shared infrastructure,
00:12:50.420 --> 00:12:54.790
the reasons for that have almost nothing
to do with how well those languages are
00:12:54.790 --> 00:13:00.600
designed and in fact whether
they are designed at all, right?
00:13:00.600 --> 00:13:04.110
applause
00:13:04.110 --> 00:13:09.040
So I didn't really want to get too much
hate mail after this talk so I'm going to
00:13:09.040 --> 00:13:13.999
try and avoid naming particular languages
as much as I can, but I encourage you to
00:13:13.999 --> 00:13:20.279
think away as I speak and imagine the
examples. No, that was - I didn't want to
00:13:20.279 --> 00:13:23.730
get much hate mail. Sorry.
00:13:23.730 --> 00:13:25.160
laughter
00:13:25.160 --> 00:13:26.579
So...
00:13:26.579 --> 00:13:32.069
laughterapplause
00:13:32.069 --> 00:13:36.540
So at the very least, we could use
programming languages based on ideas from
00:13:36.540 --> 00:13:47.389
the 1970s instead of the 1960s. I mean,
really, come on. Right? You know, back in
00:13:47.389 --> 00:13:54.120
'67 or '69 or whatever we had ECPL and C
and only a couple of years later we had
00:13:54.120 --> 00:14:00.069
languages that guaranteed type and memory
safety and enforcement of abstraction
00:14:00.069 --> 00:14:05.089
boundaries and gave us rich mechanisms for
programming so we could say what we mean
00:14:05.089 --> 00:14:11.350
and not futz about with 23 million
pointers, about 1 per cent of which were
00:14:11.350 --> 00:14:16.629
completely wrong, right? So for any
particular - in the context of any
00:14:16.629 --> 00:14:20.459
particular project or any particular
existing legacy infrastructure there may
00:14:20.459 --> 00:14:25.789
be very good reasons why one just can't
make a switch. Right? But for us
00:14:25.789 --> 00:14:31.860
collectively, there is no excuse. It's
just wretched, right?
00:14:31.860 --> 00:14:37.040
applause
00:14:37.040 --> 00:14:40.969
It's completely bonkers. And then the
other thing that you see as a programming
00:14:40.969 --> 00:14:49.239
language researcher is that - er, well, to
make another analogy, it looks as if
00:14:49.239 --> 00:14:55.529
anyone who was capable of driving a car
considered themselves an expert on car
00:14:55.529 --> 00:15:03.559
design, right? There is in fact a rather
well-established subject of, you know,
00:15:03.559 --> 00:15:08.239
programming language design and we know
how to specify completely precisely,
00:15:08.239 --> 00:15:13.019
mathematically precisely not just the
syntax like we had in these BNF grammars
00:15:13.019 --> 00:15:18.160
from the 1960s and still have, but also
the behaviour of these things. We know how
00:15:18.160 --> 00:15:22.699
to specify a few operational semantics and
the type systems. And we know how to do
00:15:22.699 --> 00:15:27.360
mathematical proofs that our language
designs are self-consistent, that they do
00:15:27.360 --> 00:15:35.059
guarantee good properties of arbitrary
well-typed programs that you write, right?
00:15:35.059 --> 00:15:38.759
We can do this now. We can do this, if we
have to, post hoc. And people are doing
00:15:38.759 --> 00:15:45.669
this for JavaScript and C and god help
them, PHP and other such languages. But
00:15:45.669 --> 00:15:50.829
even better, we can do it at design time.
And if you do this at design time, you see
00:15:50.829 --> 00:15:55.509
the structure. You have to - the act of
writing that specification forces you to
00:15:55.509 --> 00:15:59.939
consider all the cases and all the
interactions between features. You can
00:15:59.939 --> 00:16:07.861
still get it wrong, but you get it wrong
in more interesting ways. Right? So people
00:16:07.861 --> 00:16:12.119
are starting to do this now. So I think
there was a talk on the first day of this
00:16:12.119 --> 00:16:17.429
by Hannes and David who are building
system software in maybe not the best sys
00:16:17.429 --> 00:16:21.800
language you could possibly imagine, but
something which compared with C is as
00:16:21.800 --> 00:16:27.759
manna from heaven. Anil Madhavapeddy and
his colleagues in Cambridge have been
00:16:27.759 --> 00:16:32.759
working hard to build system software in
at least moderately modern languages. And
00:16:32.759 --> 00:16:40.780
it works. Sometimes you have to use C but
surprisingly often you really don't. Yeah,
00:16:40.780 --> 00:16:47.049
so I teach, as my introducer said, in the
University of Cambridge. And I teach
00:16:47.049 --> 00:16:51.269
semantics. And one of my fears is that I
will accidentally let someone out in the
00:16:51.269 --> 00:16:54.970
world who will accidentally find
themselves over a weekend inventing a
00:16:54.970 --> 00:16:58.149
little scripting language which will
accidentally take over the world and
00:16:58.149 --> 00:17:05.359
become popular and I won't have explained
to them what they have to do or even that
00:17:05.359 --> 00:17:08.189
there is a subject they have to
understand. And if you want to understand
00:17:08.189 --> 00:17:18.740
it, there's lots of stuff you can look at.
Okay. Let's go to the uppermost extreme.
00:17:18.740 --> 00:17:23.829
If we want software and hardware that
actually works, we should prove it
00:17:23.829 --> 00:17:30.700
correct. Right? So this is something which
academics, especially in the domain of
00:17:30.700 --> 00:17:36.830
labelled "formal methods" in the '70s and
'80s, they've been pushing this, promoting
00:17:36.830 --> 00:17:44.789
this as an idea and a promise and an
exhortation for decades, right? Why don't
00:17:44.789 --> 00:17:50.169
we just prove our programs correct? And
for some of those decades it wasn't really
00:17:50.169 --> 00:17:54.570
plausible. I remember back when I was
considerably younger, it was a big thing
00:17:54.570 --> 00:17:59.540
to proof that if you took a linked list
and reversed it twice, you got back the
00:17:59.540 --> 00:18:08.059
same thing. Right? That was hard, then.
But now, well, we can't do this for the
00:18:08.059 --> 00:18:14.419
Linux kernel, let's say, but we can
surprising often do this. And I just give
00:18:14.419 --> 00:18:19.590
you a couple of the examples of modern
day, state-of-the-art academic
00:18:19.590 --> 00:18:25.720
verification projects. People have
verified compilers all the way from C-like
00:18:25.720 --> 00:18:31.769
languages down to assembly or from ML-like
languages. You know, higher-order
00:18:31.769 --> 00:18:36.959
functional imperative languages all the
way down to binary models of how x86
00:18:36.959 --> 00:18:44.470
behaves. And people have verified LLVM
optimisation paths. There's verified
00:18:44.470 --> 00:18:49.880
software fault isolation which I believe
goes faster than the original non-verified
00:18:49.880 --> 00:18:56.130
form. Win for them. There's a verified
hypervisor from Nectar group in Australia.
00:18:56.130 --> 00:19:00.570
Right? Verified secure hypervisor with
proofs of security properties. There's any
00:19:00.570 --> 00:19:06.690
amount of work verifying crypto protocols,
sometimes with respect to assumptions - I
00:19:06.690 --> 00:19:12.510
think reasonable assumption on what the
underlying mathematics is giving you. So
00:19:12.510 --> 00:19:22.919
we can do this. I maybe have to explain a
bit what I mean by "prove". To some
00:19:22.919 --> 00:19:30.990
computery people, "I proved it works"
means "I tested it a little bit".
00:19:30.990 --> 00:19:32.510
laughter
00:19:32.510 --> 00:19:34.960
In extremis, "it compiled".
00:19:34.960 --> 00:19:44.120
laughterapplause
00:19:44.120 --> 00:19:50.770
To a program analysis person, you might
run some sophisticated tool but written in
00:19:50.770 --> 00:19:54.649
untrusted code and possibly doing an
approximate and maybe unsound analysis and
00:19:54.649 --> 00:19:59.720
you might find an absence of certain kinds
of bugs. That's tremendously useful; it's
00:19:59.720 --> 00:20:04.640
not what we would really call "proved".
I mean, nor would they, to be fair.
00:20:04.640 --> 00:20:08.970
Scientists generally don't use the word
because they know they're not doing it,
00:20:08.970 --> 00:20:13.559
right? They're doing controlled
experiments, which gives them, you know,
00:20:13.559 --> 00:20:20.070
information for or against some
hypothesis. Mathematician write proofs.
00:20:20.070 --> 00:20:25.139
You probably did that when you were young.
And those proofs are careful mathematical
00:20:25.139 --> 00:20:30.139
arguments usually written on paper with
pencils and their aim is to convince a
00:20:30.139 --> 00:20:33.389
human being that if that mathematician was
really nailed up against the wall and
00:20:33.389 --> 00:20:38.980
pushed, they could expand that into a
completely detailed proof. But what we
00:20:38.980 --> 00:20:47.750
have in computing, we don't have the rich
mathematical structure that these people
00:20:47.750 --> 00:20:51.950
are working with, we have a tremendous
amount of ad hoc and stupid detail mixed
00:20:51.950 --> 00:20:57.769
in which a smidgin of rich mathematical
structure. So - and we have hundreds of
00:20:57.769 --> 00:21:03.399
millions of lines of code. So this is just
not going to cut it. And if we want say
00:21:03.399 --> 00:21:08.440
the word "prove" ever, then the only thing
which is legitimate is to do honest
00:21:08.440 --> 00:21:13.230
program proof and that means you have to
have some system that machine-checks that
00:21:13.230 --> 00:21:17.540
your proof is actually a proof. And
sometime we'll also have machines that
00:21:17.540 --> 00:21:22.250
will sort of help and sort of hinder that
process, right? And there's a variety of
00:21:22.250 --> 00:21:27.340
these systems. Coq and HOL4 and Isabelle4
- Isabelle/HOL and what have you that we
00:21:27.340 --> 00:21:36.799
can look up. So using these systems we can
prove nontrivial facts about these. Not
00:21:36.799 --> 00:21:41.530
necessarily that they do what you want
them to do, but that they meet the precise
00:21:41.530 --> 00:21:48.519
specification that we've written down. We
can do that. But it's tricky, right? So
00:21:48.519 --> 00:21:52.930
these projects, these are by academic
standards all quite big, you know? This is
00:21:52.930 --> 00:21:58.009
something like, I don't know, 10 people
for a few years or something. By industry
00:21:58.009 --> 00:22:03.399
scale, maybe not that big. By your scale,
maybe not that big. But still a lot of
00:22:03.399 --> 00:22:08.149
work for a research group to do. And quite
high-end work. It can take, you know,
00:22:08.149 --> 00:22:13.730
literally a few years to become really
fluent in using these tools, right? And
00:22:13.730 --> 00:22:20.139
there isn't as yet sort of an open-source,
collaborative movement doing this kind of
00:22:20.139 --> 00:22:26.710
stuff. Maybe it's the time to start. Maybe
in five years, ten years. I don't know. If
00:22:26.710 --> 00:22:35.680
you want a challenge, there's a challenge.
But it's really quite hard, right? It's
00:22:35.680 --> 00:22:40.830
not something that one can quite put
forward credibly as a solution for all
00:22:40.830 --> 00:22:46.529
software and hardware development, right?
So that leads me to, like an intermediate
00:22:46.529 --> 00:22:53.399
point. That should have been a four there.
What can we do which improves the quality
00:22:53.399 --> 00:22:58.340
of our system, which injects some
mathematical rigour but which involves
00:22:58.340 --> 00:23:06.370
less work and is kind of easy? And what we
can do is take some of these interfaces
00:23:06.370 --> 00:23:11.679
and be precise about what they are. And
initially, we have to do that after the
00:23:11.679 --> 00:23:16.941
fact. We have to reverse-engineer good
specs of existing stuff. But then we can
00:23:16.941 --> 00:23:23.470
use the same techniques to make much
cleaner and better-tested and things which
00:23:23.470 --> 00:23:30.670
are easier to test in the future. That's
the idea. So my colleagues and I have been
00:23:30.670 --> 00:23:35.360
doing this kind of stuff for the last
quite a few years and I just want to give
00:23:35.360 --> 00:23:41.570
you just a - two little vignettes just to
sort of show you how it goes. So I can't
00:23:41.570 --> 00:23:46.759
give you much detail. And this is,
obviously, joint work with a whole bunch
00:23:46.759 --> 00:23:55.399
of very smart and good people, some of
which I name here. So this is not me, this
00:23:55.399 --> 00:24:00.840
is a whole community effort. So I'm going
to talk for a little bit about the
00:24:00.840 --> 00:24:05.799
behaviour of multiprocessors, so at that
hardware interface. And I'm going to talk
00:24:05.799 --> 00:24:11.090
a tiny little bit about the behaviour of
the TCP and socket API, right? Network
00:24:11.090 --> 00:24:17.169
protocols. So - and there'll be some
similarities and some differences between
00:24:17.169 --> 00:24:24.990
these two things. So multiprocessors. You
probably want your computers to go fast.
00:24:24.990 --> 00:24:31.090
Most people do. And it's an obvious idea
to glom together, because processors don't
00:24:31.090 --> 00:24:35.529
go that fast, let's glom together a whole
bunch of processors and make them all talk
00:24:35.529 --> 00:24:46.150
to the same memory. This is not a new
idea. It goes back at least to 1962 when
00:24:46.150 --> 00:24:51.380
the Burroughs D825 had, I think, two
processors talking to a single shared
00:24:51.380 --> 00:24:55.850
memory. It had outstanding features
including truly modular hardware with
00:24:55.850 --> 00:25:02.649
parallel processing throughout, and - some
things do not change very much - the
00:25:02.649 --> 00:25:11.640
complement of compiling languages was to
be expanded, right? 1962, so that'll be 52
00:25:11.640 --> 00:25:18.820
years ago. Deary me. Okay. Now, how do
these things behave? So let me show you
00:25:18.820 --> 00:25:22.899
some code, right? It's a hacker
conference. There should be code. Let me
00:25:22.899 --> 00:25:29.030
show you two example programs and these
will both be programs with two threads and
00:25:29.030 --> 00:25:34.570
they will both be acting on two shared
variables, X and Y. And in the initial
00:25:34.570 --> 00:25:42.899
state, X and Y are both zero. So, the
first program. On this thread we write X
00:25:42.899 --> 00:25:48.970
and then we read Y, and over here we write
Y and then we read X, right? Now, these
00:25:48.970 --> 00:25:53.629
are operating, you know, in an
interleaving concurrency, you might think,
00:25:53.629 --> 00:25:58.030
so there's no synchronisation between
those two threads so we might interleave
00:25:58.030 --> 00:26:03.269
them. We might run all of thread 0 before
all of thread 1 or all of thread 1 before
00:26:03.269 --> 00:26:10.340
all of thread 0 or they might be mixed in
like that or like that or like that or
00:26:10.340 --> 00:26:16.940
like that. There's six possible ways of
interleaving two lists of two things. And
00:26:16.940 --> 00:26:22.980
in all of those ways you never see in the
same execution that this reads from the
00:26:22.980 --> 00:26:30.169
initial state AND this reads from the
initial state. You never see that. So you
00:26:30.169 --> 00:26:35.479
might expect and you might assume in your
code that these are the possible outcomes.
00:26:35.479 --> 00:26:40.490
So what happens if you were to actually
run that code on my laptop in a particular
00:26:40.490 --> 00:26:46.970
test harness? Well, okay, you see
occasionally they're quite well
00:26:46.970 --> 00:26:52.499
synchronised and they both read the other
guy's right. Sorry, big call. They both
00:26:52.499 --> 00:26:57.510
read the other guy's right. And quite
often one thread comes completely before
00:26:57.510 --> 00:27:09.769
the other. But sometimes, sometimes they
both see 0. Huh. Rats. Well, that's
00:27:09.769 --> 00:27:18.840
interesting. Let me show you another
program. So now, thread 0 writes some
00:27:18.840 --> 00:27:23.440
stuff. Maybe it writes a big bunch of
data, maybe multi-word data. And then it
00:27:23.440 --> 00:27:28.249
writes, let's say that's a flag announcing
to some another thread that that data is
00:27:28.249 --> 00:27:33.249
ready to go now. And the other thread
busy-waits reading that value until it
00:27:33.249 --> 00:27:38.710
gets 1, until it sees the flag. And then
it reads the data. So you might want, as a
00:27:38.710 --> 00:27:43.610
programmer, you might want a guarantee
that this read will always see the data
00:27:43.610 --> 00:27:48.759
that you have initialised. Hey? That would
be like a message-passing kind of an
00:27:48.759 --> 00:27:56.229
idiom. So what happens if you run that?
Well, on the x86 processors that we've
00:27:56.229 --> 00:28:03.929
tested, you always see that value. Good.
This is a descent coding idiom on x86. All
00:28:03.929 --> 00:28:12.230
right. On Arma IBM Power processors,
however, you see sometimes you just ignore
00:28:12.230 --> 00:28:16.850
the value written and you see the initial
state. So this is not a good
00:28:16.850 --> 00:28:25.539
message-passing idiom, right? So what's
going on? Well, these behaviours, it might
00:28:25.539 --> 00:28:31.260
be surprising, eh? And when you see
surprising behaviour in hardware, you have
00:28:31.260 --> 00:28:36.100
two choices. Either it's a bug in the
hardware, and we have found a number of
00:28:36.100 --> 00:28:45.259
bugs in hardware. It's always - it's very
rewarding. Or it's a bug in your
00:28:45.259 --> 00:28:51.350
expectation. Or it's a bug in your test
harness. Right? So what you do is you walk
00:28:51.350 --> 00:28:58.050
along and you walk up to your friendly
processor architect in IBM or ARM or x86
00:28:58.050 --> 00:29:04.770
land, and we have worked with all of these
people. And you ask them, "Is this a bug?"
00:29:04.770 --> 00:29:12.360
And a processor architect is a person who
is - has the authority to decide whether
00:29:12.360 --> 00:29:16.240
some behaviour is intended or is a bug.
00:29:16.240 --> 00:29:17.750
laughter
00:29:17.750 --> 00:29:24.669
That's what they do, essentially. And for
these, they say, "Oh, we meant it to be
00:29:24.669 --> 00:29:29.239
like that". Right? No question. "We meant
it to be like that. That's perfectly
00:29:29.239 --> 00:29:36.559
proper. We have, because you and everybody
else wanted their computers to go fast, we
00:29:36.559 --> 00:29:41.690
as processor designers have introduced all
manner of sophisticated optimizations,
00:29:41.690 --> 00:29:45.200
which if you were running sequential code,
you would never notice, but if you start
00:29:45.200 --> 00:29:49.289
running fancy concurrent code like this,
fancy concurrent code that isn't just
00:29:49.289 --> 00:29:56.049
trivially locked, you can notice", so this
is intentional behaviour. And if you want
00:29:56.049 --> 00:30:00.129
to write that code, like a mutual
exclusion algorithm or a message-passing
00:30:00.129 --> 00:30:06.460
algorithm or something, then you need to
insert special instructions, memory
00:30:06.460 --> 00:30:11.180
barrier instructions. And so you go away
and you say, "What do they do?" And you
00:30:11.180 --> 00:30:18.949
look up in the vendor documentation. And
you get stuff like this. I'll read it out:
00:30:18.949 --> 00:30:22.190
(Reads quickly) "For any applicable pair
AB, the memory barrier ensures that a will
00:30:22.190 --> 00:30:24.740
be performed with respect to any processor
or mechanism, to the extent required by
00:30:24.740 --> 00:30:27.590
the associated memory coherence required
at beauties, before b is performed with
00:30:27.590 --> 00:30:30.360
respect to that prior or mechanism. A
includes all applicable storage accesses
00:30:30.360 --> 00:30:33.090
by any such processor or mechanism that
have been performed with respect to P1
00:30:33.090 --> 00:30:36.210
before the memory barrier is created. B
includes all applicable storage accesses
00:30:36.210 --> 00:30:38.670
by any such processor or mechanism that
are performed after a Load instruction
00:30:38.670 --> 00:30:41.290
executed by that processor or mechanism
has returned the value stored by a store
00:30:41.290 --> 00:30:42.700
that is in B."
00:30:42.700 --> 00:30:49.850
applause
00:30:49.850 --> 00:30:58.630
Hands up if you understand what that
means? Hands up if you think that if you
00:30:58.630 --> 00:31:02.139
thought about it and read the rest of the
book, you would understand or you could
00:31:02.139 --> 00:31:09.730
understand what that means? Hands up the
people who think that we're doomed forever?
00:31:09.730 --> 00:31:12.920
laughter
00:31:12.920 --> 00:31:16.979
So I'm sorry to the first few groups,
but the last ones are right.
00:31:16.979 --> 00:31:22.299
laughterapplause
00:31:22.299 --> 00:31:28.850
This looks like it's really intricate and
really carefully thought out stuff. And we
00:31:28.850 --> 00:31:32.220
thought that for a while and we spent
literally years trying to make precise
00:31:32.220 --> 00:31:37.159
mathematical models that give precise
meaning to these words, but actually it
00:31:37.159 --> 00:31:45.460
can't be done. So what do you have to do
in that circumstance? You have to go away
00:31:45.460 --> 00:31:52.120
and you have to invent some kind of a
model. And there's a - this is a really
00:31:52.120 --> 00:31:56.190
fundamental problem, that we - on the one
hand, we develop our software by this
00:31:56.190 --> 00:32:01.960
trial and error process, but on the other
hand are points like this. We have these
00:32:01.960 --> 00:32:06.470
loose specifications, supposedly, that
have to cover all manner of behaviour of
00:32:06.470 --> 00:32:10.730
many generations of processors that might
all behave the same way, written just in
00:32:10.730 --> 00:32:16.769
text. And we tell people they should write
"to the spec", but the only way they have
00:32:16.769 --> 00:32:22.450
of developing their code is to run it and
run particular executions on the
00:32:22.450 --> 00:32:26.960
particular hardware implementations that
they have, whose relationship to the spec
00:32:26.960 --> 00:32:34.740
is very elusive. We can't use those thick
books that you get or those quite weighty
00:32:34.740 --> 00:32:39.260
PDFs that you get these days from the
processor vendors to test programs. You
00:32:39.260 --> 00:32:43.970
can't feed a program into that thick book
and get output out. And you can't test
00:32:43.970 --> 00:32:47.429
processor implementations and you can't
prove anything and you can't even
00:32:47.429 --> 00:32:53.430
communicate between human beings, right?
So these things, really, they don't exist.
00:32:53.430 --> 00:32:58.500
So what can we do? Well, the best we can
do at this point in time is a bit of
00:32:58.500 --> 00:33:04.950
empirical science, right? So we can invent
some model off the top of our heads,
00:33:04.950 --> 00:33:09.649
trying to describe just the
programmer-visible behaviour, not the
00:33:09.649 --> 00:33:15.529
internal structure. And we can make a
tool. Because that's not prose now. Now we
00:33:15.529 --> 00:33:18.950
can - now it's stuff. It's real
mathematics and we can turn that into code
00:33:18.950 --> 00:33:23.200
and execute it. We can make tools that
take programs and execute it and - small
00:33:23.200 --> 00:33:28.280
programs - tell us the set of all
behaviours allowed by that model. And then
00:33:28.280 --> 00:33:33.759
we can compare those sets of behaviours
against the experimental observations. And
00:33:33.759 --> 00:33:38.360
you might have to fix the model and you
might find hard bugs. And then the model
00:33:38.360 --> 00:33:42.869
also has to make sense to the architect so
you can discuss with the architects what
00:33:42.869 --> 00:33:47.100
they intend and what their institution is,
and then you can also prove some other
00:33:47.100 --> 00:33:50.980
facts about it to give a bit more
assurance and then because you probably
00:33:50.980 --> 00:33:57.249
got this wrong the first couple of times,
you can go back. And this results or has
00:33:57.249 --> 00:34:04.509
resulted in models which are not
guaranteed to be correct, but they are
00:34:04.509 --> 00:34:08.639
effectively the de facto standard for
understanding the concurrency behaviour of
00:34:08.639 --> 00:34:14.300
these things, you know? And various people
use them. And just to give you a tiny
00:34:14.300 --> 00:34:18.570
snippet - I'll have to speed up a bit - a
tiny snippet of what the model looks like,
00:34:18.570 --> 00:34:22.969
it's basically just an abstract machine.
It's got some stuff for the threads that
00:34:22.969 --> 00:34:26.949
handles the programmer-visible speculative
execution, and some stuff for a storage
00:34:26.949 --> 00:34:32.739
subsystem that handles the fact that in
these architectures, memory rights can be
00:34:32.739 --> 00:34:38.270
propagated to different threads at
different times, right? And there's a -
00:34:38.270 --> 00:34:42.540
that model has a state which is just a
collection of some sets and lists and
00:34:42.540 --> 00:34:47.230
partial orders and a few other things
which I won't talk about. And it has some
00:34:47.230 --> 00:34:51.030
transitions, right? In any state, there
might be several possible transitions.
00:34:51.030 --> 00:34:54.699
This is just a sample. I'm not going to
explain all of this. But when you want to
00:34:54.699 --> 00:34:58.570
propagate a write to another thread, there
have to be some preconditions that you have to
00:34:58.570 --> 00:35:07.140
satisfy. And than there is an action. It's not
amazingly complicated, really. So this is text,
00:35:07.140 --> 00:35:13.319
but it's very precisely written text and it has
the great advantage that you can go through
00:35:13.319 --> 00:35:16.570
these bullet points with your friendly
architect and say, "Is this what you
00:35:16.570 --> 00:35:22.440
mean?" And they can think about it and say
yes or no. But for the actual definition,
00:35:22.440 --> 00:35:27.309
that's in mathematics but it's mathematics
that's quite close to code. I mean, it's
00:35:27.309 --> 00:35:32.380
terribly close to pure, functional code
with outside effects. And just to give you
00:35:32.380 --> 00:35:37.230
an idea, this is some of it and those are
three of those conditions in the real,
00:35:37.230 --> 00:35:42.660
honest, true version. Then we can take
this mathematics and because it's in a
00:35:42.660 --> 00:35:49.270
particular style, we can compile this into
actually OCaml and then OCaml byte code
00:35:49.270 --> 00:35:55.770
and then we can run it for batch jobs. But
then you can compile that OCaml byte code
00:35:55.770 --> 00:36:02.579
to JavaScript and glue on an user interface
and stick it into a web browser and then you
00:36:02.579 --> 00:36:07.760
have a web interface that lets people
explore this model. And that's also a
00:36:07.760 --> 00:36:16.190
necessary part of validating that it makes
sense. Okay. This is not rocket science.
00:36:16.190 --> 00:36:21.710
You've missed the talk for rocket science,
I'm sorry. All we're doing here is
00:36:21.710 --> 00:36:28.630
specifying the intended behaviour of a
system. Okay? That's - it's not very
00:36:28.630 --> 00:36:32.490
technical but it's unusual. And we happen
to be doing it in this mathematical
00:36:32.490 --> 00:36:37.049
language, but you could use in fact almost
any language so long as you understand
00:36:37.049 --> 00:36:40.170
what you're doing. What you have to
understand is that you're writing
00:36:40.170 --> 00:36:45.230
something which is not an implementation.
It is something which, given some observed
00:36:45.230 --> 00:36:49.810
behaviour, some arbitrary observed
behaviour, will be able to decide if it's
00:36:49.810 --> 00:36:54.580
allowed, if you want it to be allowed or
not, right? It has to be executed as a
00:36:54.580 --> 00:37:02.059
test oracle. The key question for getting
this to work is to understand how much
00:37:02.059 --> 00:37:05.730
non-determinism or loose specification
there is in the system that you're working
00:37:05.730 --> 00:37:08.890
with, right? So if everything is
completely deterministic, you can write a
00:37:08.890 --> 00:37:13.839
reference implementation in the cleanest
language of your choice and just compare
00:37:13.839 --> 00:37:19.270
the output of that and the output of the
real thing, right? But if there's more
00:37:19.270 --> 00:37:24.000
non-determinism, then you can't do that.
I'm going to have to abbreviate this a
00:37:24.000 --> 00:37:31.369
little tiny bit. So for the TCP network
protocols, there is more non-determinism.
00:37:31.369 --> 00:37:35.220
You know what the TCP is, yes? A protocol
that gives you sort of reliable
00:37:35.220 --> 00:37:41.720
connections, assuming that the world is
made of good people, right? You look at
00:37:41.720 --> 00:37:47.640
the standards for TCP, and they're the
same kind of wibbly text from the 1980s
00:37:47.640 --> 00:37:52.400
and you look at the implementations and
they are a ghastly mess. And you try and
00:37:52.400 --> 00:37:55.690
understand the relationship between the
two of them and you can't because the
00:37:55.690 --> 00:38:01.740
standard, again, is not the definition. It
doesn't define in all circumstances what
00:38:01.740 --> 00:38:07.010
behaviour is allowed and what isn't. So
again, we had to make up a specification
00:38:07.010 --> 00:38:14.310
in the first instance just of this host, a
single endpoint, and observing its
00:38:14.310 --> 00:38:18.510
sockets, API, calls and returns and its
behaviour on the wire, right? And when
00:38:18.510 --> 00:38:21.969
you've decided - and a few internal debug
events. And when you've decided what to
00:38:21.969 --> 00:38:26.619
observe, then observation is just a trace,
it's just a list of those events. And you
00:38:26.619 --> 00:38:34.750
have to be able to ask your spec "Is that
list admissible or not"? But there's a lot
00:38:34.750 --> 00:38:38.820
of non-determinism. Variation between the
implementations, variations within the
00:38:38.820 --> 00:38:45.650
implementations. And that's internal. It's
not announced in the API or on the wire
00:38:45.650 --> 00:38:50.980
maybe until much later when the
implementation picks a new window size or
00:38:50.980 --> 00:38:56.890
something. You can't tell until quite a
lot later in the trace what it's picked.
00:38:56.890 --> 00:39:02.740
And that makes the job of checking whether
a trace is admissible by the spec much
00:39:02.740 --> 00:39:08.559
harder than it has to be. Right? And if
you had designed the TCP protocol and
00:39:08.559 --> 00:39:14.470
those implementations for to be testable
in this very discriminating way, back in
00:39:14.470 --> 00:39:20.470
the 1980s when you were designing TCP
protocol, it would be much easier. And for
00:39:20.470 --> 00:39:25.650
new protocols, one should make sure that
you're doing this in this particular way.
00:39:25.650 --> 00:39:30.930
I don't think I want to talk about that
slide. That's just one of the rules of
00:39:30.930 --> 00:39:35.480
that specification. But the key fact about
that spec is the - well, we handled the
00:39:35.480 --> 00:39:40.210
real protocols for arbitrary inputs. But
it's structured not just for this
00:39:40.210 --> 00:39:46.240
executability as a test oracle, but it's
structured for clarity, not performance,
00:39:46.240 --> 00:39:49.750
which is scarcely ever what anyone ever
does, right? So it's a whole new kind of
00:39:49.750 --> 00:39:54.480
thing. And the testing is very
discriminating. So we found all manner of
00:39:54.480 --> 00:40:03.839
amusing and bizarre bugs which I think I
don't have time to talk about. Okay. So
00:40:03.839 --> 00:40:12.610
I've described three kinds of things that
we might do. The first thing - for
00:40:12.610 --> 00:40:18.800
goodness sake - I mean, it's just a
no-brainer. Just do it already. Everybody.
00:40:18.800 --> 00:40:29.670
All of you. All of you. Right? This middle
zone is a very interesting zone as far as
00:40:29.670 --> 00:40:33.920
I'm concerned, right? It's - out of what -
a place where we could get a very good
00:40:33.920 --> 00:40:39.220
benefit for not that much effort, right?
We can do it, if necessary, post hoc. We
00:40:39.220 --> 00:40:44.490
can also do it and even better, at design
time. We have to do this in a way that
00:40:44.490 --> 00:40:48.240
makes our executions testable as test
oracles, and another thing that that
00:40:48.240 --> 00:40:52.119
enables is completely random testing. When
you've got a test oracle, you can just
00:40:52.119 --> 00:40:58.210
feed in random goop. This is fuzzing, but
better fuzzing - feed in random goop and
00:40:58.210 --> 00:41:02.859
check at every point whether what the
behaviour that you're getting is allowable
00:41:02.859 --> 00:41:09.271
or not. And then they're written for
clarity, not for performance, and that
00:41:09.271 --> 00:41:13.619
means you can see what you're doing,
right? You can see the complexity. If you
00:41:13.619 --> 00:41:17.140
go ahead and you add some feature to your
protocol or your programming language or
00:41:17.140 --> 00:41:22.420
whatever and you're working just with text
specifications, then you can't see the
00:41:22.420 --> 00:41:24.759
interactions. You just chuck in a couple
of paragraphs and you think for a few
00:41:24.759 --> 00:41:30.210
minutes, right? And if you're lucky, you
make an implementation. But if you're
00:41:30.210 --> 00:41:35.309
writing a spec that has to cover all the
cases like this, the act of doing that
00:41:35.309 --> 00:41:41.150
encourages you to think or helps you think
about the excess complexity. And you might
00:41:41.150 --> 00:41:46.690
think that's too complex, I'm being silly,
I'll make it simpler. And it has to be
00:41:46.690 --> 00:41:51.630
usable for proof. So this, I think also is
pretty much a no-brainer. And it doesn't
00:41:51.630 --> 00:41:56.650
require great technical, you know,
mathematical skill either, right? Lots of
00:41:56.650 --> 00:42:01.760
people can do this. And then there's
option 3, best option, full-on formal
00:42:01.760 --> 00:42:07.930
verification of the key components. And
that is now also in reach. You can imagine
00:42:07.930 --> 00:42:16.309
secure systems made using actual verified
compilers and verified hypervisors with
00:42:16.309 --> 00:42:22.540
verified TLS stacks and you can imagine
making things out of those or making those
00:42:22.540 --> 00:42:27.870
and then making things out of those. And
maybe one should be thinking about that.
00:42:27.870 --> 00:42:34.070
So, maybe not appropriate for everything.
If you were writing Word, you would not
00:42:34.070 --> 00:42:38.599
wish to do these things. Probably, you're
not. But for this key infrastructure that
00:42:38.599 --> 00:42:46.420
we really depend on, we trust even though
it's not trustworthy, we have to do this.
00:42:46.420 --> 00:42:52.450
Is this a new dawn? I wonder if there's
anyone standing next to a light switch
00:42:52.450 --> 00:42:58.920
that can dim them for me. I didn't ask
them beforehand, so... If you think back
00:42:58.920 --> 00:43:05.540
the last 70-odd years, we started building
computers in around 1940. It's been pretty
00:43:05.540 --> 00:43:12.760
dark from the point of view of rigorous,
solid, reliable engineering, stuff that is
00:43:12.760 --> 00:43:21.760
actually trustworthy. Pretty dark, I would
say. Maybe, just maybe there's a tiny
00:43:21.760 --> 00:43:26.339
smidgen of light coming through the doors.
And we start to have a little bit of a
00:43:26.339 --> 00:43:31.109
clue and we start to get reusable models
of processor architectures and programming
00:43:31.109 --> 00:43:35.540
languages and network protocols and what
have you. It's just the beginnings of the
00:43:35.540 --> 00:43:39.790
analogue of that thermodynamics and
materials science and quality control
00:43:39.790 --> 00:43:47.819
management and what have you that we have
from mechanical engineering. And we've got
00:43:47.819 --> 00:43:57.200
no choice. If we don't, the next 75 years
is going to be a lot worse. You can just
00:43:57.200 --> 00:44:02.400
imagine, right? So I went to this - as I'm
sure many of you do - this great talk on
00:44:02.400 --> 00:44:08.370
some Apple boot loader exploit yesterday
which was relying on some feature that had
00:44:08.370 --> 00:44:14.900
been introduced in the early '80s and was
still present. And you can imagine in 100
00:44:14.900 --> 00:44:22.800
years from now, you can imagine as long as
human civilisation endures, the x86
00:44:22.800 --> 00:44:28.869
architecture and the socket API and all of
this stuff, it will be embedded in some
00:44:28.869 --> 00:44:34.099
monumental ghastly stack of
virtualisation layers forever.
00:44:34.099 --> 00:44:45.760
laughterapplause
00:44:45.760 --> 00:44:50.720
So I'd like to thank especially all of my
colleagues that have been working with me
00:44:50.720 --> 00:44:56.550
or not with me in these directions. I'd
like to thank our generous funders who
00:44:56.550 --> 00:45:01.389
support this stuff. I'd like to thank you
for your attention and I exhort you,
00:45:01.389 --> 00:45:04.850
sort it out.
00:45:04.850 --> 00:45:20.800
applause
00:45:20.800 --> 00:45:25.300
Herald: Thank you very much, Peter, for
those interesting insights to our
00:45:25.300 --> 00:45:31.269
programming. We have now time for a Q & A,
so those people who have to leave the
00:45:31.269 --> 00:45:39.970
room, please do so quietly and as fast as
possible, so we can go on and hear what
00:45:39.970 --> 00:45:51.270
the questions are. So we
start with microphone 4, please.
00:45:51.270 --> 00:45:58.790
Mic 4: Hello. Thanks for the talk and my
question is if you got an oracle of the
00:45:58.790 --> 00:46:04.270
software behaviour which can translate
every possible input to a correct output,
00:46:04.270 --> 00:46:08.410
how can this be less complex and
error-prone than the reference
00:46:08.410 --> 00:46:10.510
implementation?
00:46:10.510 --> 00:46:15.620
Peter: Good question. So the first point
is that this oracle doesn't have to
00:46:15.620 --> 00:46:22.010
produce the outputs. It only has to check
that the inputs and the outputs match up.
00:46:22.010 --> 00:46:26.560
And then the second point is that in
general it might have to have much of the
00:46:26.560 --> 00:46:33.221
same complexity, but by writing it to be
clear as opposed to to be fast, you may
00:46:33.221 --> 00:46:37.730
adopt a completely different structure,
right? So the structure of our TCP spec is
00:46:37.730 --> 00:46:43.540
organised by the behaviour that you see
for particular transitions. The structure
00:46:43.540 --> 00:46:49.160
of a real implementation has fast-path
code and lots of complicated intertwined
00:46:49.160 --> 00:46:55.309
branching and all manner of excess
complexity, of implementation complexity,
00:46:55.309 --> 00:46:56.879
if you will.
00:46:56.879 --> 00:47:01.450
Mic 4: Thanks.
Herald: So microphone 3, please?
00:47:01.450 --> 00:47:06.710
Mic 3: I wanted to ask you what you
thought about programming by manipulating
00:47:06.710 --> 00:47:13.859
the abstract syntax tree directly so as to
not allow incompilable programs because of
00:47:13.859 --> 00:47:18.410
some, like, you're missing a semicolon or
something like that. What's your thoughts
00:47:18.410 --> 00:47:20.220
on that?
00:47:20.220 --> 00:47:23.700
Peter: So that's - in the grand scheme of
things, I think that's not a very big
00:47:23.700 --> 00:47:28.890
deal, right? So there's - people have
worked on structured editors for lordy
00:47:28.890 --> 00:47:33.309
knows how many years. It's not a big deal
because it's very easy for a compiler to
00:47:33.309 --> 00:47:38.569
detect that kind of thing. And even more,
it's very easy for a compiler of a
00:47:38.569 --> 00:47:43.140
sensibly designed language to detect type
errors, even for a very sophisticated type
00:47:43.140 --> 00:47:49.589
system. So I don't think that - that's not
- I mean, some people might find it
00:47:49.589 --> 00:47:52.930
helpful, but I don't think it's getting to
the heart of the matter.
00:47:52.930 --> 00:47:54.250
Mic 3: Thanks.
00:47:54.250 --> 00:47:59.850
Herald: So we've got some questions
from our signal angels from the IRC.
00:47:59.850 --> 00:48:06.150
Signal angel: Yes. There's one question.
It's about the repository for Isabelle and
00:48:06.150 --> 00:48:11.300
HOL proofs. It should be on source forge,
and you said there are no open source
00:48:11.300 --> 00:48:15.480
repositories. What about them?
00:48:15.480 --> 00:48:20.589
Peter: I'm not quite sure what the
question is, really. So there's a
00:48:20.589 --> 00:48:28.230
repository of Isabelle formal proofs which
is the archival form of proofs, it's
00:48:28.230 --> 00:48:34.790
called. I didn't really mean to say there
are no open source repositories and in
00:48:34.790 --> 00:48:38.500
fact I don't know under what conditions
most of those proofs are licensed. They
00:48:38.500 --> 00:48:43.210
probably are open. But there isn't an open
source community building these things,
00:48:43.210 --> 00:48:47.710
right? It's still pretty
much an academic pursuit.
00:48:47.710 --> 00:48:50.329
Herald: Microphone 2, please.
00:48:50.329 --> 00:48:55.530
Mic 2: Hello. Thanks again for your talk.
I just want to add something that you
00:48:55.530 --> 00:49:01.970
didn't address, and that's that we
actually need more good studies in
00:49:01.970 --> 00:49:07.970
software engineering. We often hear a lot
of experts and also very well-known
00:49:07.970 --> 00:49:11.740
computer scientists say things like,
"Well, we just need to do functional
00:49:11.740 --> 00:49:17.839
programming and OOP is bad and stuff like
that", which I think there's a lot of
00:49:17.839 --> 00:49:24.109
truth to it, but we actually need studies
where we test these kinds of claims that
00:49:24.109 --> 00:49:29.530
we make, because when you look at other
fields which it also did compare to, like,
00:49:29.530 --> 00:49:34.250
medicine, if somebody comes around and
is well-known and says things like,
00:49:34.250 --> 00:49:42.230
"homeopathy works", we don't believe them.
We do trials, we do good trials. And
00:49:42.230 --> 00:49:50.430
there's a lot of myths out there, like, or
not well tested things, like "hire the
00:49:50.430 --> 00:49:54.739
best programmers, they are 100 times
more productive", "do steady type
00:49:54.739 --> 00:50:00.430
systems", and stuff like that. And we need
to verify that this is true, that this
00:50:00.430 --> 00:50:02.210
helps. And it's...
00:50:02.210 --> 00:50:07.490
Peter: Okay. So in the grand scheme of
things, arguably you're right. In the
00:50:07.490 --> 00:50:12.739
grandest scheme of things, computer
science is actually a branch of psychology
00:50:12.739 --> 00:50:18.609
or really sociology. We are trying to
understand how large groups of people can
00:50:18.609 --> 00:50:25.839
combine to make things which are insanely
complicated. Now, for - would it be good
00:50:25.839 --> 00:50:31.329
if we had, you know, solid evidence that
programming in this language was better
00:50:31.329 --> 00:50:38.450
than programming in that language? Well,
yes, but it's staggeringly difficult and
00:50:38.450 --> 00:50:45.079
expensive to do honest experiments of that
nature and they're scarcely ever done,
00:50:45.079 --> 00:50:49.040
right? You might do little experiments on
little groups of students but it's really
00:50:49.040 --> 00:50:55.930
difficult. And then some of these things
which I'm saying, when one is familiar
00:50:55.930 --> 00:51:01.740
with the different possible options, are
just blindingly obvious. I mean, there are
00:51:01.740 --> 00:51:09.161
reasons why these people are using OCaml
for their system programming, right? It's
00:51:09.161 --> 00:51:16.609
not - you know, it's not - "Homeopathy is
right", but "Homeopathy is wrong" which is
00:51:16.609 --> 00:51:21.230
the kind of argument being put forward.
00:51:21.230 --> 00:51:24.260
Herald: Question from
microphone 5, please.
00:51:24.260 --> 00:51:29.609
Mic 5: So where are you using ECC
memory for your testing - up here.
00:51:29.609 --> 00:51:33.510
Peter: When you say up here,
there's a bit of ambiguity.
00:51:33.510 --> 00:51:34.819
Mic 5: Here.
00:51:34.819 --> 00:51:37.860
Peter: Thank you. Say again?
00:51:37.860 --> 00:51:43.119
Mic 5: So where are you using ECC memory
for the testing you showed about the
00:51:43.119 --> 00:51:51.070
results of the multithreaded results due
to memory barriers and memory reorderings?
00:51:51.070 --> 00:51:55.450
Peter: Well...
00:51:55.450 --> 00:52:00.130
Mic 5: Okay, but even if you were or you
were not, the point I want to make is that
00:52:00.130 --> 00:52:05.950
you can expect about 1 bit flip each
minute in a modern computer system that
00:52:05.950 --> 00:52:13.670
may completely change what your software
is doing, so perhaps we also have to look
00:52:13.670 --> 00:52:19.180
in ways how we can work if something goes
wrong at the very low level so that we
00:52:19.180 --> 00:52:25.010
kind of have a check against our
specification on a more higher level of
00:52:25.010 --> 00:52:31.940
our software. So it is valuable to do good
engineering on the low levels, but still I
00:52:31.940 --> 00:52:37.670
think we will not solve the problems of
computation and computer engineering just
00:52:37.670 --> 00:52:42.520
by proving things in the mathematical
domain because computers are physical
00:52:42.520 --> 00:52:47.739
entities. They have errors and so on and
we really have to deal with them as well.
00:52:47.739 --> 00:52:54.340
Peter: So it's certainly true that there
are such random errors. In fact, I would
00:52:54.340 --> 00:52:59.480
put the point that I would argue that you
have to be able to model the physics well
00:52:59.480 --> 00:53:03.680
enough and you have to be able to model
the statistics of those errors well
00:53:03.680 --> 00:53:07.750
enough, so that's a different kind of
mathematics. And it's certainly valuable
00:53:07.750 --> 00:53:13.290
and necessary. But if you look at the
statistic you just quoted, is that the
00:53:13.290 --> 00:53:22.109
main cause of why systems go wrong? Except
possibly for space hardware, no. So
00:53:22.109 --> 00:53:26.499
important, yes. The most important thing
that we should be paying attention to? I
00:53:26.499 --> 00:53:29.940
don't really think so.
00:53:29.940 --> 00:53:32.980
Herald: Microphone 6, please?
00:53:32.980 --> 00:53:41.200
Mic 6: Hi. I really think that what you're
proposing to specify and verify more key
00:53:41.200 --> 00:53:47.160
components is - would be a valuable
addition, but how do you make sure that
00:53:47.160 --> 00:53:51.880
your specification actually matches the
behaviour that you want to do or to have?
00:53:51.880 --> 00:53:58.300
Peter: So as I described, we do as partial
validation of those specifications, we do
00:53:58.300 --> 00:54:03.050
a lot of testing against the
implementations, against a range of
00:54:03.050 --> 00:54:07.060
existing implementations. That's one
source of validation. Another source of
00:54:07.060 --> 00:54:11.520
validation is that you talk to the
architects and the designers. You want the
00:54:11.520 --> 00:54:16.929
internal structure to match their intent.
You want it to be comprehensible to them.
00:54:16.929 --> 00:54:21.510
Another kind of validation that we do is
prove properties about them. So we proved
00:54:21.510 --> 00:54:28.750
that you can correctly compile from a C11
concurrency, a mathematical model of that,
00:54:28.750 --> 00:54:35.432
to a IBM Power concurrency. And that kind
of proof gives you a bit more level of
00:54:35.432 --> 00:54:39.450
assurance in both. So none of this is
giving you a total guarantee. You
00:54:39.450 --> 00:54:43.980
certainly don't claim a total guarantee.
But it gives you pretty good levels of
00:54:43.980 --> 00:54:47.880
assurance by normal standards.
00:54:47.880 --> 00:54:49.580
Herald: Mic 4?
00:54:49.580 --> 00:54:56.309
Mic 4: Yes. Thanks again. You proposed
random tests, and with my expertise, it's
00:54:56.309 --> 00:55:02.859
very annoying if you have tests where the
outcome is sometimes a failure and you
00:55:02.859 --> 00:55:10.499
want to have reproducible tests to fix a
bug. So how do we bring this aspects to
00:55:10.499 --> 00:55:17.850
test more variety in data and to have it
reproducible together?
00:55:17.850 --> 00:55:24.800
Peter: Well, if - as I was mentioning, for
the TCP thing, one of the - so the problem
00:55:24.800 --> 00:55:29.150
is reproducibility is exactly this
internal non-determinism, right? The
00:55:29.150 --> 00:55:33.389
system makes this scheduling choice or
what have you and you can't see what it
00:55:33.389 --> 00:55:39.599
is, or the checker can't see what it is.
So one way to do that is to really design
00:55:39.599 --> 00:55:45.369
the protocols in a different way to expose
that non-determinism. The other fact about
00:55:45.369 --> 00:55:53.421
this kind of general purpose specification
as test oracle idea is that in some sense,
00:55:53.421 --> 00:55:59.240
it doesn't have to be reproducible. Right?
The specification is giving a yes-no
00:55:59.240 --> 00:56:05.510
answer for an arbitrary test. And that
means that you can use arbitrary tests.
00:56:05.510 --> 00:56:09.730
Figuring out the root causes of the
differences afterwards may be tricky, but
00:56:09.730 --> 00:56:13.200
you can use them for assurance.
00:56:13.200 --> 00:56:18.489
Herald: So we now got time for two
more questions. Microphone 1, please?
00:56:18.489 --> 00:56:23.069
Mic 1: Hiya. Thanks for a great talk. So
what you've described seems to be a middle
00:56:23.069 --> 00:56:28.259
ground between a full-on, formal
verification and more practical testing,
00:56:28.259 --> 00:56:34.309
like, in between. So where do you see in
the future where formal verifications will
00:56:34.309 --> 00:56:39.831
go? Will they converge to the middle or
whether it will still be there just to
00:56:39.831 --> 00:56:44.299
verify something that's more
well-specified?
00:56:44.299 --> 00:56:49.940
Peter: So the progress of sort of the
whole subject of formal verification in
00:56:49.940 --> 00:56:56.930
general, if you look at that over the last
10 years or so, it's been very - it's been
00:56:56.930 --> 00:57:01.349
really amazing compared with what we could
do in the 80s and 90s and early 2000s,
00:57:01.349 --> 00:57:06.480
right? So the scope of things, the scale
of things which are - for which it is
00:57:06.480 --> 00:57:11.180
becoming feasible to do verification is
getting bigger. And I think that will
00:57:11.180 --> 00:57:16.339
continue. You know, I don't know when you
might see a completely verified TOR
00:57:16.339 --> 00:57:23.980
client, let's say, but that's not
inconceivable now. And 20 years ago, that
00:57:23.980 --> 00:57:29.470
would have been completely inconceivable.
So that is expanding and at the same time,
00:57:29.470 --> 00:57:36.829
I hope we see more of this interface, text
based oracle specification - and these -
00:57:36.829 --> 00:57:41.000
when you're doing formal verification of a
piece of the system, you have to have
00:57:41.000 --> 00:57:47.810
these indicators already defined, all
right? So these two fit very well together.
00:57:47.810 --> 00:57:51.489
Herald: So the last question
from microphone 2, please.
00:57:51.489 --> 00:57:56.040
Mic 2: Hi. You mentioned that you often
find bugs in hardware. Is there any effort
00:57:56.040 --> 00:58:00.310
to verify the transistors on chips
themselves as well?
00:58:00.310 --> 00:58:03.440
Peter: So there's a whole world of
hardware verification. That wasn't my
00:58:03.440 --> 00:58:10.640
topic today. And most of the big processor
vendors have teams working on this.
00:58:10.640 --> 00:58:14.440
Unsurprisingly, if you remember your
history, many of them have teams focusing
00:58:14.440 --> 00:58:21.059
on the floating-point behaviour of their
processors. And they're doing - they're
00:58:21.059 --> 00:58:26.981
also doing by the standards of ten years
ago, pretty spectacularly well. So there
00:58:26.981 --> 00:58:33.181
are companies that have the whole of some
execution unit formally verified. There's
00:58:33.181 --> 00:58:36.979
been a lot of work over the years on
verification of cache protocols and such
00:58:36.979 --> 00:58:43.100
like. Right? There's a lot of work on not
verifying the hardware as a whole, but
00:58:43.100 --> 00:58:47.519
verifying that, you know, the RTL-level
description of the hardware matches some
00:58:47.519 --> 00:58:54.060
lower level description. So
there is a lot of that going on.
00:58:54.060 --> 00:58:57.619
Herald: Thank you very much
again for this great talk.
00:58:57.619 --> 00:59:00.480
Give him another applause.
Thank you, Peter.
00:59:00.480 --> 00:59:01.790
Peter: Thank you.
00:59:01.790 --> 00:59:04.219
applause
00:59:04.219 --> 00:59:15.000
subtitles created by c3subtitles.de
Join, and help us!