-
applause
-
Peter Sewell: Thank you for that
introduction and to the organisers for
-
this opportunity to speak to you guys. So
my name is Peter Sewell. I'm an academic
-
computer scientist and as such, I bear a
heavy burden of guilt for what we have to
-
work with. Shared guilt, to be fair, but
guilt all the same. In many, many years of
-
higher education, I know two things about
computers. Maybe you've figured them out
-
slightly faster. The first thing, there's
an awful lot of them about. Shocking. And
-
the second is that they go wrong. They go
wrong a lot. Really a lot. A lot. A lot!
-
And they go wrong in many interestingly
different ways. So sometimes they go wrong
-
in spectacular ways, right? With
fireworks. And here, we see...
-
laughterapplause
-
Here we see half a billion dollars worth
of sparkle. You don't get that every day.
-
And so what was going on? So as I
understand it, what was going on was that
-
the - this was the first flight of the
Ariane 5 launcher and it reused some of
-
the guidance software from Ariane 4.
That's fine. Good, tested code. Should
-
reuse it. But the launch profile for
Ariane 5 went sideways a bit more - on
-
purpose - than the Ariane 5 launch. And as
a result, one of the sideways values in
-
the guidance control software overflowed.
Bad. And hence, the guidance software
-
decided that something bad was going wrong
and in order to not land rockets on
-
people's heads, it kindly blew itself up.
Okay. So that's a spectacular. Sometimes
-
they go wrong in insidious ways. And you
people probably know much more about that
-
than me. Very insidious ways. Exploited by
bad people to our detriment. And they go
-
wrong in insidious ways, exploited by our
kindly governments for our protection.
-
Right? And these - there are many, many,
many different causes of these things but
-
many of these causes are completely
trivial. You know, you have one line of
-
code in the wrong place - something like
that, right? So that's very, very bad and
-
you're all conscious of how bad that is.
But from my point of view, this is as
-
nothing with the real disaster that we
have been facing and continue to face.
-
Right? The real disaster is the enormous
waste of human life and energy and effort
-
and emotion dealing with these crap
machines that we've built. Right? And it's
-
hard to quantify that. I did a little
experiment a couple of months ago. I
-
Googled up "Android problem" and "Windows
problem" and got about 300 million hits
-
for each, right? Somehow indicative of the
amount of wrongness. So this should you
-
seem kind of familiar to you. If you think
back - think back to when you were young.
-
Maybe back in the 19th century when you
were young, and what were you doing then
-
as a community of hackers and makers and
builders? Well, this was the full-on
-
industrial revolution. You were building
stuff. You were building beautiful bridges
-
of cast-iron and stone with numerous
arches and pillars. And every so often
-
they would fall down. I believe someone
miscalculated the wind loading. And we
-
would build magical steam engines to whisk
us away. And every so often, they would
-
blow up. And that kind of thing, as you
may have noticed in the last hundred years
-
- sorry - that kind of thing doesn't
happen very often any more, right? We have
-
pretty good civil and mechanical
engineering. And what does it mean to have
-
good civil and mechanical engineering? It
means that we know enough thermodynamics
-
and materials science and quality control
management and all that kind of thing to
-
model our designs before we build them and
predict how they're going to behave pretty
-
well and with pretty high confidence,
right? I see an optimisation in this talk.
-
For computer science, for computing
systems we can predict with 100 per cent
-
certainty that they will not work.
Ah, time for lunch!
-
applause
-
But it's not a very satisfying answer,
really. So why is it so hard? Let me
-
discuss a couple of the reasons why
computing is tricky. In some sense,
-
trickier than steam engines. The first
reason is that there is just too much
-
code. Right? So this is one of the more
scary graphs I've found on the internet
-
recently. This is a measure of code in
hundreds of millions of lines. Each of
-
these blocks I think is 10 million lines
of code. And in the little letters - you
-
might not be able to see we have Android
and the Large Hadron Collider and Windows
-
Vista and Facebook, US Army Future Combat
System, Mac OS and software in a high-end
-
car. Right?
-
laughterapplause
-
I'm flying home.
-
laughter
-
So we that's a bit of a problem, really.
We're never going to get that right. Then
-
there's too much legacy complexity there,
right? Really too much. So as
-
right-minded, pure-minded people you might
think that software and hardware should be
-
architected beautifully with regular
structure a bit like a Mies van der Rohe
-
skyscraper or something. Clean. Okay. You
know that's not realistic, right? You
-
might expect a bit of baroqueness. You
might expect maybe something more like
-
this. You know, it's a bit twiddly but it
still hangs together. It's still got
-
structure. It still works. But then you
stop and you think and you realise that
-
software and hardware is built by very
smart people and usually in big groups and
-
subject to a whole assortment of rather
intense commercial pressure and related
-
open-source pressure and using the best
components and tools that they know and
-
they like.
-
laughterapplause
-
Someone from the room: (Indistinct)
You forgot management!
-
And the best management that
human beings can arrange.
-
laughterapplause
-
So we end up with something
spectacularly ingenious.
-
laughterapplause
-
You can see here, if you look
closely, you can see all the parts.
-
laughter
-
This was just a found picture from the
internet. But you can see there's a bunch
-
of, you know, transparent sort of
silicon-like stuff down here. Let's say
-
that's our hardware. And over here, I see
a C compiler with all of its phases.
-
laughter
-
And these bottles, that's the file system,
I reckon. And there's a TCP stack in here
-
somewhere. And there's a piece that
belongs to the NSA, obviously. And up
-
here, I think up here - yeah, there's a
JavaScript engine and a TLS stack and a
-
web browser. And then perched at the top
is the software that most people use to
-
talk to their banks.
-
laughter
-
It's just - it's just insane, right? And
then we have to look at how we build these
-
pieces, right? So we build them
fundamentally by a process of trial and
-
error, right? Just occasionally we write
some specifications in text. And then we
-
write some code and maybe we write a few
ad hoc texts and then we test and fix
-
until the thing is marketable and then we
test and fix and extend until the thing is
-
no longer marketable and then we keep on
using it until we can't anymore. Right? So
-
the fundamental process that we're using
to develop these systems is trial and
-
error by exploring particular execution
paths of the code on particular inputs.
-
But - you know this, but I want to
highlight - there are too many. The number
-
of execution paths scales typically at
least exponentially with the size of the
-
code and the number of states scales
exponentially with the size of the data
-
that you're dealing with and the number of
possible inputs is also shockingly large.
-
There is no way that we can do this and we
should perhaps stop deluding ourselves
-
that there is. And then the most
fundamental reason that computers are hard
-
is that they are too discrete, right? If
you take a bridge or, I don't know, a
-
chair or something - where's something
slightly bendable? You take a chair and
-
you exert a bit more force on it than it's
used to - I'm not very strong - and it
-
bends a bit. It deforms continuously. If
you take a bridge and you tighten up one
-
of the bolts a bit too much or you
underspecify one of the girders by 10 per
-
cent, occasionally you'll reach a point of
catastrophic failure but usually it will
-
just be a little bit weaker or a little
bit stronger or a little bit wibbly.
-
Right? But computer systems, you change a
bit - sometimes it doesn't make a
-
difference but disturbingly often you
change a line of code - as in system of
-
those bugs that we were talking about -
the whole thing becomes broken, right? So
-
it's quite different from that traditional
engineering. It's really different. Okay.
-
So, "A New Dawn", this thing is titled.
What can we do? What might we do? So I'm
-
going to talk about several different
possibilities, all right? One thing, we
-
can do more of the stuff which is
traditionally called software engineering.
-
It's not really engineering in my book,
but some of it's good. I mean, it's
-
useful, you know? We can do more testing
and have more assertions in our code and
-
better management maybe and all that kind
of stuff. We can do all that - we should
-
do all that, but it's not going to make a
very big change, all right? It's not
-
addressing any of those root causes of the
problem. What else could we do? So as an
-
academic computer scientist, I've devoted
several years of my life to working in the
-
area of programming languages and I look
at the programming languages used out
-
there in the world today. Oh, it's just
disgusting! It's shocking.
-
applause
-
The reasons that languages become popular
and built into our shared infrastructure,
-
the reasons for that have almost nothing
to do with how well those languages are
-
designed and in fact whether
they are designed at all, right?
-
applause
-
So I didn't really want to get too much
hate mail after this talk so I'm going to
-
try and avoid naming particular languages
as much as I can, but I encourage you to
-
think away as I speak and imagine the
examples. No, that was - I didn't want to
-
get much hate mail. Sorry.
-
laughter
-
So...
-
laughterapplause
-
So at the very least, we could use
programming languages based on ideas from
-
the 1970s instead of the 1960s. I mean,
really, come on. Right? You know, back in
-
'67 or '69 or whatever we had ECPL and C
and only a couple of years later we had
-
languages that guaranteed type and memory
safety and enforcement of abstraction
-
boundaries and gave us rich mechanisms for
programming so we could say what we mean
-
and not futz about with 23 million
pointers, about 1 per cent of which were
-
completely wrong, right? So for any
particular - in the context of any
-
particular project or any particular
existing legacy infrastructure there may
-
be very good reasons why one just can't
make a switch. Right? But for us
-
collectively, there is no excuse. It's
just wretched, right?
-
applause
-
It's completely bonkers. And then the
other thing that you see as a programming
-
language researcher is that - er, well, to
make another analogy, it looks as if
-
anyone who was capable of driving a car
considered themselves an expert on car
-
design, right? There is in fact a rather
well-established subject of, you know,
-
programming language design and we know
how to specify completely precisely,
-
mathematically precisely not just the
syntax like we had in these BNF grammars
-
from the 1960s and still have, but also
the behaviour of these things. We know how
-
to specify a few operational semantics and
the type systems. And we know how to do
-
mathematical proofs that our language
designs are self-consistent, that they do
-
guarantee good properties of arbitrary
well-typed programs that you write, right?
-
We can do this now. We can do this, if we
have to, post hoc. And people are doing
-
this for JavaScript and C and god help
them, PHP and other such languages. But
-
even better, we can do it at design time.
And if you do this at design time, you see
-
the structure. You have to - the act of
writing that specification forces you to
-
consider all the cases and all the
interactions between features. You can
-
still get it wrong, but you get it wrong
in more interesting ways. Right? So people
-
are starting to do this now. So I think
there was a talk on the first day of this
-
by Hannes and David who are building
system software in maybe not the best sys
-
language you could possibly imagine, but
something which compared with C is as
-
manna from heaven. Anil Madhavapeddy and
his colleagues in Cambridge have been
-
working hard to build system software in
at least moderately modern languages. And
-
it works. Sometimes you have to use C but
surprisingly often you really don't. Yeah,
-
so I teach, as my introducer said, in the
University of Cambridge. And I teach
-
semantics. And one of my fears is that I
will accidentally let someone out in the
-
world who will accidentally find
themselves over a weekend inventing a
-
little scripting language which will
accidentally take over the world and
-
become popular and I won't have explained
to them what they have to do or even that
-
there is a subject they have to
understand. And if you want to understand
-
it, there's lots of stuff you can look at.
Okay. Let's go to the uppermost extreme.
-
If we want software and hardware that
actually works, we should prove it
-
correct. Right? So this is something which
academics, especially in the domain of
-
labelled "formal methods" in the '70s and
'80s, they've been pushing this, promoting
-
this as an idea and a promise and an
exhortation for decades, right? Why don't
-
we just prove our programs correct? And
for some of those decades it wasn't really
-
plausible. I remember back when I was
considerably younger, it was a big thing
-
to proof that if you took a linked list
and reversed it twice, you got back the
-
same thing. Right? That was hard, then.
But now, well, we can't do this for the
-
Linux kernel, let's say, but we can
surprising often do this. And I just give
-
you a couple of the examples of modern
day, state-of-the-art academic
-
verification projects. People have
verified compilers all the way from C-like
-
languages down to assembly or from ML-like
languages. You know, higher-order
-
functional imperative languages all the
way down to binary models of how x86
-
behaves. And people have verified LLVM
optimisation paths. There's verified
-
software fault isolation which I believe
goes faster than the original non-verified
-
form. Win for them. There's a verified
hypervisor from Nectar group in Australia.
-
Right? Verified secure hypervisor with
proofs of security properties. There's any
-
amount of work verifying crypto protocols,
sometimes with respect to assumptions - I
-
think reasonable assumption on what the
underlying mathematics is giving you. So
-
we can do this. I maybe have to explain a
bit what I mean by "prove". To some
-
computery people, "I proved it works"
means "I tested it a little bit".
-
laughter
-
In extremis, "it compiled".
-
laughterapplause
-
To a program analysis person, you might
run some sophisticated tool but written in
-
untrusted code and possibly doing an
approximate and maybe unsound analysis and
-
you might find an absence of certain kinds
of bugs. That's tremendously useful; it's
-
not what we would really call "proved".
I mean, nor would they, to be fair.
-
Scientists generally don't use the word
because they know they're not doing it,
-
right? They're doing controlled
experiments, which gives them, you know,
-
information for or against some
hypothesis. Mathematician write proofs.
-
You probably did that when you were young.
And those proofs are careful mathematical
-
arguments usually written on paper with
pencils and their aim is to convince a
-
human being that if that mathematician was
really nailed up against the wall and
-
pushed, they could expand that into a
completely detailed proof. But what we
-
have in computing, we don't have the rich
mathematical structure that these people
-
are working with, we have a tremendous
amount of ad hoc and stupid detail mixed
-
in which a smidgin of rich mathematical
structure. So - and we have hundreds of
-
millions of lines of code. So this is just
not going to cut it. And if we want say
-
the word "prove" ever, then the only thing
which is legitimate is to do honest
-
program proof and that means you have to
have some system that machine-checks that
-
your proof is actually a proof. And
sometime we'll also have machines that
-
will sort of help and sort of hinder that
process, right? And there's a variety of
-
these systems. Coq and HOL4 and Isabelle4
- Isabelle/HOL and what have you that we
-
can look up. So using these systems we can
prove nontrivial facts about these. Not
-
necessarily that they do what you want
them to do, but that they meet the precise
-
specification that we've written down. We
can do that. But it's tricky, right? So
-
these projects, these are by academic
standards all quite big, you know? This is
-
something like, I don't know, 10 people
for a few years or something. By industry
-
scale, maybe not that big. By your scale,
maybe not that big. But still a lot of
-
work for a research group to do. And quite
high-end work. It can take, you know,
-
literally a few years to become really
fluent in using these tools, right? And
-
there isn't as yet sort of an open-source,
collaborative movement doing this kind of
-
stuff. Maybe it's the time to start. Maybe
in five years, ten years. I don't know. If
-
you want a challenge, there's a challenge.
But it's really quite hard, right? It's
-
not something that one can quite put
forward credibly as a solution for all
-
software and hardware development, right?
So that leads me to, like an intermediate
-
point. That should have been a four there.
What can we do which improves the quality
-
of our system, which injects some
mathematical rigour but which involves
-
less work and is kind of easy? And what we
can do is take some of these interfaces
-
and be precise about what they are. And
initially, we have to do that after the
-
fact. We have to reverse-engineer good
specs of existing stuff. But then we can
-
use the same techniques to make much
cleaner and better-tested and things which
-
are easier to test in the future. That's
the idea. So my colleagues and I have been
-
doing this kind of stuff for the last
quite a few years and I just want to give
-
you just a - two little vignettes just to
sort of show you how it goes. So I can't
-
give you much detail. And this is,
obviously, joint work with a whole bunch
-
of very smart and good people, some of
which I name here. So this is not me, this
-
is a whole community effort. So I'm going
to talk for a little bit about the
-
behaviour of multiprocessors, so at that
hardware interface. And I'm going to talk
-
a tiny little bit about the behaviour of
the TCP and socket API, right? Network
-
protocols. So - and there'll be some
similarities and some differences between
-
these two things. So multiprocessors. You
probably want your computers to go fast.
-
Most people do. And it's an obvious idea
to glom together, because processors don't
-
go that fast, let's glom together a whole
bunch of processors and make them all talk
-
to the same memory. This is not a new
idea. It goes back at least to 1962 when
-
the Burroughs D825 had, I think, two
processors talking to a single shared
-
memory. It had outstanding features
including truly modular hardware with
-
parallel processing throughout, and - some
things do not change very much - the
-
complement of compiling languages was to
be expanded, right? 1962, so that'll be 52
-
years ago. Deary me. Okay. Now, how do
these things behave? So let me show you
-
some code, right? It's a hacker
conference. There should be code. Let me
-
show you two example programs and these
will both be programs with two threads and
-
they will both be acting on two shared
variables, X and Y. And in the initial
-
state, X and Y are both zero. So, the
first program. On this thread we write X
-
and then we read Y, and over here we write
Y and then we read X, right? Now, these
-
are operating, you know, in an
interleaving concurrency, you might think,
-
so there's no synchronisation between
those two threads so we might interleave
-
them. We might run all of thread 0 before
all of thread 1 or all of thread 1 before
-
all of thread 0 or they might be mixed in
like that or like that or like that or
-
like that. There's six possible ways of
interleaving two lists of two things. And
-
in all of those ways you never see in the
same execution that this reads from the
-
initial state AND this reads from the
initial state. You never see that. So you
-
might expect and you might assume in your
code that these are the possible outcomes.
-
So what happens if you were to actually
run that code on my laptop in a particular
-
test harness? Well, okay, you see
occasionally they're quite well
-
synchronised and they both read the other
guy's right. Sorry, big call. They both
-
read the other guy's right. And quite
often one thread comes completely before
-
the other. But sometimes, sometimes they
both see 0. Huh. Rats. Well, that's
-
interesting. Let me show you another
program. So now, thread 0 writes some
-
stuff. Maybe it writes a big bunch of
data, maybe multi-word data. And then it
-
writes, let's say that's a flag announcing
to some another thread that that data is
-
ready to go now. And the other thread
busy-waits reading that value until it
-
gets 1, until it sees the flag. And then
it reads the data. So you might want, as a
-
programmer, you might want a guarantee
that this read will always see the data
-
that you have initialised. Hey? That would
be like a message-passing kind of an
-
idiom. So what happens if you run that?
Well, on the x86 processors that we've
-
tested, you always see that value. Good.
This is a descent coding idiom on x86. All
-
right. On Arma IBM Power processors,
however, you see sometimes you just ignore
-
the value written and you see the initial
state. So this is not a good
-
message-passing idiom, right? So what's
going on? Well, these behaviours, it might
-
be surprising, eh? And when you see
surprising behaviour in hardware, you have
-
two choices. Either it's a bug in the
hardware, and we have found a number of
-
bugs in hardware. It's always - it's very
rewarding. Or it's a bug in your
-
expectation. Or it's a bug in your test
harness. Right? So what you do is you walk
-
along and you walk up to your friendly
processor architect in IBM or ARM or x86
-
land, and we have worked with all of these
people. And you ask them, "Is this a bug?"
-
And a processor architect is a person who
is - has the authority to decide whether
-
some behaviour is intended or is a bug.
-
laughter
-
That's what they do, essentially. And for
these, they say, "Oh, we meant it to be
-
like that". Right? No question. "We meant
it to be like that. That's perfectly
-
proper. We have, because you and everybody
else wanted their computers to go fast, we
-
as processor designers have introduced all
manner of sophisticated optimizations,
-
which if you were running sequential code,
you would never notice, but if you start
-
running fancy concurrent code like this,
fancy concurrent code that isn't just
-
trivially locked, you can notice", so this
is intentional behaviour. And if you want
-
to write that code, like a mutual
exclusion algorithm or a message-passing
-
algorithm or something, then you need to
insert special instructions, memory
-
barrier instructions. And so you go away
and you say, "What do they do?" And you
-
look up in the vendor documentation. And
you get stuff like this. I'll read it out:
-
(Reads quickly) "For any applicable pair
AB, the memory barrier ensures that a will
-
be performed with respect to any processor
or mechanism, to the extent required by
-
the associated memory coherence required
at beauties, before b is performed with
-
respect to that prior or mechanism. A
includes all applicable storage accesses
-
by any such processor or mechanism that
have been performed with respect to P1
-
before the memory barrier is created. B
includes all applicable storage accesses
-
by any such processor or mechanism that
are performed after a Load instruction
-
executed by that processor or mechanism
has returned the value stored by a store
-
that is in B."
-
applause
-
Hands up if you understand what that
means? Hands up if you think that if you
-
thought about it and read the rest of the
book, you would understand or you could
-
understand what that means? Hands up the
people who think that we're doomed forever?
-
laughter
-
So I'm sorry to the first few groups,
but the last ones are right.
-
laughterapplause
-
This looks like it's really intricate and
really carefully thought out stuff. And we
-
thought that for a while and we spent
literally years trying to make precise
-
mathematical models that give precise
meaning to these words, but actually it
-
can't be done. So what do you have to do
in that circumstance? You have to go away
-
and you have to invent some kind of a
model. And there's a - this is a really
-
fundamental problem, that we - on the one
hand, we develop our software by this
-
trial and error process, but on the other
hand are points like this. We have these
-
loose specifications, supposedly, that
have to cover all manner of behaviour of
-
many generations of processors that might
all behave the same way, written just in
-
text. And we tell people they should write
"to the spec", but the only way they have
-
of developing their code is to run it and
run particular executions on the
-
particular hardware implementations that
they have, whose relationship to the spec
-
is very elusive. We can't use those thick
books that you get or those quite weighty
-
PDFs that you get these days from the
processor vendors to test programs. You
-
can't feed a program into that thick book
and get output out. And you can't test
-
processor implementations and you can't
prove anything and you can't even
-
communicate between human beings, right?
So these things, really, they don't exist.
-
So what can we do? Well, the best we can
do at this point in time is a bit of
-
empirical science, right? So we can invent
some model off the top of our heads,
-
trying to describe just the
programmer-visible behaviour, not the
-
internal structure. And we can make a
tool. Because that's not prose now. Now we
-
can - now it's stuff. It's real
mathematics and we can turn that into code
-
and execute it. We can make tools that
take programs and execute it and - small
-
programs - tell us the set of all
behaviours allowed by that model. And then
-
we can compare those sets of behaviours
against the experimental observations. And
-
you might have to fix the model and you
might find hard bugs. And then the model
-
also has to make sense to the architect so
you can discuss with the architects what
-
they intend and what their institution is,
and then you can also prove some other
-
facts about it to give a bit more
assurance and then because you probably
-
got this wrong the first couple of times,
you can go back. And this results or has
-
resulted in models which are not
guaranteed to be correct, but they are
-
effectively the de facto standard for
understanding the concurrency behaviour of
-
these things, you know? And various people
use them. And just to give you a tiny
-
snippet - I'll have to speed up a bit - a
tiny snippet of what the model looks like,
-
it's basically just an abstract machine.
It's got some stuff for the threads that
-
handles the programmer-visible speculative
execution, and some stuff for a storage
-
subsystem that handles the fact that in
these architectures, memory rights can be
-
propagated to different threads at
different times, right? And there's a -
-
that model has a state which is just a
collection of some sets and lists and
-
partial orders and a few other things
which I won't talk about. And it has some
-
transitions, right? In any state, there
might be several possible transitions.
-
This is just a sample. I'm not going to
explain all of this. But when you want to
-
propagate a write to another thread, there
have to be some preconditions that you have to
-
satisfy. And than there is an action. It's not
amazingly complicated, really. So this is text,
-
but it's very precisely written text and it has
the great advantage that you can go through
-
these bullet points with your friendly
architect and say, "Is this what you
-
mean?" And they can think about it and say
yes or no. But for the actual definition,
-
that's in mathematics but it's mathematics
that's quite close to code. I mean, it's
-
terribly close to pure, functional code
with outside effects. And just to give you
-
an idea, this is some of it and those are
three of those conditions in the real,
-
honest, true version. Then we can take
this mathematics and because it's in a
-
particular style, we can compile this into
actually OCaml and then OCaml byte code
-
and then we can run it for batch jobs. But
then you can compile that OCaml byte code
-
to JavaScript and glue on an user interface
and stick it into a web browser and then you
-
have a web interface that lets people
explore this model. And that's also a
-
necessary part of validating that it makes
sense. Okay. This is not rocket science.
-
You've missed the talk for rocket science,
I'm sorry. All we're doing here is
-
specifying the intended behaviour of a
system. Okay? That's - it's not very
-
technical but it's unusual. And we happen
to be doing it in this mathematical
-
language, but you could use in fact almost
any language so long as you understand
-
what you're doing. What you have to
understand is that you're writing
-
something which is not an implementation.
It is something which, given some observed
-
behaviour, some arbitrary observed
behaviour, will be able to decide if it's
-
allowed, if you want it to be allowed or
not, right? It has to be executed as a
-
test oracle. The key question for getting
this to work is to understand how much
-
non-determinism or loose specification
there is in the system that you're working
-
with, right? So if everything is
completely deterministic, you can write a
-
reference implementation in the cleanest
language of your choice and just compare
-
the output of that and the output of the
real thing, right? But if there's more
-
non-determinism, then you can't do that.
I'm going to have to abbreviate this a
-
little tiny bit. So for the TCP network
protocols, there is more non-determinism.
-
You know what the TCP is, yes? A protocol
that gives you sort of reliable
-
connections, assuming that the world is
made of good people, right? You look at
-
the standards for TCP, and they're the
same kind of wibbly text from the 1980s
-
and you look at the implementations and
they are a ghastly mess. And you try and
-
understand the relationship between the
two of them and you can't because the
-
standard, again, is not the definition. It
doesn't define in all circumstances what
-
behaviour is allowed and what isn't. So
again, we had to make up a specification
-
in the first instance just of this host, a
single endpoint, and observing its
-
sockets, API, calls and returns and its
behaviour on the wire, right? And when
-
you've decided - and a few internal debug
events. And when you've decided what to
-
observe, then observation is just a trace,
it's just a list of those events. And you
-
have to be able to ask your spec "Is that
list admissible or not"? But there's a lot
-
of non-determinism. Variation between the
implementations, variations within the
-
implementations. And that's internal. It's
not announced in the API or on the wire
-
maybe until much later when the
implementation picks a new window size or
-
something. You can't tell until quite a
lot later in the trace what it's picked.
-
And that makes the job of checking whether
a trace is admissible by the spec much
-
harder than it has to be. Right? And if
you had designed the TCP protocol and
-
those implementations for to be testable
in this very discriminating way, back in
-
the 1980s when you were designing TCP
protocol, it would be much easier. And for
-
new protocols, one should make sure that
you're doing this in this particular way.
-
I don't think I want to talk about that
slide. That's just one of the rules of
-
that specification. But the key fact about
that spec is the - well, we handled the
-
real protocols for arbitrary inputs. But
it's structured not just for this
-
executability as a test oracle, but it's
structured for clarity, not performance,
-
which is scarcely ever what anyone ever
does, right? So it's a whole new kind of
-
thing. And the testing is very
discriminating. So we found all manner of
-
amusing and bizarre bugs which I think I
don't have time to talk about. Okay. So
-
I've described three kinds of things that
we might do. The first thing - for
-
goodness sake - I mean, it's just a
no-brainer. Just do it already. Everybody.
-
All of you. All of you. Right? This middle
zone is a very interesting zone as far as
-
I'm concerned, right? It's - out of what -
a place where we could get a very good
-
benefit for not that much effort, right?
We can do it, if necessary, post hoc. We
-
can also do it and even better, at design
time. We have to do this in a way that
-
makes our executions testable as test
oracles, and another thing that that
-
enables is completely random testing. When
you've got a test oracle, you can just
-
feed in random goop. This is fuzzing, but
better fuzzing - feed in random goop and
-
check at every point whether what the
behaviour that you're getting is allowable
-
or not. And then they're written for
clarity, not for performance, and that
-
means you can see what you're doing,
right? You can see the complexity. If you
-
go ahead and you add some feature to your
protocol or your programming language or
-
whatever and you're working just with text
specifications, then you can't see the
-
interactions. You just chuck in a couple
of paragraphs and you think for a few
-
minutes, right? And if you're lucky, you
make an implementation. But if you're
-
writing a spec that has to cover all the
cases like this, the act of doing that
-
encourages you to think or helps you think
about the excess complexity. And you might
-
think that's too complex, I'm being silly,
I'll make it simpler. And it has to be
-
usable for proof. So this, I think also is
pretty much a no-brainer. And it doesn't
-
require great technical, you know,
mathematical skill either, right? Lots of
-
people can do this. And then there's
option 3, best option, full-on formal
-
verification of the key components. And
that is now also in reach. You can imagine
-
secure systems made using actual verified
compilers and verified hypervisors with
-
verified TLS stacks and you can imagine
making things out of those or making those
-
and then making things out of those. And
maybe one should be thinking about that.
-
So, maybe not appropriate for everything.
If you were writing Word, you would not
-
wish to do these things. Probably, you're
not. But for this key infrastructure that
-
we really depend on, we trust even though
it's not trustworthy, we have to do this.
-
Is this a new dawn? I wonder if there's
anyone standing next to a light switch
-
that can dim them for me. I didn't ask
them beforehand, so... If you think back
-
the last 70-odd years, we started building
computers in around 1940. It's been pretty
-
dark from the point of view of rigorous,
solid, reliable engineering, stuff that is
-
actually trustworthy. Pretty dark, I would
say. Maybe, just maybe there's a tiny
-
smidgen of light coming through the doors.
And we start to have a little bit of a
-
clue and we start to get reusable models
of processor architectures and programming
-
languages and network protocols and what
have you. It's just the beginnings of the
-
analogue of that thermodynamics and
materials science and quality control
-
management and what have you that we have
from mechanical engineering. And we've got
-
no choice. If we don't, the next 75 years
is going to be a lot worse. You can just
-
imagine, right? So I went to this - as I'm
sure many of you do - this great talk on
-
some Apple boot loader exploit yesterday
which was relying on some feature that had
-
been introduced in the early '80s and was
still present. And you can imagine in 100
-
years from now, you can imagine as long as
human civilisation endures, the x86
-
architecture and the socket API and all of
this stuff, it will be embedded in some
-
monumental ghastly stack of
virtualisation layers forever.
-
laughterapplause
-
So I'd like to thank especially all of my
colleagues that have been working with me
-
or not with me in these directions. I'd
like to thank our generous funders who
-
support this stuff. I'd like to thank you
for your attention and I exhort you,
-
sort it out.
-
applause
-
Herald: Thank you very much, Peter, for
those interesting insights to our
-
programming. We have now time for a Q & A,
so those people who have to leave the
-
room, please do so quietly and as fast as
possible, so we can go on and hear what
-
the questions are. So we
start with microphone 4, please.
-
Mic 4: Hello. Thanks for the talk and my
question is if you got an oracle of the
-
software behaviour which can translate
every possible input to a correct output,
-
how can this be less complex and
error-prone than the reference
-
implementation?
-
Peter: Good question. So the first point
is that this oracle doesn't have to
-
produce the outputs. It only has to check
that the inputs and the outputs match up.
-
And then the second point is that in
general it might have to have much of the
-
same complexity, but by writing it to be
clear as opposed to to be fast, you may
-
adopt a completely different structure,
right? So the structure of our TCP spec is
-
organised by the behaviour that you see
for particular transitions. The structure
-
of a real implementation has fast-path
code and lots of complicated intertwined
-
branching and all manner of excess
complexity, of implementation complexity,
-
if you will.
-
Mic 4: Thanks.
Herald: So microphone 3, please?
-
Mic 3: I wanted to ask you what you
thought about programming by manipulating
-
the abstract syntax tree directly so as to
not allow incompilable programs because of
-
some, like, you're missing a semicolon or
something like that. What's your thoughts
-
on that?
-
Peter: So that's - in the grand scheme of
things, I think that's not a very big
-
deal, right? So there's - people have
worked on structured editors for lordy
-
knows how many years. It's not a big deal
because it's very easy for a compiler to
-
detect that kind of thing. And even more,
it's very easy for a compiler of a
-
sensibly designed language to detect type
errors, even for a very sophisticated type
-
system. So I don't think that - that's not
- I mean, some people might find it
-
helpful, but I don't think it's getting to
the heart of the matter.
-
Mic 3: Thanks.
-
Herald: So we've got some questions
from our signal angels from the IRC.
-
Signal angel: Yes. There's one question.
It's about the repository for Isabelle and
-
HOL proofs. It should be on source forge,
and you said there are no open source
-
repositories. What about them?
-
Peter: I'm not quite sure what the
question is, really. So there's a
-
repository of Isabelle formal proofs which
is the archival form of proofs, it's
-
called. I didn't really mean to say there
are no open source repositories and in
-
fact I don't know under what conditions
most of those proofs are licensed. They
-
probably are open. But there isn't an open
source community building these things,
-
right? It's still pretty
much an academic pursuit.
-
Herald: Microphone 2, please.
-
Mic 2: Hello. Thanks again for your talk.
I just want to add something that you
-
didn't address, and that's that we
actually need more good studies in
-
software engineering. We often hear a lot
of experts and also very well-known
-
computer scientists say things like,
"Well, we just need to do functional
-
programming and OOP is bad and stuff like
that", which I think there's a lot of
-
truth to it, but we actually need studies
where we test these kinds of claims that
-
we make, because when you look at other
fields which it also did compare to, like,
-
medicine, if somebody comes around and
is well-known and says things like,
-
"homeopathy works", we don't believe them.
We do trials, we do good trials. And
-
there's a lot of myths out there, like, or
not well tested things, like "hire the
-
best programmers, they are 100 times
more productive", "do steady type
-
systems", and stuff like that. And we need
to verify that this is true, that this
-
helps. And it's...
-
Peter: Okay. So in the grand scheme of
things, arguably you're right. In the
-
grandest scheme of things, computer
science is actually a branch of psychology
-
or really sociology. We are trying to
understand how large groups of people can
-
combine to make things which are insanely
complicated. Now, for - would it be good
-
if we had, you know, solid evidence that
programming in this language was better
-
than programming in that language? Well,
yes, but it's staggeringly difficult and
-
expensive to do honest experiments of that
nature and they're scarcely ever done,
-
right? You might do little experiments on
little groups of students but it's really
-
difficult. And then some of these things
which I'm saying, when one is familiar
-
with the different possible options, are
just blindingly obvious. I mean, there are
-
reasons why these people are using OCaml
for their system programming, right? It's
-
not - you know, it's not - "Homeopathy is
right", but "Homeopathy is wrong" which is
-
the kind of argument being put forward.
-
Herald: Question from
microphone 5, please.
-
Mic 5: So where are you using ECC
memory for your testing - up here.
-
Peter: When you say up here,
there's a bit of ambiguity.
-
Mic 5: Here.
-
Peter: Thank you. Say again?
-
Mic 5: So where are you using ECC memory
for the testing you showed about the
-
results of the multithreaded results due
to memory barriers and memory reorderings?
-
Peter: Well...
-
Mic 5: Okay, but even if you were or you
were not, the point I want to make is that
-
you can expect about 1 bit flip each
minute in a modern computer system that
-
may completely change what your software
is doing, so perhaps we also have to look
-
in ways how we can work if something goes
wrong at the very low level so that we
-
kind of have a check against our
specification on a more higher level of
-
our software. So it is valuable to do good
engineering on the low levels, but still I
-
think we will not solve the problems of
computation and computer engineering just
-
by proving things in the mathematical
domain because computers are physical
-
entities. They have errors and so on and
we really have to deal with them as well.
-
Peter: So it's certainly true that there
are such random errors. In fact, I would
-
put the point that I would argue that you
have to be able to model the physics well
-
enough and you have to be able to model
the statistics of those errors well
-
enough, so that's a different kind of
mathematics. And it's certainly valuable
-
and necessary. But if you look at the
statistic you just quoted, is that the
-
main cause of why systems go wrong? Except
possibly for space hardware, no. So
-
important, yes. The most important thing
that we should be paying attention to? I
-
don't really think so.
-
Herald: Microphone 6, please?
-
Mic 6: Hi. I really think that what you're
proposing to specify and verify more key
-
components is - would be a valuable
addition, but how do you make sure that
-
your specification actually matches the
behaviour that you want to do or to have?
-
Peter: So as I described, we do as partial
validation of those specifications, we do
-
a lot of testing against the
implementations, against a range of
-
existing implementations. That's one
source of validation. Another source of
-
validation is that you talk to the
architects and the designers. You want the
-
internal structure to match their intent.
You want it to be comprehensible to them.
-
Another kind of validation that we do is
prove properties about them. So we proved
-
that you can correctly compile from a C11
concurrency, a mathematical model of that,
-
to a IBM Power concurrency. And that kind
of proof gives you a bit more level of
-
assurance in both. So none of this is
giving you a total guarantee. You
-
certainly don't claim a total guarantee.
But it gives you pretty good levels of
-
assurance by normal standards.
-
Herald: Mic 4?
-
Mic 4: Yes. Thanks again. You proposed
random tests, and with my expertise, it's
-
very annoying if you have tests where the
outcome is sometimes a failure and you
-
want to have reproducible tests to fix a
bug. So how do we bring this aspects to
-
test more variety in data and to have it
reproducible together?
-
Peter: Well, if - as I was mentioning, for
the TCP thing, one of the - so the problem
-
is reproducibility is exactly this
internal non-determinism, right? The
-
system makes this scheduling choice or
what have you and you can't see what it
-
is, or the checker can't see what it is.
So one way to do that is to really design
-
the protocols in a different way to expose
that non-determinism. The other fact about
-
this kind of general purpose specification
as test oracle idea is that in some sense,
-
it doesn't have to be reproducible. Right?
The specification is giving a yes-no
-
answer for an arbitrary test. And that
means that you can use arbitrary tests.
-
Figuring out the root causes of the
differences afterwards may be tricky, but
-
you can use them for assurance.
-
Herald: So we now got time for two
more questions. Microphone 1, please?
-
Mic 1: Hiya. Thanks for a great talk. So
what you've described seems to be a middle
-
ground between a full-on, formal
verification and more practical testing,
-
like, in between. So where do you see in
the future where formal verifications will
-
go? Will they converge to the middle or
whether it will still be there just to
-
verify something that's more
well-specified?
-
Peter: So the progress of sort of the
whole subject of formal verification in
-
general, if you look at that over the last
10 years or so, it's been very - it's been
-
really amazing compared with what we could
do in the 80s and 90s and early 2000s,
-
right? So the scope of things, the scale
of things which are - for which it is
-
becoming feasible to do verification is
getting bigger. And I think that will
-
continue. You know, I don't know when you
might see a completely verified TOR
-
client, let's say, but that's not
inconceivable now. And 20 years ago, that
-
would have been completely inconceivable.
So that is expanding and at the same time,
-
I hope we see more of this interface, text
based oracle specification - and these -
-
when you're doing formal verification of a
piece of the system, you have to have
-
these indicators already defined, all
right? So these two fit very well together.
-
Herald: So the last question
from microphone 2, please.
-
Mic 2: Hi. You mentioned that you often
find bugs in hardware. Is there any effort
-
to verify the transistors on chips
themselves as well?
-
Peter: So there's a whole world of
hardware verification. That wasn't my
-
topic today. And most of the big processor
vendors have teams working on this.
-
Unsurprisingly, if you remember your
history, many of them have teams focusing
-
on the floating-point behaviour of their
processors. And they're doing - they're
-
also doing by the standards of ten years
ago, pretty spectacularly well. So there
-
are companies that have the whole of some
execution unit formally verified. There's
-
been a lot of work over the years on
verification of cache protocols and such
-
like. Right? There's a lot of work on not
verifying the hardware as a whole, but
-
verifying that, you know, the RTL-level
description of the hardware matches some
-
lower level description. So
there is a lot of that going on.
-
Herald: Thank you very much
again for this great talk.
-
Give him another applause.
Thank you, Peter.
-
Peter: Thank you.
-
applause
-
subtitles created by c3subtitles.de
Join, and help us!