applause
Peter Sewell: Thank you for that
introduction and to the organisers for
this opportunity to speak to you guys. So
my name is Peter Sewell. I'm an academic
computer scientist and as such, I bear a
heavy burden of guilt for what we have to
work with. Shared guilt, to be fair, but
guilt all the same. In many, many years of
higher education, I know two things about
computers. Maybe you've figured them out
slightly faster. The first thing, there's
an awful lot of them about. Shocking. And
the second is that they go wrong. They go
wrong a lot. Really a lot. A lot. A lot!
And they go wrong in many interestingly
different ways. So sometimes they go wrong
in spectacular ways, right? With
fireworks. And here, we see...
laughterapplause
Here we see half a billion dollars worth
of sparkle. You don't get that every day.
And so what was going on? So as I
understand it, what was going on was that
the - this was the first flight of the
Ariane 5 launcher and it reused some of
the guidance software from Ariane 4.
That's fine. Good, tested code. Should
reuse it. But the launch profile for
Ariane 5 went sideways a bit more - on
purpose - than the Ariane 5 launch. And as
a result, one of the sideways values in
the guidance control software overflowed.
Bad. And hence, the guidance software
decided that something bad was going wrong
and in order to not land rockets on
people's heads, it kindly blew itself up.
Okay. So that's a spectacular. Sometimes
they go wrong in insidious ways. And you
people probably know much more about that
than me. Very insidious ways. Exploited by
bad people to our detriment. And they go
wrong in insidious ways, exploited by our
kindly governments for our protection.
Right? And these - there are many, many,
many different causes of these things but
many of these causes are completely
trivial. You know, you have one line of
code in the wrong place - something like
that, right? So that's very, very bad and
you're all conscious of how bad that is.
But from my point of view, this is as
nothing with the real disaster that we
have been facing and continue to face.
Right? The real disaster is the enormous
waste of human life and energy and effort
and emotion dealing with these crap
machines that we've built. Right? And it's
hard to quantify that. I did a little
experiment a couple of months ago. I
Googled up "Android problem" and "Windows
problem" and got about 300 million hits
for each, right? Somehow indicative of the
amount of wrongness. So this should you
seem kind of familiar to you. If you think
back - think back to when you were young.
Maybe back in the 19th century when you
were young, and what were you doing then
as a community of hackers and makers and
builders? Well, this was the full-on
industrial revolution. You were building
stuff. You were building beautiful bridges
of cast-iron and stone with numerous
arches and pillars. And every so often
they would fall down. I believe someone
miscalculated the wind loading. And we
would build magical steam engines to whisk
us away. And every so often, they would
blow up. And that kind of thing, as you
may have noticed in the last hundred years
- sorry - that kind of thing doesn't
happen very often any more, right? We have
pretty good civil and mechanical
engineering. And what does it mean to have
good civil and mechanical engineering? It
means that we know enough thermodynamics
and materials science and quality control
management and all that kind of thing to
model our designs before we build them and
predict how they're going to behave pretty
well and with pretty high confidence,
right? I see an optimisation in this talk.
For computer science, for computing
systems we can predict with 100 per cent
certainty that they will not work.
Ah, time for lunch!
applause
But it's not a very satisfying answer,
really. So why is it so hard? Let me
discuss a couple of the reasons why
computing is tricky. In some sense,
trickier than steam engines. The first
reason is that there is just too much
code. Right? So this is one of the more
scary graphs I've found on the internet
recently. This is a measure of code in
hundreds of millions of lines. Each of
these blocks I think is 10 million lines
of code. And in the little letters - you
might not be able to see we have Android
and the Large Hadron Collider and Windows
Vista and Facebook, US Army Future Combat
System, Mac OS and software in a high-end
car. Right?
laughterapplause
I'm flying home.
laughter
So we that's a bit of a problem, really.
We're never going to get that right. Then
there's too much legacy complexity there,
right? Really too much. So as
right-minded, pure-minded people you might
think that software and hardware should be
architected beautifully with regular
structure a bit like a Mies van der Rohe
skyscraper or something. Clean. Okay. You
know that's not realistic, right? You
might expect a bit of baroqueness. You
might expect maybe something more like
this. You know, it's a bit twiddly but it
still hangs together. It's still got
structure. It still works. But then you
stop and you think and you realise that
software and hardware is built by very
smart people and usually in big groups and
subject to a whole assortment of rather
intense commercial pressure and related
open-source pressure and using the best
components and tools that they know and
they like.
laughterapplause
Someone from the room: (Indistinct)
You forgot management!
And the best management that
human beings can arrange.
laughterapplause
So we end up with something
spectacularly ingenious.
laughterapplause
You can see here, if you look
closely, you can see all the parts.
laughter
This was just a found picture from the
internet. But you can see there's a bunch
of, you know, transparent sort of
silicon-like stuff down here. Let's say
that's our hardware. And over here, I see
a C compiler with all of its phases.
laughter
And these bottles, that's the file system,
I reckon. And there's a TCP stack in here
somewhere. And there's a piece that
belongs to the NSA, obviously. And up
here, I think up here - yeah, there's a
JavaScript engine and a TLS stack and a
web browser. And then perched at the top
is the software that most people use to
talk to their banks.
laughter
It's just - it's just insane, right? And
then we have to look at how we build these
pieces, right? So we build them
fundamentally by a process of trial and
error, right? Just occasionally we write
some specifications in text. And then we
write some code and maybe we write a few
ad hoc texts and then we test and fix
until the thing is marketable and then we
test and fix and extend until the thing is
no longer marketable and then we keep on
using it until we can't anymore. Right? So
the fundamental process that we're using
to develop these systems is trial and
error by exploring particular execution
paths of the code on particular inputs.
But - you know this, but I want to
highlight - there are too many. The number
of execution paths scales typically at
least exponentially with the size of the
code and the number of states scales
exponentially with the size of the data
that you're dealing with and the number of
possible inputs is also shockingly large.
There is no way that we can do this and we
should perhaps stop deluding ourselves
that there is. And then the most
fundamental reason that computers are hard
is that they are too discrete, right? If
you take a bridge or, I don't know, a
chair or something - where's something
slightly bendable? You take a chair and
you exert a bit more force on it than it's
used to - I'm not very strong - and it
bends a bit. It deforms continuously. If
you take a bridge and you tighten up one
of the bolts a bit too much or you
underspecify one of the girders by 10 per
cent, occasionally you'll reach a point of
catastrophic failure but usually it will
just be a little bit weaker or a little
bit stronger or a little bit wibbly.
Right? But computer systems, you change a
bit - sometimes it doesn't make a
difference but disturbingly often you
change a line of code - as in system of
those bugs that we were talking about -
the whole thing becomes broken, right? So
it's quite different from that traditional
engineering. It's really different. Okay.
So, "A New Dawn", this thing is titled.
What can we do? What might we do? So I'm
going to talk about several different
possibilities, all right? One thing, we
can do more of the stuff which is
traditionally called software engineering.
It's not really engineering in my book,
but some of it's good. I mean, it's
useful, you know? We can do more testing
and have more assertions in our code and
better management maybe and all that kind
of stuff. We can do all that - we should
do all that, but it's not going to make a
very big change, all right? It's not
addressing any of those root causes of the
problem. What else could we do? So as an
academic computer scientist, I've devoted
several years of my life to working in the
area of programming languages and I look
at the programming languages used out
there in the world today. Oh, it's just
disgusting! It's shocking.
applause
The reasons that languages become popular
and built into our shared infrastructure,
the reasons for that have almost nothing
to do with how well those languages are
designed and in fact whether
they are designed at all, right?
applause
So I didn't really want to get too much
hate mail after this talk so I'm going to
try and avoid naming particular languages
as much as I can, but I encourage you to
think away as I speak and imagine the
examples. No, that was - I didn't want to
get much hate mail. Sorry.
laughter
So...
laughterapplause
So at the very least, we could use
programming languages based on ideas from
the 1970s instead of the 1960s. I mean,
really, come on. Right? You know, back in
'67 or '69 or whatever we had ECPL and C
and only a couple of years later we had
languages that guaranteed type and memory
safety and enforcement of abstraction
boundaries and gave us rich mechanisms for
programming so we could say what we mean
and not futz about with 23 million
pointers, about 1 per cent of which were
completely wrong, right? So for any
particular - in the context of any
particular project or any particular
existing legacy infrastructure there may
be very good reasons why one just can't
make a switch. Right? But for us
collectively, there is no excuse. It's
just wretched, right?
applause
It's completely bonkers. And then the
other thing that you see as a programming
language researcher is that - er, well, to
make another analogy, it looks as if
anyone who was capable of driving a car
considered themselves an expert on car
design, right? There is in fact a rather
well-established subject of, you know,
programming language design and we know
how to specify completely precisely,
mathematically precisely not just the
syntax like we had in these BNF grammars
from the 1960s and still have, but also
the behaviour of these things. We know how
to specify a few operational semantics and
the type systems. And we know how to do
mathematical proofs that our language
designs are self-consistent, that they do
guarantee good properties of arbitrary
well-typed programs that you write, right?
We can do this now. We can do this, if we
have to, post hoc. And people are doing
this for JavaScript and C and god help
them, PHP and other such languages. But
even better, we can do it at design time.
And if you do this at design time, you see
the structure. You have to - the act of
writing that specification forces you to
consider all the cases and all the
interactions between features. You can
still get it wrong, but you get it wrong
in more interesting ways. Right? So people
are starting to do this now. So I think
there was a talk on the first day of this
by Hannes and David who are building
system software in maybe not the best sys
language you could possibly imagine, but
something which compared with C is as
manna from heaven. Anil Madhavapeddy and
his colleagues in Cambridge have been
working hard to build system software in
at least moderately modern languages. And
it works. Sometimes you have to use C but
surprisingly often you really don't. Yeah,
so I teach, as my introducer said, in the
University of Cambridge. And I teach
semantics. And one of my fears is that I
will accidentally let someone out in the
world who will accidentally find
themselves over a weekend inventing a
little scripting language which will
accidentally take over the world and
become popular and I won't have explained
to them what they have to do or even that
there is a subject they have to
understand. And if you want to understand
it, there's lots of stuff you can look at.
Okay. Let's go to the uppermost extreme.
If we want software and hardware that
actually works, we should prove it
correct. Right? So this is something which
academics, especially in the domain of
labelled "formal methods" in the '70s and
'80s, they've been pushing this, promoting
this as an idea and a promise and an
exhortation for decades, right? Why don't
we just prove our programs correct? And
for some of those decades it wasn't really
plausible. I remember back when I was
considerably younger, it was a big thing
to proof that if you took a linked list
and reversed it twice, you got back the
same thing. Right? That was hard, then.
But now, well, we can't do this for the
Linux kernel, let's say, but we can
surprising often do this. And I just give
you a couple of the examples of modern
day, state-of-the-art academic
verification projects. People have
verified compilers all the way from C-like
languages down to assembly or from ML-like
languages. You know, higher-order
functional imperative languages all the
way down to binary models of how x86
behaves. And people have verified LLVM
optimisation paths. There's verified
software fault isolation which I believe
goes faster than the original non-verified
form. Win for them. There's a verified
hypervisor from Nectar group in Australia.
Right? Verified secure hypervisor with
proofs of security properties. There's any
amount of work verifying crypto protocols,
sometimes with respect to assumptions - I
think reasonable assumption on what the
underlying mathematics is giving you. So
we can do this. I maybe have to explain a
bit what I mean by "prove". To some
computery people, "I proved it works"
means "I tested it a little bit".
laughter
In extremis, "it compiled".
laughterapplause
To a program analysis person, you might
run some sophisticated tool but written in
untrusted code and possibly doing an
approximate and maybe unsound analysis and
you might find an absence of certain kinds
of bugs. That's tremendously useful; it's
not what we would really call "proved".
I mean, nor would they, to be fair.
Scientists generally don't use the word
because they know they're not doing it,
right? They're doing controlled
experiments, which gives them, you know,
information for or against some
hypothesis. Mathematician write proofs.
You probably did that when you were young.
And those proofs are careful mathematical
arguments usually written on paper with
pencils and their aim is to convince a
human being that if that mathematician was
really nailed up against the wall and
pushed, they could expand that into a
completely detailed proof. But what we
have in computing, we don't have the rich
mathematical structure that these people
are working with, we have a tremendous
amount of ad hoc and stupid detail mixed
in which a smidgin of rich mathematical
structure. So - and we have hundreds of
millions of lines of code. So this is just
not going to cut it. And if we want say
the word "prove" ever, then the only thing
which is legitimate is to do honest
program proof and that means you have to
have some system that machine-checks that
your proof is actually a proof. And
sometime we'll also have machines that
will sort of help and sort of hinder that
process, right? And there's a variety of
these systems. Coq and HOL4 and Isabelle4
- Isabelle/HOL and what have you that we
can look up. So using these systems we can
prove nontrivial facts about these. Not
necessarily that they do what you want
them to do, but that they meet the precise
specification that we've written down. We
can do that. But it's tricky, right? So
these projects, these are by academic
standards all quite big, you know? This is
something like, I don't know, 10 people
for a few years or something. By industry
scale, maybe not that big. By your scale,
maybe not that big. But still a lot of
work for a research group to do. And quite
high-end work. It can take, you know,
literally a few years to become really
fluent in using these tools, right? And
there isn't as yet sort of an open-source,
collaborative movement doing this kind of
stuff. Maybe it's the time to start. Maybe
in five years, ten years. I don't know. If
you want a challenge, there's a challenge.
But it's really quite hard, right? It's
not something that one can quite put
forward credibly as a solution for all
software and hardware development, right?
So that leads me to, like an intermediate
point. That should have been a four there.
What can we do which improves the quality
of our system, which injects some
mathematical rigour but which involves
less work and is kind of easy? And what we
can do is take some of these interfaces
and be precise about what they are. And
initially, we have to do that after the
fact. We have to reverse-engineer good
specs of existing stuff. But then we can
use the same techniques to make much
cleaner and better-tested and things which
are easier to test in the future. That's
the idea. So my colleagues and I have been
doing this kind of stuff for the last
quite a few years and I just want to give
you just a - two little vignettes just to
sort of show you how it goes. So I can't
give you much detail. And this is,
obviously, joint work with a whole bunch
of very smart and good people, some of
which I name here. So this is not me, this
is a whole community effort. So I'm going
to talk for a little bit about the
behaviour of multiprocessors, so at that
hardware interface. And I'm going to talk
a tiny little bit about the behaviour of
the TCP and socket API, right? Network
protocols. So - and there'll be some
similarities and some differences between
these two things. So multiprocessors. You
probably want your computers to go fast.
Most people do. And it's an obvious idea
to glom together, because processors don't
go that fast, let's glom together a whole
bunch of processors and make them all talk
to the same memory. This is not a new
idea. It goes back at least to 1962 when
the Burroughs D825 had, I think, two
processors talking to a single shared
memory. It had outstanding features
including truly modular hardware with
parallel processing throughout, and - some
things do not change very much - the
complement of compiling languages was to
be expanded, right? 1962, so that'll be 52
years ago. Deary me. Okay. Now, how do
these things behave? So let me show you
some code, right? It's a hacker
conference. There should be code. Let me
show you two example programs and these
will both be programs with two threads and
they will both be acting on two shared
variables, X and Y. And in the initial
state, X and Y are both zero. So, the
first program. On this thread we write X
and then we read Y, and over here we write
Y and then we read X, right? Now, these
are operating, you know, in an
interleaving concurrency, you might think,
so there's no synchronisation between
those two threads so we might interleave
them. We might run all of thread 0 before
all of thread 1 or all of thread 1 before
all of thread 0 or they might be mixed in
like that or like that or like that or
like that. There's six possible ways of
interleaving two lists of two things. And
in all of those ways you never see in the
same execution that this reads from the
initial state AND this reads from the
initial state. You never see that. So you
might expect and you might assume in your
code that these are the possible outcomes.
So what happens if you were to actually
run that code on my laptop in a particular
test harness? Well, okay, you see
occasionally they're quite well
synchronised and they both read the other
guy's right. Sorry, big call. They both
read the other guy's right. And quite
often one thread comes completely before
the other. But sometimes, sometimes they
both see 0. Huh. Rats. Well, that's
interesting. Let me show you another
program. So now, thread 0 writes some
stuff. Maybe it writes a big bunch of
data, maybe multi-word data. And then it
writes, let's say that's a flag announcing
to some another thread that that data is
ready to go now. And the other thread
busy-waits reading that value until it
gets 1, until it sees the flag. And then
it reads the data. So you might want, as a
programmer, you might want a guarantee
that this read will always see the data
that you have initialised. Hey? That would
be like a message-passing kind of an
idiom. So what happens if you run that?
Well, on the x86 processors that we've
tested, you always see that value. Good.
This is a descent coding idiom on x86. All
right. On Arma IBM Power processors,
however, you see sometimes you just ignore
the value written and you see the initial
state. So this is not a good
message-passing idiom, right? So what's
going on? Well, these behaviours, it might
be surprising, eh? And when you see
surprising behaviour in hardware, you have
two choices. Either it's a bug in the
hardware, and we have found a number of
bugs in hardware. It's always - it's very
rewarding. Or it's a bug in your
expectation. Or it's a bug in your test
harness. Right? So what you do is you walk
along and you walk up to your friendly
processor architect in IBM or ARM or x86
land, and we have worked with all of these
people. And you ask them, "Is this a bug?"
And a processor architect is a person who
is - has the authority to decide whether
some behaviour is intended or is a bug.
laughter
That's what they do, essentially. And for
these, they say, "Oh, we meant it to be
like that". Right? No question. "We meant
it to be like that. That's perfectly
proper. We have, because you and everybody
else wanted their computers to go fast, we
as processor designers have introduced all
manner of sophisticated optimizations,
which if you were running sequential code,
you would never notice, but if you start
running fancy concurrent code like this,
fancy concurrent code that isn't just
trivially locked, you can notice", so this
is intentional behaviour. And if you want
to write that code, like a mutual
exclusion algorithm or a message-passing
algorithm or something, then you need to
insert special instructions, memory
barrier instructions. And so you go away
and you say, "What do they do?" And you
look up in the vendor documentation. And
you get stuff like this. I'll read it out:
(Reads quickly) "For any applicable pair
AB, the memory barrier ensures that a will
be performed with respect to any processor
or mechanism, to the extent required by
the associated memory coherence required
at beauties, before b is performed with
respect to that prior or mechanism. A
includes all applicable storage accesses
by any such processor or mechanism that
have been performed with respect to P1
before the memory barrier is created. B
includes all applicable storage accesses
by any such processor or mechanism that
are performed after a Load instruction
executed by that processor or mechanism
has returned the value stored by a store
that is in B."
applause
Hands up if you understand what that
means? Hands up if you think that if you
thought about it and read the rest of the
book, you would understand or you could
understand what that means? Hands up the
people who think that we're doomed forever?
laughter
So I'm sorry to the first few groups,
but the last ones are right.
laughterapplause
This looks like it's really intricate and
really carefully thought out stuff. And we
thought that for a while and we spent
literally years trying to make precise
mathematical models that give precise
meaning to these words, but actually it
can't be done. So what do you have to do
in that circumstance? You have to go away
and you have to invent some kind of a
model. And there's a - this is a really
fundamental problem, that we - on the one
hand, we develop our software by this
trial and error process, but on the other
hand are points like this. We have these
loose specifications, supposedly, that
have to cover all manner of behaviour of
many generations of processors that might
all behave the same way, written just in
text. And we tell people they should write
"to the spec", but the only way they have
of developing their code is to run it and
run particular executions on the
particular hardware implementations that
they have, whose relationship to the spec
is very elusive. We can't use those thick
books that you get or those quite weighty
PDFs that you get these days from the
processor vendors to test programs. You
can't feed a program into that thick book
and get output out. And you can't test
processor implementations and you can't
prove anything and you can't even
communicate between human beings, right?
So these things, really, they don't exist.
So what can we do? Well, the best we can
do at this point in time is a bit of
empirical science, right? So we can invent
some model off the top of our heads,
trying to describe just the
programmer-visible behaviour, not the
internal structure. And we can make a
tool. Because that's not prose now. Now we
can - now it's stuff. It's real
mathematics and we can turn that into code
and execute it. We can make tools that
take programs and execute it and - small
programs - tell us the set of all
behaviours allowed by that model. And then
we can compare those sets of behaviours
against the experimental observations. And
you might have to fix the model and you
might find hard bugs. And then the model
also has to make sense to the architect so
you can discuss with the architects what
they intend and what their institution is,
and then you can also prove some other
facts about it to give a bit more
assurance and then because you probably
got this wrong the first couple of times,
you can go back. And this results or has
resulted in models which are not
guaranteed to be correct, but they are
effectively the de facto standard for
understanding the concurrency behaviour of
these things, you know? And various people
use them. And just to give you a tiny
snippet - I'll have to speed up a bit - a
tiny snippet of what the model looks like,
it's basically just an abstract machine.
It's got some stuff for the threads that
handles the programmer-visible speculative
execution, and some stuff for a storage
subsystem that handles the fact that in
these architectures, memory rights can be
propagated to different threads at
different times, right? And there's a -
that model has a state which is just a
collection of some sets and lists and
partial orders and a few other things
which I won't talk about. And it has some
transitions, right? In any state, there
might be several possible transitions.
This is just a sample. I'm not going to
explain all of this. But when you want to
propagate a write to another thread, there
have to be some preconditions that you have to
satisfy. And than there is an action. It's not
amazingly complicated, really. So this is text,
but it's very precisely written text and it has
the great advantage that you can go through
these bullet points with your friendly
architect and say, "Is this what you
mean?" And they can think about it and say
yes or no. But for the actual definition,
that's in mathematics but it's mathematics
that's quite close to code. I mean, it's
terribly close to pure, functional code
with outside effects. And just to give you
an idea, this is some of it and those are
three of those conditions in the real,
honest, true version. Then we can take
this mathematics and because it's in a
particular style, we can compile this into
actually OCaml and then OCaml byte code
and then we can run it for batch jobs. But
then you can compile that OCaml byte code
to JavaScript and glue on an user interface
and stick it into a web browser and then you
have a web interface that lets people
explore this model. And that's also a
necessary part of validating that it makes
sense. Okay. This is not rocket science.
You've missed the talk for rocket science,
I'm sorry. All we're doing here is
specifying the intended behaviour of a
system. Okay? That's - it's not very
technical but it's unusual. And we happen
to be doing it in this mathematical
language, but you could use in fact almost
any language so long as you understand
what you're doing. What you have to
understand is that you're writing
something which is not an implementation.
It is something which, given some observed
behaviour, some arbitrary observed
behaviour, will be able to decide if it's
allowed, if you want it to be allowed or
not, right? It has to be executed as a
test oracle. The key question for getting
this to work is to understand how much
non-determinism or loose specification
there is in the system that you're working
with, right? So if everything is
completely deterministic, you can write a
reference implementation in the cleanest
language of your choice and just compare
the output of that and the output of the
real thing, right? But if there's more
non-determinism, then you can't do that.
I'm going to have to abbreviate this a
little tiny bit. So for the TCP network
protocols, there is more non-determinism.
You know what the TCP is, yes? A protocol
that gives you sort of reliable
connections, assuming that the world is
made of good people, right? You look at
the standards for TCP, and they're the
same kind of wibbly text from the 1980s
and you look at the implementations and
they are a ghastly mess. And you try and
understand the relationship between the
two of them and you can't because the
standard, again, is not the definition. It
doesn't define in all circumstances what
behaviour is allowed and what isn't. So
again, we had to make up a specification
in the first instance just of this host, a
single endpoint, and observing its
sockets, API, calls and returns and its
behaviour on the wire, right? And when
you've decided - and a few internal debug
events. And when you've decided what to
observe, then observation is just a trace,
it's just a list of those events. And you
have to be able to ask your spec "Is that
list admissible or not"? But there's a lot
of non-determinism. Variation between the
implementations, variations within the
implementations. And that's internal. It's
not announced in the API or on the wire
maybe until much later when the
implementation picks a new window size or
something. You can't tell until quite a
lot later in the trace what it's picked.
And that makes the job of checking whether
a trace is admissible by the spec much
harder than it has to be. Right? And if
you had designed the TCP protocol and
those implementations for to be testable
in this very discriminating way, back in
the 1980s when you were designing TCP
protocol, it would be much easier. And for
new protocols, one should make sure that
you're doing this in this particular way.
I don't think I want to talk about that
slide. That's just one of the rules of
that specification. But the key fact about
that spec is the - well, we handled the
real protocols for arbitrary inputs. But
it's structured not just for this
executability as a test oracle, but it's
structured for clarity, not performance,
which is scarcely ever what anyone ever
does, right? So it's a whole new kind of
thing. And the testing is very
discriminating. So we found all manner of
amusing and bizarre bugs which I think I
don't have time to talk about. Okay. So
I've described three kinds of things that
we might do. The first thing - for
goodness sake - I mean, it's just a
no-brainer. Just do it already. Everybody.
All of you. All of you. Right? This middle
zone is a very interesting zone as far as
I'm concerned, right? It's - out of what -
a place where we could get a very good
benefit for not that much effort, right?
We can do it, if necessary, post hoc. We
can also do it and even better, at design
time. We have to do this in a way that
makes our executions testable as test
oracles, and another thing that that
enables is completely random testing. When
you've got a test oracle, you can just
feed in random goop. This is fuzzing, but
better fuzzing - feed in random goop and
check at every point whether what the
behaviour that you're getting is allowable
or not. And then they're written for
clarity, not for performance, and that
means you can see what you're doing,
right? You can see the complexity. If you
go ahead and you add some feature to your
protocol or your programming language or
whatever and you're working just with text
specifications, then you can't see the
interactions. You just chuck in a couple
of paragraphs and you think for a few
minutes, right? And if you're lucky, you
make an implementation. But if you're
writing a spec that has to cover all the
cases like this, the act of doing that
encourages you to think or helps you think
about the excess complexity. And you might
think that's too complex, I'm being silly,
I'll make it simpler. And it has to be
usable for proof. So this, I think also is
pretty much a no-brainer. And it doesn't
require great technical, you know,
mathematical skill either, right? Lots of
people can do this. And then there's
option 3, best option, full-on formal
verification of the key components. And
that is now also in reach. You can imagine
secure systems made using actual verified
compilers and verified hypervisors with
verified TLS stacks and you can imagine
making things out of those or making those
and then making things out of those. And
maybe one should be thinking about that.
So, maybe not appropriate for everything.
If you were writing Word, you would not
wish to do these things. Probably, you're
not. But for this key infrastructure that
we really depend on, we trust even though
it's not trustworthy, we have to do this.
Is this a new dawn? I wonder if there's
anyone standing next to a light switch
that can dim them for me. I didn't ask
them beforehand, so... If you think back
the last 70-odd years, we started building
computers in around 1940. It's been pretty
dark from the point of view of rigorous,
solid, reliable engineering, stuff that is
actually trustworthy. Pretty dark, I would
say. Maybe, just maybe there's a tiny
smidgen of light coming through the doors.
And we start to have a little bit of a
clue and we start to get reusable models
of processor architectures and programming
languages and network protocols and what
have you. It's just the beginnings of the
analogue of that thermodynamics and
materials science and quality control
management and what have you that we have
from mechanical engineering. And we've got
no choice. If we don't, the next 75 years
is going to be a lot worse. You can just
imagine, right? So I went to this - as I'm
sure many of you do - this great talk on
some Apple boot loader exploit yesterday
which was relying on some feature that had
been introduced in the early '80s and was
still present. And you can imagine in 100
years from now, you can imagine as long as
human civilisation endures, the x86
architecture and the socket API and all of
this stuff, it will be embedded in some
monumental ghastly stack of
virtualisation layers forever.
laughterapplause
So I'd like to thank especially all of my
colleagues that have been working with me
or not with me in these directions. I'd
like to thank our generous funders who
support this stuff. I'd like to thank you
for your attention and I exhort you,
sort it out.
applause
Herald: Thank you very much, Peter, for
those interesting insights to our
programming. We have now time for a Q & A,
so those people who have to leave the
room, please do so quietly and as fast as
possible, so we can go on and hear what
the questions are. So we
start with microphone 4, please.
Mic 4: Hello. Thanks for the talk and my
question is if you got an oracle of the
software behaviour which can translate
every possible input to a correct output,
how can this be less complex and
error-prone than the reference
implementation?
Peter: Good question. So the first point
is that this oracle doesn't have to
produce the outputs. It only has to check
that the inputs and the outputs match up.
And then the second point is that in
general it might have to have much of the
same complexity, but by writing it to be
clear as opposed to to be fast, you may
adopt a completely different structure,
right? So the structure of our TCP spec is
organised by the behaviour that you see
for particular transitions. The structure
of a real implementation has fast-path
code and lots of complicated intertwined
branching and all manner of excess
complexity, of implementation complexity,
if you will.
Mic 4: Thanks.
Herald: So microphone 3, please?
Mic 3: I wanted to ask you what you
thought about programming by manipulating
the abstract syntax tree directly so as to
not allow incompilable programs because of
some, like, you're missing a semicolon or
something like that. What's your thoughts
on that?
Peter: So that's - in the grand scheme of
things, I think that's not a very big
deal, right? So there's - people have
worked on structured editors for lordy
knows how many years. It's not a big deal
because it's very easy for a compiler to
detect that kind of thing. And even more,
it's very easy for a compiler of a
sensibly designed language to detect type
errors, even for a very sophisticated type
system. So I don't think that - that's not
- I mean, some people might find it
helpful, but I don't think it's getting to
the heart of the matter.
Mic 3: Thanks.
Herald: So we've got some questions
from our signal angels from the IRC.
Signal angel: Yes. There's one question.
It's about the repository for Isabelle and
HOL proofs. It should be on source forge,
and you said there are no open source
repositories. What about them?
Peter: I'm not quite sure what the
question is, really. So there's a
repository of Isabelle formal proofs which
is the archival form of proofs, it's
called. I didn't really mean to say there
are no open source repositories and in
fact I don't know under what conditions
most of those proofs are licensed. They
probably are open. But there isn't an open
source community building these things,
right? It's still pretty
much an academic pursuit.
Herald: Microphone 2, please.
Mic 2: Hello. Thanks again for your talk.
I just want to add something that you
didn't address, and that's that we
actually need more good studies in
software engineering. We often hear a lot
of experts and also very well-known
computer scientists say things like,
"Well, we just need to do functional
programming and OOP is bad and stuff like
that", which I think there's a lot of
truth to it, but we actually need studies
where we test these kinds of claims that
we make, because when you look at other
fields which it also did compare to, like,
medicine, if somebody comes around and
is well-known and says things like,
"homeopathy works", we don't believe them.
We do trials, we do good trials. And
there's a lot of myths out there, like, or
not well tested things, like "hire the
best programmers, they are 100 times
more productive", "do steady type
systems", and stuff like that. And we need
to verify that this is true, that this
helps. And it's...
Peter: Okay. So in the grand scheme of
things, arguably you're right. In the
grandest scheme of things, computer
science is actually a branch of psychology
or really sociology. We are trying to
understand how large groups of people can
combine to make things which are insanely
complicated. Now, for - would it be good
if we had, you know, solid evidence that
programming in this language was better
than programming in that language? Well,
yes, but it's staggeringly difficult and
expensive to do honest experiments of that
nature and they're scarcely ever done,
right? You might do little experiments on
little groups of students but it's really
difficult. And then some of these things
which I'm saying, when one is familiar
with the different possible options, are
just blindingly obvious. I mean, there are
reasons why these people are using OCaml
for their system programming, right? It's
not - you know, it's not - "Homeopathy is
right", but "Homeopathy is wrong" which is
the kind of argument being put forward.
Herald: Question from
microphone 5, please.
Mic 5: So where are you using ECC
memory for your testing - up here.
Peter: When you say up here,
there's a bit of ambiguity.
Mic 5: Here.
Peter: Thank you. Say again?
Mic 5: So where are you using ECC memory
for the testing you showed about the
results of the multithreaded results due
to memory barriers and memory reorderings?
Peter: Well...
Mic 5: Okay, but even if you were or you
were not, the point I want to make is that
you can expect about 1 bit flip each
minute in a modern computer system that
may completely change what your software
is doing, so perhaps we also have to look
in ways how we can work if something goes
wrong at the very low level so that we
kind of have a check against our
specification on a more higher level of
our software. So it is valuable to do good
engineering on the low levels, but still I
think we will not solve the problems of
computation and computer engineering just
by proving things in the mathematical
domain because computers are physical
entities. They have errors and so on and
we really have to deal with them as well.
Peter: So it's certainly true that there
are such random errors. In fact, I would
put the point that I would argue that you
have to be able to model the physics well
enough and you have to be able to model
the statistics of those errors well
enough, so that's a different kind of
mathematics. And it's certainly valuable
and necessary. But if you look at the
statistic you just quoted, is that the
main cause of why systems go wrong? Except
possibly for space hardware, no. So
important, yes. The most important thing
that we should be paying attention to? I
don't really think so.
Herald: Microphone 6, please?
Mic 6: Hi. I really think that what you're
proposing to specify and verify more key
components is - would be a valuable
addition, but how do you make sure that
your specification actually matches the
behaviour that you want to do or to have?
Peter: So as I described, we do as partial
validation of those specifications, we do
a lot of testing against the
implementations, against a range of
existing implementations. That's one
source of validation. Another source of
validation is that you talk to the
architects and the designers. You want the
internal structure to match their intent.
You want it to be comprehensible to them.
Another kind of validation that we do is
prove properties about them. So we proved
that you can correctly compile from a C11
concurrency, a mathematical model of that,
to a IBM Power concurrency. And that kind
of proof gives you a bit more level of
assurance in both. So none of this is
giving you a total guarantee. You
certainly don't claim a total guarantee.
But it gives you pretty good levels of
assurance by normal standards.
Herald: Mic 4?
Mic 4: Yes. Thanks again. You proposed
random tests, and with my expertise, it's
very annoying if you have tests where the
outcome is sometimes a failure and you
want to have reproducible tests to fix a
bug. So how do we bring this aspects to
test more variety in data and to have it
reproducible together?
Peter: Well, if - as I was mentioning, for
the TCP thing, one of the - so the problem
is reproducibility is exactly this
internal non-determinism, right? The
system makes this scheduling choice or
what have you and you can't see what it
is, or the checker can't see what it is.
So one way to do that is to really design
the protocols in a different way to expose
that non-determinism. The other fact about
this kind of general purpose specification
as test oracle idea is that in some sense,
it doesn't have to be reproducible. Right?
The specification is giving a yes-no
answer for an arbitrary test. And that
means that you can use arbitrary tests.
Figuring out the root causes of the
differences afterwards may be tricky, but
you can use them for assurance.
Herald: So we now got time for two
more questions. Microphone 1, please?
Mic 1: Hiya. Thanks for a great talk. So
what you've described seems to be a middle
ground between a full-on, formal
verification and more practical testing,
like, in between. So where do you see in
the future where formal verifications will
go? Will they converge to the middle or
whether it will still be there just to
verify something that's more
well-specified?
Peter: So the progress of sort of the
whole subject of formal verification in
general, if you look at that over the last
10 years or so, it's been very - it's been
really amazing compared with what we could
do in the 80s and 90s and early 2000s,
right? So the scope of things, the scale
of things which are - for which it is
becoming feasible to do verification is
getting bigger. And I think that will
continue. You know, I don't know when you
might see a completely verified TOR
client, let's say, but that's not
inconceivable now. And 20 years ago, that
would have been completely inconceivable.
So that is expanding and at the same time,
I hope we see more of this interface, text
based oracle specification - and these -
when you're doing formal verification of a
piece of the system, you have to have
these indicators already defined, all
right? So these two fit very well together.
Herald: So the last question
from microphone 2, please.
Mic 2: Hi. You mentioned that you often
find bugs in hardware. Is there any effort
to verify the transistors on chips
themselves as well?
Peter: So there's a whole world of
hardware verification. That wasn't my
topic today. And most of the big processor
vendors have teams working on this.
Unsurprisingly, if you remember your
history, many of them have teams focusing
on the floating-point behaviour of their
processors. And they're doing - they're
also doing by the standards of ten years
ago, pretty spectacularly well. So there
are companies that have the whole of some
execution unit formally verified. There's
been a lot of work over the years on
verification of cache protocols and such
like. Right? There's a lot of work on not
verifying the hardware as a whole, but
verifying that, you know, the RTL-level
description of the hardware matches some
lower level description. So
there is a lot of that going on.
Herald: Thank you very much
again for this great talk.
Give him another applause.
Thank you, Peter.
Peter: Thank you.
applause
subtitles created by c3subtitles.de
Join, and help us!