<i>applause</i>

Peter Sewell: Thank you for that
introduction and to the organisers for

this opportunity to speak to you guys. So
my name is Peter Sewell. I'm an academic

computer scientist and as such, I bear a
heavy burden of guilt for what we have to

work with. Shared guilt, to be fair, but
guilt all the same. In many, many years of

higher education, I know two things about
computers. Maybe you've figured them out

slightly faster. The first thing, there's
an awful lot of them about. Shocking. And

the second is that they go wrong. They go
wrong a lot. Really a lot. A lot. A lot!

And they go wrong in many interestingly
different ways. So sometimes they go wrong

in spectacular ways, right? With
fireworks. And here, we see...

<i>laughter</i><i>applause</i>

Here we see half a billion dollars worth
of sparkle. You don't get that every day.

And so what was going on? So as I
understand it, what was going on was that

the - this was the first flight of the
Ariane 5 launcher and it reused some of

the guidance software from Ariane 4.
That's fine. Good, tested code. Should

reuse it. But the launch profile for
Ariane 5 went sideways a bit more - on

purpose - than the Ariane 5 launch. And as
a result, one of the sideways values in

the guidance control software overflowed.
Bad. And hence, the guidance software

decided that something bad was going wrong
and in order to not land rockets on

people's heads, it kindly blew itself up.
Okay. So that's a spectacular. Sometimes

they go wrong in insidious ways. And you
people probably know much more about that

than me. Very insidious ways. Exploited by
bad people to our detriment. And they go

wrong in insidious ways, exploited by our
kindly governments for our protection.

Right? And these - there are many, many,
many different causes of these things but

many of these causes are completely
trivial. You know, you have one line of

code in the wrong place - something like
that, right? So that's very, very bad and

you're all conscious of how bad that is.
But from my point of view, this is as

nothing with the real disaster that we
have been facing and continue to face.

Right? The real disaster is the enormous
waste of human life and energy and effort

and emotion dealing with these crap
machines that we've built. Right? And it's

hard to quantify that. I did a little
experiment a couple of months ago. I

Googled up "Android problem" and "Windows
problem" and got about 300 million hits

for each, right? Somehow indicative of the
amount of wrongness. So this should you

seem kind of familiar to you. If you think
back - think back to when you were young.

Maybe back in the 19th century when you
were young, and what were you doing then

as a community of hackers and makers and
builders? Well, this was the full-on

industrial revolution. You were building
stuff. You were building beautiful bridges

of cast-iron and stone with numerous
arches and pillars. And every so often

they would fall down. I believe someone
miscalculated the wind loading. And we

would build magical steam engines to whisk
us away. And every so often, they would

blow up. And that kind of thing, as you
may have noticed in the last hundred years

- sorry - that kind of thing doesn't
happen very often any more, right? We have

pretty good civil and mechanical
engineering. And what does it mean to have

good civil and mechanical engineering? It
means that we know enough thermodynamics

and materials science and quality control
management and all that kind of thing to

model our designs before we build them and
predict how they're going to behave pretty

well and with pretty high confidence,
right? I see an optimisation in this talk.

For computer science, for computing
systems we can predict with 100 per cent

certainty that they will not work.
Ah, time for lunch!

<i>applause</i>

But it's not a very satisfying answer,
really. So why is it so hard? Let me

discuss a couple of the reasons why
computing is tricky. In some sense,

trickier than steam engines. The first
reason is that there is just too much

code. Right? So this is one of the more
scary graphs I've found on the internet

recently. This is a measure of code in
hundreds of millions of lines. Each of

these blocks I think is 10 million lines
of code. And in the little letters - you

might not be able to see we have Android
and the Large Hadron Collider and Windows

Vista and Facebook, US Army Future Combat
System, Mac OS and software in a high-end

car. Right?

<i>laughter</i><i>applause</i>

I'm flying home.

<i>laughter</i>

So we that's a bit of a problem, really.
We're never going to get that right. Then

there's too much legacy complexity there,
right? Really too much. So as

right-minded, pure-minded people you might
think that software and hardware should be

architected beautifully with regular
structure a bit like a Mies van der Rohe

skyscraper or something. Clean. Okay. You
know that's not realistic, right? You

might expect a bit of baroqueness. You
might expect maybe something more like

this. You know, it's a bit twiddly but it
still hangs together. It's still got

structure. It still works. But then you
stop and you think and you realise that

software and hardware is built by very
smart people and usually in big groups and

subject to a whole assortment of rather
intense commercial pressure and related

open-source pressure and using the best
components and tools that they know and

they like.

<i>laughter</i><i>applause</i>

Someone from the room: (Indistinct)
You forgot management!

And the best management that
human beings can arrange.

<i>laughter</i><i>applause</i>

So we end up with something
spectacularly ingenious.

<i>laughter</i><i>applause</i>

You can see here, if you look
closely, you can see all the parts.

<i>laughter</i>

This was just a found picture from the
internet. But you can see there's a bunch

of, you know, transparent sort of
silicon-like stuff down here. Let's say

that's our hardware. And over here, I see
a C compiler with all of its phases.

<i>laughter</i>

And these bottles, that's the file system,
I reckon. And there's a TCP stack in here

somewhere. And there's a piece that
belongs to the NSA, obviously. And up

here, I think up here - yeah, there's a
JavaScript engine and a TLS stack and a

web browser. And then perched at the top
is the software that most people use to

talk to their banks.

<i>laughter</i>

It's just - it's just insane, right? And
then we have to look at how we build these

pieces, right? So we build them
fundamentally by a process of trial and

error, right? Just occasionally we write
some specifications in text. And then we

write some code and maybe we write a few
ad hoc texts and then we test and fix

until the thing is marketable and then we
test and fix and extend until the thing is

no longer marketable and then we keep on
using it until we can't anymore. Right? So

the fundamental process that we're using
to develop these systems is trial and

error by exploring particular execution
paths of the code on particular inputs.

But - you know this, but I want to
highlight - there are too many. The number

of execution paths scales typically at
least exponentially with the size of the

code and the number of states scales
exponentially with the size of the data

that you're dealing with and the number of
possible inputs is also shockingly large.

There is no way that we can do this and we
should perhaps stop deluding ourselves

that there is. And then the most
fundamental reason that computers are hard

is that they are too discrete, right? If
you take a bridge or, I don't know, a

chair or something - where's something
slightly bendable? You take a chair and

you exert a bit more force on it than it's
used to - I'm not very strong - and it

bends a bit. It deforms continuously. If
you take a bridge and you tighten up one

of the bolts a bit too much or you
underspecify one of the girders by 10 per

cent, occasionally you'll reach a point of
catastrophic failure but usually it will

just be a little bit weaker or a little
bit stronger or a little bit wibbly.

Right? But computer systems, you change a
bit - sometimes it doesn't make a

difference but disturbingly often you
change a line of code - as in system of

those bugs that we were talking about -
the whole thing becomes broken, right? So

it's quite different from that traditional
engineering. It's really different. Okay.

So, "A New Dawn", this thing is titled.
What can we do? What might we do? So I'm

going to talk about several different
possibilities, all right? One thing, we

can do more of the stuff which is
traditionally called software engineering.

It's not really engineering in my book,
but some of it's good. I mean, it's

useful, you know? We can do more testing
and have more assertions in our code and

better management maybe and all that kind
of stuff. We can do all that - we should

do all that, but it's not going to make a
very big change, all right? It's not

addressing any of those root causes of the
problem. What else could we do? So as an

academic computer scientist, I've devoted
several years of my life to working in the

area of programming languages and I look
at the programming languages used out

there in the world today. Oh, it's just
disgusting! It's shocking.

<i>applause</i>

The reasons that languages become popular
and built into our shared infrastructure,

the reasons for that have almost nothing
to do with how well those languages are

designed and in fact whether
they are designed at all, right?

<i>applause</i>

So I didn't really want to get too much
hate mail after this talk so I'm going to

try and avoid naming particular languages
as much as I can, but I encourage you to

think away as I speak and imagine the
examples. No, that was - I didn't want to

get much hate mail. Sorry.

<i>laughter</i>

So...

<i>laughter</i><i>applause</i>

So at the very least, we could use
programming languages based on ideas from

the 1970s instead of the 1960s. I mean,
really, come on. Right? You know, back in

'67 or '69 or whatever we had ECPL and C
and only a couple of years later we had

languages that guaranteed type and memory
safety and enforcement of abstraction

boundaries and gave us rich mechanisms for
programming so we could say what we mean

and not futz about with 23 million
pointers, about 1 per cent of which were

completely wrong, right? So for any
particular - in the context of any

particular project or any particular
existing legacy infrastructure there may

be very good reasons why one just can't
make a switch. Right? But for us

collectively, there is no excuse. It's
just wretched, right?

<i>applause</i>

It's completely bonkers. And then the
other thing that you see as a programming

language researcher is that - er, well, to
make another analogy, it looks as if

anyone who was capable of driving a car
considered themselves an expert on car

design, right? There is in fact a rather
well-established subject of, you know,

programming language design and we know
how to specify completely precisely,

mathematically precisely not just the
syntax like we had in these BNF grammars

from the 1960s and still have, but also
the behaviour of these things. We know how

to specify a few operational semantics and
the type systems. And we know how to do

mathematical proofs that our language
designs are self-consistent, that they do

guarantee good properties of arbitrary
well-typed programs that you write, right?

We can do this now. We can do this, if we
have to, post hoc. And people are doing

this for JavaScript and C and god help
them, PHP and other such languages. But

even better, we can do it at design time.
And if you do this at design time, you see

the structure. You have to - the act of
writing that specification forces you to

consider all the cases and all the
interactions between features. You can

still get it wrong, but you get it wrong
in more interesting ways. Right? So people

are starting to do this now. So I think
there was a talk on the first day of this

by Hannes and David who are building
system software in maybe not the best sys

language you could possibly imagine, but
something which compared with C is as

manna from heaven. Anil Madhavapeddy and
his colleagues in Cambridge have been

working hard to build system software in
at least moderately modern languages. And

it works. Sometimes you have to use C but
surprisingly often you really don't. Yeah,

so I teach, as my introducer said, in the
University of Cambridge. And I teach

semantics. And one of my fears is that I
will accidentally let someone out in the

world who will accidentally find
themselves over a weekend inventing a

little scripting language which will
accidentally take over the world and

become popular and I won't have explained
to them what they have to do or even that

there is a subject they have to
understand. And if you want to understand

it, there's lots of stuff you can look at.
Okay. Let's go to the uppermost extreme.

If we want software and hardware that
actually works, we should prove it

correct. Right? So this is something which
academics, especially in the domain of

labelled "formal methods" in the '70s and
'80s, they've been pushing this, promoting

this as an idea and a promise and an
exhortation for decades, right? Why don't

we just prove our programs correct? And
for some of those decades it wasn't really

plausible. I remember back when I was
considerably younger, it was a big thing

to proof that if you took a linked list
and reversed it twice, you got back the

same thing. Right? That was hard, then.
But now, well, we can't do this for the

Linux kernel, let's say, but we can
surprising often do this. And I just give

you a couple of the examples of modern
day, state-of-the-art academic

verification projects. People have
verified compilers all the way from C-like

languages down to assembly or from ML-like
languages. You know, higher-order

functional imperative languages all the
way down to binary models of how x86

behaves. And people have verified LLVM
optimisation paths. There's verified

software fault isolation which I believe
goes faster than the original non-verified

form. Win for them. There's a verified
hypervisor from Nectar group in Australia.

Right? Verified secure hypervisor with
proofs of security properties. There's any

amount of work verifying crypto protocols,
sometimes with respect to assumptions - I

think reasonable assumption on what the
underlying mathematics is giving you. So

we can do this. I maybe have to explain a
bit what I mean by "prove". To some

computery people, "I proved it works"
means "I tested it a little bit".

<i>laughter</i>

In extremis, "it compiled".

<i>laughter</i><i>applause</i>

To a program analysis person, you might
run some sophisticated tool but written in

untrusted code and possibly doing an
approximate and maybe unsound analysis and

you might find an absence of certain kinds
of bugs. That's tremendously useful; it's

not what we would really call "proved".
I mean, nor would they, to be fair.

Scientists generally don't use the word
because they know they're not doing it,

right? They're doing controlled
experiments, which gives them, you know,

information for or against some
hypothesis. Mathematician write proofs.

You probably did that when you were young.
And those proofs are careful mathematical

arguments usually written on paper with
pencils and their aim is to convince a

human being that if that mathematician was
really nailed up against the wall and

pushed, they could expand that into a
completely detailed proof. But what we

have in computing, we don't have the rich
mathematical structure that these people

are working with, we have a tremendous
amount of ad hoc and stupid detail mixed

in which a smidgin of rich mathematical
structure. So - and we have hundreds of

millions of lines of code. So this is just
not going to cut it. And if we want say

the word "prove" ever, then the only thing
which is legitimate is to do honest

program proof and that means you have to
have some system that machine-checks that

your proof is actually a proof. And
sometime we'll also have machines that

will sort of help and sort of hinder that
process, right? And there's a variety of

these systems. Coq and HOL4 and Isabelle4
- Isabelle/HOL and what have you that we

can look up. So using these systems we can
prove nontrivial facts about these. Not

necessarily that they do what you want
them to do, but that they meet the precise

specification that we've written down. We
can do that. But it's tricky, right? So

these projects, these are by academic
standards all quite big, you know? This is

something like, I don't know, 10 people
for a few years or something. By industry

scale, maybe not that big. By your scale,
maybe not that big. But still a lot of

work for a research group to do. And quite
high-end work. It can take, you know,

literally a few years to become really
fluent in using these tools, right? And

there isn't as yet sort of an open-source,
collaborative movement doing this kind of

stuff. Maybe it's the time to start. Maybe
in five years, ten years. I don't know. If

you want a challenge, there's a challenge.
But it's really quite hard, right? It's

not something that one can quite put
forward credibly as a solution for all

software and hardware development, right?
So that leads me to, like an intermediate

point. That should have been a four there.
What can we do which improves the quality

of our system, which injects some
mathematical rigour but which involves

less work and is kind of easy? And what we
can do is take some of these interfaces

and be precise about what they are. And
initially, we have to do that after the

fact. We have to reverse-engineer good
specs of existing stuff. But then we can

use the same techniques to make much
cleaner and better-tested and things which

are easier to test in the future. That's
the idea. So my colleagues and I have been

doing this kind of stuff for the last
quite a few years and I just want to give

you just a - two little vignettes just to
sort of show you how it goes. So I can't

give you much detail. And this is,
obviously, joint work with a whole bunch

of very smart and good people, some of
which I name here. So this is not me, this

is a whole community effort. So I'm going
to talk for a little bit about the

behaviour of multiprocessors, so at that
hardware interface. And I'm going to talk

a tiny little bit about the behaviour of
the TCP and socket API, right? Network

protocols. So - and there'll be some
similarities and some differences between

these two things. So multiprocessors. You
probably want your computers to go fast.

Most people do. And it's an obvious idea
to glom together, because processors don't

go that fast, let's glom together a whole
bunch of processors and make them all talk

to the same memory. This is not a new
idea. It goes back at least to 1962 when

the Burroughs D825 had, I think, two
processors talking to a single shared

memory. It had outstanding features
including truly modular hardware with

parallel processing throughout, and - some
things do not change very much - the

complement of compiling languages was to
be expanded, right? 1962, so that'll be 52

years ago. Deary me. Okay. Now, how do
these things behave? So let me show you

some code, right? It's a hacker
conference. There should be code. Let me

show you two example programs and these
will both be programs with two threads and

they will both be acting on two shared
variables, X and Y. And in the initial

state, X and Y are both zero. So, the
first program. On this thread we write X

and then we read Y, and over here we write
Y and then we read X, right? Now, these

are operating, you know, in an
interleaving concurrency, you might think,

so there's no synchronisation between
those two threads so we might interleave

them. We might run all of thread 0 before
all of thread 1 or all of thread 1 before

all of thread 0 or they might be mixed in
like that or like that or like that or

like that. There's six possible ways of
interleaving two lists of two things. And

in all of those ways you never see in the
same execution that this reads from the

initial state AND this reads from the
initial state. You never see that. So you

might expect and you might assume in your
code that these are the possible outcomes.

So what happens if you were to actually
run that code on my laptop in a particular

test harness? Well, okay, you see
occasionally they're quite well

synchronised and they both read the other
guy's right. Sorry, big call. They both

read the other guy's right. And quite
often one thread comes completely before

the other. But sometimes, sometimes they
both see 0. Huh. Rats. Well, that's

interesting. Let me show you another
program. So now, thread 0 writes some

stuff. Maybe it writes a big bunch of
data, maybe multi-word data. And then it

writes, let's say that's a flag announcing
to some another thread that that data is

ready to go now. And the other thread
busy-waits reading that value until it

gets 1, until it sees the flag. And then
it reads the data. So you might want, as a

programmer, you might want a guarantee
that this read will always see the data

that you have initialised. Hey? That would
be like a message-passing kind of an

idiom. So what happens if you run that?
Well, on the x86 processors that we've

tested, you always see that value. Good.
This is a descent coding idiom on x86. All

right. On Arma IBM Power processors,
however, you see sometimes you just ignore

the value written and you see the initial
state. So this is not a good

message-passing idiom, right? So what's
going on? Well, these behaviours, it might

be surprising, eh? And when you see
surprising behaviour in hardware, you have

two choices. Either it's a bug in the
hardware, and we have found a number of

bugs in hardware. It's always - it's very
rewarding. Or it's a bug in your

expectation. Or it's a bug in your test
harness. Right? So what you do is you walk

along and you walk up to your friendly
processor architect in IBM or ARM or x86

land, and we have worked with all of these
people. And you ask them, "Is this a bug?"

And a processor architect is a person who
is - has the authority to decide whether

some behaviour is intended or is a bug.

<i>laughter</i>

That's what they do, essentially. And for
these, they say, "Oh, we meant it to be

like that". Right? No question. "We meant
it to be like that. That's perfectly

proper. We have, because you and everybody
else wanted their computers to go fast, we

as processor designers have introduced all
manner of sophisticated optimizations,

which if you were running sequential code,
you would never notice, but if you start

running fancy concurrent code like this,
fancy concurrent code that isn't just

trivially locked, you can notice", so this
is intentional behaviour. And if you want

to write that code, like a mutual
exclusion algorithm or a message-passing

algorithm or something, then you need to
insert special instructions, memory

barrier instructions. And so you go away
and you say, "What do they do?" And you

look up in the vendor documentation. And
you get stuff like this. I'll read it out:

(Reads quickly) "For any applicable pair
AB, the memory barrier ensures that a will

be performed with respect to any processor
or mechanism, to the extent required by

the associated memory coherence required
at beauties, before b is performed with

respect to that prior or mechanism. A
includes all applicable storage accesses

by any such processor or mechanism that
have been performed with respect to P1

before the memory barrier is created. B
includes all applicable storage accesses

by any such processor or mechanism that
are performed after a Load instruction

executed by that processor or mechanism
has returned the value stored by a store

that is in B."

<i>applause</i>

Hands up if you understand what that
means? Hands up if you think that if you

thought about it and read the rest of the
book, you would understand or you could

understand what that means? Hands up the
people who think that we're doomed forever?

<i>laughter</i>

So I'm sorry to the first few groups,
but the last ones are right.

<i>laughter</i><i>applause</i>

This looks like it's really intricate and
really carefully thought out stuff. And we

thought that for a while and we spent
literally years trying to make precise

mathematical models that give precise
meaning to these words, but actually it

can't be done. So what do you have to do
in that circumstance? You have to go away

and you have to invent some kind of a
model. And there's a - this is a really

fundamental problem, that we - on the one
hand, we develop our software by this

trial and error process, but on the other
hand are points like this. We have these

loose specifications, supposedly, that
have to cover all manner of behaviour of

many generations of processors that might
all behave the same way, written just in

text. And we tell people they should write
"to the spec", but the only way they have

of developing their code is to run it and
run particular executions on the

particular hardware implementations that
they have, whose relationship to the spec

is very elusive. We can't use those thick
books that you get or those quite weighty

PDFs that you get these days from the
processor vendors to test programs. You

can't feed a program into that thick book
and get output out. And you can't test

processor implementations and you can't
prove anything and you can't even

communicate between human beings, right?
So these things, really, they don't exist.

So what can we do? Well, the best we can
do at this point in time is a bit of

empirical science, right? So we can invent
some model off the top of our heads,

trying to describe just the
programmer-visible behaviour, not the

internal structure. And we can make a
tool. Because that's not prose now. Now we

can - now it's stuff. It's real
mathematics and we can turn that into code

and execute it. We can make tools that
take programs and execute it and - small

programs - tell us the set of all
behaviours allowed by that model. And then

we can compare those sets of behaviours
against the experimental observations. And

you might have to fix the model and you
might find hard bugs. And then the model

also has to make sense to the architect so
you can discuss with the architects what

they intend and what their institution is,
and then you can also prove some other

facts about it to give a bit more
assurance and then because you probably

got this wrong the first couple of times,
you can go back. And this results or has

resulted in models which are not
guaranteed to be correct, but they are

effectively the de facto standard for
understanding the concurrency behaviour of

these things, you know? And various people
use them. And just to give you a tiny

snippet - I'll have to speed up a bit - a
tiny snippet of what the model looks like,

it's basically just an abstract machine.
It's got some stuff for the threads that

handles the programmer-visible speculative
execution, and some stuff for a storage

subsystem that handles the fact that in
these architectures, memory rights can be

propagated to different threads at
different times, right? And there's a -

that model has a state which is just a
collection of some sets and lists and

partial orders and a few other things
which I won't talk about. And it has some

transitions, right? In any state, there
might be several possible transitions.

This is just a sample. I'm not going to
explain all of this. But when you want to

propagate a write to another thread, there
have to be some preconditions that you have to

satisfy. And than there is an action. It's not
amazingly complicated, really. So this is text,

but it's very precisely written text and it has
the great advantage that you can go through

these bullet points with your friendly
architect and say, "Is this what you

mean?" And they can think about it and say
yes or no. But for the actual definition,

that's in mathematics but it's mathematics
that's quite close to code. I mean, it's

terribly close to pure, functional code
with outside effects. And just to give you

an idea, this is some of it and those are
three of those conditions in the real,

honest, true version. Then we can take
this mathematics and because it's in a

particular style, we can compile this into
actually OCaml and then OCaml byte code

and then we can run it for batch jobs. But
then you can compile that OCaml byte code

to JavaScript and glue on an user interface
and stick it into a web browser and then you

have a web interface that lets people
explore this model. And that's also a

necessary part of validating that it makes
sense. Okay. This is not rocket science.

You've missed the talk for rocket science,
I'm sorry. All we're doing here is

specifying the intended behaviour of a
system. Okay? That's - it's not very

technical but it's unusual. And we happen
to be doing it in this mathematical

language, but you could use in fact almost
any language so long as you understand

what you're doing. What you have to
understand is that you're writing

something which is not an implementation.
It is something which, given some observed

behaviour, some arbitrary observed
behaviour, will be able to decide if it's

allowed, if you want it to be allowed or
not, right? It has to be executed as a

test oracle. The key question for getting
this to work is to understand how much

non-determinism or loose specification
there is in the system that you're working

with, right? So if everything is
completely deterministic, you can write a

reference implementation in the cleanest
language of your choice and just compare

the output of that and the output of the
real thing, right? But if there's more

non-determinism, then you can't do that.
I'm going to have to abbreviate this a

little tiny bit. So for the TCP network
protocols, there is more non-determinism.

You know what the TCP is, yes? A protocol
that gives you sort of reliable

connections, assuming that the world is
made of good people, right? You look at

the standards for TCP, and they're the
same kind of wibbly text from the 1980s

and you look at the implementations and
they are a ghastly mess. And you try and

understand the relationship between the
two of them and you can't because the

standard, again, is not the definition. It
doesn't define in all circumstances what

behaviour is allowed and what isn't. So
again, we had to make up a specification

in the first instance just of this host, a
single endpoint, and observing its

sockets, API, calls and returns and its
behaviour on the wire, right? And when

you've decided - and a few internal debug
events. And when you've decided what to

observe, then observation is just a trace,
it's just a list of those events. And you

have to be able to ask your spec "Is that
list admissible or not"? But there's a lot

of non-determinism. Variation between the
implementations, variations within the

implementations. And that's internal. It's
not announced in the API or on the wire

maybe until much later when the
implementation picks a new window size or

something. You can't tell until quite a
lot later in the trace what it's picked.

And that makes the job of checking whether
a trace is admissible by the spec much

harder than it has to be. Right? And if
you had designed the TCP protocol and

those implementations for to be testable
in this very discriminating way, back in

the 1980s when you were designing TCP
protocol, it would be much easier. And for

new protocols, one should make sure that
you're doing this in this particular way.

I don't think I want to talk about that
slide. That's just one of the rules of

that specification. But the key fact about
that spec is the - well, we handled the

real protocols for arbitrary inputs. But
it's structured not just for this

executability as a test oracle, but it's
structured for clarity, not performance,

which is scarcely ever what anyone ever
does, right? So it's a whole new kind of

thing. And the testing is very
discriminating. So we found all manner of

amusing and bizarre bugs which I think I
don't have time to talk about. Okay. So

I've described three kinds of things that
we might do. The first thing - for

goodness sake - I mean, it's just a
no-brainer. Just do it already. Everybody.

All of you. All of you. Right? This middle
zone is a very interesting zone as far as

I'm concerned, right? It's - out of what -
a place where we could get a very good

benefit for not that much effort, right?
We can do it, if necessary, post hoc. We

can also do it and even better, at design
time. We have to do this in a way that

makes our executions testable as test
oracles, and another thing that that

enables is completely random testing. When
you've got a test oracle, you can just

feed in random goop. This is fuzzing, but
better fuzzing - feed in random goop and

check at every point whether what the
behaviour that you're getting is allowable

or not. And then they're written for
clarity, not for performance, and that

means you can see what you're doing,
right? You can see the complexity. If you

go ahead and you add some feature to your
protocol or your programming language or

whatever and you're working just with text
specifications, then you can't see the

interactions. You just chuck in a couple
of paragraphs and you think for a few

minutes, right? And if you're lucky, you
make an implementation. But if you're

writing a spec that has to cover all the
cases like this, the act of doing that

encourages you to think or helps you think
about the excess complexity. And you might

think that's too complex, I'm being silly,
I'll make it simpler. And it has to be

usable for proof. So this, I think also is
pretty much a no-brainer. And it doesn't

require great technical, you know,
mathematical skill either, right? Lots of

people can do this. And then there's
option 3, best option, full-on formal

verification of the key components. And
that is now also in reach. You can imagine

secure systems made using actual verified
compilers and verified hypervisors with

verified TLS stacks and you can imagine
making things out of those or making those

and then making things out of those. And
maybe one should be thinking about that.

So, maybe not appropriate for everything.
If you were writing Word, you would not

wish to do these things. Probably, you're
not. But for this key infrastructure that

we really depend on, we trust even though
it's not trustworthy, we have to do this.

Is this a new dawn? I wonder if there's
anyone standing next to a light switch

that can dim them for me. I didn't ask
them beforehand, so... If you think back

the last 70-odd years, we started building
computers in around 1940. It's been pretty

dark from the point of view of rigorous,
solid, reliable engineering, stuff that is

actually trustworthy. Pretty dark, I would
say. Maybe, just maybe there's a tiny

smidgen of light coming through the doors.
And we start to have a little bit of a

clue and we start to get reusable models
of processor architectures and programming

languages and network protocols and what
have you. It's just the beginnings of the

analogue of that thermodynamics and
materials science and quality control

management and what have you that we have
from mechanical engineering. And we've got

no choice. If we don't, the next 75 years
is going to be a lot worse. You can just

imagine, right? So I went to this - as I'm
sure many of you do - this great talk on

some Apple boot loader exploit yesterday
which was relying on some feature that had

been introduced in the early '80s and was
still present. And you can imagine in 100

years from now, you can imagine as long as
human civilisation endures, the x86

architecture and the socket API and all of
this stuff, it will be embedded in some

monumental ghastly stack of
virtualisation layers forever.

<i>laughter</i><i>applause</i>

So I'd like to thank especially all of my
colleagues that have been working with me

or not with me in these directions. I'd
like to thank our generous funders who

support this stuff. I'd like to thank you
for your attention and I exhort you,

sort it out.

<i>applause</i>

Herald: Thank you very much, Peter, for
those interesting insights to our

programming. We have now time for a Q &amp; A,
so those people who have to leave the

room, please do so quietly and as fast as
possible, so we can go on and hear what

the questions are. So we
start with microphone 4, please.

Mic 4: Hello. Thanks for the talk and my
question is if you got an oracle of the

software behaviour which can translate
every possible input to a correct output,

how can this be less complex and
error-prone than the reference

implementation?

Peter: Good question. So the first point
is that this oracle doesn't have to

produce the outputs. It only has to check
that the inputs and the outputs match up.

And then the second point is that in
general it might have to have much of the

same complexity, but by writing it to be
clear as opposed to to be fast, you may

adopt a completely different structure,
right? So the structure of our TCP spec is

organised by the behaviour that you see
for particular transitions. The structure

of a real implementation has fast-path
code and lots of complicated intertwined

branching and all manner of excess
complexity, of implementation complexity,

if you will.

Mic 4: Thanks.
Herald: So microphone 3, please?

Mic 3: I wanted to ask you what you
thought about programming by manipulating

the abstract syntax tree directly so as to
not allow incompilable programs because of

some, like, you're missing a semicolon or
something like that. What's your thoughts

on that?

Peter: So that's - in the grand scheme of
things, I think that's not a very big

deal, right? So there's - people have
worked on structured editors for lordy

knows how many years. It's not a big deal
because it's very easy for a compiler to

detect that kind of thing. And even more,
it's very easy for a compiler of a

sensibly designed language to detect type
errors, even for a very sophisticated type

system. So I don't think that - that's not
- I mean, some people might find it

helpful, but I don't think it's getting to
the heart of the matter.

Mic 3: Thanks.

Herald: So we've got some questions
from our signal angels from the IRC.

Signal angel: Yes. There's one question.
It's about the repository for Isabelle and

HOL proofs. It should be on source forge,
and you said there are no open source

repositories. What about them?

Peter: I'm not quite sure what the
question is, really. So there's a

repository of Isabelle formal proofs which
is the archival form of proofs, it's

called. I didn't really mean to say there
are no open source repositories and in

fact I don't know under what conditions
most of those proofs are licensed. They

probably are open. But there isn't an open
source community building these things,

right? It's still pretty
much an academic pursuit.

Herald: Microphone 2, please.

Mic 2: Hello. Thanks again for your talk.
I just want to add something that you

didn't address, and that's that we
actually need more good studies in

software engineering. We often hear a lot
of experts and also very well-known

computer scientists say things like,
"Well, we just need to do functional

programming and OOP is bad and stuff like
that", which I think there's a lot of

truth to it, but we actually need studies
where we test these kinds of claims that

we make, because when you look at other
fields which it also did compare to, like,

medicine, if somebody comes around and
is well-known and says things like,

"homeopathy works", we don't believe them.
We do trials, we do good trials. And

there's a lot of myths out there, like, or
not well tested things, like "hire the

best programmers, they are 100 times
more productive", "do steady type

systems", and stuff like that. And we need
to verify that this is true, that this

helps. And it's...

Peter: Okay. So in the grand scheme of
things, arguably you're right. In the

grandest scheme of things, computer
science is actually a branch of psychology

or really sociology. We are trying to
understand how large groups of people can

combine to make things which are insanely
complicated. Now, for - would it be good

if we had, you know, solid evidence that
programming in this language was better

than programming in that language? Well,
yes, but it's staggeringly difficult and

expensive to do honest experiments of that
nature and they're scarcely ever done,

right? You might do little experiments on
little groups of students but it's really

difficult. And then some of these things
which I'm saying, when one is familiar

with the different possible options, are
just blindingly obvious. I mean, there are

reasons why these people are using OCaml
for their system programming, right? It's

not - you know, it's not - "Homeopathy is
right", but "Homeopathy is wrong" which is

the kind of argument being put forward.

Herald: Question from
microphone 5, please.

Mic 5: So where are you using ECC
memory for your testing - up here.

Peter: When you say up here,
there's a bit of ambiguity.

Mic 5: Here.

Peter: Thank you. Say again?

Mic 5: So where are you using ECC memory
for the testing you showed about the

results of the multithreaded results due
to memory barriers and memory reorderings?

Peter: Well...

Mic 5: Okay, but even if you were or you
were not, the point I want to make is that

you can expect about 1 bit flip each
minute in a modern computer system that

may completely change what your software
is doing, so perhaps we also have to look

in ways how we can work if something goes
wrong at the very low level so that we

kind of have a check against our
specification on a more higher level of

our software. So it is valuable to do good
engineering on the low levels, but still I

think we will not solve the problems of
computation and computer engineering just

by proving things in the mathematical
domain because computers are physical

entities. They have errors and so on and
we really have to deal with them as well.

Peter: So it's certainly true that there
are such random errors. In fact, I would

put the point that I would argue that you
have to be able to model the physics well

enough and you have to be able to model
the statistics of those errors well

enough, so that's a different kind of
mathematics. And it's certainly valuable

and necessary. But if you look at the
statistic you just quoted, is that the

main cause of why systems go wrong? Except
possibly for space hardware, no. So

important, yes. The most important thing
that we should be paying attention to? I

don't really think so.

Herald: Microphone 6, please?

Mic 6: Hi. I really think that what you're
proposing to specify and verify more key

components is - would be a valuable
addition, but how do you make sure that

your specification actually matches the
behaviour that you want to do or to have?

Peter: So as I described, we do as partial
validation of those specifications, we do

a lot of testing against the
implementations, against a range of

existing implementations. That's one
source of validation. Another source of

validation is that you talk to the
architects and the designers. You want the

internal structure to match their intent.
You want it to be comprehensible to them.

Another kind of validation that we do is
prove properties about them. So we proved

that you can correctly compile from a C11
concurrency, a mathematical model of that,

to a IBM Power concurrency. And that kind
of proof gives you a bit more level of

assurance in both. So none of this is
giving you a total guarantee. You

certainly don't claim a total guarantee.
But it gives you pretty good levels of

assurance by normal standards.

Herald: Mic 4?

Mic 4: Yes. Thanks again. You proposed
random tests, and with my expertise, it's

very annoying if you have tests where the
outcome is sometimes a failure and you

want to have reproducible tests to fix a
bug. So how do we bring this aspects to

test more variety in data and to have it
reproducible together?

Peter: Well, if - as I was mentioning, for
the TCP thing, one of the - so the problem

is reproducibility is exactly this
internal non-determinism, right? The

system makes this scheduling choice or
what have you and you can't see what it

is, or the checker can't see what it is.
So one way to do that is to really design

the protocols in a different way to expose
that non-determinism. The other fact about

this kind of general purpose specification
as test oracle idea is that in some sense,

it doesn't have to be reproducible. Right?
The specification is giving a yes-no

answer for an arbitrary test. And that
means that you can use arbitrary tests.

Figuring out the root causes of the
differences afterwards may be tricky, but

you can use them for assurance.

Herald: So we now got time for two
more questions. Microphone 1, please?

Mic 1: Hiya. Thanks for a great talk. So
what you've described seems to be a middle

ground between a full-on, formal
verification and more practical testing,

like, in between. So where do you see in
the future where formal verifications will

go? Will they converge to the middle or
whether it will still be there just to

verify something that's more
well-specified?

Peter: So the progress of sort of the
whole subject of formal verification in

general, if you look at that over the last
10 years or so, it's been very - it's been

really amazing compared with what we could
do in the 80s and 90s and early 2000s,

right? So the scope of things, the scale
of things which are - for which it is

becoming feasible to do verification is
getting bigger. And I think that will

continue. You know, I don't know when you
might see a completely verified TOR

client, let's say, but that's not
inconceivable now. And 20 years ago, that

would have been completely inconceivable.
So that is expanding and at the same time,

I hope we see more of this interface, text
based oracle specification - and these -

when you're doing formal verification of a
piece of the system, you have to have

these indicators already defined, all
right? So these two fit very well together.

Herald: So the last question
from microphone 2, please.

Mic 2: Hi. You mentioned that you often
find bugs in hardware. Is there any effort

to verify the transistors on chips
themselves as well?

Peter: So there's a whole world of
hardware verification. That wasn't my

topic today. And most of the big processor
vendors have teams working on this.

Unsurprisingly, if you remember your
history, many of them have teams focusing

on the floating-point behaviour of their
processors. And they're doing - they're

also doing by the standards of ten years
ago, pretty spectacularly well. So there

are companies that have the whole of some
execution unit formally verified. There's

been a lot of work over the years on
verification of cache protocols and such

like. Right? There's a lot of work on not
verifying the hardware as a whole, but

verifying that, you know, the RTL-level
description of the hardware matches some

lower level description. So
there is a lot of that going on.

Herald: Thank you very much
again for this great talk.

Give him another applause.
Thank you, Peter.

Peter: Thank you.

<i>applause</i>

subtitles created by c3subtitles.de
Join, and help us!