applause Peter Sewell: Thank you for that introduction and to the organisers for this opportunity to speak to you guys. So my name is Peter Sewell. I'm an academic computer scientist and as such, I bear a heavy burden of guilt for what we have to work with. Shared guilt, to be fair, but guilt all the same. In many, many years of higher education, I know two things about computers. Maybe you've figured them out slightly faster. The first thing, there's an awful lot of them about. Shocking. And the second is that they go wrong. They go wrong a lot. Really a lot. A lot. A lot! And they go wrong in many interestingly different ways. So sometimes they go wrong in spectacular ways, right? With fireworks. And here, we see... laughterapplause Here we see half a billion dollars worth of sparkle. You don't get that every day. And so what was going on? So as I understand it, what was going on was that the - this was the first flight of the Ariane 5 launcher and it reused some of the guidance software from Ariane 4. That's fine. Good, tested code. Should reuse it. But the launch profile for Ariane 5 went sideways a bit more - on purpose - than the Ariane 5 launch. And as a result, one of the sideways values in the guidance control software overflowed. Bad. And hence, the guidance software decided that something bad was going wrong and in order to not land rockets on people's heads, it kindly blew itself up. Okay. So that's a spectacular. Sometimes they go wrong in insidious ways. And you people probably know much more about that than me. Very insidious ways. Exploited by bad people to our detriment. And they go wrong in insidious ways, exploited by our kindly governments for our protection. Right? And these - there are many, many, many different causes of these things but many of these causes are completely trivial. You know, you have one line of code in the wrong place - something like that, right? So that's very, very bad and you're all conscious of how bad that is. But from my point of view, this is as nothing with the real disaster that we have been facing and continue to face. Right? The real disaster is the enormous waste of human life and energy and effort and emotion dealing with these crap machines that we've built. Right? And it's hard to quantify that. I did a little experiment a couple of months ago. I Googled up "Android problem" and "Windows problem" and got about 300 million hits for each, right? Somehow indicative of the amount of wrongness. So this should you seem kind of familiar to you. If you think back - think back to when you were young. Maybe back in the 19th century when you were young, and what were you doing then as a community of hackers and makers and builders? Well, this was the full-on industrial revolution. You were building stuff. You were building beautiful bridges of cast-iron and stone with numerous arches and pillars. And every so often they would fall down. I believe someone miscalculated the wind loading. And we would build magical steam engines to whisk us away. And every so often, they would blow up. And that kind of thing, as you may have noticed in the last hundred years - sorry - that kind of thing doesn't happen very often any more, right? We have pretty good civil and mechanical engineering. And what does it mean to have good civil and mechanical engineering? It means that we know enough thermodynamics and materials science and quality control management and all that kind of thing to model our designs before we build them and predict how they're going to behave pretty well and with pretty high confidence, right? I see an optimisation in this talk. For computer science, for computing systems we can predict with 100 per cent certainty that they will not work. Ah, time for lunch! applause But it's not a very satisfying answer, really. So why is it so hard? Let me discuss a couple of the reasons why computing is tricky. In some sense, trickier than steam engines. The first reason is that there is just too much code. Right? So this is one of the more scary graphs I've found on the internet recently. This is a measure of code in hundreds of millions of lines. Each of these blocks I think is 10 million lines of code. And in the little letters - you might not be able to see we have Android and the Large Hadron Collider and Windows Vista and Facebook, US Army Future Combat System, Mac OS and software in a high-end car. Right? laughterapplause I'm flying home. laughter So we that's a bit of a problem, really. We're never going to get that right. Then there's too much legacy complexity there, right? Really too much. So as right-minded, pure-minded people you might think that software and hardware should be architected beautifully with regular structure a bit like a Mies van der Rohe skyscraper or something. Clean. Okay. You know that's not realistic, right? You might expect a bit of baroqueness. You might expect maybe something more like this. You know, it's a bit twiddly but it still hangs together. It's still got structure. It still works. But then you stop and you think and you realise that software and hardware is built by very smart people and usually in big groups and subject to a whole assortment of rather intense commercial pressure and related open-source pressure and using the best components and tools that they know and they like. laughterapplause Someone from the room: (Indistinct) You forgot management! And the best management that human beings can arrange. laughterapplause So we end up with something spectacularly ingenious. laughterapplause You can see here, if you look closely, you can see all the parts. laughter This was just a found picture from the internet. But you can see there's a bunch of, you know, transparent sort of silicon-like stuff down here. Let's say that's our hardware. And over here, I see a C compiler with all of its phases. laughter And these bottles, that's the file system, I reckon. And there's a TCP stack in here somewhere. And there's a piece that belongs to the NSA, obviously. And up here, I think up here - yeah, there's a JavaScript engine and a TLS stack and a web browser. And then perched at the top is the software that most people use to talk to their banks. laughter It's just - it's just insane, right? And then we have to look at how we build these pieces, right? So we build them fundamentally by a process of trial and error, right? Just occasionally we write some specifications in text. And then we write some code and maybe we write a few ad hoc texts and then we test and fix until the thing is marketable and then we test and fix and extend until the thing is no longer marketable and then we keep on using it until we can't anymore. Right? So the fundamental process that we're using to develop these systems is trial and error by exploring particular execution paths of the code on particular inputs. But - you know this, but I want to highlight - there are too many. The number of execution paths scales typically at least exponentially with the size of the code and the number of states scales exponentially with the size of the data that you're dealing with and the number of possible inputs is also shockingly large. There is no way that we can do this and we should perhaps stop deluding ourselves that there is. And then the most fundamental reason that computers are hard is that they are too discrete, right? If you take a bridge or, I don't know, a chair or something - where's something slightly bendable? You take a chair and you exert a bit more force on it than it's used to - I'm not very strong - and it bends a bit. It deforms continuously. If you take a bridge and you tighten up one of the bolts a bit too much or you underspecify one of the girders by 10 per cent, occasionally you'll reach a point of catastrophic failure but usually it will just be a little bit weaker or a little bit stronger or a little bit wibbly. Right? But computer systems, you change a bit - sometimes it doesn't make a difference but disturbingly often you change a line of code - as in system of those bugs that we were talking about - the whole thing becomes broken, right? So it's quite different from that traditional engineering. It's really different. Okay. So, "A New Dawn", this thing is titled. What can we do? What might we do? So I'm going to talk about several different possibilities, all right? One thing, we can do more of the stuff which is traditionally called software engineering. It's not really engineering in my book, but some of it's good. I mean, it's useful, you know? We can do more testing and have more assertions in our code and better management maybe and all that kind of stuff. We can do all that - we should do all that, but it's not going to make a very big change, all right? It's not addressing any of those root causes of the problem. What else could we do? So as an academic computer scientist, I've devoted several years of my life to working in the area of programming languages and I look at the programming languages used out there in the world today. Oh, it's just disgusting! It's shocking. applause The reasons that languages become popular and built into our shared infrastructure, the reasons for that have almost nothing to do with how well those languages are designed and in fact whether they are designed at all, right? applause So I didn't really want to get too much hate mail after this talk so I'm going to try and avoid naming particular languages as much as I can, but I encourage you to think away as I speak and imagine the examples. No, that was - I didn't want to get much hate mail. Sorry. laughter So... laughterapplause So at the very least, we could use programming languages based on ideas from the 1970s instead of the 1960s. I mean, really, come on. Right? You know, back in '67 or '69 or whatever we had ECPL and C and only a couple of years later we had languages that guaranteed type and memory safety and enforcement of abstraction boundaries and gave us rich mechanisms for programming so we could say what we mean and not futz about with 23 million pointers, about 1 per cent of which were completely wrong, right? So for any particular - in the context of any particular project or any particular existing legacy infrastructure there may be very good reasons why one just can't make a switch. Right? But for us collectively, there is no excuse. It's just wretched, right? applause It's completely bonkers. And then the other thing that you see as a programming language researcher is that - er, well, to make another analogy, it looks as if anyone who was capable of driving a car considered themselves an expert on car design, right? There is in fact a rather well-established subject of, you know, programming language design and we know how to specify completely precisely, mathematically precisely not just the syntax like we had in these BNF grammars from the 1960s and still have, but also the behaviour of these things. We know how to specify a few operational semantics and the type systems. And we know how to do mathematical proofs that our language designs are self-consistent, that they do guarantee good properties of arbitrary well-typed programs that you write, right? We can do this now. We can do this, if we have to, post hoc. And people are doing this for JavaScript and C and god help them, PHP and other such languages. But even better, we can do it at design time. And if you do this at design time, you see the structure. You have to - the act of writing that specification forces you to consider all the cases and all the interactions between features. You can still get it wrong, but you get it wrong in more interesting ways. Right? So people are starting to do this now. So I think there was a talk on the first day of this by Hannes and David who are building system software in maybe not the best sys language you could possibly imagine, but something which compared with C is as manna from heaven. Anil Madhavapeddy and his colleagues in Cambridge have been working hard to build system software in at least moderately modern languages. And it works. Sometimes you have to use C but surprisingly often you really don't. Yeah, so I teach, as my introducer said, in the University of Cambridge. And I teach semantics. And one of my fears is that I will accidentally let someone out in the world who will accidentally find themselves over a weekend inventing a little scripting language which will accidentally take over the world and become popular and I won't have explained to them what they have to do or even that there is a subject they have to understand. And if you want to understand it, there's lots of stuff you can look at. Okay. Let's go to the uppermost extreme. If we want software and hardware that actually works, we should prove it correct. Right? So this is something which academics, especially in the domain of labelled "formal methods" in the '70s and '80s, they've been pushing this, promoting this as an idea and a promise and an exhortation for decades, right? Why don't we just prove our programs correct? And for some of those decades it wasn't really plausible. I remember back when I was considerably younger, it was a big thing to proof that if you took a linked list and reversed it twice, you got back the same thing. Right? That was hard, then. But now, well, we can't do this for the Linux kernel, let's say, but we can surprising often do this. And I just give you a couple of the examples of modern day, state-of-the-art academic verification projects. People have verified compilers all the way from C-like languages down to assembly or from ML-like languages. You know, higher-order functional imperative languages all the way down to binary models of how x86 behaves. And people have verified LLVM optimisation paths. There's verified software fault isolation which I believe goes faster than the original non-verified form. Win for them. There's a verified hypervisor from Nectar group in Australia. Right? Verified secure hypervisor with proofs of security properties. There's any amount of work verifying crypto protocols, sometimes with respect to assumptions - I think reasonable assumption on what the underlying mathematics is giving you. So we can do this. I maybe have to explain a bit what I mean by "prove". To some computery people, "I proved it works" means "I tested it a little bit". laughter In extremis, "it compiled". laughterapplause To a program analysis person, you might run some sophisticated tool but written in untrusted code and possibly doing an approximate and maybe unsound analysis and you might find an absence of certain kinds of bugs. That's tremendously useful; it's not what we would really call "proved". I mean, nor would they, to be fair. Scientists generally don't use the word because they know they're not doing it, right? They're doing controlled experiments, which gives them, you know, information for or against some hypothesis. Mathematician write proofs. You probably did that when you were young. And those proofs are careful mathematical arguments usually written on paper with pencils and their aim is to convince a human being that if that mathematician was really nailed up against the wall and pushed, they could expand that into a completely detailed proof. But what we have in computing, we don't have the rich mathematical structure that these people are working with, we have a tremendous amount of ad hoc and stupid detail mixed in which a smidgin of rich mathematical structure. So - and we have hundreds of millions of lines of code. So this is just not going to cut it. And if we want say the word "prove" ever, then the only thing which is legitimate is to do honest program proof and that means you have to have some system that machine-checks that your proof is actually a proof. And sometime we'll also have machines that will sort of help and sort of hinder that process, right? And there's a variety of these systems. Coq and HOL4 and Isabelle4 - Isabelle/HOL and what have you that we can look up. So using these systems we can prove nontrivial facts about these. Not necessarily that they do what you want them to do, but that they meet the precise specification that we've written down. We can do that. But it's tricky, right? So these projects, these are by academic standards all quite big, you know? This is something like, I don't know, 10 people for a few years or something. By industry scale, maybe not that big. By your scale, maybe not that big. But still a lot of work for a research group to do. And quite high-end work. It can take, you know, literally a few years to become really fluent in using these tools, right? And there isn't as yet sort of an open-source, collaborative movement doing this kind of stuff. Maybe it's the time to start. Maybe in five years, ten years. I don't know. If you want a challenge, there's a challenge. But it's really quite hard, right? It's not something that one can quite put forward credibly as a solution for all software and hardware development, right? So that leads me to, like an intermediate point. That should have been a four there. What can we do which improves the quality of our system, which injects some mathematical rigour but which involves less work and is kind of easy? And what we can do is take some of these interfaces and be precise about what they are. And initially, we have to do that after the fact. We have to reverse-engineer good specs of existing stuff. But then we can use the same techniques to make much cleaner and better-tested and things which are easier to test in the future. That's the idea. So my colleagues and I have been doing this kind of stuff for the last quite a few years and I just want to give you just a - two little vignettes just to sort of show you how it goes. So I can't give you much detail. And this is, obviously, joint work with a whole bunch of very smart and good people, some of which I name here. So this is not me, this is a whole community effort. So I'm going to talk for a little bit about the behaviour of multiprocessors, so at that hardware interface. And I'm going to talk a tiny little bit about the behaviour of the TCP and socket API, right? Network protocols. So - and there'll be some similarities and some differences between these two things. So multiprocessors. You probably want your computers to go fast. Most people do. And it's an obvious idea to glom together, because processors don't go that fast, let's glom together a whole bunch of processors and make them all talk to the same memory. This is not a new idea. It goes back at least to 1962 when the Burroughs D825 had, I think, two processors talking to a single shared memory. It had outstanding features including truly modular hardware with parallel processing throughout, and - some things do not change very much - the complement of compiling languages was to be expanded, right? 1962, so that'll be 52 years ago. Deary me. Okay. Now, how do these things behave? So let me show you some code, right? It's a hacker conference. There should be code. Let me show you two example programs and these will both be programs with two threads and they will both be acting on two shared variables, X and Y. And in the initial state, X and Y are both zero. So, the first program. On this thread we write X and then we read Y, and over here we write Y and then we read X, right? Now, these are operating, you know, in an interleaving concurrency, you might think, so there's no synchronisation between those two threads so we might interleave them. We might run all of thread 0 before all of thread 1 or all of thread 1 before all of thread 0 or they might be mixed in like that or like that or like that or like that. There's six possible ways of interleaving two lists of two things. And in all of those ways you never see in the same execution that this reads from the initial state AND this reads from the initial state. You never see that. So you might expect and you might assume in your code that these are the possible outcomes. So what happens if you were to actually run that code on my laptop in a particular test harness? Well, okay, you see occasionally they're quite well synchronised and they both read the other guy's right. Sorry, big call. They both read the other guy's right. And quite often one thread comes completely before the other. But sometimes, sometimes they both see 0. Huh. Rats. Well, that's interesting. Let me show you another program. So now, thread 0 writes some stuff. Maybe it writes a big bunch of data, maybe multi-word data. And then it writes, let's say that's a flag announcing to some another thread that that data is ready to go now. And the other thread busy-waits reading that value until it gets 1, until it sees the flag. And then it reads the data. So you might want, as a programmer, you might want a guarantee that this read will always see the data that you have initialised. Hey? That would be like a message-passing kind of an idiom. So what happens if you run that? Well, on the x86 processors that we've tested, you always see that value. Good. This is a descent coding idiom on x86. All right. On Arma IBM Power processors, however, you see sometimes you just ignore the value written and you see the initial state. So this is not a good message-passing idiom, right? So what's going on? Well, these behaviours, it might be surprising, eh? And when you see surprising behaviour in hardware, you have two choices. Either it's a bug in the hardware, and we have found a number of bugs in hardware. It's always - it's very rewarding. Or it's a bug in your expectation. Or it's a bug in your test harness. Right? So what you do is you walk along and you walk up to your friendly processor architect in IBM or ARM or x86 land, and we have worked with all of these people. And you ask them, "Is this a bug?" And a processor architect is a person who is - has the authority to decide whether some behaviour is intended or is a bug. laughter That's what they do, essentially. And for these, they say, "Oh, we meant it to be like that". Right? No question. "We meant it to be like that. That's perfectly proper. We have, because you and everybody else wanted their computers to go fast, we as processor designers have introduced all manner of sophisticated optimizations, which if you were running sequential code, you would never notice, but if you start running fancy concurrent code like this, fancy concurrent code that isn't just trivially locked, you can notice", so this is intentional behaviour. And if you want to write that code, like a mutual exclusion algorithm or a message-passing algorithm or something, then you need to insert special instructions, memory barrier instructions. And so you go away and you say, "What do they do?" And you look up in the vendor documentation. And you get stuff like this. I'll read it out: (Reads quickly) "For any applicable pair AB, the memory barrier ensures that a will be performed with respect to any processor or mechanism, to the extent required by the associated memory coherence required at beauties, before b is performed with respect to that prior or mechanism. A includes all applicable storage accesses by any such processor or mechanism that have been performed with respect to P1 before the memory barrier is created. B includes all applicable storage accesses by any such processor or mechanism that are performed after a Load instruction executed by that processor or mechanism has returned the value stored by a store that is in B." applause Hands up if you understand what that means? Hands up if you think that if you thought about it and read the rest of the book, you would understand or you could understand what that means? Hands up the people who think that we're doomed forever? laughter So I'm sorry to the first few groups, but the last ones are right. laughterapplause This looks like it's really intricate and really carefully thought out stuff. And we thought that for a while and we spent literally years trying to make precise mathematical models that give precise meaning to these words, but actually it can't be done. So what do you have to do in that circumstance? You have to go away and you have to invent some kind of a model. And there's a - this is a really fundamental problem, that we - on the one hand, we develop our software by this trial and error process, but on the other hand are points like this. We have these loose specifications, supposedly, that have to cover all manner of behaviour of many generations of processors that might all behave the same way, written just in text. And we tell people they should write "to the spec", but the only way they have of developing their code is to run it and run particular executions on the particular hardware implementations that they have, whose relationship to the spec is very elusive. We can't use those thick books that you get or those quite weighty PDFs that you get these days from the processor vendors to test programs. You can't feed a program into that thick book and get output out. And you can't test processor implementations and you can't prove anything and you can't even communicate between human beings, right? So these things, really, they don't exist. So what can we do? Well, the best we can do at this point in time is a bit of empirical science, right? So we can invent some model off the top of our heads, trying to describe just the programmer-visible behaviour, not the internal structure. And we can make a tool. Because that's not prose now. Now we can - now it's stuff. It's real mathematics and we can turn that into code and execute it. We can make tools that take programs and execute it and - small programs - tell us the set of all behaviours allowed by that model. And then we can compare those sets of behaviours against the experimental observations. And you might have to fix the model and you might find hard bugs. And then the model also has to make sense to the architect so you can discuss with the architects what they intend and what their institution is, and then you can also prove some other facts about it to give a bit more assurance and then because you probably got this wrong the first couple of times, you can go back. And this results or has resulted in models which are not guaranteed to be correct, but they are effectively the de facto standard for understanding the concurrency behaviour of these things, you know? And various people use them. And just to give you a tiny snippet - I'll have to speed up a bit - a tiny snippet of what the model looks like, it's basically just an abstract machine. It's got some stuff for the threads that handles the programmer-visible speculative execution, and some stuff for a storage subsystem that handles the fact that in these architectures, memory rights can be propagated to different threads at different times, right? And there's a - that model has a state which is just a collection of some sets and lists and partial orders and a few other things which I won't talk about. And it has some transitions, right? In any state, there might be several possible transitions. This is just a sample. I'm not going to explain all of this. But when you want to propagate a write to another thread, there have to be some preconditions that you have to satisfy. And than there is an action. It's not amazingly complicated, really. So this is text, but it's very precisely written text and it has the great advantage that you can go through these bullet points with your friendly architect and say, "Is this what you mean?" And they can think about it and say yes or no. But for the actual definition, that's in mathematics but it's mathematics that's quite close to code. I mean, it's terribly close to pure, functional code with outside effects. And just to give you an idea, this is some of it and those are three of those conditions in the real, honest, true version. Then we can take this mathematics and because it's in a particular style, we can compile this into actually OCaml and then OCaml byte code and then we can run it for batch jobs. But then you can compile that OCaml byte code to JavaScript and glue on an user interface and stick it into a web browser and then you have a web interface that lets people explore this model. And that's also a necessary part of validating that it makes sense. Okay. This is not rocket science. You've missed the talk for rocket science, I'm sorry. All we're doing here is specifying the intended behaviour of a system. Okay? That's - it's not very technical but it's unusual. And we happen to be doing it in this mathematical language, but you could use in fact almost any language so long as you understand what you're doing. What you have to understand is that you're writing something which is not an implementation. It is something which, given some observed behaviour, some arbitrary observed behaviour, will be able to decide if it's allowed, if you want it to be allowed or not, right? It has to be executed as a test oracle. The key question for getting this to work is to understand how much non-determinism or loose specification there is in the system that you're working with, right? So if everything is completely deterministic, you can write a reference implementation in the cleanest language of your choice and just compare the output of that and the output of the real thing, right? But if there's more non-determinism, then you can't do that. I'm going to have to abbreviate this a little tiny bit. So for the TCP network protocols, there is more non-determinism. You know what the TCP is, yes? A protocol that gives you sort of reliable connections, assuming that the world is made of good people, right? You look at the standards for TCP, and they're the same kind of wibbly text from the 1980s and you look at the implementations and they are a ghastly mess. And you try and understand the relationship between the two of them and you can't because the standard, again, is not the definition. It doesn't define in all circumstances what behaviour is allowed and what isn't. So again, we had to make up a specification in the first instance just of this host, a single endpoint, and observing its sockets, API, calls and returns and its behaviour on the wire, right? And when you've decided - and a few internal debug events. And when you've decided what to observe, then observation is just a trace, it's just a list of those events. And you have to be able to ask your spec "Is that list admissible or not"? But there's a lot of non-determinism. Variation between the implementations, variations within the implementations. And that's internal. It's not announced in the API or on the wire maybe until much later when the implementation picks a new window size or something. You can't tell until quite a lot later in the trace what it's picked. And that makes the job of checking whether a trace is admissible by the spec much harder than it has to be. Right? And if you had designed the TCP protocol and those implementations for to be testable in this very discriminating way, back in the 1980s when you were designing TCP protocol, it would be much easier. And for new protocols, one should make sure that you're doing this in this particular way. I don't think I want to talk about that slide. That's just one of the rules of that specification. But the key fact about that spec is the - well, we handled the real protocols for arbitrary inputs. But it's structured not just for this executability as a test oracle, but it's structured for clarity, not performance, which is scarcely ever what anyone ever does, right? So it's a whole new kind of thing. And the testing is very discriminating. So we found all manner of amusing and bizarre bugs which I think I don't have time to talk about. Okay. So I've described three kinds of things that we might do. The first thing - for goodness sake - I mean, it's just a no-brainer. Just do it already. Everybody. All of you. All of you. Right? This middle zone is a very interesting zone as far as I'm concerned, right? It's - out of what - a place where we could get a very good benefit for not that much effort, right? We can do it, if necessary, post hoc. We can also do it and even better, at design time. We have to do this in a way that makes our executions testable as test oracles, and another thing that that enables is completely random testing. When you've got a test oracle, you can just feed in random goop. This is fuzzing, but better fuzzing - feed in random goop and check at every point whether what the behaviour that you're getting is allowable or not. And then they're written for clarity, not for performance, and that means you can see what you're doing, right? You can see the complexity. If you go ahead and you add some feature to your protocol or your programming language or whatever and you're working just with text specifications, then you can't see the interactions. You just chuck in a couple of paragraphs and you think for a few minutes, right? And if you're lucky, you make an implementation. But if you're writing a spec that has to cover all the cases like this, the act of doing that encourages you to think or helps you think about the excess complexity. And you might think that's too complex, I'm being silly, I'll make it simpler. And it has to be usable for proof. So this, I think also is pretty much a no-brainer. And it doesn't require great technical, you know, mathematical skill either, right? Lots of people can do this. And then there's option 3, best option, full-on formal verification of the key components. And that is now also in reach. You can imagine secure systems made using actual verified compilers and verified hypervisors with verified TLS stacks and you can imagine making things out of those or making those and then making things out of those. And maybe one should be thinking about that. So, maybe not appropriate for everything. If you were writing Word, you would not wish to do these things. Probably, you're not. But for this key infrastructure that we really depend on, we trust even though it's not trustworthy, we have to do this. Is this a new dawn? I wonder if there's anyone standing next to a light switch that can dim them for me. I didn't ask them beforehand, so... If you think back the last 70-odd years, we started building computers in around 1940. It's been pretty dark from the point of view of rigorous, solid, reliable engineering, stuff that is actually trustworthy. Pretty dark, I would say. Maybe, just maybe there's a tiny smidgen of light coming through the doors. And we start to have a little bit of a clue and we start to get reusable models of processor architectures and programming languages and network protocols and what have you. It's just the beginnings of the analogue of that thermodynamics and materials science and quality control management and what have you that we have from mechanical engineering. And we've got no choice. If we don't, the next 75 years is going to be a lot worse. You can just imagine, right? So I went to this - as I'm sure many of you do - this great talk on some Apple boot loader exploit yesterday which was relying on some feature that had been introduced in the early '80s and was still present. And you can imagine in 100 years from now, you can imagine as long as human civilisation endures, the x86 architecture and the socket API and all of this stuff, it will be embedded in some monumental ghastly stack of virtualisation layers forever. laughterapplause So I'd like to thank especially all of my colleagues that have been working with me or not with me in these directions. I'd like to thank our generous funders who support this stuff. I'd like to thank you for your attention and I exhort you, sort it out. applause Herald: Thank you very much, Peter, for those interesting insights to our programming. We have now time for a Q & A, so those people who have to leave the room, please do so quietly and as fast as possible, so we can go on and hear what the questions are. So we start with microphone 4, please. Mic 4: Hello. Thanks for the talk and my question is if you got an oracle of the software behaviour which can translate every possible input to a correct output, how can this be less complex and error-prone than the reference implementation? Peter: Good question. So the first point is that this oracle doesn't have to produce the outputs. It only has to check that the inputs and the outputs match up. And then the second point is that in general it might have to have much of the same complexity, but by writing it to be clear as opposed to to be fast, you may adopt a completely different structure, right? So the structure of our TCP spec is organised by the behaviour that you see for particular transitions. The structure of a real implementation has fast-path code and lots of complicated intertwined branching and all manner of excess complexity, of implementation complexity, if you will. Mic 4: Thanks. Herald: So microphone 3, please? Mic 3: I wanted to ask you what you thought about programming by manipulating the abstract syntax tree directly so as to not allow incompilable programs because of some, like, you're missing a semicolon or something like that. What's your thoughts on that? Peter: So that's - in the grand scheme of things, I think that's not a very big deal, right? So there's - people have worked on structured editors for lordy knows how many years. It's not a big deal because it's very easy for a compiler to detect that kind of thing. And even more, it's very easy for a compiler of a sensibly designed language to detect type errors, even for a very sophisticated type system. So I don't think that - that's not - I mean, some people might find it helpful, but I don't think it's getting to the heart of the matter. Mic 3: Thanks. Herald: So we've got some questions from our signal angels from the IRC. Signal angel: Yes. There's one question. It's about the repository for Isabelle and HOL proofs. It should be on source forge, and you said there are no open source repositories. What about them? Peter: I'm not quite sure what the question is, really. So there's a repository of Isabelle formal proofs which is the archival form of proofs, it's called. I didn't really mean to say there are no open source repositories and in fact I don't know under what conditions most of those proofs are licensed. They probably are open. But there isn't an open source community building these things, right? It's still pretty much an academic pursuit. Herald: Microphone 2, please. Mic 2: Hello. Thanks again for your talk. I just want to add something that you didn't address, and that's that we actually need more good studies in software engineering. We often hear a lot of experts and also very well-known computer scientists say things like, "Well, we just need to do functional programming and OOP is bad and stuff like that", which I think there's a lot of truth to it, but we actually need studies where we test these kinds of claims that we make, because when you look at other fields which it also did compare to, like, medicine, if somebody comes around and is well-known and says things like, "homeopathy works", we don't believe them. We do trials, we do good trials. And there's a lot of myths out there, like, or not well tested things, like "hire the best programmers, they are 100 times more productive", "do steady type systems", and stuff like that. And we need to verify that this is true, that this helps. And it's... Peter: Okay. So in the grand scheme of things, arguably you're right. In the grandest scheme of things, computer science is actually a branch of psychology or really sociology. We are trying to understand how large groups of people can combine to make things which are insanely complicated. Now, for - would it be good if we had, you know, solid evidence that programming in this language was better than programming in that language? Well, yes, but it's staggeringly difficult and expensive to do honest experiments of that nature and they're scarcely ever done, right? You might do little experiments on little groups of students but it's really difficult. And then some of these things which I'm saying, when one is familiar with the different possible options, are just blindingly obvious. I mean, there are reasons why these people are using OCaml for their system programming, right? It's not - you know, it's not - "Homeopathy is right", but "Homeopathy is wrong" which is the kind of argument being put forward. Herald: Question from microphone 5, please. Mic 5: So where are you using ECC memory for your testing - up here. Peter: When you say up here, there's a bit of ambiguity. Mic 5: Here. Peter: Thank you. Say again? Mic 5: So where are you using ECC memory for the testing you showed about the results of the multithreaded results due to memory barriers and memory reorderings? Peter: Well... Mic 5: Okay, but even if you were or you were not, the point I want to make is that you can expect about 1 bit flip each minute in a modern computer system that may completely change what your software is doing, so perhaps we also have to look in ways how we can work if something goes wrong at the very low level so that we kind of have a check against our specification on a more higher level of our software. So it is valuable to do good engineering on the low levels, but still I think we will not solve the problems of computation and computer engineering just by proving things in the mathematical domain because computers are physical entities. They have errors and so on and we really have to deal with them as well. Peter: So it's certainly true that there are such random errors. In fact, I would put the point that I would argue that you have to be able to model the physics well enough and you have to be able to model the statistics of those errors well enough, so that's a different kind of mathematics. And it's certainly valuable and necessary. But if you look at the statistic you just quoted, is that the main cause of why systems go wrong? Except possibly for space hardware, no. So important, yes. The most important thing that we should be paying attention to? I don't really think so. Herald: Microphone 6, please? Mic 6: Hi. I really think that what you're proposing to specify and verify more key components is - would be a valuable addition, but how do you make sure that your specification actually matches the behaviour that you want to do or to have? Peter: So as I described, we do as partial validation of those specifications, we do a lot of testing against the implementations, against a range of existing implementations. That's one source of validation. Another source of validation is that you talk to the architects and the designers. You want the internal structure to match their intent. You want it to be comprehensible to them. Another kind of validation that we do is prove properties about them. So we proved that you can correctly compile from a C11 concurrency, a mathematical model of that, to a IBM Power concurrency. And that kind of proof gives you a bit more level of assurance in both. So none of this is giving you a total guarantee. You certainly don't claim a total guarantee. But it gives you pretty good levels of assurance by normal standards. Herald: Mic 4? Mic 4: Yes. Thanks again. You proposed random tests, and with my expertise, it's very annoying if you have tests where the outcome is sometimes a failure and you want to have reproducible tests to fix a bug. So how do we bring this aspects to test more variety in data and to have it reproducible together? Peter: Well, if - as I was mentioning, for the TCP thing, one of the - so the problem is reproducibility is exactly this internal non-determinism, right? The system makes this scheduling choice or what have you and you can't see what it is, or the checker can't see what it is. So one way to do that is to really design the protocols in a different way to expose that non-determinism. The other fact about this kind of general purpose specification as test oracle idea is that in some sense, it doesn't have to be reproducible. Right? The specification is giving a yes-no answer for an arbitrary test. And that means that you can use arbitrary tests. Figuring out the root causes of the differences afterwards may be tricky, but you can use them for assurance. Herald: So we now got time for two more questions. Microphone 1, please? Mic 1: Hiya. Thanks for a great talk. So what you've described seems to be a middle ground between a full-on, formal verification and more practical testing, like, in between. So where do you see in the future where formal verifications will go? Will they converge to the middle or whether it will still be there just to verify something that's more well-specified? Peter: So the progress of sort of the whole subject of formal verification in general, if you look at that over the last 10 years or so, it's been very - it's been really amazing compared with what we could do in the 80s and 90s and early 2000s, right? So the scope of things, the scale of things which are - for which it is becoming feasible to do verification is getting bigger. And I think that will continue. You know, I don't know when you might see a completely verified TOR client, let's say, but that's not inconceivable now. And 20 years ago, that would have been completely inconceivable. So that is expanding and at the same time, I hope we see more of this interface, text based oracle specification - and these - when you're doing formal verification of a piece of the system, you have to have these indicators already defined, all right? So these two fit very well together. Herald: So the last question from microphone 2, please. Mic 2: Hi. You mentioned that you often find bugs in hardware. Is there any effort to verify the transistors on chips themselves as well? Peter: So there's a whole world of hardware verification. That wasn't my topic today. And most of the big processor vendors have teams working on this. Unsurprisingly, if you remember your history, many of them have teams focusing on the floating-point behaviour of their processors. And they're doing - they're also doing by the standards of ten years ago, pretty spectacularly well. So there are companies that have the whole of some execution unit formally verified. There's been a lot of work over the years on verification of cache protocols and such like. Right? There's a lot of work on not verifying the hardware as a whole, but verifying that, you know, the RTL-level description of the hardware matches some lower level description. So there is a lot of that going on. Herald: Thank you very much again for this great talk. Give him another applause. Thank you, Peter. Peter: Thank you. applause subtitles created by c3subtitles.de Join, and help us!