WEBVTT 00:00:09.519 --> 00:00:14.479 applause 00:00:14.479 --> 00:00:16.930 Peter Sewell: Thank you for that introduction and to the organisers for 00:00:16.930 --> 00:00:23.459 this opportunity to speak to you guys. So my name is Peter Sewell. I'm an academic 00:00:23.459 --> 00:00:30.609 computer scientist and as such, I bear a heavy burden of guilt for what we have to 00:00:30.609 --> 00:00:38.040 work with. Shared guilt, to be fair, but guilt all the same. In many, many years of 00:00:38.040 --> 00:00:44.670 higher education, I know two things about computers. Maybe you've figured them out 00:00:44.670 --> 00:00:55.000 slightly faster. The first thing, there's an awful lot of them about. Shocking. And 00:00:55.000 --> 00:01:03.300 the second is that they go wrong. They go wrong a lot. Really a lot. A lot. A lot! 00:01:03.300 --> 00:01:09.009 And they go wrong in many interestingly different ways. So sometimes they go wrong 00:01:09.009 --> 00:01:17.850 in spectacular ways, right? With fireworks. And here, we see... 00:01:17.850 --> 00:01:24.030 laughterapplause 00:01:24.030 --> 00:01:31.260 Here we see half a billion dollars worth of sparkle. You don't get that every day. 00:01:31.260 --> 00:01:34.970 And so what was going on? So as I understand it, what was going on was that 00:01:34.970 --> 00:01:40.600 the - this was the first flight of the Ariane 5 launcher and it reused some of 00:01:40.600 --> 00:01:45.851 the guidance software from Ariane 4. That's fine. Good, tested code. Should 00:01:45.851 --> 00:01:53.030 reuse it. But the launch profile for Ariane 5 went sideways a bit more - on 00:01:53.030 --> 00:01:58.830 purpose - than the Ariane 5 launch. And as a result, one of the sideways values in 00:01:58.830 --> 00:02:04.740 the guidance control software overflowed. Bad. And hence, the guidance software 00:02:04.740 --> 00:02:08.449 decided that something bad was going wrong and in order to not land rockets on 00:02:08.449 --> 00:02:15.841 people's heads, it kindly blew itself up. Okay. So that's a spectacular. Sometimes 00:02:15.841 --> 00:02:21.200 they go wrong in insidious ways. And you people probably know much more about that 00:02:21.200 --> 00:02:29.370 than me. Very insidious ways. Exploited by bad people to our detriment. And they go 00:02:29.370 --> 00:02:36.990 wrong in insidious ways, exploited by our kindly governments for our protection. 00:02:36.990 --> 00:02:44.959 Right? And these - there are many, many, many different causes of these things but 00:02:44.959 --> 00:02:49.420 many of these causes are completely trivial. You know, you have one line of 00:02:49.420 --> 00:02:56.150 code in the wrong place - something like that, right? So that's very, very bad and 00:02:56.150 --> 00:03:01.040 you're all conscious of how bad that is. But from my point of view, this is as 00:03:01.040 --> 00:03:06.959 nothing with the real disaster that we have been facing and continue to face. 00:03:06.959 --> 00:03:14.950 Right? The real disaster is the enormous waste of human life and energy and effort 00:03:14.950 --> 00:03:21.360 and emotion dealing with these crap machines that we've built. Right? And it's 00:03:21.360 --> 00:03:25.489 hard to quantify that. I did a little experiment a couple of months ago. I 00:03:25.489 --> 00:03:34.030 Googled up "Android problem" and "Windows problem" and got about 300 million hits 00:03:34.030 --> 00:03:43.840 for each, right? Somehow indicative of the amount of wrongness. So this should you 00:03:43.840 --> 00:03:49.790 seem kind of familiar to you. If you think back - think back to when you were young. 00:03:49.790 --> 00:03:55.700 Maybe back in the 19th century when you were young, and what were you doing then 00:03:55.700 --> 00:04:00.940 as a community of hackers and makers and builders? Well, this was the full-on 00:04:00.940 --> 00:04:07.130 industrial revolution. You were building stuff. You were building beautiful bridges 00:04:07.130 --> 00:04:14.739 of cast-iron and stone with numerous arches and pillars. And every so often 00:04:14.739 --> 00:04:22.300 they would fall down. I believe someone miscalculated the wind loading. And we 00:04:22.300 --> 00:04:30.630 would build magical steam engines to whisk us away. And every so often, they would 00:04:30.630 --> 00:04:35.770 blow up. And that kind of thing, as you may have noticed in the last hundred years 00:04:35.770 --> 00:04:41.810 - sorry - that kind of thing doesn't happen very often any more, right? We have 00:04:41.810 --> 00:04:46.990 pretty good civil and mechanical engineering. And what does it mean to have 00:04:46.990 --> 00:04:52.620 good civil and mechanical engineering? It means that we know enough thermodynamics 00:04:52.620 --> 00:04:56.750 and materials science and quality control management and all that kind of thing to 00:04:56.750 --> 00:05:01.830 model our designs before we build them and predict how they're going to behave pretty 00:05:01.830 --> 00:05:10.750 well and with pretty high confidence, right? I see an optimisation in this talk. 00:05:10.750 --> 00:05:15.340 For computer science, for computing systems we can predict with 100 per cent 00:05:15.340 --> 00:05:20.450 certainty that they will not work. Ah, time for lunch! 00:05:20.450 --> 00:05:26.509 applause 00:05:26.509 --> 00:05:33.330 But it's not a very satisfying answer, really. So why is it so hard? Let me 00:05:33.330 --> 00:05:38.500 discuss a couple of the reasons why computing is tricky. In some sense, 00:05:38.500 --> 00:05:45.430 trickier than steam engines. The first reason is that there is just too much 00:05:45.430 --> 00:05:50.360 code. Right? So this is one of the more scary graphs I've found on the internet 00:05:50.360 --> 00:05:55.470 recently. This is a measure of code in hundreds of millions of lines. Each of 00:05:55.470 --> 00:06:00.250 these blocks I think is 10 million lines of code. And in the little letters - you 00:06:00.250 --> 00:06:04.860 might not be able to see we have Android and the Large Hadron Collider and Windows 00:06:04.860 --> 00:06:12.820 Vista and Facebook, US Army Future Combat System, Mac OS and software in a high-end 00:06:12.820 --> 00:06:15.010 car. Right? 00:06:15.010 --> 00:06:23.230 laughterapplause 00:06:23.230 --> 00:06:25.320 I'm flying home. 00:06:25.320 --> 00:06:27.810 laughter 00:06:27.810 --> 00:06:33.520 So we that's a bit of a problem, really. We're never going to get that right. Then 00:06:33.520 --> 00:06:37.410 there's too much legacy complexity there, right? Really too much. So as 00:06:37.410 --> 00:06:41.840 right-minded, pure-minded people you might think that software and hardware should be 00:06:41.840 --> 00:06:46.490 architected beautifully with regular structure a bit like a Mies van der Rohe 00:06:46.490 --> 00:06:53.660 skyscraper or something. Clean. Okay. You know that's not realistic, right? You 00:06:53.660 --> 00:06:59.470 might expect a bit of baroqueness. You might expect maybe something more like 00:06:59.470 --> 00:07:02.680 this. You know, it's a bit twiddly but it still hangs together. It's still got 00:07:02.680 --> 00:07:08.160 structure. It still works. But then you stop and you think and you realise that 00:07:08.160 --> 00:07:15.770 software and hardware is built by very smart people and usually in big groups and 00:07:15.770 --> 00:07:20.300 subject to a whole assortment of rather intense commercial pressure and related 00:07:20.300 --> 00:07:25.210 open-source pressure and using the best components and tools that they know and 00:07:25.210 --> 00:07:27.169 they like. 00:07:27.169 --> 00:07:34.520 laughterapplause 00:07:34.520 --> 00:07:38.990 Someone from the room: (Indistinct) You forgot management! 00:07:38.990 --> 00:07:42.480 And the best management that human beings can arrange. 00:07:42.480 --> 00:07:49.680 laughterapplause 00:07:49.680 --> 00:07:54.910 So we end up with something spectacularly ingenious. 00:07:54.910 --> 00:08:05.490 laughterapplause 00:08:05.490 --> 00:08:09.730 You can see here, if you look closely, you can see all the parts. 00:08:09.730 --> 00:08:11.000 laughter 00:08:11.000 --> 00:08:15.770 This was just a found picture from the internet. But you can see there's a bunch 00:08:15.770 --> 00:08:19.060 of, you know, transparent sort of silicon-like stuff down here. Let's say 00:08:19.060 --> 00:08:25.079 that's our hardware. And over here, I see a C compiler with all of its phases. 00:08:25.079 --> 00:08:28.599 laughter 00:08:28.599 --> 00:08:36.930 And these bottles, that's the file system, I reckon. And there's a TCP stack in here 00:08:36.930 --> 00:08:43.690 somewhere. And there's a piece that belongs to the NSA, obviously. And up 00:08:43.690 --> 00:08:50.259 here, I think up here - yeah, there's a JavaScript engine and a TLS stack and a 00:08:50.259 --> 00:08:56.050 web browser. And then perched at the top is the software that most people use to 00:08:56.050 --> 00:08:57.910 talk to their banks. 00:08:57.910 --> 00:09:01.489 laughter 00:09:01.489 --> 00:09:09.489 It's just - it's just insane, right? And then we have to look at how we build these 00:09:09.489 --> 00:09:14.089 pieces, right? So we build them fundamentally by a process of trial and 00:09:14.089 --> 00:09:20.649 error, right? Just occasionally we write some specifications in text. And then we 00:09:20.649 --> 00:09:26.029 write some code and maybe we write a few ad hoc texts and then we test and fix 00:09:26.029 --> 00:09:30.610 until the thing is marketable and then we test and fix and extend until the thing is 00:09:30.610 --> 00:09:36.069 no longer marketable and then we keep on using it until we can't anymore. Right? So 00:09:36.069 --> 00:09:40.259 the fundamental process that we're using to develop these systems is trial and 00:09:40.259 --> 00:09:47.930 error by exploring particular execution paths of the code on particular inputs. 00:09:47.930 --> 00:09:54.459 But - you know this, but I want to highlight - there are too many. The number 00:09:54.459 --> 00:09:59.649 of execution paths scales typically at least exponentially with the size of the 00:09:59.649 --> 00:10:04.490 code and the number of states scales exponentially with the size of the data 00:10:04.490 --> 00:10:08.790 that you're dealing with and the number of possible inputs is also shockingly large. 00:10:08.790 --> 00:10:14.920 There is no way that we can do this and we should perhaps stop deluding ourselves 00:10:14.920 --> 00:10:21.129 that there is. And then the most fundamental reason that computers are hard 00:10:21.129 --> 00:10:27.899 is that they are too discrete, right? If you take a bridge or, I don't know, a 00:10:27.899 --> 00:10:36.519 chair or something - where's something slightly bendable? You take a chair and 00:10:36.519 --> 00:10:44.199 you exert a bit more force on it than it's used to - I'm not very strong - and it 00:10:44.199 --> 00:10:51.299 bends a bit. It deforms continuously. If you take a bridge and you tighten up one 00:10:51.299 --> 00:10:55.239 of the bolts a bit too much or you underspecify one of the girders by 10 per 00:10:55.239 --> 00:11:01.019 cent, occasionally you'll reach a point of catastrophic failure but usually it will 00:11:01.019 --> 00:11:04.089 just be a little bit weaker or a little bit stronger or a little bit wibbly. 00:11:04.089 --> 00:11:08.960 Right? But computer systems, you change a bit - sometimes it doesn't make a 00:11:08.960 --> 00:11:13.179 difference but disturbingly often you change a line of code - as in system of 00:11:13.179 --> 00:11:19.600 those bugs that we were talking about - the whole thing becomes broken, right? So 00:11:19.600 --> 00:11:26.929 it's quite different from that traditional engineering. It's really different. Okay. 00:11:26.929 --> 00:11:34.850 So, "A New Dawn", this thing is titled. What can we do? What might we do? So I'm 00:11:34.850 --> 00:11:40.879 going to talk about several different possibilities, all right? One thing, we 00:11:40.879 --> 00:11:46.490 can do more of the stuff which is traditionally called software engineering. 00:11:46.490 --> 00:11:50.910 It's not really engineering in my book, but some of it's good. I mean, it's 00:11:50.910 --> 00:11:55.119 useful, you know? We can do more testing and have more assertions in our code and 00:11:55.119 --> 00:11:58.920 better management maybe and all that kind of stuff. We can do all that - we should 00:11:58.920 --> 00:12:04.339 do all that, but it's not going to make a very big change, all right? It's not 00:12:04.339 --> 00:12:13.989 addressing any of those root causes of the problem. What else could we do? So as an 00:12:13.989 --> 00:12:18.570 academic computer scientist, I've devoted several years of my life to working in the 00:12:18.570 --> 00:12:24.509 area of programming languages and I look at the programming languages used out 00:12:24.509 --> 00:12:36.919 there in the world today. Oh, it's just disgusting! It's shocking. 00:12:36.919 --> 00:12:43.399 applause 00:12:43.399 --> 00:12:50.420 The reasons that languages become popular and built into our shared infrastructure, 00:12:50.420 --> 00:12:54.790 the reasons for that have almost nothing to do with how well those languages are 00:12:54.790 --> 00:13:00.600 designed and in fact whether they are designed at all, right? 00:13:00.600 --> 00:13:04.110 applause 00:13:04.110 --> 00:13:09.040 So I didn't really want to get too much hate mail after this talk so I'm going to 00:13:09.040 --> 00:13:13.999 try and avoid naming particular languages as much as I can, but I encourage you to 00:13:13.999 --> 00:13:20.279 think away as I speak and imagine the examples. No, that was - I didn't want to 00:13:20.279 --> 00:13:23.730 get much hate mail. Sorry. 00:13:23.730 --> 00:13:25.160 laughter 00:13:25.160 --> 00:13:26.579 So... 00:13:26.579 --> 00:13:32.069 laughterapplause 00:13:32.069 --> 00:13:36.540 So at the very least, we could use programming languages based on ideas from 00:13:36.540 --> 00:13:47.389 the 1970s instead of the 1960s. I mean, really, come on. Right? You know, back in 00:13:47.389 --> 00:13:54.120 '67 or '69 or whatever we had ECPL and C and only a couple of years later we had 00:13:54.120 --> 00:14:00.069 languages that guaranteed type and memory safety and enforcement of abstraction 00:14:00.069 --> 00:14:05.089 boundaries and gave us rich mechanisms for programming so we could say what we mean 00:14:05.089 --> 00:14:11.350 and not futz about with 23 million pointers, about 1 per cent of which were 00:14:11.350 --> 00:14:16.629 completely wrong, right? So for any particular - in the context of any 00:14:16.629 --> 00:14:20.459 particular project or any particular existing legacy infrastructure there may 00:14:20.459 --> 00:14:25.789 be very good reasons why one just can't make a switch. Right? But for us 00:14:25.789 --> 00:14:31.860 collectively, there is no excuse. It's just wretched, right? 00:14:31.860 --> 00:14:37.040 applause 00:14:37.040 --> 00:14:40.969 It's completely bonkers. And then the other thing that you see as a programming 00:14:40.969 --> 00:14:49.239 language researcher is that - er, well, to make another analogy, it looks as if 00:14:49.239 --> 00:14:55.529 anyone who was capable of driving a car considered themselves an expert on car 00:14:55.529 --> 00:15:03.559 design, right? There is in fact a rather well-established subject of, you know, 00:15:03.559 --> 00:15:08.239 programming language design and we know how to specify completely precisely, 00:15:08.239 --> 00:15:13.019 mathematically precisely not just the syntax like we had in these BNF grammars 00:15:13.019 --> 00:15:18.160 from the 1960s and still have, but also the behaviour of these things. We know how 00:15:18.160 --> 00:15:22.699 to specify a few operational semantics and the type systems. And we know how to do 00:15:22.699 --> 00:15:27.360 mathematical proofs that our language designs are self-consistent, that they do 00:15:27.360 --> 00:15:35.059 guarantee good properties of arbitrary well-typed programs that you write, right? 00:15:35.059 --> 00:15:38.759 We can do this now. We can do this, if we have to, post hoc. And people are doing 00:15:38.759 --> 00:15:45.669 this for JavaScript and C and god help them, PHP and other such languages. But 00:15:45.669 --> 00:15:50.829 even better, we can do it at design time. And if you do this at design time, you see 00:15:50.829 --> 00:15:55.509 the structure. You have to - the act of writing that specification forces you to 00:15:55.509 --> 00:15:59.939 consider all the cases and all the interactions between features. You can 00:15:59.939 --> 00:16:07.861 still get it wrong, but you get it wrong in more interesting ways. Right? So people 00:16:07.861 --> 00:16:12.119 are starting to do this now. So I think there was a talk on the first day of this 00:16:12.119 --> 00:16:17.429 by Hannes and David who are building system software in maybe not the best sys 00:16:17.429 --> 00:16:21.800 language you could possibly imagine, but something which compared with C is as 00:16:21.800 --> 00:16:27.759 manna from heaven. Anil Madhavapeddy and his colleagues in Cambridge have been 00:16:27.759 --> 00:16:32.759 working hard to build system software in at least moderately modern languages. And 00:16:32.759 --> 00:16:40.780 it works. Sometimes you have to use C but surprisingly often you really don't. Yeah, 00:16:40.780 --> 00:16:47.049 so I teach, as my introducer said, in the University of Cambridge. And I teach 00:16:47.049 --> 00:16:51.269 semantics. And one of my fears is that I will accidentally let someone out in the 00:16:51.269 --> 00:16:54.970 world who will accidentally find themselves over a weekend inventing a 00:16:54.970 --> 00:16:58.149 little scripting language which will accidentally take over the world and 00:16:58.149 --> 00:17:05.359 become popular and I won't have explained to them what they have to do or even that 00:17:05.359 --> 00:17:08.189 there is a subject they have to understand. And if you want to understand 00:17:08.189 --> 00:17:18.740 it, there's lots of stuff you can look at. Okay. Let's go to the uppermost extreme. 00:17:18.740 --> 00:17:23.829 If we want software and hardware that actually works, we should prove it 00:17:23.829 --> 00:17:30.700 correct. Right? So this is something which academics, especially in the domain of 00:17:30.700 --> 00:17:36.830 labelled "formal methods" in the '70s and '80s, they've been pushing this, promoting 00:17:36.830 --> 00:17:44.789 this as an idea and a promise and an exhortation for decades, right? Why don't 00:17:44.789 --> 00:17:50.169 we just prove our programs correct? And for some of those decades it wasn't really 00:17:50.169 --> 00:17:54.570 plausible. I remember back when I was considerably younger, it was a big thing 00:17:54.570 --> 00:17:59.540 to proof that if you took a linked list and reversed it twice, you got back the 00:17:59.540 --> 00:18:08.059 same thing. Right? That was hard, then. But now, well, we can't do this for the 00:18:08.059 --> 00:18:14.419 Linux kernel, let's say, but we can surprising often do this. And I just give 00:18:14.419 --> 00:18:19.590 you a couple of the examples of modern day, state-of-the-art academic 00:18:19.590 --> 00:18:25.720 verification projects. People have verified compilers all the way from C-like 00:18:25.720 --> 00:18:31.769 languages down to assembly or from ML-like languages. You know, higher-order 00:18:31.769 --> 00:18:36.959 functional imperative languages all the way down to binary models of how x86 00:18:36.959 --> 00:18:44.470 behaves. And people have verified LLVM optimisation paths. There's verified 00:18:44.470 --> 00:18:49.880 software fault isolation which I believe goes faster than the original non-verified 00:18:49.880 --> 00:18:56.130 form. Win for them. There's a verified hypervisor from Nectar group in Australia. 00:18:56.130 --> 00:19:00.570 Right? Verified secure hypervisor with proofs of security properties. There's any 00:19:00.570 --> 00:19:06.690 amount of work verifying crypto protocols, sometimes with respect to assumptions - I 00:19:06.690 --> 00:19:12.510 think reasonable assumption on what the underlying mathematics is giving you. So 00:19:12.510 --> 00:19:22.919 we can do this. I maybe have to explain a bit what I mean by "prove". To some 00:19:22.919 --> 00:19:30.990 computery people, "I proved it works" means "I tested it a little bit". 00:19:30.990 --> 00:19:32.510 laughter 00:19:32.510 --> 00:19:34.960 In extremis, "it compiled". 00:19:34.960 --> 00:19:44.120 laughterapplause 00:19:44.120 --> 00:19:50.770 To a program analysis person, you might run some sophisticated tool but written in 00:19:50.770 --> 00:19:54.649 untrusted code and possibly doing an approximate and maybe unsound analysis and 00:19:54.649 --> 00:19:59.720 you might find an absence of certain kinds of bugs. That's tremendously useful; it's 00:19:59.720 --> 00:20:04.640 not what we would really call "proved". I mean, nor would they, to be fair. 00:20:04.640 --> 00:20:08.970 Scientists generally don't use the word because they know they're not doing it, 00:20:08.970 --> 00:20:13.559 right? They're doing controlled experiments, which gives them, you know, 00:20:13.559 --> 00:20:20.070 information for or against some hypothesis. Mathematician write proofs. 00:20:20.070 --> 00:20:25.139 You probably did that when you were young. And those proofs are careful mathematical 00:20:25.139 --> 00:20:30.139 arguments usually written on paper with pencils and their aim is to convince a 00:20:30.139 --> 00:20:33.389 human being that if that mathematician was really nailed up against the wall and 00:20:33.389 --> 00:20:38.980 pushed, they could expand that into a completely detailed proof. But what we 00:20:38.980 --> 00:20:47.750 have in computing, we don't have the rich mathematical structure that these people 00:20:47.750 --> 00:20:51.950 are working with, we have a tremendous amount of ad hoc and stupid detail mixed 00:20:51.950 --> 00:20:57.769 in which a smidgin of rich mathematical structure. So - and we have hundreds of 00:20:57.769 --> 00:21:03.399 millions of lines of code. So this is just not going to cut it. And if we want say 00:21:03.399 --> 00:21:08.440 the word "prove" ever, then the only thing which is legitimate is to do honest 00:21:08.440 --> 00:21:13.230 program proof and that means you have to have some system that machine-checks that 00:21:13.230 --> 00:21:17.540 your proof is actually a proof. And sometime we'll also have machines that 00:21:17.540 --> 00:21:22.250 will sort of help and sort of hinder that process, right? And there's a variety of 00:21:22.250 --> 00:21:27.340 these systems. Coq and HOL4 and Isabelle4 - Isabelle/HOL and what have you that we 00:21:27.340 --> 00:21:36.799 can look up. So using these systems we can prove nontrivial facts about these. Not 00:21:36.799 --> 00:21:41.530 necessarily that they do what you want them to do, but that they meet the precise 00:21:41.530 --> 00:21:48.519 specification that we've written down. We can do that. But it's tricky, right? So 00:21:48.519 --> 00:21:52.930 these projects, these are by academic standards all quite big, you know? This is 00:21:52.930 --> 00:21:58.009 something like, I don't know, 10 people for a few years or something. By industry 00:21:58.009 --> 00:22:03.399 scale, maybe not that big. By your scale, maybe not that big. But still a lot of 00:22:03.399 --> 00:22:08.149 work for a research group to do. And quite high-end work. It can take, you know, 00:22:08.149 --> 00:22:13.730 literally a few years to become really fluent in using these tools, right? And 00:22:13.730 --> 00:22:20.139 there isn't as yet sort of an open-source, collaborative movement doing this kind of 00:22:20.139 --> 00:22:26.710 stuff. Maybe it's the time to start. Maybe in five years, ten years. I don't know. If 00:22:26.710 --> 00:22:35.680 you want a challenge, there's a challenge. But it's really quite hard, right? It's 00:22:35.680 --> 00:22:40.830 not something that one can quite put forward credibly as a solution for all 00:22:40.830 --> 00:22:46.529 software and hardware development, right? So that leads me to, like an intermediate 00:22:46.529 --> 00:22:53.399 point. That should have been a four there. What can we do which improves the quality 00:22:53.399 --> 00:22:58.340 of our system, which injects some mathematical rigour but which involves 00:22:58.340 --> 00:23:06.370 less work and is kind of easy? And what we can do is take some of these interfaces 00:23:06.370 --> 00:23:11.679 and be precise about what they are. And initially, we have to do that after the 00:23:11.679 --> 00:23:16.941 fact. We have to reverse-engineer good specs of existing stuff. But then we can 00:23:16.941 --> 00:23:23.470 use the same techniques to make much cleaner and better-tested and things which 00:23:23.470 --> 00:23:30.670 are easier to test in the future. That's the idea. So my colleagues and I have been 00:23:30.670 --> 00:23:35.360 doing this kind of stuff for the last quite a few years and I just want to give 00:23:35.360 --> 00:23:41.570 you just a - two little vignettes just to sort of show you how it goes. So I can't 00:23:41.570 --> 00:23:46.759 give you much detail. And this is, obviously, joint work with a whole bunch 00:23:46.759 --> 00:23:55.399 of very smart and good people, some of which I name here. So this is not me, this 00:23:55.399 --> 00:24:00.840 is a whole community effort. So I'm going to talk for a little bit about the 00:24:00.840 --> 00:24:05.799 behaviour of multiprocessors, so at that hardware interface. And I'm going to talk 00:24:05.799 --> 00:24:11.090 a tiny little bit about the behaviour of the TCP and socket API, right? Network 00:24:11.090 --> 00:24:17.169 protocols. So - and there'll be some similarities and some differences between 00:24:17.169 --> 00:24:24.990 these two things. So multiprocessors. You probably want your computers to go fast. 00:24:24.990 --> 00:24:31.090 Most people do. And it's an obvious idea to glom together, because processors don't 00:24:31.090 --> 00:24:35.529 go that fast, let's glom together a whole bunch of processors and make them all talk 00:24:35.529 --> 00:24:46.150 to the same memory. This is not a new idea. It goes back at least to 1962 when 00:24:46.150 --> 00:24:51.380 the Burroughs D825 had, I think, two processors talking to a single shared 00:24:51.380 --> 00:24:55.850 memory. It had outstanding features including truly modular hardware with 00:24:55.850 --> 00:25:02.649 parallel processing throughout, and - some things do not change very much - the 00:25:02.649 --> 00:25:11.640 complement of compiling languages was to be expanded, right? 1962, so that'll be 52 00:25:11.640 --> 00:25:18.820 years ago. Deary me. Okay. Now, how do these things behave? So let me show you 00:25:18.820 --> 00:25:22.899 some code, right? It's a hacker conference. There should be code. Let me 00:25:22.899 --> 00:25:29.030 show you two example programs and these will both be programs with two threads and 00:25:29.030 --> 00:25:34.570 they will both be acting on two shared variables, X and Y. And in the initial 00:25:34.570 --> 00:25:42.899 state, X and Y are both zero. So, the first program. On this thread we write X 00:25:42.899 --> 00:25:48.970 and then we read Y, and over here we write Y and then we read X, right? Now, these 00:25:48.970 --> 00:25:53.629 are operating, you know, in an interleaving concurrency, you might think, 00:25:53.629 --> 00:25:58.030 so there's no synchronisation between those two threads so we might interleave 00:25:58.030 --> 00:26:03.269 them. We might run all of thread 0 before all of thread 1 or all of thread 1 before 00:26:03.269 --> 00:26:10.340 all of thread 0 or they might be mixed in like that or like that or like that or 00:26:10.340 --> 00:26:16.940 like that. There's six possible ways of interleaving two lists of two things. And 00:26:16.940 --> 00:26:22.980 in all of those ways you never see in the same execution that this reads from the 00:26:22.980 --> 00:26:30.169 initial state AND this reads from the initial state. You never see that. So you 00:26:30.169 --> 00:26:35.479 might expect and you might assume in your code that these are the possible outcomes. 00:26:35.479 --> 00:26:40.490 So what happens if you were to actually run that code on my laptop in a particular 00:26:40.490 --> 00:26:46.970 test harness? Well, okay, you see occasionally they're quite well 00:26:46.970 --> 00:26:52.499 synchronised and they both read the other guy's right. Sorry, big call. They both 00:26:52.499 --> 00:26:57.510 read the other guy's right. And quite often one thread comes completely before 00:26:57.510 --> 00:27:09.769 the other. But sometimes, sometimes they both see 0. Huh. Rats. Well, that's 00:27:09.769 --> 00:27:18.840 interesting. Let me show you another program. So now, thread 0 writes some 00:27:18.840 --> 00:27:23.440 stuff. Maybe it writes a big bunch of data, maybe multi-word data. And then it 00:27:23.440 --> 00:27:28.249 writes, let's say that's a flag announcing to some another thread that that data is 00:27:28.249 --> 00:27:33.249 ready to go now. And the other thread busy-waits reading that value until it 00:27:33.249 --> 00:27:38.710 gets 1, until it sees the flag. And then it reads the data. So you might want, as a 00:27:38.710 --> 00:27:43.610 programmer, you might want a guarantee that this read will always see the data 00:27:43.610 --> 00:27:48.759 that you have initialised. Hey? That would be like a message-passing kind of an 00:27:48.759 --> 00:27:56.229 idiom. So what happens if you run that? Well, on the x86 processors that we've 00:27:56.229 --> 00:28:03.929 tested, you always see that value. Good. This is a descent coding idiom on x86. All 00:28:03.929 --> 00:28:12.230 right. On Arma IBM Power processors, however, you see sometimes you just ignore 00:28:12.230 --> 00:28:16.850 the value written and you see the initial state. So this is not a good 00:28:16.850 --> 00:28:25.539 message-passing idiom, right? So what's going on? Well, these behaviours, it might 00:28:25.539 --> 00:28:31.260 be surprising, eh? And when you see surprising behaviour in hardware, you have 00:28:31.260 --> 00:28:36.100 two choices. Either it's a bug in the hardware, and we have found a number of 00:28:36.100 --> 00:28:45.259 bugs in hardware. It's always - it's very rewarding. Or it's a bug in your 00:28:45.259 --> 00:28:51.350 expectation. Or it's a bug in your test harness. Right? So what you do is you walk 00:28:51.350 --> 00:28:58.050 along and you walk up to your friendly processor architect in IBM or ARM or x86 00:28:58.050 --> 00:29:04.770 land, and we have worked with all of these people. And you ask them, "Is this a bug?" 00:29:04.770 --> 00:29:12.360 And a processor architect is a person who is - has the authority to decide whether 00:29:12.360 --> 00:29:16.240 some behaviour is intended or is a bug. 00:29:16.240 --> 00:29:17.750 laughter 00:29:17.750 --> 00:29:24.669 That's what they do, essentially. And for these, they say, "Oh, we meant it to be 00:29:24.669 --> 00:29:29.239 like that". Right? No question. "We meant it to be like that. That's perfectly 00:29:29.239 --> 00:29:36.559 proper. We have, because you and everybody else wanted their computers to go fast, we 00:29:36.559 --> 00:29:41.690 as processor designers have introduced all manner of sophisticated optimizations, 00:29:41.690 --> 00:29:45.200 which if you were running sequential code, you would never notice, but if you start 00:29:45.200 --> 00:29:49.289 running fancy concurrent code like this, fancy concurrent code that isn't just 00:29:49.289 --> 00:29:56.049 trivially locked, you can notice", so this is intentional behaviour. And if you want 00:29:56.049 --> 00:30:00.129 to write that code, like a mutual exclusion algorithm or a message-passing 00:30:00.129 --> 00:30:06.460 algorithm or something, then you need to insert special instructions, memory 00:30:06.460 --> 00:30:11.180 barrier instructions. And so you go away and you say, "What do they do?" And you 00:30:11.180 --> 00:30:18.949 look up in the vendor documentation. And you get stuff like this. I'll read it out: 00:30:18.949 --> 00:30:22.190 (Reads quickly) "For any applicable pair AB, the memory barrier ensures that a will 00:30:22.190 --> 00:30:24.740 be performed with respect to any processor or mechanism, to the extent required by 00:30:24.740 --> 00:30:27.590 the associated memory coherence required at beauties, before b is performed with 00:30:27.590 --> 00:30:30.360 respect to that prior or mechanism. A includes all applicable storage accesses 00:30:30.360 --> 00:30:33.090 by any such processor or mechanism that have been performed with respect to P1 00:30:33.090 --> 00:30:36.210 before the memory barrier is created. B includes all applicable storage accesses 00:30:36.210 --> 00:30:38.670 by any such processor or mechanism that are performed after a Load instruction 00:30:38.670 --> 00:30:41.290 executed by that processor or mechanism has returned the value stored by a store 00:30:41.290 --> 00:30:42.700 that is in B." 00:30:42.700 --> 00:30:49.850 applause 00:30:49.850 --> 00:30:58.630 Hands up if you understand what that means? Hands up if you think that if you 00:30:58.630 --> 00:31:02.139 thought about it and read the rest of the book, you would understand or you could 00:31:02.139 --> 00:31:09.730 understand what that means? Hands up the people who think that we're doomed forever? 00:31:09.730 --> 00:31:12.920 laughter 00:31:12.920 --> 00:31:16.979 So I'm sorry to the first few groups, but the last ones are right. 00:31:16.979 --> 00:31:22.299 laughterapplause 00:31:22.299 --> 00:31:28.850 This looks like it's really intricate and really carefully thought out stuff. And we 00:31:28.850 --> 00:31:32.220 thought that for a while and we spent literally years trying to make precise 00:31:32.220 --> 00:31:37.159 mathematical models that give precise meaning to these words, but actually it 00:31:37.159 --> 00:31:45.460 can't be done. So what do you have to do in that circumstance? You have to go away 00:31:45.460 --> 00:31:52.120 and you have to invent some kind of a model. And there's a - this is a really 00:31:52.120 --> 00:31:56.190 fundamental problem, that we - on the one hand, we develop our software by this 00:31:56.190 --> 00:32:01.960 trial and error process, but on the other hand are points like this. We have these 00:32:01.960 --> 00:32:06.470 loose specifications, supposedly, that have to cover all manner of behaviour of 00:32:06.470 --> 00:32:10.730 many generations of processors that might all behave the same way, written just in 00:32:10.730 --> 00:32:16.769 text. And we tell people they should write "to the spec", but the only way they have 00:32:16.769 --> 00:32:22.450 of developing their code is to run it and run particular executions on the 00:32:22.450 --> 00:32:26.960 particular hardware implementations that they have, whose relationship to the spec 00:32:26.960 --> 00:32:34.740 is very elusive. We can't use those thick books that you get or those quite weighty 00:32:34.740 --> 00:32:39.260 PDFs that you get these days from the processor vendors to test programs. You 00:32:39.260 --> 00:32:43.970 can't feed a program into that thick book and get output out. And you can't test 00:32:43.970 --> 00:32:47.429 processor implementations and you can't prove anything and you can't even 00:32:47.429 --> 00:32:53.430 communicate between human beings, right? So these things, really, they don't exist. 00:32:53.430 --> 00:32:58.500 So what can we do? Well, the best we can do at this point in time is a bit of 00:32:58.500 --> 00:33:04.950 empirical science, right? So we can invent some model off the top of our heads, 00:33:04.950 --> 00:33:09.649 trying to describe just the programmer-visible behaviour, not the 00:33:09.649 --> 00:33:15.529 internal structure. And we can make a tool. Because that's not prose now. Now we 00:33:15.529 --> 00:33:18.950 can - now it's stuff. It's real mathematics and we can turn that into code 00:33:18.950 --> 00:33:23.200 and execute it. We can make tools that take programs and execute it and - small 00:33:23.200 --> 00:33:28.280 programs - tell us the set of all behaviours allowed by that model. And then 00:33:28.280 --> 00:33:33.759 we can compare those sets of behaviours against the experimental observations. And 00:33:33.759 --> 00:33:38.360 you might have to fix the model and you might find hard bugs. And then the model 00:33:38.360 --> 00:33:42.869 also has to make sense to the architect so you can discuss with the architects what 00:33:42.869 --> 00:33:47.100 they intend and what their institution is, and then you can also prove some other 00:33:47.100 --> 00:33:50.980 facts about it to give a bit more assurance and then because you probably 00:33:50.980 --> 00:33:57.249 got this wrong the first couple of times, you can go back. And this results or has 00:33:57.249 --> 00:34:04.509 resulted in models which are not guaranteed to be correct, but they are 00:34:04.509 --> 00:34:08.639 effectively the de facto standard for understanding the concurrency behaviour of 00:34:08.639 --> 00:34:14.300 these things, you know? And various people use them. And just to give you a tiny 00:34:14.300 --> 00:34:18.570 snippet - I'll have to speed up a bit - a tiny snippet of what the model looks like, 00:34:18.570 --> 00:34:22.969 it's basically just an abstract machine. It's got some stuff for the threads that 00:34:22.969 --> 00:34:26.949 handles the programmer-visible speculative execution, and some stuff for a storage 00:34:26.949 --> 00:34:32.739 subsystem that handles the fact that in these architectures, memory rights can be 00:34:32.739 --> 00:34:38.270 propagated to different threads at different times, right? And there's a - 00:34:38.270 --> 00:34:42.540 that model has a state which is just a collection of some sets and lists and 00:34:42.540 --> 00:34:47.230 partial orders and a few other things which I won't talk about. And it has some 00:34:47.230 --> 00:34:51.030 transitions, right? In any state, there might be several possible transitions. 00:34:51.030 --> 00:34:54.699 This is just a sample. I'm not going to explain all of this. But when you want to 00:34:54.699 --> 00:34:58.570 propagate a write to another thread, there have to be some preconditions that you have to 00:34:58.570 --> 00:35:07.140 satisfy. And than there is an action. It's not amazingly complicated, really. So this is text, 00:35:07.140 --> 00:35:13.319 but it's very precisely written text and it has the great advantage that you can go through 00:35:13.319 --> 00:35:16.570 these bullet points with your friendly architect and say, "Is this what you 00:35:16.570 --> 00:35:22.440 mean?" And they can think about it and say yes or no. But for the actual definition, 00:35:22.440 --> 00:35:27.309 that's in mathematics but it's mathematics that's quite close to code. I mean, it's 00:35:27.309 --> 00:35:32.380 terribly close to pure, functional code with outside effects. And just to give you 00:35:32.380 --> 00:35:37.230 an idea, this is some of it and those are three of those conditions in the real, 00:35:37.230 --> 00:35:42.660 honest, true version. Then we can take this mathematics and because it's in a 00:35:42.660 --> 00:35:49.270 particular style, we can compile this into actually OCaml and then OCaml byte code 00:35:49.270 --> 00:35:55.770 and then we can run it for batch jobs. But then you can compile that OCaml byte code 00:35:55.770 --> 00:36:02.579 to JavaScript and glue on an user interface and stick it into a web browser and then you 00:36:02.579 --> 00:36:07.760 have a web interface that lets people explore this model. And that's also a 00:36:07.760 --> 00:36:16.190 necessary part of validating that it makes sense. Okay. This is not rocket science. 00:36:16.190 --> 00:36:21.710 You've missed the talk for rocket science, I'm sorry. All we're doing here is 00:36:21.710 --> 00:36:28.630 specifying the intended behaviour of a system. Okay? That's - it's not very 00:36:28.630 --> 00:36:32.490 technical but it's unusual. And we happen to be doing it in this mathematical 00:36:32.490 --> 00:36:37.049 language, but you could use in fact almost any language so long as you understand 00:36:37.049 --> 00:36:40.170 what you're doing. What you have to understand is that you're writing 00:36:40.170 --> 00:36:45.230 something which is not an implementation. It is something which, given some observed 00:36:45.230 --> 00:36:49.810 behaviour, some arbitrary observed behaviour, will be able to decide if it's 00:36:49.810 --> 00:36:54.580 allowed, if you want it to be allowed or not, right? It has to be executed as a 00:36:54.580 --> 00:37:02.059 test oracle. The key question for getting this to work is to understand how much 00:37:02.059 --> 00:37:05.730 non-determinism or loose specification there is in the system that you're working 00:37:05.730 --> 00:37:08.890 with, right? So if everything is completely deterministic, you can write a 00:37:08.890 --> 00:37:13.839 reference implementation in the cleanest language of your choice and just compare 00:37:13.839 --> 00:37:19.270 the output of that and the output of the real thing, right? But if there's more 00:37:19.270 --> 00:37:24.000 non-determinism, then you can't do that. I'm going to have to abbreviate this a 00:37:24.000 --> 00:37:31.369 little tiny bit. So for the TCP network protocols, there is more non-determinism. 00:37:31.369 --> 00:37:35.220 You know what the TCP is, yes? A protocol that gives you sort of reliable 00:37:35.220 --> 00:37:41.720 connections, assuming that the world is made of good people, right? You look at 00:37:41.720 --> 00:37:47.640 the standards for TCP, and they're the same kind of wibbly text from the 1980s 00:37:47.640 --> 00:37:52.400 and you look at the implementations and they are a ghastly mess. And you try and 00:37:52.400 --> 00:37:55.690 understand the relationship between the two of them and you can't because the 00:37:55.690 --> 00:38:01.740 standard, again, is not the definition. It doesn't define in all circumstances what 00:38:01.740 --> 00:38:07.010 behaviour is allowed and what isn't. So again, we had to make up a specification 00:38:07.010 --> 00:38:14.310 in the first instance just of this host, a single endpoint, and observing its 00:38:14.310 --> 00:38:18.510 sockets, API, calls and returns and its behaviour on the wire, right? And when 00:38:18.510 --> 00:38:21.969 you've decided - and a few internal debug events. And when you've decided what to 00:38:21.969 --> 00:38:26.619 observe, then observation is just a trace, it's just a list of those events. And you 00:38:26.619 --> 00:38:34.750 have to be able to ask your spec "Is that list admissible or not"? But there's a lot 00:38:34.750 --> 00:38:38.820 of non-determinism. Variation between the implementations, variations within the 00:38:38.820 --> 00:38:45.650 implementations. And that's internal. It's not announced in the API or on the wire 00:38:45.650 --> 00:38:50.980 maybe until much later when the implementation picks a new window size or 00:38:50.980 --> 00:38:56.890 something. You can't tell until quite a lot later in the trace what it's picked. 00:38:56.890 --> 00:39:02.740 And that makes the job of checking whether a trace is admissible by the spec much 00:39:02.740 --> 00:39:08.559 harder than it has to be. Right? And if you had designed the TCP protocol and 00:39:08.559 --> 00:39:14.470 those implementations for to be testable in this very discriminating way, back in 00:39:14.470 --> 00:39:20.470 the 1980s when you were designing TCP protocol, it would be much easier. And for 00:39:20.470 --> 00:39:25.650 new protocols, one should make sure that you're doing this in this particular way. 00:39:25.650 --> 00:39:30.930 I don't think I want to talk about that slide. That's just one of the rules of 00:39:30.930 --> 00:39:35.480 that specification. But the key fact about that spec is the - well, we handled the 00:39:35.480 --> 00:39:40.210 real protocols for arbitrary inputs. But it's structured not just for this 00:39:40.210 --> 00:39:46.240 executability as a test oracle, but it's structured for clarity, not performance, 00:39:46.240 --> 00:39:49.750 which is scarcely ever what anyone ever does, right? So it's a whole new kind of 00:39:49.750 --> 00:39:54.480 thing. And the testing is very discriminating. So we found all manner of 00:39:54.480 --> 00:40:03.839 amusing and bizarre bugs which I think I don't have time to talk about. Okay. So 00:40:03.839 --> 00:40:12.610 I've described three kinds of things that we might do. The first thing - for 00:40:12.610 --> 00:40:18.800 goodness sake - I mean, it's just a no-brainer. Just do it already. Everybody. 00:40:18.800 --> 00:40:29.670 All of you. All of you. Right? This middle zone is a very interesting zone as far as 00:40:29.670 --> 00:40:33.920 I'm concerned, right? It's - out of what - a place where we could get a very good 00:40:33.920 --> 00:40:39.220 benefit for not that much effort, right? We can do it, if necessary, post hoc. We 00:40:39.220 --> 00:40:44.490 can also do it and even better, at design time. We have to do this in a way that 00:40:44.490 --> 00:40:48.240 makes our executions testable as test oracles, and another thing that that 00:40:48.240 --> 00:40:52.119 enables is completely random testing. When you've got a test oracle, you can just 00:40:52.119 --> 00:40:58.210 feed in random goop. This is fuzzing, but better fuzzing - feed in random goop and 00:40:58.210 --> 00:41:02.859 check at every point whether what the behaviour that you're getting is allowable 00:41:02.859 --> 00:41:09.271 or not. And then they're written for clarity, not for performance, and that 00:41:09.271 --> 00:41:13.619 means you can see what you're doing, right? You can see the complexity. If you 00:41:13.619 --> 00:41:17.140 go ahead and you add some feature to your protocol or your programming language or 00:41:17.140 --> 00:41:22.420 whatever and you're working just with text specifications, then you can't see the 00:41:22.420 --> 00:41:24.759 interactions. You just chuck in a couple of paragraphs and you think for a few 00:41:24.759 --> 00:41:30.210 minutes, right? And if you're lucky, you make an implementation. But if you're 00:41:30.210 --> 00:41:35.309 writing a spec that has to cover all the cases like this, the act of doing that 00:41:35.309 --> 00:41:41.150 encourages you to think or helps you think about the excess complexity. And you might 00:41:41.150 --> 00:41:46.690 think that's too complex, I'm being silly, I'll make it simpler. And it has to be 00:41:46.690 --> 00:41:51.630 usable for proof. So this, I think also is pretty much a no-brainer. And it doesn't 00:41:51.630 --> 00:41:56.650 require great technical, you know, mathematical skill either, right? Lots of 00:41:56.650 --> 00:42:01.760 people can do this. And then there's option 3, best option, full-on formal 00:42:01.760 --> 00:42:07.930 verification of the key components. And that is now also in reach. You can imagine 00:42:07.930 --> 00:42:16.309 secure systems made using actual verified compilers and verified hypervisors with 00:42:16.309 --> 00:42:22.540 verified TLS stacks and you can imagine making things out of those or making those 00:42:22.540 --> 00:42:27.870 and then making things out of those. And maybe one should be thinking about that. 00:42:27.870 --> 00:42:34.070 So, maybe not appropriate for everything. If you were writing Word, you would not 00:42:34.070 --> 00:42:38.599 wish to do these things. Probably, you're not. But for this key infrastructure that 00:42:38.599 --> 00:42:46.420 we really depend on, we trust even though it's not trustworthy, we have to do this. 00:42:46.420 --> 00:42:52.450 Is this a new dawn? I wonder if there's anyone standing next to a light switch 00:42:52.450 --> 00:42:58.920 that can dim them for me. I didn't ask them beforehand, so... If you think back 00:42:58.920 --> 00:43:05.540 the last 70-odd years, we started building computers in around 1940. It's been pretty 00:43:05.540 --> 00:43:12.760 dark from the point of view of rigorous, solid, reliable engineering, stuff that is 00:43:12.760 --> 00:43:21.760 actually trustworthy. Pretty dark, I would say. Maybe, just maybe there's a tiny 00:43:21.760 --> 00:43:26.339 smidgen of light coming through the doors. And we start to have a little bit of a 00:43:26.339 --> 00:43:31.109 clue and we start to get reusable models of processor architectures and programming 00:43:31.109 --> 00:43:35.540 languages and network protocols and what have you. It's just the beginnings of the 00:43:35.540 --> 00:43:39.790 analogue of that thermodynamics and materials science and quality control 00:43:39.790 --> 00:43:47.819 management and what have you that we have from mechanical engineering. And we've got 00:43:47.819 --> 00:43:57.200 no choice. If we don't, the next 75 years is going to be a lot worse. You can just 00:43:57.200 --> 00:44:02.400 imagine, right? So I went to this - as I'm sure many of you do - this great talk on 00:44:02.400 --> 00:44:08.370 some Apple boot loader exploit yesterday which was relying on some feature that had 00:44:08.370 --> 00:44:14.900 been introduced in the early '80s and was still present. And you can imagine in 100 00:44:14.900 --> 00:44:22.800 years from now, you can imagine as long as human civilisation endures, the x86 00:44:22.800 --> 00:44:28.869 architecture and the socket API and all of this stuff, it will be embedded in some 00:44:28.869 --> 00:44:34.099 monumental ghastly stack of virtualisation layers forever. 00:44:34.099 --> 00:44:45.760 laughterapplause 00:44:45.760 --> 00:44:50.720 So I'd like to thank especially all of my colleagues that have been working with me 00:44:50.720 --> 00:44:56.550 or not with me in these directions. I'd like to thank our generous funders who 00:44:56.550 --> 00:45:01.389 support this stuff. I'd like to thank you for your attention and I exhort you, 00:45:01.389 --> 00:45:04.850 sort it out. 00:45:04.850 --> 00:45:20.800 applause 00:45:20.800 --> 00:45:25.300 Herald: Thank you very much, Peter, for those interesting insights to our 00:45:25.300 --> 00:45:31.269 programming. We have now time for a Q & A, so those people who have to leave the 00:45:31.269 --> 00:45:39.970 room, please do so quietly and as fast as possible, so we can go on and hear what 00:45:39.970 --> 00:45:51.270 the questions are. So we start with microphone 4, please. 00:45:51.270 --> 00:45:58.790 Mic 4: Hello. Thanks for the talk and my question is if you got an oracle of the 00:45:58.790 --> 00:46:04.270 software behaviour which can translate every possible input to a correct output, 00:46:04.270 --> 00:46:08.410 how can this be less complex and error-prone than the reference 00:46:08.410 --> 00:46:10.510 implementation? 00:46:10.510 --> 00:46:15.620 Peter: Good question. So the first point is that this oracle doesn't have to 00:46:15.620 --> 00:46:22.010 produce the outputs. It only has to check that the inputs and the outputs match up. 00:46:22.010 --> 00:46:26.560 And then the second point is that in general it might have to have much of the 00:46:26.560 --> 00:46:33.221 same complexity, but by writing it to be clear as opposed to to be fast, you may 00:46:33.221 --> 00:46:37.730 adopt a completely different structure, right? So the structure of our TCP spec is 00:46:37.730 --> 00:46:43.540 organised by the behaviour that you see for particular transitions. The structure 00:46:43.540 --> 00:46:49.160 of a real implementation has fast-path code and lots of complicated intertwined 00:46:49.160 --> 00:46:55.309 branching and all manner of excess complexity, of implementation complexity, 00:46:55.309 --> 00:46:56.879 if you will. 00:46:56.879 --> 00:47:01.450 Mic 4: Thanks. Herald: So microphone 3, please? 00:47:01.450 --> 00:47:06.710 Mic 3: I wanted to ask you what you thought about programming by manipulating 00:47:06.710 --> 00:47:13.859 the abstract syntax tree directly so as to not allow incompilable programs because of 00:47:13.859 --> 00:47:18.410 some, like, you're missing a semicolon or something like that. What's your thoughts 00:47:18.410 --> 00:47:20.220 on that? 00:47:20.220 --> 00:47:23.700 Peter: So that's - in the grand scheme of things, I think that's not a very big 00:47:23.700 --> 00:47:28.890 deal, right? So there's - people have worked on structured editors for lordy 00:47:28.890 --> 00:47:33.309 knows how many years. It's not a big deal because it's very easy for a compiler to 00:47:33.309 --> 00:47:38.569 detect that kind of thing. And even more, it's very easy for a compiler of a 00:47:38.569 --> 00:47:43.140 sensibly designed language to detect type errors, even for a very sophisticated type 00:47:43.140 --> 00:47:49.589 system. So I don't think that - that's not - I mean, some people might find it 00:47:49.589 --> 00:47:52.930 helpful, but I don't think it's getting to the heart of the matter. 00:47:52.930 --> 00:47:54.250 Mic 3: Thanks. 00:47:54.250 --> 00:47:59.850 Herald: So we've got some questions from our signal angels from the IRC. 00:47:59.850 --> 00:48:06.150 Signal angel: Yes. There's one question. It's about the repository for Isabelle and 00:48:06.150 --> 00:48:11.300 HOL proofs. It should be on source forge, and you said there are no open source 00:48:11.300 --> 00:48:15.480 repositories. What about them? 00:48:15.480 --> 00:48:20.589 Peter: I'm not quite sure what the question is, really. So there's a 00:48:20.589 --> 00:48:28.230 repository of Isabelle formal proofs which is the archival form of proofs, it's 00:48:28.230 --> 00:48:34.790 called. I didn't really mean to say there are no open source repositories and in 00:48:34.790 --> 00:48:38.500 fact I don't know under what conditions most of those proofs are licensed. They 00:48:38.500 --> 00:48:43.210 probably are open. But there isn't an open source community building these things, 00:48:43.210 --> 00:48:47.710 right? It's still pretty much an academic pursuit. 00:48:47.710 --> 00:48:50.329 Herald: Microphone 2, please. 00:48:50.329 --> 00:48:55.530 Mic 2: Hello. Thanks again for your talk. I just want to add something that you 00:48:55.530 --> 00:49:01.970 didn't address, and that's that we actually need more good studies in 00:49:01.970 --> 00:49:07.970 software engineering. We often hear a lot of experts and also very well-known 00:49:07.970 --> 00:49:11.740 computer scientists say things like, "Well, we just need to do functional 00:49:11.740 --> 00:49:17.839 programming and OOP is bad and stuff like that", which I think there's a lot of 00:49:17.839 --> 00:49:24.109 truth to it, but we actually need studies where we test these kinds of claims that 00:49:24.109 --> 00:49:29.530 we make, because when you look at other fields which it also did compare to, like, 00:49:29.530 --> 00:49:34.250 medicine, if somebody comes around and is well-known and says things like, 00:49:34.250 --> 00:49:42.230 "homeopathy works", we don't believe them. We do trials, we do good trials. And 00:49:42.230 --> 00:49:50.430 there's a lot of myths out there, like, or not well tested things, like "hire the 00:49:50.430 --> 00:49:54.739 best programmers, they are 100 times more productive", "do steady type 00:49:54.739 --> 00:50:00.430 systems", and stuff like that. And we need to verify that this is true, that this 00:50:00.430 --> 00:50:02.210 helps. And it's... 00:50:02.210 --> 00:50:07.490 Peter: Okay. So in the grand scheme of things, arguably you're right. In the 00:50:07.490 --> 00:50:12.739 grandest scheme of things, computer science is actually a branch of psychology 00:50:12.739 --> 00:50:18.609 or really sociology. We are trying to understand how large groups of people can 00:50:18.609 --> 00:50:25.839 combine to make things which are insanely complicated. Now, for - would it be good 00:50:25.839 --> 00:50:31.329 if we had, you know, solid evidence that programming in this language was better 00:50:31.329 --> 00:50:38.450 than programming in that language? Well, yes, but it's staggeringly difficult and 00:50:38.450 --> 00:50:45.079 expensive to do honest experiments of that nature and they're scarcely ever done, 00:50:45.079 --> 00:50:49.040 right? You might do little experiments on little groups of students but it's really 00:50:49.040 --> 00:50:55.930 difficult. And then some of these things which I'm saying, when one is familiar 00:50:55.930 --> 00:51:01.740 with the different possible options, are just blindingly obvious. I mean, there are 00:51:01.740 --> 00:51:09.161 reasons why these people are using OCaml for their system programming, right? It's 00:51:09.161 --> 00:51:16.609 not - you know, it's not - "Homeopathy is right", but "Homeopathy is wrong" which is 00:51:16.609 --> 00:51:21.230 the kind of argument being put forward. 00:51:21.230 --> 00:51:24.260 Herald: Question from microphone 5, please. 00:51:24.260 --> 00:51:29.609 Mic 5: So where are you using ECC memory for your testing - up here. 00:51:29.609 --> 00:51:33.510 Peter: When you say up here, there's a bit of ambiguity. 00:51:33.510 --> 00:51:34.819 Mic 5: Here. 00:51:34.819 --> 00:51:37.860 Peter: Thank you. Say again? 00:51:37.860 --> 00:51:43.119 Mic 5: So where are you using ECC memory for the testing you showed about the 00:51:43.119 --> 00:51:51.070 results of the multithreaded results due to memory barriers and memory reorderings? 00:51:51.070 --> 00:51:55.450 Peter: Well... 00:51:55.450 --> 00:52:00.130 Mic 5: Okay, but even if you were or you were not, the point I want to make is that 00:52:00.130 --> 00:52:05.950 you can expect about 1 bit flip each minute in a modern computer system that 00:52:05.950 --> 00:52:13.670 may completely change what your software is doing, so perhaps we also have to look 00:52:13.670 --> 00:52:19.180 in ways how we can work if something goes wrong at the very low level so that we 00:52:19.180 --> 00:52:25.010 kind of have a check against our specification on a more higher level of 00:52:25.010 --> 00:52:31.940 our software. So it is valuable to do good engineering on the low levels, but still I 00:52:31.940 --> 00:52:37.670 think we will not solve the problems of computation and computer engineering just 00:52:37.670 --> 00:52:42.520 by proving things in the mathematical domain because computers are physical 00:52:42.520 --> 00:52:47.739 entities. They have errors and so on and we really have to deal with them as well. 00:52:47.739 --> 00:52:54.340 Peter: So it's certainly true that there are such random errors. In fact, I would 00:52:54.340 --> 00:52:59.480 put the point that I would argue that you have to be able to model the physics well 00:52:59.480 --> 00:53:03.680 enough and you have to be able to model the statistics of those errors well 00:53:03.680 --> 00:53:07.750 enough, so that's a different kind of mathematics. And it's certainly valuable 00:53:07.750 --> 00:53:13.290 and necessary. But if you look at the statistic you just quoted, is that the 00:53:13.290 --> 00:53:22.109 main cause of why systems go wrong? Except possibly for space hardware, no. So 00:53:22.109 --> 00:53:26.499 important, yes. The most important thing that we should be paying attention to? I 00:53:26.499 --> 00:53:29.940 don't really think so. 00:53:29.940 --> 00:53:32.980 Herald: Microphone 6, please? 00:53:32.980 --> 00:53:41.200 Mic 6: Hi. I really think that what you're proposing to specify and verify more key 00:53:41.200 --> 00:53:47.160 components is - would be a valuable addition, but how do you make sure that 00:53:47.160 --> 00:53:51.880 your specification actually matches the behaviour that you want to do or to have? 00:53:51.880 --> 00:53:58.300 Peter: So as I described, we do as partial validation of those specifications, we do 00:53:58.300 --> 00:54:03.050 a lot of testing against the implementations, against a range of 00:54:03.050 --> 00:54:07.060 existing implementations. That's one source of validation. Another source of 00:54:07.060 --> 00:54:11.520 validation is that you talk to the architects and the designers. You want the 00:54:11.520 --> 00:54:16.929 internal structure to match their intent. You want it to be comprehensible to them. 00:54:16.929 --> 00:54:21.510 Another kind of validation that we do is prove properties about them. So we proved 00:54:21.510 --> 00:54:28.750 that you can correctly compile from a C11 concurrency, a mathematical model of that, 00:54:28.750 --> 00:54:35.432 to a IBM Power concurrency. And that kind of proof gives you a bit more level of 00:54:35.432 --> 00:54:39.450 assurance in both. So none of this is giving you a total guarantee. You 00:54:39.450 --> 00:54:43.980 certainly don't claim a total guarantee. But it gives you pretty good levels of 00:54:43.980 --> 00:54:47.880 assurance by normal standards. 00:54:47.880 --> 00:54:49.580 Herald: Mic 4? 00:54:49.580 --> 00:54:56.309 Mic 4: Yes. Thanks again. You proposed random tests, and with my expertise, it's 00:54:56.309 --> 00:55:02.859 very annoying if you have tests where the outcome is sometimes a failure and you 00:55:02.859 --> 00:55:10.499 want to have reproducible tests to fix a bug. So how do we bring this aspects to 00:55:10.499 --> 00:55:17.850 test more variety in data and to have it reproducible together? 00:55:17.850 --> 00:55:24.800 Peter: Well, if - as I was mentioning, for the TCP thing, one of the - so the problem 00:55:24.800 --> 00:55:29.150 is reproducibility is exactly this internal non-determinism, right? The 00:55:29.150 --> 00:55:33.389 system makes this scheduling choice or what have you and you can't see what it 00:55:33.389 --> 00:55:39.599 is, or the checker can't see what it is. So one way to do that is to really design 00:55:39.599 --> 00:55:45.369 the protocols in a different way to expose that non-determinism. The other fact about 00:55:45.369 --> 00:55:53.421 this kind of general purpose specification as test oracle idea is that in some sense, 00:55:53.421 --> 00:55:59.240 it doesn't have to be reproducible. Right? The specification is giving a yes-no 00:55:59.240 --> 00:56:05.510 answer for an arbitrary test. And that means that you can use arbitrary tests. 00:56:05.510 --> 00:56:09.730 Figuring out the root causes of the differences afterwards may be tricky, but 00:56:09.730 --> 00:56:13.200 you can use them for assurance. 00:56:13.200 --> 00:56:18.489 Herald: So we now got time for two more questions. Microphone 1, please? 00:56:18.489 --> 00:56:23.069 Mic 1: Hiya. Thanks for a great talk. So what you've described seems to be a middle 00:56:23.069 --> 00:56:28.259 ground between a full-on, formal verification and more practical testing, 00:56:28.259 --> 00:56:34.309 like, in between. So where do you see in the future where formal verifications will 00:56:34.309 --> 00:56:39.831 go? Will they converge to the middle or whether it will still be there just to 00:56:39.831 --> 00:56:44.299 verify something that's more well-specified? 00:56:44.299 --> 00:56:49.940 Peter: So the progress of sort of the whole subject of formal verification in 00:56:49.940 --> 00:56:56.930 general, if you look at that over the last 10 years or so, it's been very - it's been 00:56:56.930 --> 00:57:01.349 really amazing compared with what we could do in the 80s and 90s and early 2000s, 00:57:01.349 --> 00:57:06.480 right? So the scope of things, the scale of things which are - for which it is 00:57:06.480 --> 00:57:11.180 becoming feasible to do verification is getting bigger. And I think that will 00:57:11.180 --> 00:57:16.339 continue. You know, I don't know when you might see a completely verified TOR 00:57:16.339 --> 00:57:23.980 client, let's say, but that's not inconceivable now. And 20 years ago, that 00:57:23.980 --> 00:57:29.470 would have been completely inconceivable. So that is expanding and at the same time, 00:57:29.470 --> 00:57:36.829 I hope we see more of this interface, text based oracle specification - and these - 00:57:36.829 --> 00:57:41.000 when you're doing formal verification of a piece of the system, you have to have 00:57:41.000 --> 00:57:47.810 these indicators already defined, all right? So these two fit very well together. 00:57:47.810 --> 00:57:51.489 Herald: So the last question from microphone 2, please. 00:57:51.489 --> 00:57:56.040 Mic 2: Hi. You mentioned that you often find bugs in hardware. Is there any effort 00:57:56.040 --> 00:58:00.310 to verify the transistors on chips themselves as well? 00:58:00.310 --> 00:58:03.440 Peter: So there's a whole world of hardware verification. That wasn't my 00:58:03.440 --> 00:58:10.640 topic today. And most of the big processor vendors have teams working on this. 00:58:10.640 --> 00:58:14.440 Unsurprisingly, if you remember your history, many of them have teams focusing 00:58:14.440 --> 00:58:21.059 on the floating-point behaviour of their processors. And they're doing - they're 00:58:21.059 --> 00:58:26.981 also doing by the standards of ten years ago, pretty spectacularly well. So there 00:58:26.981 --> 00:58:33.181 are companies that have the whole of some execution unit formally verified. There's 00:58:33.181 --> 00:58:36.979 been a lot of work over the years on verification of cache protocols and such 00:58:36.979 --> 00:58:43.100 like. Right? There's a lot of work on not verifying the hardware as a whole, but 00:58:43.100 --> 00:58:47.519 verifying that, you know, the RTL-level description of the hardware matches some 00:58:47.519 --> 00:58:54.060 lower level description. So there is a lot of that going on. 00:58:54.060 --> 00:58:57.619 Herald: Thank you very much again for this great talk. 00:58:57.619 --> 00:59:00.480 Give him another applause. Thank you, Peter. 00:59:00.480 --> 00:59:01.790 Peter: Thank you. 00:59:01.790 --> 00:59:04.219 applause 00:59:04.219 --> 00:59:15.000 subtitles created by c3subtitles.de Join, and help us!