WEBVTT 00:00:00.000 --> 00:00:14.710 34c3 intro 00:00:14.710 --> 00:00:16.290 Herald: He's been publishing in academia 00:00:16.290 --> 00:00:22.520 for almost 30 years. Please join me in giving him a warm welcome to 34c3. 00:00:22.520 --> 00:00:30.367 Applause Alastiar Reid: Thank you very much for 00:00:30.367 --> 00:00:35.797 your introduction. So I'm going to be talking about the ARM processors and 00:00:35.797 --> 00:00:41.300 they're incredibly widely used. You find them in phones, tablets, IOT devices, 00:00:41.300 --> 00:00:46.449 hard-disk drives; they're all over. And if you think about it, what I'm saying is: 00:00:46.449 --> 00:00:50.500 They're in all the things, which contain your most private and personal data, so 00:00:50.500 --> 00:00:54.819 it's really, really important that they do exactly what they should be doing and 00:00:54.819 --> 00:00:59.469 nothing else. We need to make sure we really understand them and that means it's 00:00:59.469 --> 00:01:05.309 important that we can analyze them for any malware, look for vulnerabilities and so 00:01:05.309 --> 00:01:11.990 on. So I'm going to be talking about some work I started about six years ago, 00:01:11.990 --> 00:01:19.560 creating a very precise specification of what an ARM processor does and I'm going 00:01:19.560 --> 00:01:26.670 to be talking also about back in April I'm released this their specification in a 00:01:26.670 --> 00:01:34.189 machine readable form. And I should say that I'm working with Kimbridge University 00:01:34.189 --> 00:01:42.539 to turn that in something you can use. So so this talk I'm going to mostly actually 00:01:42.539 --> 00:01:47.560 talk about this executable processor specification, that's going to be the bulk 00:01:47.560 --> 00:01:53.249 of the talk but at the end that sets me up very nicely to talk about a formally 00:01:53.249 --> 00:01:57.979 verified software. So I thought, given the theme of the Congress, it would be more 00:01:57.979 --> 00:02:05.539 useful to emphasize things you could actually do. So the specification that ARM 00:02:05.539 --> 00:02:10.660 released, what's in it? Well, there's lots of instruction descriptions of course but 00:02:10.660 --> 00:02:14.690 there's also lots of really interesting security features; things to do with 00:02:14.690 --> 00:02:20.380 memory protection, exceptions, privilege checks and so on. 00:02:20.380 --> 00:02:24.440 So there's lots of really interesting stuff ... of your interest in their how 00:02:24.440 --> 00:02:30.690 secure your devices and how to make sure it really is secure. Throughout the talk 00:02:30.690 --> 00:02:36.080 I'll be giving a bunch of links; you can go and download the specs right now from 00:02:36.080 --> 00:02:42.580 the first link but please wait to the end of the talk and there's also the 00:02:42.580 --> 00:02:47.790 specification rendered as HTML, as tools that can take the specification release 00:02:47.790 --> 00:02:52.560 apart and find useful information in it; I've written blogs and papers about it as 00:02:52.560 --> 00:03:00.910 well. And so let's just dive into, look at the bits of the actual specification. So 00:03:00.910 --> 00:03:07.150 the first thing is all the really important security features in the specif- 00:03:07.150 --> 00:03:13.070 in our processor are controlled by, what I call, the system control registers and the 00:03:13.070 --> 00:03:19.570 top dog among all those control registers is this one here called SCTLR. Just trips 00:03:19.570 --> 00:03:26.370 off the tongue, doesn't it? So SCTLR is where - it's full of lots of individual 00:03:26.370 --> 00:03:32.120 control bits, which affect either optimizations, the processor's doing, or 00:03:32.120 --> 00:03:36.950 also security features. And so let's just zoom in and one of them, to give you an 00:03:36.950 --> 00:03:40.210 idea of what kind of information the spec contains. 00:03:40.210 --> 00:03:46.770 So here's some documentation, telling you about a WXN bit. What does that do? It 00:03:46.770 --> 00:03:53.190 makes sure that any code, any, that your stack cannot contain code. you can't boot 00:03:53.190 --> 00:03:58.250 instructions on the code, and on the stack, because they're proce- if you set 00:03:58.250 --> 00:04:04.180 this bit the processor won't execute them. In other words: This is the bit that 00:04:04.180 --> 00:04:12.220 triggered the requirement for things like return-oriented programming. Okay, so what 00:04:12.220 --> 00:04:18.380 can you do with this fact? Well, suppose you're in the habit of reverse engineering 00:04:18.380 --> 00:04:24.310 some code. You might, your disassemble may show you something like this. And you're 00:04:24.310 --> 00:04:28.120 probably all staring at this going: "What on earth does that do?", because it's 00:04:28.120 --> 00:04:33.131 extremely cryptic. But using the information that's in the XML version of 00:04:33.131 --> 00:04:40.710 the release you could easily figure out how to build, how to decode some of the 00:04:40.710 --> 00:04:45.710 more cryptic parts and go "Oh actually, it's that SCTLR register", right, so you 00:04:45.710 --> 00:04:50.490 could decode the cryptic name for the number for the register into that. But you 00:04:50.490 --> 00:04:54.720 could do a bit more than that. You can see it's setting one of the bits in the 00:04:54.720 --> 00:05:00.990 register so - it is of course the bit I just told you about WXN, so we could dig 00:05:00.990 --> 00:05:08.740 into that and, so we could kind of use the information from the XML to render it, 00:05:08.740 --> 00:05:13.560 perhaps, as like this. So you can make things a bit more useful 00:05:13.560 --> 00:05:17.550 and you can also take that documentation that was there, that told you what the WXN 00:05:17.550 --> 00:05:23.470 bit does, and maybe, if you're a feeling really energetic, you could, when you 00:05:23.470 --> 00:05:27.320 click on it, mouse over or whatever, it could bring up some of that documentation. 00:05:27.320 --> 00:05:31.870 And and that makes specf- that makes it much easier to understand what the cryptic 00:05:31.870 --> 00:05:36.240 piece of code at the top is doing. Okay, so that's just a very shallow thing you 00:05:36.240 --> 00:05:43.720 can get from the specification. If we dig into the instruction descriptions there's 00:05:43.720 --> 00:05:52.570 also things like the assembly language of - the specification of the assembly syntax 00:05:52.570 --> 00:05:59.030 and. So, something I did a few years ago and which I just wrote a blog article 00:05:59.030 --> 00:06:05.190 about over the weekend was was it's possible to actually take that 00:06:05.190 --> 00:06:09.900 specification, turn it into a disassembler. So I first of all 00:06:09.900 --> 00:06:16.170 transformed it into the code I'm showing at the bottom. What this is showing is how 00:06:16.170 --> 00:06:18.250 to take a binary description of an 00:06:18.250 --> 00:06:25.710 instruction so that's the the top line of typewriter font and and then turn that 00:06:25.710 --> 00:06:31.470 into strings, which is what this the code at the bottom is describing how to do. So 00:06:31.470 --> 00:06:35.242 so you can use this as a disassembler. It's actually possible to run it in 00:06:35.242 --> 00:06:41.940 reverse and use it as an assembler; you basically run the code from bottom to top 00:06:41.940 --> 00:06:47.080 and you can turn strings into binary instructions. And we'll see more of this 00:06:47.080 --> 00:06:57.430 running things in reverse in a few slides time. So the main thing about instructions 00:06:57.430 --> 00:07:03.330 is of course they do something. So the specification contains a description of 00:07:03.330 --> 00:07:08.650 exactly what an instruction does and that description is as code, which, as a 00:07:08.650 --> 00:07:13.150 programmer, makes me very happy, right. I don't understand the English words in the 00:07:13.150 --> 00:07:17.900 specification but I do understand what the code does. So one thing you can do with 00:07:17.900 --> 00:07:22.221 code is execute it, so let's just walk through - let's suppose ... take an 00:07:22.221 --> 00:07:29.300 instruction and I match against that diagram there. I might get some values for 00:07:29.300 --> 00:07:35.680 for some of the variables from that and then I can walk through, step by step, 00:07:35.680 --> 00:07:40.380 evaluating this code, until I eventually realize that register five is having some 00:07:40.380 --> 00:07:43.889 value assigned to it. Okay, so that's a fairly basic thing you 00:07:43.889 --> 00:07:47.380 can do with the specification; interpreters are fairly easy thing to 00:07:47.380 --> 00:07:53.210 implement but once you have it there's a lot you can build on top of it. And one 00:07:53.210 --> 00:07:57.820 thing that's surprisingly easy to implement is to extract a symbolic 00:07:57.820 --> 00:08:01.680 representation of what the instruction does. So I'll just show you quickly how 00:08:01.680 --> 00:08:07.400 you do that, using the interpreter. So let's take those same concrete values but 00:08:07.400 --> 00:08:11.600 I'm also, I've added three other variables at the side there, which I'm going to 00:08:11.600 --> 00:08:16.620 treat as symbolic variables. And as I walk through the code I won't just calculate 00:08:16.620 --> 00:08:24.230 some concrete values, like the value five or 32, I'll also build up a graph, 00:08:24.230 --> 00:08:29.710 representing exactly how I came up with these values at five and so on. So as I'm 00:08:29.710 --> 00:08:34.830 executing the code I can build a graph representing exactly what this code is 00:08:34.830 --> 00:08:41.188 doing. And what I can do now is just throw away the code and focus on what that graph 00:08:41.188 --> 00:08:46.259 is telling me. So I now have a symbolic representation of 00:08:46.259 --> 00:08:51.540 one slice through that, through that instruction and I can feed that to a 00:08:51.540 --> 00:08:56.369 constraint solver, so if any of you have heard of Z3 or SMT solvers, that's what 00:08:56.369 --> 00:09:05.129 I'm talking about here. And a constraint solver is a really useful tool, because 00:09:05.129 --> 00:09:09.940 you can run code forwards through it, so given some input values you can say what 00:09:09.940 --> 00:09:16.930 are the outputs from this function or from this instruction but you can also run 00:09:16.930 --> 00:09:22.279 them backwards, given some output value, some final result you want 00:09:22.279 --> 00:09:26.790 to see, you can ask what inputs would cause this to happen. And this is just 00:09:26.790 --> 00:09:31.070 tremendously useful if you're trying to figure out what instructions you want to 00:09:31.070 --> 00:09:36.389 use to generate some particular effect. All right, so if you're trying to figure 00:09:36.389 --> 00:09:41.800 out how some particular register could be accessed, how some particular thing could 00:09:41.800 --> 00:09:47.449 be turned on or off, being able to ask what inputs will cause this to happen is 00:09:47.449 --> 00:09:53.569 incredibly useful. And you can also use the constraint solver to ask for 00:09:53.569 --> 00:09:58.069 intermediate values, in the middle of the calculation. You know if you know some 00:09:58.069 --> 00:10:02.859 value you want to see there you can ask what the inputs that would cause that to 00:10:02.859 --> 00:10:07.260 happen. So, if any of you are familiar with tools 00:10:07.260 --> 00:10:13.949 like KLEE, which is a symbolic execution tool for based on LLVM, then they use 00:10:13.949 --> 00:10:19.999 something similar to this. So, I've shown you are fairly simple draft, something I 00:10:19.999 --> 00:10:26.300 could show you how you build it, kind of on the fly. This is the actual graph for 00:10:26.300 --> 00:10:31.259 that same instruction. Here I've added in a lot more nodes to do with some of the 00:10:31.259 --> 00:10:35.980 functions that were being called and to do with the calculating, whether to take a 00:10:35.980 --> 00:10:46.279 branch or not. So this is about 80 or 90 nodes. And I've been experimenting with 00:10:46.279 --> 00:10:52.360 extending this in two ways. So this is just one path through the execution of an 00:10:52.360 --> 00:10:56.779 instruction, so one way to improve on that is to build a graph that represents all 00:10:56.779 --> 00:11:00.499 possible paths through the instruction. And that's much more useful, because you 00:11:00.499 --> 00:11:04.990 can you then can point at something and say "how can I make that happen?" and it 00:11:04.990 --> 00:11:11.920 will tell you exactly what inputs will cause the path that will make that happen. 00:11:11.920 --> 00:11:16.351 I've also been experimenting with taking the entire specification, right, so that 00:11:16.351 --> 00:11:20.509 stuff that handles exceptions, that fetches instructions or execute 00:11:20.509 --> 00:11:24.699 instructions. It contains all instructions. And I've been experimenting 00:11:24.699 --> 00:11:29.790 with building a graph representing the entire specification all at once. And 00:11:29.790 --> 00:11:33.470 that's even more useful, because now pretty much any question you have about 00:11:33.470 --> 00:11:36.720 the specification, you can ask the constraint solver and it will come back 00:11:36.720 --> 00:11:43.490 and give you an answer. And this graph for the full specification is quite large. 00:11:43.490 --> 00:11:48.540 It's about half a million nodes and that's for the smallest specification that our 00:11:48.540 --> 00:11:53.050 major uses. So it's about half a million nodes. But the great thing is modern 00:11:53.050 --> 00:11:58.009 constraint solvers can read in that half million nodes and still give you answers. 00:11:58.009 --> 00:12:04.040 Typically in about 1 to 10 seconds for most of the questions posed to them. So, 00:12:04.040 --> 00:12:08.259 these are just tremendously useful tools, if you're wanting to be able to understand 00:12:08.259 --> 00:12:15.670 exactly what the specification does, and exactly how some program is going to 00:12:15.670 --> 00:12:20.689 behave or figure out what program you want to write to make it behave some particular 00:12:20.689 --> 00:12:28.850 way. Okay so I've talked a lot about instructions, but most of us actually run 00:12:28.850 --> 00:12:33.990 programs, right? So to turn this the specification into something in execute 00:12:33.990 --> 00:12:38.260 programs, in other words turn it into a simulator for the ARM architecture, you 00:12:38.260 --> 00:12:43.839 need to add a little bit of a loop that will handle interrupts, fake instructions 00:12:43.839 --> 00:12:49.980 and then it can execute them and handle any exceptions. So, I... So I did this. I 00:12:49.980 --> 00:12:55.860 added this loop to this specification. And then, I thought I'd better try testing the 00:12:55.860 --> 00:13:03.610 specification. And, so the good news for me, because I work for ARM I have access 00:13:03.610 --> 00:13:08.259 to ARM's internal test suite. Which is something that ARM has been working on 00:13:08.259 --> 00:13:12.720 basically since the company started 25, 30 years ago. So it's quite a large test 00:13:12.720 --> 00:13:17.579 suite. It is extremely thorough, has tens of thousands of test programs in it, runs 00:13:17.579 --> 00:13:25.140 billions of instructions. And so I set about testing the, testing the 00:13:25.140 --> 00:13:30.119 specification using this test suite. And if any of you have tested software you'll 00:13:30.119 --> 00:13:33.620 be familiar with a graph that looks like this, right? The start things don't work 00:13:33.620 --> 00:13:39.529 all that well. Gradually you get things working pretty well but then there's a 00:13:39.529 --> 00:13:45.899 heavy tail of difficult to find bugs. Okay, so, and that's basically what I 00:13:45.899 --> 00:13:49.869 found when I was testing the specification. But you should all be 00:13:49.869 --> 00:13:56.379 shocked by what I just said. Because what I'm seeing is ARM's official specification 00:13:56.379 --> 00:14:03.290 could not pass ARM's official test suite when I started, right. I mean that's 00:14:03.290 --> 00:14:10.269 that's pretty shocking. And I'm telling you this and emphasizing it, not because 00:14:10.269 --> 00:14:16.850 I think ARM's specification was especially bad. I think it was just typically bad. I 00:14:16.850 --> 00:14:22.589 think if you were to take any specification for, you know, any real- 00:14:22.589 --> 00:14:27.819 world system and actually test it against the test suite, well first of all you'd 00:14:27.819 --> 00:14:32.389 find it's not an executable specification most the time and secondly you'd find it 00:14:32.389 --> 00:14:38.499 wouldn't work. And you'd probably find it would work as badly as ARM's did. So it's 00:14:38.499 --> 00:14:42.850 not just that it didn't pass all the tests. It didn't pass any tests. In fact 00:14:42.850 --> 00:14:47.809 it took me weeks to get the processor or the specification to come out of reset. 00:14:47.809 --> 00:14:54.009 Right? Just to get the starting fix to get the first instruction took weeks. So and I 00:14:54.009 --> 00:15:00.379 think so I think you'd find this if you were to try any other specification. 00:15:00.379 --> 00:15:11.129 What's my next slide? So, moving on to useful thing you can do with the 00:15:11.129 --> 00:15:16.959 specification, something we tried last summer was using it for fuzz testing. So, 00:15:16.959 --> 00:15:21.490 fuzz testing consists of taking a program and throwing random inputs at the 00:15:21.490 --> 00:15:27.959 program and just seeing what breaks and it pretty much always breaks and a modern 00:15:27.959 --> 00:15:34.329 fuzztester like maybe AFL to make it more effective. It monitors something 00:15:34.329 --> 00:15:39.351 about how the program is executing and uses that to guide its choice of what to 00:15:39.351 --> 00:15:45.181 change next. So, if it's finding...in particular monitor whether an 00:15:45.181 --> 00:15:50.179 instruction... whether the program is taking a branch or not and if it sees it's 00:15:50.179 --> 00:15:54.240 taking lots of new branches then it goes: "Ok I'll keep trying more of what I'm 00:15:54.240 --> 00:15:58.379 doing at the moment because it seems to be effective, and if it's and not finding any 00:15:58.379 --> 00:16:03.910 new branches, then it will look for something else to try changing. So, that's 00:16:03.910 --> 00:16:06.970 kind of a normal fuzzing. What you're doing is basically trying to kind of 00:16:06.970 --> 00:16:15.230 maximize your branch coverage. So, what we did though, when we did this with the 00:16:15.230 --> 00:16:20.389 specification, was, we actually monitored branches not just in the in the binary 00:16:20.389 --> 00:16:28.399 that we were analyzing but also in the specification. And what this gave us was 00:16:28.399 --> 00:16:33.040 basically: if you got.. Suppose, your the binary you're analyzing is just straight 00:16:33.040 --> 00:16:37.459 line code. There's no branches in it at all. Then there's nothing really for your 00:16:37.459 --> 00:16:41.519 fuzzer to work with right. So it doesn't see that the code is interesting, it 00:16:41.519 --> 00:16:47.199 quickly moves on to something else. But maybe your straight line cord is doing 00:16:47.199 --> 00:16:50.940 something really interesting, like accessing a privileged register. Well, 00:16:50.940 --> 00:16:54.899 there will be a branch around that to check that you do have the privilege you 00:16:54.899 --> 00:16:58.250 require. Or maybe it's accessing a memory in which 00:16:58.250 --> 00:17:03.649 cases memory protection checks. There's also checks for: Is this a device 00:17:03.649 --> 00:17:09.750 register? Or is this a kind of RAM or ROM? So, there's a number of different branches 00:17:09.750 --> 00:17:13.429 there and that gives the fuzzer interesting things to... interesting 00:17:13.429 --> 00:17:20.320 feedback to play with. So, when we set this going on one of our microkernel, 00:17:22.770 --> 00:17:30.130 we analyzed the system call interface for that microkernel. And one of the things 00:17:30.130 --> 00:17:35.330 the fuzzer surprised us with was it said: Suppose I take the stack pointer and point 00:17:35.330 --> 00:17:39.940 it into the middle of this device space and then take an exception immediately, 00:17:39.940 --> 00:17:45.530 what happens? And the answer was: there was a bug in the microkernel and what it 00:17:45.530 --> 00:17:48.679 did was: it first thing it would do is read from the stack, where the stack 00:17:48.679 --> 00:17:55.320 pointer was pointing, so we do a read from one of the devices, which doesn't wasn't 00:17:55.320 --> 00:18:00.060 really what was intended. In fact it completely breaks a security model. So, 00:18:00.060 --> 00:18:06.610 fuzztesting using not just coverage of the monitoring branches in the binary but 00:18:06.610 --> 00:18:10.940 also the specification can find you a bunch of really interesting things. And I 00:18:10.940 --> 00:18:20.110 hope some of you will I pick this idea up and run with it. So, the reason that I was 00:18:20.110 --> 00:18:29.290 doing all this work was I wanted to be able to formally verify ARM processors and 00:18:29.290 --> 00:18:34.769 so I needed to create a specification before I could do that. So, sorry, let me 00:18:34.769 --> 00:18:42.690 just take a drink here... So, this is an overly simplified picture of a processor, 00:18:42.690 --> 00:18:48.159 it's more or less what processors looked like 25, 30 years ago, in fact. 00:18:48.159 --> 00:18:53.170 But it's good enough for the example. So, if you want to check a processor, is 00:18:53.170 --> 00:18:59.770 correct, then what you can do is: add an extra logic, which will monitor the values 00:18:59.770 --> 00:19:03.140 at the start of an instruction executing the values that are finally produced at 00:19:03.140 --> 00:19:07.700 the end of an instruction executing, and then if you feed those the specification 00:19:07.700 --> 00:19:11.830 you can see what the right answer should have been. You can compare that with what 00:19:11.830 --> 00:19:17.620 the processor actually did. So, you can do this using a test-based approach, right: 00:19:17.620 --> 00:19:21.200 just feed in inputs and check that everything matches. 00:19:21.200 --> 00:19:25.570 But you can also do it using a formal verification tool called a "bounded model 00:19:25.570 --> 00:19:31.250 checker". And a bounded model checker is like one of those constraint solvers I 00:19:31.250 --> 00:19:36.760 mentioned earlier on crack cocaine. So, what it will do is: it won't just try kind 00:19:36.760 --> 00:19:41.799 of one step for what can happen but will actually try multiple steps: all possible 00:19:41.799 --> 00:19:46.130 combinations of inputs for one instruction, two instructions, three 00:19:46.130 --> 00:19:49.809 instructions, longer and longer sequences of instructions, looking for ways they can 00:19:49.809 --> 00:19:56.279 fail the property. So, and this turned out to be an incredibly effective way of 00:19:56.279 --> 00:20:00.530 finding bugs in our processors. We've actually, we're going to be using 00:20:00.530 --> 00:20:07.720 this on all future processor developments. So, there's a paper that we wrote about 00:20:07.720 --> 00:20:12.299 this but, I recommend that you go find the video for the top by Clifford 00:20:12.299 --> 00:20:20.620 Wolf from a couple of hours ago, which describes a very similar process. Okay, so 00:20:20.620 --> 00:20:25.779 I'm encouraging you to see the specification and find something awesome 00:20:25.779 --> 00:20:30.070 to do with it. There's a bunch of ideas I have suggested there, and there's a few 00:20:30.070 --> 00:20:36.850 extras which I didn't have time to describe here. But now, let me turn to 00:20:36.850 --> 00:20:43.350 this title of the talk: How can you trust formally verified software? So, what I'm 00:20:43.350 --> 00:20:50.200 talking about here is: verifying a program against some specification. But, of 00:20:50.200 --> 00:20:54.510 course, programs don't just run in a vacuum. They run into some operating 00:20:54.510 --> 00:21:01.559 system that, they use some libraries and they're also written in some programming 00:21:01.559 --> 00:21:08.210 language. And, so, when you verify your program against your specification, you're 00:21:08.210 --> 00:21:15.309 also verifying them against the specifications of Linux, glibc and ISO-C, 00:21:15.309 --> 00:21:21.940 none of which have good specifications. In fact, they're just terrible specifications 00:21:21.940 --> 00:21:27.550 which I bear little resemblance to what compilers and operating systems actually 00:21:27.550 --> 00:21:34.559 do in practice. So, if you... the current state of things is: if you do verify your 00:21:34.559 --> 00:21:39.580 program against these specifications, you will find lots of bugs. You will make your 00:21:39.580 --> 00:21:47.961 software a lot more reliable, but you'll be doing it against specifications, which 00:21:47.961 --> 00:21:51.639 are probably not very good. Just as ARM's specification was not 00:21:51.639 --> 00:21:57.850 very good before I tested it really thoroughly. And so: Do I trust formally 00:21:57.850 --> 00:22:03.350 verified software? No, not really. It's a lot better for being formally verified, 00:22:03.350 --> 00:22:06.591 but I'd also want to test it quite thoroughly and maybe to use a bit of fuzz 00:22:06.591 --> 00:22:15.549 testing as well. Okay, so this is my final slide, by the way. So, I'm encouraging you 00:22:15.549 --> 00:22:19.500 to go out and do something with the specification. If you're interested in 00:22:19.500 --> 00:22:25.779 this, then do contact me! Do ask me questions, if you have trouble. I'm also 00:22:25.779 --> 00:22:31.759 going to mention: if there's any white hat hackers out there in the... white hat 00:22:31.759 --> 00:22:36.639 hackers in the audience, then do please talk to me or Milisch Meriac who's here in 00:22:36.639 --> 00:22:44.750 the front row, if you're interested in working at ARM and I like to thank a whole 00:22:44.750 --> 00:22:50.150 lot of people at ARM and elsewhere, who've helped me in this work and also I'd like 00:22:50.150 --> 00:22:54.149 to thank you for giving me your attention for the last half hour. 00:22:54.149 --> 00:22:59.789 Applause 00:22:59.789 --> 00:23:03.330 Herald: So, we have time for some 00:23:03.330 --> 00:23:06.340 questions. I would ask anyone that has a question to line up at one of the 00:23:06.340 --> 00:23:10.559 microphones that are in the aisles here one through eight. Questions for Alastair 00:23:10.559 --> 00:23:14.019 there about formal verification, about working at ARM, how is the weather in 00:23:14.019 --> 00:23:19.539 Cambridge. Try to keep it on topic. And signal angel: do we have already questions 00:23:19.539 --> 00:23:22.590 from online? Signal Angel: No questions yet. 00:23:22.590 --> 00:23:25.409 Herald: Okay, then let's go to microphone number one. 00:23:25.409 --> 00:23:33.150 Microphone 1: Hi... I was just AR: Maybe tiptoes. 00:23:33.150 --> 00:23:38.009 MIC 1: Yes, I was just curious how are you actually using the machine specification 00:23:38.009 --> 00:23:42.360 in order to generate VCs for the SMG solver. Because you didn't really get a 00:23:42.360 --> 00:23:47.759 chance to talk about that I guess. AR: Trying to think how I can describe 00:23:47.759 --> 00:23:56.639 that briefly... My basic idea is to take the specification, which basically... the 00:23:56.639 --> 00:24:00.630 specification is describing a piece of hardware. And so, I try to do what a 00:24:00.630 --> 00:24:04.610 hardware engineer would do, if they were actually building a machine that 00:24:04.610 --> 00:24:09.360 implemented it. So I end up with something that's essentially a giant circuit. So, 00:24:09.360 --> 00:24:14.289 that was the graph I was describing. So, whenever this control flow, whenever the 00:24:14.289 --> 00:24:18.960 control flow joins back up, I have to introduce things to slide between the 00:24:18.960 --> 00:24:22.259 value of what was calculated in the left or the right. And so on. 00:24:22.259 --> 00:24:27.270 MIC 1: Yeah, I guess I'm mostly curious about, like, what logics you're using for, 00:24:27.270 --> 00:24:31.240 like, the solvers and stuff like that. AR: I see. Just very basic solvers because 00:24:31.240 --> 00:24:34.440 they run fastest. MIC 1: Then ugh 00:24:34.440 --> 00:24:40.539 H: Thank you. So, microphone 6 please. MIC 6: I was just wondering, if you could 00:24:40.539 --> 00:24:46.730 talk some little bit about the language you used to write your specification. 00:24:46.730 --> 00:24:54.309 AR: So this is a language which basically I inherited from ARM's documentation. So, 00:24:54.309 --> 00:24:58.440 all processors are described using pseudocode and what I realized was that 00:24:58.440 --> 00:25:02.789 the pseudocode the ARM had started writing was actually very close to being a 00:25:02.789 --> 00:25:06.879 language. And so, I sort of reverse engineered the language hiding inside the 00:25:06.879 --> 00:25:14.200 pseudocode, built some tools for it, and just kind of figured out what the language 00:25:14.200 --> 00:25:20.370 could possibly mean, given what I thought processors actually did. 00:25:20.370 --> 00:25:27.850 And the language itself is... it's just a very simple imperative language. It's 00:25:27.850 --> 00:25:33.740 actually got inherits from BBC basic for anyone who's about the same age as me and 00:25:33.740 --> 00:25:39.620 remembers BBC basic or it's a bit like Pascal... It's it's not a complicated 00:25:39.620 --> 00:25:44.059 language. It's actually designed so that as many people as possible can read it and 00:25:44.059 --> 00:25:47.540 know exactly what it says without any doubt, without having to consult a 00:25:47.540 --> 00:25:53.239 language lawyer. H: Signal angel? Yet anything? 00:25:53.239 --> 00:25:58.990 Signal Angel: Yes, now we've got a question: "Has ARM its own form of the 00:25:58.999 --> 00:26:01.099 Intel Management engine?" 00:26:08.950 --> 00:26:12.220 AR: No is the short answer. Yeah, I don't 00:26:12.220 --> 00:26:20.160 think we have anything quite like that. If you... yeah, I'll just say no. 00:26:20.160 --> 00:26:22.929 Laughter H: Microphone one. 00:26:22.929 --> 00:26:27.970 MIC 1: Hi! On that question that we had before about the specification language: 00:26:27.970 --> 00:26:33.779 Have you considered using Z3's own language for expressing the instructions? 00:26:33.779 --> 00:26:41.380 AR: So, Z3's own language is basically write kind of s-expressions, which, if you 00:26:41.380 --> 00:26:46.690 like lisp is wonderful but for anybody who doesn't like Lisp it's a bit of a turn-off 00:26:46.690 --> 00:26:50.169 and a bit of a barrier to understanding it. So, again the specification is 00:26:50.169 --> 00:26:54.410 designed so that people can look at it and go: "Oh, I see what's going on here", and 00:26:54.410 --> 00:26:57.259 not have... and I just try to minimize the barriers. 00:26:57.259 --> 00:27:03.230 MIC 1: Fair enough! H: Last call, signal angel! 00:27:04.684 --> 00:27:07.923 SA: How long does the complete test of the arms specification take? 00:27:09.540 --> 00:27:20.929 AR: About two years. So, yeah, so a modern processor team about 80% of that team will 00:27:20.929 --> 00:27:26.679 be verification engineers. And, so, they're basically writing new tests, 00:27:26.679 --> 00:27:30.150 running old tests, diagnosing them, just doing that continuously for the entire 00:27:30.150 --> 00:27:34.960 life of a project, which is probably about three years. And after about the first 00:27:34.960 --> 00:27:38.860 year, you basically have a processor that it mostly works, and after that it's kind of 00:27:38.860 --> 00:27:47.590 debugging it and it's, you know, kind of fine-tuning the performance. So, yeah, it 00:27:47.590 --> 00:27:53.929 takes a really long time. To run the actual tests, I don't actually know 00:27:53.929 --> 00:28:00.400 because actually one of my colleagues in the audience, they've actually run the 00:28:00.400 --> 00:28:07.960 tests. But, yeah, I don't know... and we use a lot of processors in parallel, so I 00:28:07.960 --> 00:28:12.620 really don't have an idea. H: Thank you so much, Alastair! Let's give 00:28:12.620 --> 00:28:18.476 him another warm round of applause! Applause 00:28:18.476 --> 00:28:24.017 34c3 outro 00:28:24.017 --> 00:28:40.000 subtitles created by c3subtitles.de in the year 2018. Join, and help us!