35C3 preroll music Herald angel: OK. Our next speaker will be trying to tame the chaos. Peter Sewell from the University of Cambridge. A warm round of applause please. applause Sewell: Thank you! Okay. So here we are. C3 again with a program full of fascinating and exciting and cool stuff. And if we look just at the security talks all manner of interesting things. I'm going to go to a lot to these. You should too. Let's see though if we read some of the titles: exploiting kernel memory corruptions, attacking end to end e-mail encryption. What else have we got: modern Windows user space exploitation, compromising online accounts. OK so a lot of these talks, as usual, they're explaining to us that our computers, they're not doing what we would like and this as usual is the merest tip of a tiny iceberg, alright. We have a hundreds of thousands of known vulnerabilities and many unknown vulnerabilities. Let me do it a bit dark here, but let me try some audience participation. How many of you have in the last year written at least a hundred lines of code? And of those people keep your hands up if you are completely confident that code does the right thing. laughter I would like to talk to you later laughter and so would the people in the self driving car that you're probably engineering. So, so how many unknown vulnerabilities then. Well you can take what's in your mind right now and multiply it by - oh, this doesnt work very well. No. No, no. Computers, they do the wrong thing again and again and again. laughter and some applause We can multiply by this estimate of about a 100 billion lines of new code each year. So if we take all of this: these talks, these numbers, these should make us not happy and excited, these should make us sad, very sad and frankly rather scared of what is going on in the world. So what can we do? We can, option one, give up using computers altogether. applause In most audiences this will be a hard sell but here I'm sure you're all happy to just turn them off now. Or we can throw up our hands in the air and collapse in some kind of pit of despair. Well, maybe, maybe not. My task today is to give you a tiny smidgen, a very tall, a very small amount of hope that it may be possible to do slightly better. But if we want to do better we first have to understand why things are so bad and if we look at our aeronautical engineering colleagues for example, they can make planes which very rarely fall out of the sky. Why is it that they can do that and we are so shit at making computers? Well, I'm going to reuse a picture that I used at 31C3, which is still the best description I can find of the stack of hardware and software that we live above. Here we go. It's, there's so much information in this, it's just ace. So we see down here all of this transparent silicon, it's the hardware we live above. We see a stack of phases of a C compiler, we see all kinds of other other bits and pieces. There might be a slightly hostile wireless environment in this room for some reason. If we look at this and think about the root causes for why our systems are so bad we can see terrible things. So the first is, obviously there's a lot of legacy complexity that we're really stuck with, alright. If you pull out one of those pieces from the middle and try and replace it with something which is not of exactly the same shape the whole pile will collapse. So we somehow need to find an incremental path to a better world. And then, this is the most fundamental reason that these are not like aeroplanes, these systems are discrete not continuous. If you take an honest good I made out of a piece of steel and you push on it a little bit it moves a little bit, basically in proportion. If it is 1 percent too strong, 1 percent to weak, basically it doesn't matter. But in these things one line of code can mean you're open to a catastrophic exploit and one line in many, many million. OK. And next thing... this really doesn't work. They're very complicated. But the scary thing is not the static complexity of those lines of code and the number of components although that's pretty intimidating the really scary thing is that the exponential number of states and execution paths. So these two together they mean that the testing that we rely on testing is the only way we have to build systems which are not instantly broken. Testing can never save us from these exploitable errors. And that means ultimately we need to do proof. Honest machine checked mathematical proof. And this also tells us that we need to arrange for our systems to be secure for simple reasons that do not depend on the correctness of all of those hundred billion lines of code. Then another thing that we have here. All these interfaces: the architecture interface, the C language interface, the sockets API interface, the TCP interface. All of these we rely on to let different parts of the system be engineered by different organizations. But they're all really badly described and badly defined. So what you find is, for each of these, typically a prose book varying in thickness between about that and about that, full of text. Well, we still rely on testing - limited though it is - but you can never test anything against those books. So we need instead interface descriptions, definitions, specifications, that are more rigorous, mathematically rigorous and that are executable - not in the normal sense of executable as an implementation but executable as a test oracle. So you can compute whether some observed behaviour is allowed or not and not have to read the book and argue on the Internet. And we also need interface definitions that support this proof that we need. And then all of this stuff was made up in the 1970s or the sixties and in the 70s and the 60s the world was a happy place and people walked around with flowers in their hair and nothing bad ever happened. And that is no longer the case. And so we need defensive design, but we need defensive design that is somehow compatible or can be made enough compatible with this legacy investment. That's quite hard. And then - I can't say very much about this, but we have at the moment very bad incentives, because we have a very strong incentive to make things that are more complex than we can possibly handle or understand. We make things, we add features, we make a thing which is just about shippable and then we ship it. And then we go on to the next thing. So we need economic and legal incentives for simplicity and for security that we do not yet have. But it's hard to talk about those until we have the technical means to react to those incentives. OK, so what I'm going to do now, I'm going to talk about two of these interfaces, the architect interface and the C language interface, and see what we can do to make things better. A lot of this has to do with memory. So whoever it was that picked the subtitle for this meeting "refreshing memories" thank you, because it's great, I love it. Let's refresh your memory quite a way. So I invite you to think back to when you were very very young in about 1837 when cool hacking was going on. Charles Babbage was designing the Analytical Engine and even then you see there was this dichotomy between a mill performing operations and a store holding numbers. This is a plan view of the analytical engine, well, it was vaporware, but a design from the analytical engine, and you see here these circles these are columns of numbers each made out of a stack of cogs, maybe a 50 digit decimal number in each of those columns. And this array, really, he imagined it going on out to and over there somewhere, with about 1000 numbers. So even then you have a memory which is an array of numbers. I think these were not- I don't think you could do address computation on these things, I think addresses were only constants, but, still, basically an array of numbers. So okay what do we got now. Let's look a bit at C. How many of those people who have programmed 100 lines of code, how many of you were C programmers? Quite a few. Or maybe you're just embarrassed. I forgot to say. Those that didn't put your hands up for that first question. You should feel proud and you should glory in your innocence while you still have it. You are not yet complicit in this trainwreck. It worked then. Okay so here's a little piece of C-code which I shall explain. I shall explain it in several different ways. So we start out. It creates two locations x and secret... Don't do that. This is so bad. Creates x, storing one and secret_key which stores, I might get this wrong, your pin. And then there's some computation which is supposed to only operate on x but maybe it's a teensy bit buggy or corrupted by somebody and then we try and we make a pointer to x and in this execution x just happened to be allocated at 14. This pointer contains the number 14 and then we add one to that pointer. It's C so actually that adding one to the pointer it really means add the four bytes which are the size of that thing. So we add 4 to 14 and we get 18. And then we try and read the thing which is pointed to. What's going to happen. Let me compile this and run it in a conventional way. It printed a secret key. Bad. This program, rather distressingly, this is characteristic by no means of all security flaws but a very disturbingly large fraction of all the security flaws are basically doing this. So does C really let you do that. Yes and no. So if you look at the C standard, which is one of these beautiful books, it says you, have to read moderately carefully to understand this, but it says for this program has undefined behavior. And many of you will know what that means but others won't, but, so that means as far as the standard is concerned for programs like that there is no constraint on the behavior of the implementation at all. Or said another way and maybe a more usefully way: The standard lets implementations, so C compilers, assume that programmers don't have this kind of stuff and it's the programmer's responsibility to ensure that. Now in about 1970, 75 maybe 1980, this was a really good idea, it gives compilers a lot of flexibility to do efficient implementation. Now not so much. How does this work in the definition? So the standard somehow has to be able to look at this program and identify it as having this undefined behavior. And indeed, well, if we just think about the numbers in memory this 18, it points to a perfectly legitimate storage location and that plus one was also something that you're definitely allowed to do in C. So the only way that we can know that this is undefined behavior is to keep track of the original allocation of this pointer. And here I've got these allocation identifiers at 3, at 4, at 5, and you see here I've still got this pointer despite the fact that I added 4 to it. I still remembered the allocation idea so I can check, or the standard can check, when we try and read from this whether that address is within the footprint of that original allocation, i.e. is within there and in fact it is not, it's over here, it is not within there at all. So this program is deemed to have undefined behavior. Just to clarify something people often get confused. So we detect undefined behavior here but it isn't really a temporal thing. The fact that there is undefined behavior anywhere in the execution means the whole program is toast. Okay but this is really interesting because we're relying for this critical part of the standard onto information which is not there at run time in a conventional implementation. So just to emphasize that point, if I compile this program, let's say to ARM assembly language I get a sequence of store and load and add instructions, store, load, add, store, load, load. And if I look at what the ARM architecture says can happen these blue transitions... one thing to notice is that we can perfectly well do some things out of order. At this point we could either do this load or we could do that store. That would be a whole other talk. Let's stick with the sequential execution for now. What I want to emphasize is that these, this load of this address, really was loading a 64 bit number, which was the address of x and adding 4 to it and then loading from that address, the secret key. So there is no trace of these allocation ideas. No trace at all. Okay, let me step back a little bit. I've been showing you some screenshots of C behaviour and ARM behaviour and I claim that these are actually showing you all of the behaviours allowed by what one would like the standards to be for these two things. And really what I've been showing you are GUIs attached to mathematical models that we've built in a big research project for the last many years. REMS: "Rigorous Engineering of Mainstream Systems" sounds good, aspirational title I think I would say. And in that we've built semantics, so mathematical models defining the envelope of all allowed behaviour for a quite big fragment of C and for the concurrency architecture of major processors ARM, and x86, and RISC-V, and IBM POWER and also for the instruction set architecture of these processors, the details of how individual instructions are executed. And in each case these are specs that are executable as test oracles, you can compare algorithmically some observed behaviour against whether spec says it's allowed or not, which you can't do with those books. So is it just an idiot simple idea but for some reason the industry has not taken it on board any time in the last five decades. It's not that hard to do. These are also mathematical models, I'll come back to that later. But another thing I want to emphasize here is that in each of these cases, especially these first two, when you start looking really closely then you learn what you have to do is not build a nice clean mathematical model of something which is well understood. What you learn is that there are real open questions. For example within the C standards committee and within ARM a few years ago about exactly what these things even were. And that is a bit disturbing given that these are the foundations for all of our information infrastructure. There is also a lot of other work going on with other people within this REMS project on web assembly and Javascript and file systems and other concurrency. I don't have time to talk about those but Hannes Mehnert it is going to talk us a little bit about TCP later today. You should go to that talk too if you like this one. If you don't like this one you should still go to that talk. Okay so this is doing somewhat better. certainly more rigorous engineering of specifications, but so far only for existing infrastructure for this C language, ARM architecture, what have you. So what if we try and do better, what if we try and build better security in. So the architectures that we have, really they date back to the 1970s or even 60s with the idea of virtual memory. That gives you - I need to go into what it is - but that gives you very coarse grain protection. You can have your separate processes isolated from each other and on a good day you might have separate browser tabs isolated from each other in some browsers, sometimes, but it doesn't scale. It's just too expensive. You have lots of little compartments. One thing we can do we can certainly design much better programming languages than C but all of that legacy code, that's got a massive inertia. So an obvious question is whether we can build in better security into the hardware that doesn't need some kind of massive pervasive change to all the software we ever wrote. And that's the question that a different large project, the CHERI project has been addressing. So this is something that's been going on for the last 8 years or so, mostly at SRI and at Cambridge funded by DARPA and the EPSRC and ARM and other people, mostly led by Robert Watson, Simon Moore, Peter Neumann and a cast of thousands. And me a tiny bit. So... How, don't do that. I should learn to stop pushing this button. It's very hard to not push the button. So here's the question in a more focused way that CHERI is asking: Can we build hardware support which is both efficient enough and deployable enough that gives us both fine grained memory protection to prevent that kind of bug that we were talking about earlier, well that kind of leak at least, and also scalable compartmentalization. So you can take bits of untrusted code and put them in safe boxes and know that nothing bad is going to get out. That's the question. The answer... The basic idea of the answer is pretty simple and it also dates back to the 70s is to add hardware support for what people call capabilities, and we'll see that working in a few moments. The idea is that this will let programmers exercise the principle of least privilege, so with each little bit of code having only the permissions it needs to do its job and also the principle of intentional use. So with each little bit of code when it uses one of those capabilities, will require it to explicitly use it rather than implicitly. So the intention of the code has to be made plain. So let's see how this works. So these capabilities they're basically replacing some or maybe all of the pointers in your code. So if we take this example again in ISO C we had an address and in the standard, well, the standard is not very clear about this but we're trying to make it more clear, an allocation ID, say something like it. Now in Cherry C we can run the same code but we might represent this pointer not just with that number at runtime but with something that contains that number and the base of the original allocation and the length of the original allocation and some permissions. And having all of those in the pointer means that we can do.. the hardware can do, at run time, at access time, a very efficient check that this is within this region of memory. And if it's not it can fail stop and trap. No information leak. I need a bit more machinery to make this actually work. So it would be nice if all of these were 64 bit numbers but then you get a 256 bit pointer, and that's a bit expensive so there's a clever compression scheme that squeezes all this down into 1 to 8 bits with enough accuracy. And then you need to design the instruction set of the CPU carefully to ensure that you can never forge one of these capabilities with a normal instruction. You can never just magic one up out of a bunch of bits that you happen to find lying around. So all the capability manipulating instructions, they're going to take a valid capability and construct another, possibly smaller, in memory extent, or possibly with less permissions, new capability. They're never going to grow the memory extent or add permissions. And when the hardware starts, at hardware initialization time, the hardware will hand the software a capability to everything and then as the operating system works and the linker works and the C language allocator works, these capabilities will be chopped up into ever finer and smaller pieces like to be handed out the right places. And then need one more little thing. So this C language world we're living in here, or really a machine code language world so there's nothing stopping code writing bytes of stuff. So you have to somehow protect these capabilities against being forged by the code, either on purpose or accidentally writing bytes over the top. You need to preserve their integrity. So that's done by adding for each of these 128 bit sized underlined units of memory just a one extra bit. That holds a tag and that tag records whether this memory holds are currently valid capability or not. So all the capability manipulating instructions, if they're given a valid capability with a tag set then the result will still have the tags set. But if you write, so you just wrote that byte there which might change the base then the hardware will clear that tag and these tags, they're not conventionally addressable, they don't have addresses. They just stop there in the memory system of the hardware. So there is no way that soft- ware can inaudible through these. So this is really cool. We've taken, what in ISO was undefined behavior, in the standard, and in implementations was a memory leak, and we've turned it into something that in CHERI C is in both the specification and the implementation, is guaranteed to trap, to fail stop, and not leak that information. So another few things about the design that I should mention. I think all these blue things I think I've talked about. But then a lot of care has gone in to make this be very flexible to make it possible to adopt it. So you can use capabilities for all pointers, if you want to, or just at some interfaces. You might for example use them at the kernel userspace interface. It coexist very nicely with existing C and C++. You don't need to change the source code very much at all, and we'll see some a tiny bit of data about this, to make these to start using these. It coexists with the existing virtual memory machinery, so you can use both capabilities and virtual memory, if you want, or you can just use capabilities if you want or just use virtual memory if you want. And then there's some more machinery, which I'm not going to talk about, for doing this secure encapsulation stuff. What I've talked about so far is basically the the fine grained memory protection story. And I should say the focus so far is on spatial memory safety. So that's not in the first instance protecting against reuse of an old capability in a bad way but there various schemes for supporting temporal memory safety with basically the same setup. Okay so. So how do we... Okay this is some kind of a high level idea. How do we know that it can be made to work. Well the only way is to actually build it and see. And this has been a really interesting process because mostly when you're building something either in academia or an industry you have to keep most of the parts fixed, I mean you are not routinely changing both the processor and the operating system, because that would just be too scary. But here we have been doing that and indeed we've really been doing a three way, three way codesign of building hardware and building and adapting software to run above it and also building these mathematical models all at the same time. This is, well. A, it's all the fun because you can often get groups of people together that can do all of those things but B, it's really effective. So what we've produced then is an architecture specification. In fact, extending MIPS because it was conveniently free of IP concerns and that specification is one of these mathematically rigorous things expressed in a domain specific language specifically for writing instruction set architecture specifications, and we've designed and implemented actually two of those. And then there are hardware processor implementations that actually run an FPGAs. And a lot of hardware research devoted to making this go fast and be efficient despite these extra tags and whatnot. And then there's a big software stack adapted to run over this stuff. Including Clang and a FreeBSD kernel and FreeBSD userspace and WebKit and all kinds of pieces. So this is a very major evaluation effort. That one is not normally able to do. This is why, this combination of things is why that list of people up there was about 40 people. I say this is based on MIPS but the ideas are not specific to MIPS, there are ongoing research projects to see if this can be adapted to the ARM application class architecture and the ARM microcontroller architecture and the RISC-V. We'll see how that goes, in due course. Okay so then with all of these pieces we can evaluate whether it actually works. And that's still kind of an ongoing process but the data that we've got so far is pretty encouraging. So we see here first, performance. So you see, maybe a percent or 2 in many cases of runtime overhead. There are workloads where it performs badly if you have something that's very pointer heavy, then the extra pressure from those larger pointers will be a bit annoying, but really it seems to be surprisingly good. Then is it something that can work without massive adaption to the existing code. Well, in this port of the FreeBSD userspace, so all the programs that, all in all programs that run, they were something like 20000 files and only 200 of those needed a change. And that's been done by one or two people in not all that large an amount of time. Some of the other code that's more and more like an operating system needs a bit more invasive change but still, it seems to be viable. Is it actually more secure? How are you going to measure that. We like measuring things as a whole we try and do science where we can. Can we measure that? Not really. But it certainly does, the whole setup certainly does, mitigate against quite a large family of known attacks. I mean if you try and run Heartbleed you'll get a trap. It will say trap. I think he will say signal 34 in fact. So that's pretty good. And if you think about attacks, very many of them involve a chain of multiple floors put together by an ingenious member of the C3 community or somebody else, and very often, at least one of those is some kind of memory access permission flaw of some kind or other. Okay so this is nice, but I was talking a lot in the first part of the talk, about rigorous engineering. So here we've got formally defined definitions of the architecture and that's been really useful for testing against and for doing test generation on the basis of, and doing specification coverage in an automated way. But surely we should be able to do more than that. So this formal definition of the architecture, what does it look like. So for each instruction of this CHERI MIPS processor, there's a definition and that definition, it looks a bit like normal code. So here's a definition of how the instruction for "capability increment cursor immediate" goes. So this is a thing that takes a capability register and an immediate value and it tries to add something on to the cursor of that capability. And what you see here is some code, looks like code, that defines exactly what happens and also defines exactly what happens if something goes wrong. You can see there's some kind of an exception case there. That's a processor. No that's a... Yeah, that a "raising our processor" exception. So it looks like code, but, from this, we can generate both reasonably high performance emulators, reasonably in C and in OCaml, and also mathematical models in the theorem provers. So three of the main theorem provers in the world are called Isabelle and HOL and Coq. And we generate definitions in all of those. So then we got a mathematical definition of the architecture. Then finally, we can start stating some properties. So Cherrie's design with some kind of vague security goals in mind from the beginning but now we can take, let'ssay one of those goals and make it into a precise thing. So this is a kind of a secure encapsulation, kind of thing not exactly the memory leak that we were looking at before. Sorry, the secret leak that we were looking at. So let's take imagine some CHERI compartment. Haven't told you exactly what that means but let's suppose that that's understood and that inside that compartment there is a piece of software running. And that might be maybe a video codec or a web browser or even a data structure implementation maybe, or a C++ class and all of its objects maybe. So imagine that compartment running and imagine the initial capabilities that it has access to via registers and memory and via all the capabilities it has access to via the memory that it has access to and so on recursively. So imagine all of those capabilities. So the theorem says no matter how this code executes whatever it does however compromised or buggy it was in an execution up until any point at which it raises an exception or makes an inter compartment call it can't make any access which is not allowed by those initial capabilities. You have to exclude exceptions and into compartment calls because by design they do introduce new capabilities into the execution. But until that point you get this guarantee. And this is something that we've proved actually [...] has approved with a machine check proof in in fact the Isabelle theorem prover. So this is a fact which is about as solidly known as any fact the human race knows. These provers they're checking a mathematical proof in exceedingly great detail with great care and attention and they're structured in such a way that the soundness of approver depends only on some tiny little kernel. So conceivably cosmic rays hit the transistors at all the wrong points. But basically we know this for sure. I emphasize this is a security property we have not proved, we certainly wouldn't claim to have proved that CHERI is secure and there are all kinds of other kinds of attack that this statement doesn't even address but at least this one property we know it for real. So that's kind of comforting and not a thing that conventional computer science engineering has been able to do on the whole. So, are we taming the chaos then. Well no. Sorry. But I've shown you two kinds of things. The first was showing how we can do somewhat more rigorous engineering for that existing infrastructure. It's now feasible to do that and in fact I believe it has been feasible to build specifications which you can use as test oracles for many decades. We haven't needed any super fancy new tech for that. We've just needed to focus on that idea. So that's for existing infrastructure and then CHERI is a relatively light weight change that does built-in improved security and we think it's demonstrably deployable at least in principle. Is it deployable in practice? Are we going to have in all of our phones five years from now? Will these all be CHERI enabled? I can't say. I think it is plausible that that should happen and we're exploring various routes that might mean that there happens and we'll see how that goes. Okay so with that I aiming to have given you at least not a whole lot of hope but just a little bit of hope that the world can be made a better place and I encourage you to think about ways to do it because Lord knows we need it. But maybe you can at least dream for a few moments of living in a better world. And with that I thank you for your attention. applause Herald Angel: Thanks very much. And with that we'll head straight into the Q and A. We'll just start with mic number one which I can see right now. Q: Yes thanks for the very encouraging talk. I have a question about how to C code the part below that. The theorem that you stated assumes that the mentioned description matches the hardware at least is describes the hardware accurately. But are there any attempts to check that the hardware actually works like that? That there are no wholes in the hardware? That some combination of mentioned commands work differently or allow memory access there is not in the model? So, is it possible to use hardware designs and check that it matches the spec and will Intel or AMD try to check their model against that spec? Sewell: OK. So the question is basically can we do provably correct hardware underneath this architectural abstraction. And the answer is... So it's a good question. People are working on that and it can be done I think on at least a modest academic scale. Trying to do that in its full glory for a industrial superscalar processor design, is right now out of reach. But it's certainly something one would like to be able to do. I think we will get there maybe in a decade or so. A decade is quick. Herald Angel: Thanks. Mic number four is empty again. So we'll take the Internet. Sewell: Where is the internet? Signal Angel: The internet is right here and everywhere so ruminate would like to know is the effort and cost of building in security to hardware as you've described in your talk significantly greater or less than the effort and cost of porting software to more secure languages. Sewell: So, is the effort of building new hardware more or less than the cost of porting existing software to better languages? I think the answer has to be yes. It is less. The difficulty with porting software is all of you and all your colleagues and all of your friends and all your enemies have been beavering away writing lines of code for 50 years. Really, I don't know what to say, effectively. Really numerously. There's an awful lot of it. It's a really terrifying amount of code out there. It's very hard to get people to rewrite it. Signal Angel: OK, mic two. Q: OK, given that cost of ... of today on the software level not on the hardware level, is it possible to go half way using an entire system as an oracle. So during development there are code based with the secure hardware but then essentially ship the same software to unsecure hardware and inaudible Sewell: So this question, if I paraphrase it is can we use this hardware implementation really as the perfect sanitiser? And I think there is scope for doing that. And you can even imagine running this not on actual silicon hardware but in like a QEMU implementation which we have in order to do that. And I think for for that purpose it could already be a pretty effective bug finding tool. It would be worth doing. But you would like as for any testing it will only explore a very tiny fraction of the possible paths. So we would like to have it always on if we possibly can. Herald Angel: OK. The internet again. Signal Angel: OK someone would like to know how does your technique compare to Intel MPX which failed miserably with a inaudible ignoring it. Will it better or faster? Could you talk a little bit about that. Sewell: I think for that question, for a real answer to that question, you would need one of my more hardware colleagues. What I can say is they make nasty faces when you say MPX. Herald Angel: Well that's that. Mic number one. Q: Somewhat inherent to your whole construction was this idea of having an unchangeable bit which described whether your pointer contents have been changed. Is this even something that can be done in software? Or do you have to construct a whole extra RAM or something? Sewell: So you can't do it exclusively in software because you need to protect it. In the hardware implementations that my colleagues have built it's done in a reasonably efficient way. So it's cached and managed in clever ways to ensure that it's not terribly expensive to do. But you do need slightly wider data paths in your cache protocol and things like that to manage it. Herald Angel: OK. Mic seven. Q: How good does this system help securing the control flow integrity compared to compared to other systems, like hardware systems data flow isolation or code pointer integrity in ARM inaudible? Sewell: Ah well, that's another question which is a bit hard to answer because then it depends somewhat on how you're using the capabilities. So you can arrange here that each state each C language allocation is independently protected and indeed you can also arrange that each way - this is sensible - that each subobject is independent protected. And those protections are going to give you protection against trashing bits of the stack because they'll restrict the capability to just those objects. Herald Angel: OK the internet again. Signal Angel: Twitter would like to know is it possible to is it possible to run CHERI C without custom hardware? Sewell: Can you run CHERI C without custom hardware? And the answer is you can run it in emulation above this QEMU. That works pretty fast, I mean fast enough for debugging as the earlier question was talking about. You can imagine you know somewhat faster emulations of that. It's not clear how much it's worth going down that route. The real potential big win is to have this be always on and that's what we would like to have. Herald Angel: OK. Mic one. Q: Do you have some examples of the kinds of code changes that you need to the operating system and userland that you mentioned? Sewell: So, okay so what kind of changes do you need to to adapt code to CHERI? So, if you look at C-code there is the C standard that deems a whole lot of things to be undefined behavior and then if you look at actual C code you find that very very much of it does depend on some idiom that the C standard does not approve of. So there is for example, pause there are quite a lot of instances of code that constructs a pointer by doing more than one more than plus one offset and then brings it back within range before you use it. So in fact in CHERI C many of those cases are allowed but not all of them. More exotic cases where there's really some crazy into object stuff going on or where pointer arithmetic between objects - which the C standard is certainly quite down on but which does happen - and code which is, say manipulating the low order bits of a pointer to store some data in it. That's pretty common. You can do it in CHERI C but you might have to adapt the code a little bit. Other cases are more fundamental. So say, in operating system code if you swap out a page to disk and then you swap it back in then somehow the operating system has to reconstruct new capabilities from a large capability which it kept for the purpose. So that needs a bit more work. Though it's kind of a combination. Some things you would regard as good changes to the code anyway and others are more radical. Herald Angel: OK, another one from the internet. Signal Angel: I lost one pause Sewell: The Internet has gone quiet. Yes. Signal Angel: Last question from the internet. Are there any plans to impact test on Linux or just FreeBSD? Sewell: If anyone has a good number of spare minutes then that would be lovely to try. The impact on Linux not just previously the BSD code base is simpler and maybe cleaner and my systemsy colleagues are already familiar with it. It's the obvious target for an academic project doing it already a humungous amount of work. So I think we would love to know how Linux and especially how Android could be adapted but it's not something that we have the resource to do at the moment. Herald Angel: Mic number on again. Q: How likely or dangerous. Herald Angel: Mic number one is not working. Q: How likely or dangerous you think it is for a programmer to screw up their capabilities he specifies? Sewell: So how often will the programmer screw up the capabilities? So in many cases the programmer is just doing an access with a C or C++ pointer or C++ object in a normal way. They're not manually constructing these things. So a lot of that is going to just work. If you're building some particular secure encapsulation setup you have to be a bit careful. But on the whole I think this is a small risk compared to the state of software as we have it now. Herald Angel: OK. Mic eight. Q: So existing memory protection techniques are quite patch-worky, like canaries and address layout randomisation. Is this a system designed to replace or supplement these measures? Sewell: I think if you pause So again it depends a bit how you use it. So if you have capabilities really everywhere then there's not much need for address space layout randomization or canaries to protect against explicit information leaks. I imagine that you might still want randomisation to protect against some side channel flows but canaries is not so much on whether you actually do for side- channel flows - I don't know. That's a good question. Herald Angel: Mic four please. Q: Thanks for the very interesting talk you presented with CHERI, a system of veryfying existing C-software, sort of post hoc and improving its security via run-time mechanism...? A: Improving or Proving? Improving yes, proving no, yes. Q: So what's your opinion on the viability of using a static analysis and static verification for that. Would it be possible to somehow analyse software that already exist and as written in these unsafe languages and show that it nevertheless has some security properties without writing all the proofs manually in like HOL or Isabelle? A: Well so if you want to analyse existing software... So, static analysis is already useful for finding bugs in existing software. If you want to have assurance from static analysis, that's really very tough. So you certainly wouldn't want to manually write proofs in terms of the definitional C semantics, you would need intermediate infrastructure. And if you look at the people, who have done verified software, so verified compilers and verified hypervisor, and verified operating systems, all of those are now possible on significant scales, but they use whole layers of proof and verification infrastructure for doing that. They're not writing proofs by hand for every little detail. And some of those verification methods either are, you can imagine them being basically like static analysis, you know, you might use a static analysis to prove some relatively simple facts about the code and then stitch those together into a larger assembly. I think any kind of more complete thing I think is hard to imagine. I mean these analysis tools mostly, they rely on making some approximation in order to be able to do their thing at all. So it's hard to get assurance out of them. Herald Angel: Okay, Mic one. Q: You said you modified an MIPS architecture to add some logic to check their capabilities. Do you know what the cost of this regarding computational power, an energetic power supply? A: Sorry, can you repeat the last part of that? Q: What the cost of this modification regarding computational power, and power consumption. A: Ah, so what's the energy cost? So I gave you a performance cost. I didn't give you, I carefully didn't give you an energy cost estimate, because it's really hard to do in a scientifically credible way, without making a more or less production superscalar implementation. And we are sadly not in a position to do that, although if you have 10 or 20 million pounds, then I would be happy to accept it. Herald Angel: Mic number 7, please. Q: How does the class of problems that you can address with CHERI compare to the class of problems that are excluded by, for example, the Rust programming language? A: So how does the problems of CHERI relate to the problems, sorry the problems excluded by CHERI, relate to the problems excluded by Rust. So if you are happy to write all of your code in Rust without ever using the word "unsafe", then maybe there would be no point in CHERI, at all. Are you happy to do that? Q: Probably not, no. A: I think someone shakes their head sideways, yeah. Herald Angel: Mic number 1. Q: What do you think about the following thesis: We are building a whole system of things, with artificial intelligence and something like that. That is above this technical level and it's building another chaos that isn't healing with those things. A: It's dreadful, isn't it. Laughter A: There are, so you might, in fact some of my colleagues are interested in this question, if you, you might well want to bound what your artificial intelligence can access or touch and for that you might want this kind of technology. But this is not, we are, with machine learning systems we are intrinsically building things that we, on the whole, don't understand and that will have edge cases that go wrong. And this is not speaking to that in any way. Herald Angel: OK. Does the internet have a question? No, good. I don't see anyone else, so let's conclude this. Thank you very much. A: Thank you. applause 35c3 postroll music subtitles created by c3subtitles.de in the year 2019. Join, and help us!