-
35C3 preroll music
-
Herald angel: OK. Our next speaker will be
trying to tame the chaos. Peter Sewell
-
from the University of Cambridge. A warm
round of applause please.
-
applause
-
Sewell: Thank you! Okay. So here we are.
C3 again with a program full of
-
fascinating and exciting and cool stuff.
And if we look just at the security talks
-
all manner of interesting things. I'm
going to go to a lot to these. You should
-
too. Let's see though if we read some of
the titles: exploiting kernel memory
-
corruptions, attacking end to end e-mail
encryption. What else have we got: modern
-
Windows user space exploitation,
compromising online accounts. OK so a lot
-
of these talks, as usual, they're
explaining to us that our computers,
-
they're not doing what we would like and
this as usual is the merest tip of a tiny
-
iceberg, alright. We have a hundreds of
thousands of known vulnerabilities and
-
many unknown vulnerabilities. Let me do it
a bit dark here, but let me try some
-
audience participation. How many of you
have in the last year written at least a
-
hundred lines of code? And of those people
keep your hands up if you are completely
-
confident that code does the right thing.
laughter I would like to talk to you
-
later laughter and so would the people
in the self driving car that you're
-
probably engineering. So, so how many
unknown vulnerabilities then. Well you can
-
take what's in your mind right now and
multiply it by - oh, this doesnt work very
-
well. No. No, no. Computers, they do the
wrong thing again and again and again.
-
laughter and some applause We can
multiply by this estimate of about a 100
-
billion lines of new code each year. So if
we take all of this: these talks, these
-
numbers, these should make us not happy
and excited, these should make us sad,
-
very sad and frankly rather scared of what
is going on in the world. So what can we
-
do? We can, option one, give up using
computers altogether. applause In most
-
audiences this will be a hard sell but
here I'm sure you're all happy to just
-
turn them off now. Or we can throw up our
hands in the air and collapse in some kind
-
of pit of despair. Well, maybe, maybe not.
My task today is to give you a tiny
-
smidgen, a very tall, a very small amount
of hope that it may be possible to do
-
slightly better. But if we want to do
better we first have to understand why
-
things are so bad and if we look at our
aeronautical engineering colleagues for
-
example, they can make planes which very
rarely fall out of the sky. Why is it that
-
they can do that and we are so shit at
making computers? Well, I'm going to reuse
-
a picture that I used at 31C3, which is
still the best description I can find of
-
the stack of hardware and software that we
live above. Here we go. It's, there's so
-
much information in this, it's just ace.
So we see down here all of this
-
transparent silicon, it's the hardware we
live above. We see a stack of phases of a
-
C compiler, we see all kinds of other
other bits and pieces. There might be a
-
slightly hostile wireless environment in
this room for some reason. If we look at
-
this and think about the root causes for
why our systems are so bad we can see
-
terrible things. So the first is,
obviously there's a lot of legacy
-
complexity that we're really stuck with,
alright. If you pull out one of those
-
pieces from the middle and try and replace
it with something which is not of exactly
-
the same shape the whole pile will
collapse. So we somehow need to find an
-
incremental path to a better world. And
then, this is the most fundamental reason
-
that these are not like aeroplanes, these
systems are discrete not continuous. If
-
you take an honest good I made out of a
piece of steel and you push on it a little
-
bit it moves a little bit, basically in
proportion. If it is 1 percent too strong,
-
1 percent to weak, basically it doesn't
matter. But in these things one line of
-
code can mean you're open to a
catastrophic exploit and one line in many,
-
many million. OK. And next thing... this
really doesn't work. They're very
-
complicated. But the scary thing is not
the static complexity of those lines of
-
code and the number of components although
that's pretty intimidating the really
-
scary thing is that the exponential number
of states and execution paths. So these
-
two together they mean that the testing
that we rely on testing is the only way we
-
have to build systems which are not
instantly broken. Testing can never save
-
us from these exploitable errors. And that
means ultimately we need to do proof.
-
Honest machine checked mathematical proof.
And this also tells us that we need to
-
arrange for our systems to be secure for
simple reasons that do not depend on the
-
correctness of all of those hundred
billion lines of code. Then another thing
-
that we have here. All these interfaces:
the architecture interface, the C language
-
interface, the sockets API interface, the
TCP interface. All of these we rely on to
-
let different parts of the system be
engineered by different organizations. But
-
they're all really badly described and
badly defined. So what you find is, for
-
each of these, typically a prose book
varying in thickness between about that
-
and about that, full of text. Well, we
still rely on testing - limited though it
-
is - but you can never test anything
against those books. So we need instead
-
interface descriptions, definitions,
specifications, that are more rigorous,
-
mathematically rigorous and that are
executable - not in the normal sense of
-
executable as an implementation but
executable as a test oracle. So you can
-
compute whether some observed behaviour is
allowed or not and not have to read the
-
book and argue on the Internet. And we
also need interface definitions that
-
support this proof that we need. And then
all of this stuff was made up in the 1970s
-
or the sixties and in the 70s and the 60s
the world was a happy place and people
-
walked around with flowers in their hair
and nothing bad ever happened. And that is
-
no longer the case. And so we need
defensive design, but we need defensive
-
design that is somehow compatible or can
be made enough compatible with this legacy
-
investment. That's quite hard. And then -
I can't say very much about this, but we
-
have at the moment very bad incentives,
because we have a very strong incentive to
-
make things that are more complex than we
can possibly handle or understand. We make
-
things, we add features, we make a thing
which is just about shippable and then we
-
ship it. And then we go on to the next
thing. So we need economic and legal
-
incentives for simplicity and for security
that we do not yet have. But it's hard to
-
talk about those until we have the
technical means to react to those
-
incentives. OK, so what I'm going to do
now, I'm going to talk about two of these
-
interfaces, the architect interface and
the C language interface, and see what we
-
can do to make things better. A lot of
this has to do with memory. So whoever it
-
was that picked the subtitle for this
meeting "refreshing memories" thank you,
-
because it's great, I love it. Let's
refresh your memory quite a way. So I
-
invite you to think back to when you were
very very young in about 1837 when cool
-
hacking was going on. Charles Babbage was
designing the Analytical Engine and even
-
then you see there was this dichotomy
between a mill performing operations and a
-
store holding numbers. This is a plan view
of the analytical engine, well, it was
-
vaporware, but a design from the
analytical engine, and you see here these
-
circles these are columns of numbers each
made out of a stack of cogs, maybe a 50
-
digit decimal number in each of those
columns. And this array, really, he
-
imagined it going on out to and over there
somewhere, with about 1000 numbers. So
-
even then you have a memory which is an
array of numbers. I think these were not-
-
I don't think you could do address
computation on these things, I think
-
addresses were only constants, but, still,
basically an array of numbers. So okay
-
what do we got now. Let's look a bit at C.
How many of those people who have
-
programmed 100 lines of code, how many of
you were C programmers? Quite a few. Or
-
maybe you're just embarrassed. I forgot to
say. Those that didn't put your hands up
-
for that first question. You should feel
proud and you should glory in your
-
innocence while you still have it. You are
not yet complicit in this trainwreck. It
-
worked then. Okay so here's a little piece
of C-code which I shall explain. I shall
-
explain it in several different ways. So
we start out. It creates two locations x
-
and secret... Don't do that. This is so
bad. Creates x, storing one and secret_key
-
which stores, I might get this wrong, your
pin. And then there's some computation
-
which is supposed to only operate on x but
maybe it's a teensy bit buggy or corrupted
-
by somebody and then we try and we make a
pointer to x and in this execution x just
-
happened to be allocated at 14. This
pointer contains the number 14 and then we
-
add one to that pointer. It's C so
actually that adding one to the pointer it
-
really means add the four bytes which are
the size of that thing. So we add 4 to 14
-
and we get 18. And then we try and read
the thing which is pointed to. What's
-
going to happen. Let me compile this and
run it in a conventional way. It printed a
-
secret key. Bad. This program, rather
distressingly, this is characteristic by
-
no means of all security flaws but a very
disturbingly large fraction of all the
-
security flaws are basically doing this.
So does C really let you do that. Yes and
-
no. So if you look at the C standard,
which is one of these beautiful books, it
-
says you, have to read moderately
carefully to understand this, but it says
-
for this program has undefined behavior.
And many of you will know what that means
-
but others won't, but, so that means as
far as the standard is concerned for
-
programs like that there is no constraint
on the behavior of the implementation at
-
all. Or said another way and maybe a more
usefully way: The standard lets
-
implementations, so C compilers, assume
that programmers don't have this kind of
-
stuff and it's the programmer's
responsibility to ensure that. Now in
-
about 1970, 75 maybe 1980, this was a
really good idea, it gives compilers a lot
-
of flexibility to do efficient
implementation. Now not so much. How does
-
this work in the definition? So the
standard somehow has to be able to look at
-
this program and identify it as having
this undefined behavior. And indeed, well,
-
if we just think about the numbers in
memory this 18, it points to a perfectly
-
legitimate storage location and that plus
one was also something that you're
-
definitely allowed to do in C. So the only
way that we can know that this is
-
undefined behavior is to keep track of the
original allocation of this pointer. And
-
here I've got these allocation identifiers
at 3, at 4, at 5, and you see here I've
-
still got this pointer despite the fact
that I added 4 to it. I still remembered
-
the allocation idea so I can check, or the
standard can check, when we try and read
-
from this whether that address is within
the footprint of that original allocation,
-
i.e. is within there and in fact it is
not, it's over here, it is not within
-
there at all. So this program is deemed to
have undefined behavior. Just to clarify
-
something people often get confused. So we
detect undefined behavior here but it
-
isn't really a temporal thing. The fact
that there is undefined behavior anywhere
-
in the execution means the whole program
is toast. Okay but this is really
-
interesting because we're relying for this
critical part of the standard onto
-
information which is not there at run time
in a conventional implementation. So just
-
to emphasize that point, if I compile this
program, let's say to ARM assembly
-
language I get a sequence of store and
load and add instructions, store, load,
-
add, store, load, load. And if I look at
what the ARM architecture says can happen
-
these blue transitions... one thing to
notice is that we can perfectly well do
-
some things out of order. At this point we
could either do this load or we could do
-
that store. That would be a whole other
talk. Let's stick with the sequential
-
execution for now. What I want to
emphasize is that these, this load of this
-
address, really was loading a 64 bit
number, which was the address of x and
-
adding 4 to it and then loading from that
address, the secret key. So there is no
-
trace of these allocation ideas. No trace
at all. Okay, let me step back a little
-
bit. I've been showing you some
screenshots of C behaviour and ARM
-
behaviour and I claim that these are
actually showing you all of the behaviours
-
allowed by what one would like the
standards to be for these two things. And
-
really what I've been showing you are GUIs
attached to mathematical models that we've
-
built in a big research project for the
last many years. REMS: "Rigorous
-
Engineering of Mainstream Systems" sounds
good, aspirational title I think I would
-
say. And in that we've built semantics, so
mathematical models defining the envelope
-
of all allowed behaviour for a quite big
fragment of C and for the concurrency
-
architecture of major processors ARM, and
x86, and RISC-V, and IBM POWER and also
-
for the instruction set architecture of
these processors, the details of how
-
individual instructions are executed. And
in each case these are specs that are
-
executable as test oracles, you can
compare algorithmically some observed
-
behaviour against whether spec says it's
allowed or not, which you can't do with
-
those books. So is it just an idiot simple
idea but for some reason the industry has
-
not taken it on board any time in the last
five decades. It's not that hard to do.
-
These are also mathematical models, I'll
come back to that later. But another thing
-
I want to emphasize here is that in each
of these cases, especially these first
-
two, when you start looking really closely
then you learn what you have to do is not
-
build a nice clean mathematical model of
something which is well understood. What
-
you learn is that there are real open
questions. For example within the C
-
standards committee and within ARM a few
years ago about exactly what these things
-
even were. And that is a bit disturbing
given that these are the foundations for
-
all of our information infrastructure.
There is also a lot of other work going on
-
with other people within this REMS project
on web assembly and Javascript and file
-
systems and other concurrency. I don't
have time to talk about those but Hannes
-
Mehnert it is going to talk us a little
bit about TCP later today. You should go
-
to that talk too if you like this one. If
you don't like this one you should still
-
go to that talk. Okay so this is doing
somewhat better. certainly more rigorous
-
engineering of specifications, but so far
only for existing infrastructure for this
-
C language, ARM architecture, what have
you. So what if we try and do better, what
-
if we try and build better security in. So
the architectures that we have, really
-
they date back to the 1970s or even 60s
with the idea of virtual memory. That
-
gives you - I need to go into what it is -
but that gives you very coarse grain
-
protection. You can have your separate
processes isolated from each other and on
-
a good day you might have separate browser
tabs isolated from each other in some
-
browsers, sometimes, but it doesn't scale.
It's just too expensive. You have lots of
-
little compartments. One thing we can do
we can certainly design much better
-
programming languages than C but all of
that legacy code, that's got a massive
-
inertia. So an obvious question is whether
we can build in better security into the
-
hardware that doesn't need some kind of
massive pervasive change to all the
-
software we ever wrote. And that's the
question that a different large project,
-
the CHERI project has been addressing. So
this is something that's been going on for
-
the last 8 years or so, mostly at SRI and
at Cambridge funded by DARPA and the EPSRC
-
and ARM and other people, mostly led by
Robert Watson, Simon Moore, Peter Neumann
-
and a cast of thousands. And me a tiny
bit. So... How, don't do that. I should
-
learn to stop pushing this button. It's
very hard to not push the button. So
-
here's the question in a more focused way
that CHERI is asking: Can we build
-
hardware support which is both efficient
enough and deployable enough that gives us
-
both fine grained memory protection to
prevent that kind of bug that we were
-
talking about earlier, well that kind of
leak at least, and also scalable
-
compartmentalization. So you can take bits
of untrusted code and put them in safe
-
boxes and know that nothing bad is going
to get out. That's the question. The
-
answer... The basic idea of the answer is
pretty simple and it also dates back to
-
the 70s is to add hardware support for
what people call capabilities, and we'll
-
see that working in a few moments. The
idea is that this will let programmers
-
exercise the principle of least privilege,
so with each little bit of code having
-
only the permissions it needs to do its
job and also the principle of intentional
-
use. So with each little bit of code when
it uses one of those capabilities, will
-
require it to explicitly use it rather
than implicitly. So the intention of the
-
code has to be made plain. So let's see
how this works. So these capabilities
-
they're basically replacing some or maybe
all of the pointers in your code. So if we
-
take this example again in ISO C we had an
address and in the standard, well, the
-
standard is not very clear about this but
we're trying to make it more clear, an
-
allocation ID, say something like it. Now
in Cherry C we can run the same code but
-
we might represent this pointer not just
with that number at runtime but with
-
something that contains that number and
the base of the original allocation and
-
the length of the original allocation and
some permissions. And having all of those
-
in the pointer means that we can do.. the
hardware can do, at run time, at access
-
time, a very efficient check that this is
within this region of memory. And if it's
-
not it can fail stop and trap. No
information leak. I need a bit more
-
machinery to make this actually work. So
it would be nice if all of these were 64
-
bit numbers but then you get a 256 bit
pointer, and that's a bit expensive so
-
there's a clever compression scheme that
squeezes all this down into 1 to 8 bits
-
with enough accuracy. And then you need to
design the instruction set of the CPU
-
carefully to ensure that you can never
forge one of these capabilities with a
-
normal instruction. You can never just
magic one up out of a bunch of bits that
-
you happen to find lying around. So all
the capability manipulating instructions,
-
they're going to take a valid capability
and construct another, possibly smaller,
-
in memory extent, or possibly with less
permissions, new capability. They're never
-
going to grow the memory extent or add
permissions. And when the hardware starts,
-
at hardware initialization time, the
hardware will hand the software a
-
capability to everything and then as the
operating system works and the linker
-
works and the C language allocator works,
these capabilities will be chopped up into
-
ever finer and smaller pieces like to be
handed out the right places. And then need
-
one more little thing. So this C language
world we're living in here, or really a
-
machine code language world so there's
nothing stopping code writing bytes of
-
stuff. So you have to somehow protect
these capabilities against being forged by
-
the code, either on purpose or
accidentally writing bytes over the top.
-
You need to preserve their integrity. So
that's done by adding for each of these
-
128 bit sized underlined units of memory
just a one extra bit. That holds a tag and
-
that tag records whether this memory holds
are currently valid capability or not. So
-
all the capability manipulating
instructions, if they're given a valid
-
capability with a tag set then the result
will still have the tags set. But if you
-
write, so you just wrote that byte there
which might change the base then the
-
hardware will clear that tag and these
tags, they're not conventionally
-
addressable, they don't have addresses.
They just stop there in the memory system
-
of the hardware. So there is no way that soft-
ware can inaudible through these. So this is
-
really cool. We've taken, what in ISO was
undefined behavior, in the standard, and
-
in implementations was a memory leak, and
we've turned it into something that in
-
CHERI C is in both the specification and
the implementation, is guaranteed to trap,
-
to fail stop, and not leak that
information. So another few things about
-
the design that I should mention. I think
all these blue things I think I've talked
-
about. But then a lot of care has gone in
to make this be very flexible to make it
-
possible to adopt it. So you can use
capabilities for all pointers, if you want
-
to, or just at some interfaces. You might
for example use them at the kernel
-
userspace interface. It coexist very
nicely with existing C and C++. You don't
-
need to change the source code very much
at all, and we'll see some a tiny bit of
-
data about this, to make these to start
using these. It coexists with the existing
-
virtual memory machinery, so you can use
both capabilities and virtual memory, if
-
you want, or you can just use capabilities
if you want or just use virtual memory if
-
you want. And then there's some more
machinery, which I'm not going to talk
-
about, for doing this secure encapsulation
stuff. What I've talked about so far is
-
basically the the fine grained memory
protection story. And I should say the
-
focus so far is on spatial memory safety.
So that's not in the first instance
-
protecting against reuse of an old
capability in a bad way but there various
-
schemes for supporting temporal memory
safety with basically the same setup. Okay
-
so. So how do we... Okay this is some kind
of a high level idea. How do we know that
-
it can be made to work. Well the only way
is to actually build it and see. And this
-
has been a really interesting process
because mostly when you're building
-
something either in academia or an
industry you have to keep most of the
-
parts fixed, I mean you are not routinely
changing both the processor and the
-
operating system, because that would just
be too scary. But here we have been doing
-
that and indeed we've really been doing a
three way, three way codesign of building
-
hardware and building and adapting
software to run above it and also building
-
these mathematical models all at the same
time. This is, well. A, it's all the fun
-
because you can often get groups of people
together that can do all of those things
-
but B, it's really effective. So what
we've produced then is an architecture
-
specification. In fact, extending MIPS
because it was conveniently free of IP
-
concerns and that specification is one of
these mathematically rigorous things
-
expressed in a domain specific language
specifically for writing instruction set
-
architecture specifications, and we've
designed and implemented actually two of
-
those. And then there are hardware
processor implementations that actually
-
run an FPGAs. And a lot of hardware
research devoted to making this go fast
-
and be efficient despite these extra tags
and whatnot. And then there's a big
-
software stack adapted to run over this
stuff. Including Clang and a FreeBSD
-
kernel and FreeBSD userspace and WebKit
and all kinds of pieces. So this is a very
-
major evaluation effort. That one is not
normally able to do. This is why, this
-
combination of things is why that list of
people up there was about 40 people. I say
-
this is based on MIPS but the ideas are
not specific to MIPS, there are ongoing
-
research projects to see if this can be
adapted to the ARM application class
-
architecture and the ARM microcontroller
architecture and the RISC-V. We'll see how
-
that goes, in due course. Okay so then
with all of these pieces we can evaluate
-
whether it actually works. And that's
still kind of an ongoing process but the
-
data that we've got so far is pretty
encouraging. So we see here first,
-
performance. So you see, maybe a percent
or 2 in many cases of runtime overhead.
-
There are workloads where it performs
badly if you have something that's very
-
pointer heavy, then the extra pressure
from those larger pointers will be a bit
-
annoying, but really it seems to be
surprisingly good. Then is it something
-
that can work without massive adaption to
the existing code. Well, in this port of
-
the FreeBSD userspace, so all the programs
that, all in all programs that run, they
-
were something like 20000 files and only
200 of those needed a change. And that's
-
been done by one or two people in not all
that large an amount of time. Some of the
-
other code that's more and more like an
operating system needs a bit more invasive
-
change but still, it seems to be viable.
Is it actually more secure? How are you
-
going to measure that. We like measuring
things as a whole we try and do science
-
where we can. Can we measure that? Not
really. But it certainly does, the whole
-
setup certainly does, mitigate against
quite a large family of known attacks. I
-
mean if you try and run Heartbleed you'll
get a trap. It will say trap. I think he
-
will say signal 34 in fact. So that's
pretty good. And if you think about
-
attacks, very many of them involve a chain
of multiple floors put together by an
-
ingenious member of the C3 community or
somebody else, and very often, at least
-
one of those is some kind of memory access
permission flaw of some kind or other.
-
Okay so this is nice, but I was talking a
lot in the first part of the talk, about
-
rigorous engineering. So here we've got
formally defined definitions of the
-
architecture and that's been really useful
for testing against and for doing test
-
generation on the basis of, and doing
specification coverage in an automated
-
way. But surely we should be able to do
more than that. So this formal definition
-
of the architecture, what does it look
like. So for each instruction of this
-
CHERI MIPS processor, there's a definition
and that definition, it looks a bit like
-
normal code. So here's a definition of how
the instruction for "capability increment
-
cursor immediate" goes. So this is a thing
that takes a capability register and an
-
immediate value and it tries to add
something on to the cursor of that
-
capability. And what you see here is some
code, looks like code, that defines
-
exactly what happens and also defines
exactly what happens if something goes
-
wrong. You can see there's some kind of an
exception case there. That's a processor.
-
No that's a... Yeah, that a "raising our
processor" exception. So it looks like
-
code, but, from this, we can generate both
reasonably high performance emulators,
-
reasonably in C and in OCaml, and also
mathematical models in the theorem
-
provers. So three of the main theorem
provers in the world are called Isabelle
-
and HOL and Coq. And we generate
definitions in all of those. So then we
-
got a mathematical definition of the
architecture. Then finally, we can start
-
stating some properties. So Cherrie's
design with some kind of vague security
-
goals in mind from the beginning but now
we can take, let'ssay one of those goals and
-
make it into a precise thing. So this is a
kind of a secure encapsulation, kind of
-
thing not exactly the memory leak that we
were looking at before. Sorry, the secret
-
leak that we were looking at. So let's
take imagine some CHERI compartment.
-
Haven't told you exactly what that means
but let's suppose that that's understood
-
and that inside that compartment there is
a piece of software running. And that
-
might be maybe a video codec or a web
browser or even a data structure
-
implementation maybe, or a C++ class and
all of its objects maybe. So imagine that
-
compartment running and imagine the
initial capabilities that it has access to
-
via registers and memory and via all the
capabilities it has access to via the
-
memory that it has access to and so on
recursively. So imagine all of those
-
capabilities. So the theorem says no
matter how this code executes whatever it
-
does however compromised or buggy it was
in an execution up until any point at
-
which it raises an exception or makes an
inter compartment call it can't make any
-
access which is not allowed by those
initial capabilities. You have to exclude
-
exceptions and into compartment calls
because by design they do introduce new
-
capabilities into the execution. But until
that point you get this guarantee. And
-
this is something that we've proved
actually [...] has approved with a machine
-
check proof in in fact the Isabelle
theorem prover. So this is a fact which is
-
about as solidly known as any fact the
human race knows. These provers they're
-
checking a mathematical proof in
exceedingly great detail with great care
-
and attention and they're structured in
such a way that the soundness of approver
-
depends only on some tiny little kernel.
So conceivably cosmic rays hit the
-
transistors at all the wrong points. But
basically we know this for sure. I
-
emphasize this is a security property we
have not proved, we certainly wouldn't
-
claim to have proved that CHERI is secure
and there are all kinds of other kinds of
-
attack that this statement doesn't even
address but at least this one property we
-
know it for real. So that's kind of
comforting and not a thing that
-
conventional computer science engineering
has been able to do on the whole. So, are
-
we taming the chaos then. Well no. Sorry.
But I've shown you two kinds of things.
-
The first was showing how we can do
somewhat more rigorous engineering for
-
that existing infrastructure. It's now
feasible to do that and in fact I believe
-
it has been feasible to build
specifications which you can use as test
-
oracles for many decades. We haven't
needed any super fancy new tech for that.
-
We've just needed to focus on that idea.
So that's for existing infrastructure and
-
then CHERI is a relatively light weight
change that does built-in improved
-
security and we think it's demonstrably
deployable at least in principle. Is it
-
deployable in practice? Are we going to
have in all of our phones five years from
-
now? Will these all be CHERI enabled? I
can't say. I think it is plausible that
-
that should happen and we're exploring
various routes that might mean that there
-
happens and we'll see how that goes. Okay
so with that I aiming to have given you at
-
least not a whole lot of hope but just a
little bit of hope that the world can be
-
made a better place and I encourage you to
think about ways to do it because Lord
-
knows we need it. But maybe you can at
least dream for a few moments of living in
-
a better world. And with that I thank you
for your attention.
-
applause
-
Herald Angel: Thanks very much. And with
that we'll head straight into the Q and A.
-
We'll just start with mic number one which
I can see right now.
-
Q: Yes thanks for the very encouraging
talk. I have a question about how to C
-
code the part below that. The theorem that
you stated assumes that the mentioned
-
description matches the hardware at least
is describes the hardware accurately. But
-
are there any attempts to check that the
hardware actually works like that? That
-
there are no wholes in the hardware? That
some combination of mentioned commands
-
work differently or allow memory access
there is not in the model? So, is it
-
possible to use hardware designs and check
that it matches the spec and will Intel or
-
AMD try to check their model against that
spec?
-
Sewell: OK. So the question is basically
can we do provably correct hardware
-
underneath this architectural abstraction.
And the answer is... So it's a good
-
question. People are working on that and
it can be done I think on at least a
-
modest academic scale. Trying to do that
in its full glory for a industrial
-
superscalar processor design, is right now
out of reach. But it's certainly something
-
one would like to be able to do. I think
we will get there maybe in a decade or so.
-
A decade is quick.
Herald Angel: Thanks. Mic number four is
-
empty again. So we'll take the Internet.
Sewell: Where is the internet?
-
Signal Angel: The internet is right here
and everywhere so ruminate would like to
-
know is the effort and cost of building in security
to hardware as you've described in your
-
talk significantly greater or less than
the effort and cost of porting software to
-
more secure languages.
Sewell: So, is the effort of building new
-
hardware more or less than the cost of
porting existing software to better
-
languages? I think the answer has to be
yes. It is less. The difficulty with
-
porting software is all of you and all
your colleagues and all of your friends
-
and all your enemies have been beavering
away writing lines of code for 50 years.
-
Really, I don't know what to say,
effectively. Really numerously. There's an
-
awful lot of it. It's a really terrifying
amount of code out there. It's very hard
-
to get people to rewrite it.
Signal Angel: OK, mic two.
-
Q: OK, given that cost of ... of today on the
software level not on the hardware level, is it
-
possible to go half way using an entire
system as an oracle. So during development
-
there are code based with the secure
hardware but then essentially ship the
-
same software to unsecure hardware and inaudible
Sewell: So this question, if I paraphrase
-
it is can we use this hardware
implementation really as the perfect
-
sanitiser? And I think there is scope for
doing that. And you can even imagine
-
running this not on actual silicon
hardware but in like a QEMU implementation
-
which we have in order to do that. And I
think for for that purpose it could
-
already be a pretty effective bug finding
tool. It would be worth doing. But you
-
would like as for any testing it will only
explore a very tiny fraction of the
-
possible paths. So we would like to have
it always on if we possibly can.
-
Herald Angel: OK. The internet again.
Signal Angel: OK someone would like to
-
know how does your technique compare to
Intel MPX which failed miserably with a inaudible
-
ignoring it. Will it better or faster?
Could you talk a little bit about that.
-
Sewell: I think for that question, for a
real answer to that question, you would
-
need one of my more hardware colleagues.
What I can say is they make nasty faces
-
when you say MPX.
Herald Angel: Well that's that. Mic number
-
one.
Q: Somewhat inherent to your whole
-
construction was this idea of having an
unchangeable bit which described whether
-
your pointer contents have been changed.
Is this even something that can be done in
-
software? Or do you have to construct a
whole extra RAM or something?
-
Sewell: So you can't do it exclusively in
software because you need to protect it.
-
In the hardware implementations that my
colleagues have built it's done in a
-
reasonably efficient way. So it's cached
and managed in clever ways to ensure that
-
it's not terribly expensive to do. But you
do need slightly wider data paths in your
-
cache protocol and things like that to
manage it.
-
Herald Angel: OK. Mic seven.
Q: How good does this system help securing
-
the control flow integrity compared to
compared to other systems, like hardware
-
systems data flow isolation or code
pointer integrity in ARM inaudible?
-
Sewell: Ah well, that's another question
which is a bit hard to answer because then
-
it depends somewhat on how you're using
the capabilities. So you can arrange here
-
that each state each C language allocation
is independently protected and indeed you
-
can also arrange that each way - this is
sensible - that each subobject is
-
independent protected. And those
protections are going to give you
-
protection against trashing bits of the
stack because they'll restrict the
-
capability to just those objects.
Herald Angel: OK the internet again.
-
Signal Angel: Twitter would like to know
is it possible to is it possible to run
-
CHERI C without custom hardware?
Sewell: Can you run CHERI C without custom
-
hardware? And the answer is you can run it
in emulation above this QEMU. That works
-
pretty fast, I mean fast enough for
debugging as the earlier question was
-
talking about. You can imagine you know
somewhat faster emulations of that. It's
-
not clear how much it's worth going down
that route. The real potential big win is
-
to have this be always on and that's what
we would like to have.
-
Herald Angel: OK. Mic one.
Q: Do you have some examples of the kinds
-
of code changes that you need to the
operating system and userland that you
-
mentioned?
Sewell: So, okay so what kind of changes
-
do you need to to adapt code to CHERI? So,
if you look at C-code there is the C
-
standard that deems a whole lot of things
to be undefined behavior and then if you
-
look at actual C code you find that very
very much of it does depend on some idiom
-
that the C standard does not approve of.
So there is for example, pause there are
-
quite a lot of instances of code that
constructs a pointer by doing more than
-
one more than plus one offset and then
brings it back within range before you use
-
it. So in fact in CHERI C many of those
cases are allowed but not all of them.
-
More exotic cases where there's really
some crazy into object stuff going on or
-
where pointer arithmetic between objects -
which the C standard is certainly quite
-
down on but which does happen - and code
which is, say manipulating the low order
-
bits of a pointer to store some data in
it. That's pretty common. You can do it in
-
CHERI C but you might have to adapt the
code a little bit. Other cases are more
-
fundamental. So say, in operating system
code if you swap out a page to disk and
-
then you swap it back in then somehow the
operating system has to reconstruct new
-
capabilities from a large capability which
it kept for the purpose. So that needs a
-
bit more work. Though it's kind of a
combination. Some things you would regard
-
as good changes to the code anyway and
others are more radical.
-
Herald Angel: OK, another one from the
internet.
-
Signal Angel: I lost one pause
Sewell: The Internet has gone quiet. Yes.
-
Signal Angel: Last question from the
internet. Are there any plans to impact
-
test on Linux or just FreeBSD?
Sewell: If anyone has a good number of
-
spare minutes then that would be lovely to
try. The impact on Linux not just
-
previously the BSD code base is simpler
and maybe cleaner and my systemsy
-
colleagues are already familiar with it.
It's the obvious target for an academic
-
project doing it already a humungous
amount of work. So I think we would love
-
to know how Linux and especially how
Android could be adapted but it's not
-
something that we have the resource to do
at the moment.
-
Herald Angel: Mic number on again.
Q: How likely or dangerous.
-
Herald Angel: Mic number one is not
working.
-
Q: How likely or dangerous you think it is
for a programmer to screw up their
-
capabilities he specifies?
Sewell: So how often will the programmer
-
screw up the capabilities? So in many
cases the programmer is just doing an
-
access with a C or C++ pointer or C++
object in a normal way. They're not
-
manually constructing these things. So a
lot of that is going to just work. If
-
you're building some particular secure
encapsulation setup you have to be a bit
-
careful. But on the whole I think this is
a small risk compared to the state of
-
software as we have it now.
Herald Angel: OK. Mic eight.
-
Q: So existing memory protection
techniques are quite patch-worky, like
-
canaries and address layout randomisation.
Is this a system designed to replace or
-
supplement these measures?
Sewell: I think if you pause So again it
-
depends a bit how you use it. So if you
have capabilities really everywhere then
-
there's not much need for address space
layout randomization or canaries to
-
protect against explicit information
leaks. I imagine that you might still want
-
randomisation to protect against some side
channel flows but canaries is not so much
-
on whether you actually do for side-
channel flows - I don't know. That's a
-
good question.
Herald Angel: Mic four please.
-
Q: Thanks for the very interesting talk
you presented with CHERI, a system of
-
veryfying existing C-software, sort of
post hoc and improving its security via
-
run-time mechanism...?
A: Improving or Proving? Improving yes,
-
proving no, yes.
Q: So what's your opinion on the viability
-
of using a static analysis and static
verification for that. Would it be
-
possible to somehow analyse software that
already exist and as written in these
-
unsafe languages and show that it
nevertheless has some security properties
-
without writing all the proofs manually in
like HOL or Isabelle?
-
A: Well so if you want to analyse existing
software... So, static analysis is already
-
useful for finding bugs in existing
software. If you want to have assurance
-
from static analysis, that's really very
tough. So you certainly wouldn't want to
-
manually write proofs in terms of the
definitional C semantics, you would need
-
intermediate infrastructure. And if you
look at the people, who have done verified
-
software, so verified compilers and
verified hypervisor, and verified
-
operating systems, all of those are now
possible on significant scales, but they
-
use whole layers of proof and verification
infrastructure for doing that. They're not
-
writing proofs by hand for every little
detail. And some of those verification
-
methods either are, you can imagine them
being basically like static analysis, you
-
know, you might use a static analysis to
prove some relatively simple facts about
-
the code and then stitch those together
into a larger assembly. I think any kind
-
of more complete thing I think is hard to
imagine. I mean these analysis tools
-
mostly, they rely on making some
approximation in order to be able to do
-
their thing at all. So it's hard to get
assurance out of them.
-
Herald Angel: Okay, Mic one.
Q: You said you modified an MIPS
-
architecture to add some logic to check their
capabilities. Do you know what the cost of
-
this regarding computational power,
an energetic power supply?
-
A: Sorry, can you repeat the last part of
that?
-
Q: What the cost of this modification
regarding computational power, and power
-
consumption.
A: Ah, so what's the energy cost? So I
-
gave you a performance cost. I didn't give
you, I carefully didn't give you an energy
-
cost estimate, because it's really hard to
do in a scientifically credible way,
-
without making a more or less production
superscalar implementation. And we are
-
sadly not in a position to do that,
although if you have 10 or 20 million
-
pounds, then I would be happy to accept
it.
-
Herald Angel: Mic number 7, please.
Q: How does the class of problems that you
-
can address with CHERI compare to the
class of problems that are excluded by,
-
for example, the Rust programming
language?
-
A: So how does the problems of CHERI
relate to the problems, sorry the problems
-
excluded by CHERI, relate to the problems
excluded by Rust. So if you are happy to
-
write all of your code in Rust without
ever using the word "unsafe", then maybe
-
there would be no point in CHERI, at all.
Are you happy to do that?
-
Q: Probably not, no.
A: I think someone shakes their head
-
sideways, yeah.
Herald Angel: Mic number 1.
-
Q: What do you think about the following
thesis: We are building a whole system of
-
things, with artificial intelligence and
something like that. That is above this
-
technical level and it's building another
chaos that isn't healing with those
-
things.
A: It's dreadful, isn't it.
-
Laughter
A: There are, so you might, in fact some
-
of my colleagues are interested in this
question, if you, you might well want to
-
bound what your artificial intelligence
can access or touch and for that you might
-
want this kind of technology. But this is
not, we are, with machine learning systems
-
we are intrinsically building things that
we, on the whole, don't understand and
-
that will have edge cases that go wrong.
And this is not speaking to that in any
-
way.
Herald Angel: OK. Does the internet have a
-
question? No, good. I don't see anyone
else, so let's conclude this. Thank you
-
very much.
A: Thank you.
-
applause
-
35c3 postroll music
-
subtitles created by c3subtitles.de
in the year 2019. Join, and help us!