34c3 intro
Herald: The next talk is called "End-to-
end formal ISA verification of RISC-V
processors with riscv-formal". I have no
idea what this means, but I'm very excited
to understand what it means and Clifford
promised he's going to make sure everyone
will. Clifford has been very known in the
open-source and free-software... free-
source... oh my gosh... free-and-open-
source community and especially he's known
for the project IceStorm. Please help me
welcome Clifford!
applause
inaudible
Clifford: RISC-V is an open instruction
set architecture. It's an open ISA, so
it's not a processor, but it's a processor
specification -- an ISA specification --
that is free to use for everyone and if
you happen to have already implemented
your own processor at one point in time,
you might know that it's actually much
easier to implement a processor than to
implement all the tools that you need to
compile programs for your processor. And
if you use something like RISC-V, then you
can reuse the tools that are already out
there, so that's a great benefit. However,
for this endeavor we need processors that
actually really compatible to each other:
Processors that implement the RISC-V ISA
correctly. So, with many other ISAs, we
start with one processor and we say "Oh,
that's the processor" and later on, we
figure out there was a bug, and what
people sometimes do is, they just change
the specification, so that the
specification all fits the hardware they
actually have. We can't do something like
that with RISC-V, but there are many
implementations out there, all being
developed in parallel to fit the same
specification, so we want to have some
kind of means to make sure that all these
processors actually agree about what the
ISA specification is. So, what's formal
verification? Formal verification is a
super-broad term. In the context of this
talk, I'm talking about hardware model
checking. More specifically, I'm talking
about checking of so-called "safety
properties". So, we have some hardware
design and we have an initial state and we
would like to know if this hardware design
can reach a bad state from the initial
state. That's formally the problem that we
are trying to solve here. And there are
two means to do that, two different
categories of proofs that are bounded and
unbounded proofs, and with the bounded
proofs we only prove that it's impossible
to reach a bad state within a certain
number of cycles.
So, we give a maximum bound for the length
of a counterexample and with unbounded
proofs, we prove that a bad state can
actually never be reached. So, unbounded
proofs are, of course, better -- if you
can make an unbounded proof -- but in many
cases, this is very hard to achieve, but
bounded proofs is something that we can
do, so I'm talking about bounded proofs
here for the most part. So, what's end-to-
end formal verification? Because that's
also in my title. So, historically when
you formally verify something like a
processor, you break down the processor in
many small components and then you write
properties for each component and prove
for each component individually that they
adhere to the properties and then you make
a more abstract proof, that if you put in
system together from components that have
this property, then this system will have
the properties that you want. With end-to-
end verification, we treat the processors
one huge black box and just ask the
question "Does this one huge thing fit our
specification? Have the properties that we
want?" That has a couple of advantages:
It's much, much easier this way to take
one specification and port it from one
processor to another, because we don't
care about how the processor is built
internally, and it's much easier to take
the specification that we have and
actually match it to other specifications
of the ISA, because we have a
specification that says, what is the
overall behavior we expect from our
processor. But the big disadvantage, of
cause, is that it's computationally much
more expensive to do end-to-end formal
verifications and doing this end-to-end
verification of a processor against an ISA
specification is something that
historically was always fueled as the
textbook example of things that you can't
do with formal methods, but fortunately
the solvers... they became much better in
the last couple of years and now if we use
the right tricks, we can do stuff like
that with the solvers we have nowadays.
So, that's riscv-formal. riscv-formal is a
framework that allows us to do end-to-end
formal verification of RISC-V processors
against a formal version of the ISA
specification. So, riscv-formal is not a
formally verified processor. Instead, if
you happen to have a RISC-V processor, you
can use riscv-formal to prove that your
processor confirms to the ISA
specification. For the most part, this is
using bounded methods. Theoretically, you
could do unbounded proofs with riscv-
formal, but it's not the main use case.
So, it's good for what we call "bug
hunting", because maybe there is a
counterexample that would show, that the
processor could diverge from the desired
behavior with 1000 or 5000 cycles, but
usually, when you have something like a
processor and you can't reach a bad state
within the very short bounds, you have
high confidence that actually your
processor implements the ISA correctly.
So, if you have a processor and you would
like to integrate it with riscv-formal,
you need to do 2 things: You need to add a
special trace port to your processor; it's
called the RVFI trace port -- riscv-formal
interface trace port. And you have to
configure riscv-formal so that riscv-
formal understands the attributes of your
processor. So, for example, RISC-V is
available in a 32-bit and a 64-bit
version. You have to tell riscv-formal, if
you want to verify a 32-bit or 64-bit
processor. RISC-V is a modular ISA, so
there're a couple of extensions and you
have to tell riscv-formal, which
extensions your processor
actually implements.
And then there're a couple of other things
that are transparent for a userland
process, like if unaligned loads or stores
are supported by the hardware natively,
because RISC-V... the spec only says, that
when you do an unaligned load or store,
then a userspace program can expect this
load or store to succeed, but it might
take a long time, because there might be a
machine interrupt handler that is
emulating an unaligned load store by doing
alligned loads and stores, but if we do
this formal verification of the processor,
then the riscv-formal framework must be
aware: "What is the expected behavior for
your course?", "Should it trap when it
sees an unaligned load store or should it
just perform the load store unaligned?"
So, what does this interface look like
that you need to implement in your
processor if you would like to use riscv-
formal? This is the current version of the
riscv-formal interface. Right now, there
is no support for floating-point
instructions and there is no support for
CSRs, but this is on the to-do list, so
this interface will grow larger and larger
when we add these additional features, but
all these additional features will be
optional. And one of the reasons is that
some might implement just small
microcontrollers that actually don't have
floating-point cores or that don't have
support for the privileged specification,
so they don't have CSRs. Through this
interface, whenever the core retires an
instruction, it documents which
instruction it retired; so it tells us
"This is the instruction where I retired;
this was the program counter where I found
the instruction; this is the program
counter for the next instruction; these
are the registers that I read and these
are the values that I've observed in the
register file; this is the register that
I've written and this is the value that I
have written to the register file"
All that stuff.
So, in short, what we document through the
riscv-formal interface, is the part of the
processor state that is observed by an
instruction and the change to the state of
the processor that is performed by an
instruction -- like changes to the
register file or changes to the program
counter. And of course, most processors
actually are superscalar, even those
processors that say they're non-
superscalar. In-order pipelines usually
can do stuff like retire memory load
instructions out of order, in parallel to
another instruction that does not write
the register, things like that. So, even
with processors we usually don't think of
as superscalar processors, even with those
processors, we need the capability to
retire more than one instruction each
cycle and this can be done with this
"NRET" parameter. And we see all the ports
5 times wider if NRET is 5. Okay, so then
we have a processor that implements this
interface. What is the verification
strategy that riscv-formal follows in
order to do this proof to formally verify
that our processor's actually correct? So,
there is not one big proof that we run.
Instead, there is a large number of very
small proofs that we run. This is the most
important trick when it comes to this and
there are 2 categories of proofs: One
category is what I call the "instruction
checks". We have one of those proofs for
each instruction in the ISA specification
and each of the channels in the riscv-
formal interface. So, this is easily a
couple of hundred proofs right there
because you easily have 100 instructions
and if you have 2 channels, you already
have 200 proofs that you have to run. And
what this instruction checks do, they
reset the processor or they started a
symbolic state, if you would like to run a
unbounded proof, let the process run for a
certain number of cycles and then it
assumes that in the last cycle, the
processor will retire a certain
instruction. So, if this
check checks if the "add"
instruction works correctly, it assumes
that the last instruction retired in the
last cycle of this bounder checked will be
an "add" instruction and then it looks at
all the interfaces on the riscv-formal
interface, to make sure that this is
compliant with an "add" instruction. It
checks if the instruction has been decoded
correctly; it checks if the register value
we write to the register file is actually
the sum of the values we read from the
register file. All that kind of stuff.
But, of course, if you just have these
instruction checks, there is still a
certain verification gap, because the core
might lie to us: The core might say "Oh, I
write this value to the register file",
but then not write the value to the
register file, so we have to have a
separate set of proofs that do not look at
the entire riscv-formal interface in one
cycle, but look at only a small fraction
of the riscv- formal interface, but over a
span of cycles. So, for example, there is
one check that says "If I write the
register and then later I read the
register, I better read back the value
that I've written to the register file"
and this I call "consistency checks". So,
that's I think what I said already... So
for each instruction with riscv-formal, we
have a instruction model that looks like
that. So, these are 2 slides: The first
slides is just the interface there we have
a couple of signals from this riscv-formal
interface that we read like the
instruction that we are executing, the
program counter where we found this
instruction, the register values we read,
and then we have a couple of signals that
are generated by our specification that
are output of this specification module.
Like, which registers should we read?
Which registers should we write? What
values should we write to that register?
Stuff like that. So, that's the interface.
It's the same all the instructions and
then we have a body that looks more
like that. For all the instructions that
just decodes the instruction, checks if
this is actually the instruction the check
is for. So, in this case it's an "add
immediate" instruction. And then you have
things like the line near the bottom above
"default assignments": "assign
spec_pc_wdata", for example, says "Okay,
the next PC must be 4 bytes later than the
PC for this instruction. We must increment
the program counter by a value of 4 when
we execute this instruction." Things like
that. So, you might see, there is no
assert here, there are no assertions,
because this is just the model of what
kind of behavior we would expect. And then
there is a wrapper that instantiates this
and instantiates the call and builds the
proof and there are the assertions. The
main reason why we don't have assertions
here, but instead we output the desired
behavior here, is because I can also
generate monitor cores that can run
alongside your core and check in
simulation or an emulation, an FPGA, if
your core is doing the right thing. That
can be very, very helpful if you have a
situation where you run your core for
maybe days and then you have some
observable behavior that's not right, but
maybe there are thousands, even million,
cycles between the point where you can
observe that something is wrong and the
point where the process actually started
diverging from what the specification said
and if you can use a monitor core like
that, then it's much easier to find bugs
like this. Okay, so some examples of those
consistency checks.; the list is actually
not complete and it varies a little bit
from processor to processor what kind of
consistency checks we can actually run
with the processor we are looking at.
There is a check if the program counter
for one instruction - so I have an
instruction it says "This is the program
counter for the instruction and this is
the program counter for the next
instruction" -- and then we can look at
the next instruction and we can see is...
the program counter for that instruction
actually approved the next program counter
value for the previous instruction and
they must link together like that, but the
core might retire instructions out of
order. So it might be, that we see the
first instruction for us and then the
second instruction later, but it's also
possible that we will see the second
instruction first and then the first
instruction later and because of that,
there're 2 different checks: One for a
pair in the non-reversed order and for a
pair of instruction in the reversed order.
There is one check that checks, if
register value reads and writes are
consistent. There is one check that sees,
if the processor is alive, so when I give
the processor certain fairness constrains,
that the memory will always return, memory
reads at number of cycles, things like
that, then I can use this to prove that
the process will not just suddenly freeze.
This is very important and this will also
prove that the processor is not skipping
instruction indices, which is very
important, because some of the other
checks actually depend on the processor
behaving in this way. And so forth. So,
there are couple of these consistency
checks and it's a nice exercise to sit
down in a group of people and go through
the list of consistency checks and see
which which set of them actually is
meaningful or which set of them actually
leaves an interesting verification gap and
we still need to add checks for this or
that processor then. Okay, so what kind of
bugs can it find? That's a super-hard
question, because it's really hard to give
a complete list. It can definitely find
incorrect single-threaded instruction
semantics. So, if you
just implement a instruction
incorrectly in your core, then this will
find it; no question about it. It can find
a lot of bugs and things like bypassing
and forwarding and pipeline interlocks,
things like that. Things where you reorder
stuff in a way you shouldn't reorder them,
freezes if you have this lifeness check,
some bugs related to memory interfaces and
load/store consistency and things like
that, but that depends on things like the
size of your cache lines, if this is a
feasible proof or not. Bugs that we can't
find yet with riscv-formal are things that
are not yet covered with the riscv-formal
interface, like the floating-point stuff
or CSRs, but this is all on the to-do
list, so they are actively working on that
and a year from now, this stuff will be
included. And anything related to
concurrency between multiple heats. So
far, my excuse for that was, that the
RISC-V memory model is not completely
specified yet, so I would not actually
know what to check exactly, but right now
the RISC-V memory model is in the process
of being finalized, so I won't have this
excuse for much, much longer. So, the
processors currently supported PicoRV32,
which is my own processor, then RISC-V
Rocket, which is probably the most famous
RISC-V implementation, and VexRiscv. And
there are also a couple of others, but
they are not part of the open source
release of riscv-formal. So, if you would
like to add support to riscv-formal for
your RISC-V processor, then just check out
the riscv-formal repository, look at the
"cores" directory, see which of the
supported cores's most closely to the code
that you actually have and then just copy
that directory and make a couple of small
modifications. So, I have a few minutes
left to talk about things like cut-points
and blackboxes and other abstractions. So,
the title of this slide could just be
"abstractions", because cut-points and
blackboxes are just abstractions.
The idea behind an abstraction and formal
methods is that I switch out part of my
design with a different part with a
different circuit that is less
constrained, it includes the behavior of
the original circuit, but might do other
stuff as well. So, the textbook example
would be, I have a design with a counter
and usually the counter would just
increment in steps of 1, but now I create
an abstraction that can skip numbers and
will just increment in strictly increasing
steps. And this, of course, includes the
behavior of the original design, so if I
can prove a property with this abstraction
in place instead of the just-increment-
by-1 counter, then we have proven even a
stronger property and that includes the
same property for the thing in the
original design. And actually this idea of
abstractions works very well with riscv-
formal. So, the main reason why we do
abstractions is because it leads to easier
proofs. So, for example, consider an
instruction checker that just checks if
the core implements the "add" instruction
correctly. For this checker, we don't
actually need a register file that's
working. We could replace the register
file by something that just ignores all
writes to it and whenever we read
something from the register file, it
returns an arbitrary value. That would
still include the behavior of a core with
a functional register file, but, because
the instruction checker does not care
about consistency between register file
writes and register file reads, we can
still prove that the instruction is
implemented correctly, and therefore we
get an easier proof. Of course, we can't
use this abstraction for all those proofs,
because the other proofs that actually
check if my register file works as I would
expect it to work, but if we go through
the list of proofs and we run all these
proofs independently, then you will see
that for each of them, it's possible to
abstract away a large portion of your
processor and therefore yield
and an easier proof.
Depending on what kind of solvers you use,
some solvers're actually very capable of
finding these kind of abstractions
themselves. So in that cases, it doesn't
really help by adding these abstractions
manually, but just realizing that the
potential for these abstractions is there,
is something that's very useful when
guiding your decisions how to split up a
large verification problem into smaller
verification problems, because you would
like to split up the problem in a way so
that the solver is always capable of
finding useful abstractions that actually
lead to easier circuits to prove. With a
bounded check, we also have the questions
of what bounds do we use. Of course,
larger bounds are better, but larger
bounds also yield something that is harder
to compute, and if you have a small bound,
then you have a proof that runs very, very
quickly, but maybe you're not very
confident that it actually has proven
something that's relevant for you. So, I
propose 2 solutions for this: The first
solution is, you can use the same solvers
to find traces that cover certain events
and you could write a list and say "I
would like to see 1 memory read and 1
memory write and at least one ALU
instruction executed" and things like
that. And then, you can ask the solver
"What is the shortest trace that would
actually satisfy all this stuff?" And when
that's a trace of, say 25 cycles, then you
know "Okay, when I look at a prove that's
25 cyclic deep, I know at least these are
the cases that are going to be covered."
But more important, I think, is: Usually
when you have a processor, you already
found bugs and it's a good idea to not
just fix the bugs and forget about them,
but preserve some way of re-introducing
the bugs, just to see if your testing
framework works. So,
if you have already a couple of bugs
and you know "It took me a week to find
that and took me a month to find that",
the best thing is to just add the bugs to
your design again and see "What are the
bounds that are necessary for riscv-formal
to actually discover those bugs" and then
you will have some degree of confidence
that other similar bugs would also have
been found with the same bounds. So,
"Results": I have found bugs in pretty much
every implementation I looked at; I found
bugs in all 3 processors; we found bugs in
Spike, which is the official implementation
of RISC-V in C and I found
a way to formally verify my specification
against Spike and in some cases where I found
the difference between my specification
and Spike, it turned out, it
was actually a bug in the English language
specification. So, because of that, I also
found bugs in the English language
specification with riscv-formal. "Future
work": Multipliers already supported,
floating-point is still on the to-do list,
64-bit is half done. We would like to add
support for CSRs. We would like to add
support for more cores, but this is
something that I would like to do slowly,
because adding more cores also means we
have less flexibility with changing
things. And better integration with non-
free tools, because right now all of that
runs with open source tools that I also
happen to write -- so, I wrote those
tools, but some people actually don't want
to use my open source tools; they would
like to use the commercial tools -- and
it's the to-do list that I have better
integration with those tools. Maybe,
because I don't get licenses to those
tools, so we will see how this works.
That's it. Do we have still time for
questions?
applause
Clifford: So, I'd say we start with
questions at 1.
Herald: I'm sorry; here we go. We have 2
questions; we have time for 2 questions
and we're going to start with microphone
number 1, please.
M1: Hello, thanks for your talk and for
your work. First question: You told about
riscv-formal interface.
Clifford: Yes.
M1: So, does vendor ship the final
processor with this interface available?
Clifford: Oh, that's a great question,
thank you. This interface has only output
ports and actually when you implement this
interface, you should not add something to
your design that's not needed to generate
those output ports. So, what you can do
is, you can take the version of your core
with that interface, the version of the
core without that interface. Then, in your
synthesis script, just remove those output
ports and then run a formal equivalence
check between that version and the version
that you actually deploy on your ASIC.
M1: Thanks; and one short question: When
people say formal verification, usually
others think "Oh, if it is verified, it is
excellent, absolutely excellent. Do you
plan to say that it will find all the
erato for the processor?
Clifford: Well, it depends on what kind of
proof you run. The most work I do is with
bounded proofs and there you only get a
certain degree of confidence, because you
only see bugs that can occur within a
certain number of cycles from reset, but
if you want, you can also run a complete
proof, where you start with a symbolic
state instead of a reset state and then
you can can make sure that you actually
check the entire reachable state space,
but that's a very, very hard thing to do
So, that's not a weekend project. Just
adding the riscv-formal interface and
running some bounded proofs is probably a
weekend project if you already have
your RISC-V processor.
M1: Thank you.
Herald: Thank you. We actually do not have
time for any more questions, but there
will be time after the talk to ask you
questions... maybe?
Clifford: Yeah. So, maybe you can find me
at the open FPGA assembly, which is part
of the hardware hacking area.
Herald: Super. Very great job to put that
much information into 30 minutes. Please
help me thank Cliff
for his wonderful talk.
applause
34c3 outro
subtitles created by c3subtitles.de
in the year 2018. Join, and help us!