
34c3 intro

Herald: The next talk is called "Endto
end formal ISA verification of RISCV

processors with riscvformal". I have no
idea what this means, but I'm very excited

to understand what it means and Clifford
promised he's going to make sure everyone

will. Clifford has been very known in the
opensource and freesoftware... free

source... oh my gosh... freeandopen
source community and especially he's known

for the project IceStorm. Please help me
welcome Clifford!

applause

inaudible
Clifford: RISCV is an open instruction

set architecture. It's an open ISA, so
it's not a processor, but it's a processor

specification  an ISA specification 
that is free to use for everyone and if

you happen to have already implemented
your own processor at one point in time,

you might know that it's actually much
easier to implement a processor than to

implement all the tools that you need to
compile programs for your processor. And

if you use something like RISCV, then you
can reuse the tools that are already out

there, so that's a great benefit. However,
for this endeavor we need processors that

actually really compatible to each other:
Processors that implement the RISCV ISA

correctly. So, with many other ISAs, we
start with one processor and we say "Oh,

that's the processor" and later on, we
figure out there was a bug, and what

people sometimes do is, they just change
the specification, so that the

specification all fits the hardware they
actually have. We can't do something like

that with RISCV, but there are many
implementations out there, all being

developed in parallel to fit the same
specification, so we want to have some

kind of means to make sure that all these
processors actually agree about what the

ISA specification is. So, what's formal
verification? Formal verification is a

superbroad term. In the context of this
talk, I'm talking about hardware model

checking. More specifically, I'm talking
about checking of socalled "safety

properties". So, we have some hardware
design and we have an initial state and we

would like to know if this hardware design
can reach a bad state from the initial

state. That's formally the problem that we
are trying to solve here. And there are

two means to do that, two different
categories of proofs that are bounded and

unbounded proofs, and with the bounded
proofs we only prove that it's impossible

to reach a bad state within a certain
number of cycles.

So, we give a maximum bound for the length
of a counterexample and with unbounded

proofs, we prove that a bad state can
actually never be reached. So, unbounded

proofs are, of course, better  if you
can make an unbounded proof  but in many

cases, this is very hard to achieve, but
bounded proofs is something that we can

do, so I'm talking about bounded proofs
here for the most part. So, what's endto

end formal verification? Because that's
also in my title. So, historically when

you formally verify something like a
processor, you break down the processor in

many small components and then you write
properties for each component and prove

for each component individually that they
adhere to the properties and then you make

a more abstract proof, that if you put in
system together from components that have

this property, then this system will have
the properties that you want. With endto

end verification, we treat the processors
one huge black box and just ask the

question "Does this one huge thing fit our
specification? Have the properties that we

want?" That has a couple of advantages:
It's much, much easier this way to take

one specification and port it from one
processor to another, because we don't

care about how the processor is built
internally, and it's much easier to take

the specification that we have and
actually match it to other specifications

of the ISA, because we have a
specification that says, what is the

overall behavior we expect from our
processor. But the big disadvantage, of

cause, is that it's computationally much
more expensive to do endtoend formal

verifications and doing this endtoend
verification of a processor against an ISA

specification is something that
historically was always fueled as the

textbook example of things that you can't
do with formal methods, but fortunately

the solvers... they became much better in
the last couple of years and now if we use

the right tricks, we can do stuff like
that with the solvers we have nowadays.

So, that's riscvformal. riscvformal is a
framework that allows us to do endtoend

formal verification of RISCV processors
against a formal version of the ISA

specification. So, riscvformal is not a
formally verified processor. Instead, if

you happen to have a RISCV processor, you
can use riscvformal to prove that your

processor confirms to the ISA
specification. For the most part, this is

using bounded methods. Theoretically, you
could do unbounded proofs with riscv

formal, but it's not the main use case.
So, it's good for what we call "bug

hunting", because maybe there is a
counterexample that would show, that the

processor could diverge from the desired
behavior with 1000 or 5000 cycles, but

usually, when you have something like a
processor and you can't reach a bad state

within the very short bounds, you have
high confidence that actually your

processor implements the ISA correctly.
So, if you have a processor and you would

like to integrate it with riscvformal,
you need to do 2 things: You need to add a

special trace port to your processor; it's
called the RVFI trace port  riscvformal

interface trace port. And you have to
configure riscvformal so that riscv

formal understands the attributes of your
processor. So, for example, RISCV is

available in a 32bit and a 64bit
version. You have to tell riscvformal, if

you want to verify a 32bit or 64bit
processor. RISCV is a modular ISA, so

there're a couple of extensions and you
have to tell riscvformal, which

extensions your processor
actually implements.

And then there're a couple of other things
that are transparent for a userland

process, like if unaligned loads or stores
are supported by the hardware natively,

because RISCV... the spec only says, that
when you do an unaligned load or store,

then a userspace program can expect this
load or store to succeed, but it might

take a long time, because there might be a
machine interrupt handler that is

emulating an unaligned load store by doing
alligned loads and stores, but if we do

this formal verification of the processor,
then the riscvformal framework must be

aware: "What is the expected behavior for
your course?", "Should it trap when it

sees an unaligned load store or should it
just perform the load store unaligned?"

So, what does this interface look like
that you need to implement in your

processor if you would like to use riscv
formal? This is the current version of the

riscvformal interface. Right now, there
is no support for floatingpoint

instructions and there is no support for
CSRs, but this is on the todo list, so

this interface will grow larger and larger
when we add these additional features, but

all these additional features will be
optional. And one of the reasons is that

some might implement just small
microcontrollers that actually don't have

floatingpoint cores or that don't have
support for the privileged specification,

so they don't have CSRs. Through this
interface, whenever the core retires an

instruction, it documents which
instruction it retired; so it tells us

"This is the instruction where I retired;
this was the program counter where I found

the instruction; this is the program
counter for the next instruction; these

are the registers that I read and these
are the values that I've observed in the

register file; this is the register that
I've written and this is the value that I

have written to the register file"
All that stuff.

So, in short, what we document through the
riscvformal interface, is the part of the

processor state that is observed by an
instruction and the change to the state of

the processor that is performed by an
instruction  like changes to the

register file or changes to the program
counter. And of course, most processors

actually are superscalar, even those
processors that say they're non

superscalar. Inorder pipelines usually
can do stuff like retire memory load

instructions out of order, in parallel to
another instruction that does not write

the register, things like that. So, even
with processors we usually don't think of

as superscalar processors, even with those
processors, we need the capability to

retire more than one instruction each
cycle and this can be done with this

"NRET" parameter. And we see all the ports
5 times wider if NRET is 5. Okay, so then

we have a processor that implements this
interface. What is the verification

strategy that riscvformal follows in
order to do this proof to formally verify

that our processor's actually correct? So,
there is not one big proof that we run.

Instead, there is a large number of very
small proofs that we run. This is the most

important trick when it comes to this and
there are 2 categories of proofs: One

category is what I call the "instruction
checks". We have one of those proofs for

each instruction in the ISA specification
and each of the channels in the riscv

formal interface. So, this is easily a
couple of hundred proofs right there

because you easily have 100 instructions
and if you have 2 channels, you already

have 200 proofs that you have to run. And
what this instruction checks do, they

reset the processor or they started a
symbolic state, if you would like to run a

unbounded proof, let the process run for a
certain number of cycles and then it

assumes that in the last cycle, the
processor will retire a certain

instruction. So, if this
check checks if the "add"

instruction works correctly, it assumes
that the last instruction retired in the

last cycle of this bounder checked will be
an "add" instruction and then it looks at

all the interfaces on the riscvformal
interface, to make sure that this is

compliant with an "add" instruction. It
checks if the instruction has been decoded

correctly; it checks if the register value
we write to the register file is actually

the sum of the values we read from the
register file. All that kind of stuff.

But, of course, if you just have these
instruction checks, there is still a

certain verification gap, because the core
might lie to us: The core might say "Oh, I

write this value to the register file",
but then not write the value to the

register file, so we have to have a
separate set of proofs that do not look at

the entire riscvformal interface in one
cycle, but look at only a small fraction

of the riscv formal interface, but over a
span of cycles. So, for example, there is

one check that says "If I write the
register and then later I read the

register, I better read back the value
that I've written to the register file"

and this I call "consistency checks". So,
that's I think what I said already... So

for each instruction with riscvformal, we
have a instruction model that looks like

that. So, these are 2 slides: The first
slides is just the interface there we have

a couple of signals from this riscvformal
interface that we read like the

instruction that we are executing, the
program counter where we found this

instruction, the register values we read,
and then we have a couple of signals that

are generated by our specification that
are output of this specification module.

Like, which registers should we read?
Which registers should we write? What

values should we write to that register?
Stuff like that. So, that's the interface.

It's the same all the instructions and
then we have a body that looks more

like that. For all the instructions that
just decodes the instruction, checks if

this is actually the instruction the check
is for. So, in this case it's an "add

immediate" instruction. And then you have
things like the line near the bottom above

"default assignments": "assign
spec_pc_wdata", for example, says "Okay,

the next PC must be 4 bytes later than the
PC for this instruction. We must increment

the program counter by a value of 4 when
we execute this instruction." Things like

that. So, you might see, there is no
assert here, there are no assertions,

because this is just the model of what
kind of behavior we would expect. And then

there is a wrapper that instantiates this
and instantiates the call and builds the

proof and there are the assertions. The
main reason why we don't have assertions

here, but instead we output the desired
behavior here, is because I can also

generate monitor cores that can run
alongside your core and check in

simulation or an emulation, an FPGA, if
your core is doing the right thing. That

can be very, very helpful if you have a
situation where you run your core for

maybe days and then you have some
observable behavior that's not right, but

maybe there are thousands, even million,
cycles between the point where you can

observe that something is wrong and the
point where the process actually started

diverging from what the specification said
and if you can use a monitor core like

that, then it's much easier to find bugs
like this. Okay, so some examples of those

consistency checks.; the list is actually
not complete and it varies a little bit

from processor to processor what kind of
consistency checks we can actually run

with the processor we are looking at.
There is a check if the program counter

for one instruction  so I have an
instruction it says "This is the program

counter for the instruction and this is
the program counter for the next

instruction"  and then we can look at
the next instruction and we can see is...

the program counter for that instruction
actually approved the next program counter

value for the previous instruction and
they must link together like that, but the

core might retire instructions out of
order. So it might be, that we see the

first instruction for us and then the
second instruction later, but it's also

possible that we will see the second
instruction first and then the first

instruction later and because of that,
there're 2 different checks: One for a

pair in the nonreversed order and for a
pair of instruction in the reversed order.

There is one check that checks, if
register value reads and writes are

consistent. There is one check that sees,
if the processor is alive, so when I give

the processor certain fairness constrains,
that the memory will always return, memory

reads at number of cycles, things like
that, then I can use this to prove that

the process will not just suddenly freeze.
This is very important and this will also

prove that the processor is not skipping
instruction indices, which is very

important, because some of the other
checks actually depend on the processor

behaving in this way. And so forth. So,
there are couple of these consistency

checks and it's a nice exercise to sit
down in a group of people and go through

the list of consistency checks and see
which which set of them actually is

meaningful or which set of them actually
leaves an interesting verification gap and

we still need to add checks for this or
that processor then. Okay, so what kind of

bugs can it find? That's a superhard
question, because it's really hard to give

a complete list. It can definitely find
incorrect singlethreaded instruction

semantics. So, if you
just implement a instruction

incorrectly in your core, then this will
find it; no question about it. It can find

a lot of bugs and things like bypassing
and forwarding and pipeline interlocks,

things like that. Things where you reorder
stuff in a way you shouldn't reorder them,

freezes if you have this lifeness check,
some bugs related to memory interfaces and

load/store consistency and things like
that, but that depends on things like the

size of your cache lines, if this is a
feasible proof or not. Bugs that we can't

find yet with riscvformal are things that
are not yet covered with the riscvformal

interface, like the floatingpoint stuff
or CSRs, but this is all on the todo

list, so they are actively working on that
and a year from now, this stuff will be

included. And anything related to
concurrency between multiple heats. So

far, my excuse for that was, that the
RISCV memory model is not completely

specified yet, so I would not actually
know what to check exactly, but right now

the RISCV memory model is in the process
of being finalized, so I won't have this

excuse for much, much longer. So, the
processors currently supported PicoRV32,

which is my own processor, then RISCV
Rocket, which is probably the most famous

RISCV implementation, and VexRiscv. And
there are also a couple of others, but

they are not part of the open source
release of riscvformal. So, if you would

like to add support to riscvformal for
your RISCV processor, then just check out

the riscvformal repository, look at the
"cores" directory, see which of the

supported cores's most closely to the code
that you actually have and then just copy

that directory and make a couple of small
modifications. So, I have a few minutes

left to talk about things like cutpoints
and blackboxes and other abstractions. So,

the title of this slide could just be
"abstractions", because cutpoints and

blackboxes are just abstractions.
The idea behind an abstraction and formal

methods is that I switch out part of my
design with a different part with a

different circuit that is less
constrained, it includes the behavior of

the original circuit, but might do other
stuff as well. So, the textbook example

would be, I have a design with a counter
and usually the counter would just

increment in steps of 1, but now I create
an abstraction that can skip numbers and

will just increment in strictly increasing
steps. And this, of course, includes the

behavior of the original design, so if I
can prove a property with this abstraction

in place instead of the justincrement
by1 counter, then we have proven even a

stronger property and that includes the
same property for the thing in the

original design. And actually this idea of
abstractions works very well with riscv

formal. So, the main reason why we do
abstractions is because it leads to easier

proofs. So, for example, consider an
instruction checker that just checks if

the core implements the "add" instruction
correctly. For this checker, we don't

actually need a register file that's
working. We could replace the register

file by something that just ignores all
writes to it and whenever we read

something from the register file, it
returns an arbitrary value. That would

still include the behavior of a core with
a functional register file, but, because

the instruction checker does not care
about consistency between register file

writes and register file reads, we can
still prove that the instruction is

implemented correctly, and therefore we
get an easier proof. Of course, we can't

use this abstraction for all those proofs,
because the other proofs that actually

check if my register file works as I would
expect it to work, but if we go through

the list of proofs and we run all these
proofs independently, then you will see

that for each of them, it's possible to
abstract away a large portion of your

processor and therefore yield
and an easier proof.

Depending on what kind of solvers you use,
some solvers're actually very capable of

finding these kind of abstractions
themselves. So in that cases, it doesn't

really help by adding these abstractions
manually, but just realizing that the

potential for these abstractions is there,
is something that's very useful when

guiding your decisions how to split up a
large verification problem into smaller

verification problems, because you would
like to split up the problem in a way so

that the solver is always capable of
finding useful abstractions that actually

lead to easier circuits to prove. With a
bounded check, we also have the questions

of what bounds do we use. Of course,
larger bounds are better, but larger

bounds also yield something that is harder
to compute, and if you have a small bound,

then you have a proof that runs very, very
quickly, but maybe you're not very

confident that it actually has proven
something that's relevant for you. So, I

propose 2 solutions for this: The first
solution is, you can use the same solvers

to find traces that cover certain events
and you could write a list and say "I

would like to see 1 memory read and 1
memory write and at least one ALU

instruction executed" and things like
that. And then, you can ask the solver

"What is the shortest trace that would
actually satisfy all this stuff?" And when

that's a trace of, say 25 cycles, then you
know "Okay, when I look at a prove that's

25 cyclic deep, I know at least these are
the cases that are going to be covered."

But more important, I think, is: Usually
when you have a processor, you already

found bugs and it's a good idea to not
just fix the bugs and forget about them,

but preserve some way of reintroducing
the bugs, just to see if your testing

framework works. So,
if you have already a couple of bugs

and you know "It took me a week to find
that and took me a month to find that",

the best thing is to just add the bugs to
your design again and see "What are the

bounds that are necessary for riscvformal
to actually discover those bugs" and then

you will have some degree of confidence
that other similar bugs would also have

been found with the same bounds. So,
"Results": I have found bugs in pretty much

every implementation I looked at; I found
bugs in all 3 processors; we found bugs in

Spike, which is the official implementation
of RISCV in C and I found

a way to formally verify my specification
against Spike and in some cases where I found

the difference between my specification
and Spike, it turned out, it

was actually a bug in the English language
specification. So, because of that, I also

found bugs in the English language
specification with riscvformal. "Future

work": Multipliers already supported,
floatingpoint is still on the todo list,

64bit is half done. We would like to add
support for CSRs. We would like to add

support for more cores, but this is
something that I would like to do slowly,

because adding more cores also means we
have less flexibility with changing

things. And better integration with non
free tools, because right now all of that

runs with open source tools that I also
happen to write  so, I wrote those

tools, but some people actually don't want
to use my open source tools; they would

like to use the commercial tools  and
it's the todo list that I have better

integration with those tools. Maybe,
because I don't get licenses to those

tools, so we will see how this works.
That's it. Do we have still time for

questions?
applause

Clifford: So, I'd say we start with
questions at 1.

Herald: I'm sorry; here we go. We have 2
questions; we have time for 2 questions

and we're going to start with microphone
number 1, please.

M1: Hello, thanks for your talk and for
your work. First question: You told about

riscvformal interface.
Clifford: Yes.

M1: So, does vendor ship the final
processor with this interface available?

Clifford: Oh, that's a great question,
thank you. This interface has only output

ports and actually when you implement this
interface, you should not add something to

your design that's not needed to generate
those output ports. So, what you can do

is, you can take the version of your core
with that interface, the version of the

core without that interface. Then, in your
synthesis script, just remove those output

ports and then run a formal equivalence
check between that version and the version

that you actually deploy on your ASIC.
M1: Thanks; and one short question: When

people say formal verification, usually
others think "Oh, if it is verified, it is

excellent, absolutely excellent. Do you
plan to say that it will find all the

erato for the processor?
Clifford: Well, it depends on what kind of

proof you run. The most work I do is with
bounded proofs and there you only get a

certain degree of confidence, because you
only see bugs that can occur within a

certain number of cycles from reset, but
if you want, you can also run a complete

proof, where you start with a symbolic
state instead of a reset state and then

you can can make sure that you actually
check the entire reachable state space,

but that's a very, very hard thing to do
So, that's not a weekend project. Just

adding the riscvformal interface and
running some bounded proofs is probably a

weekend project if you already have
your RISCV processor.

M1: Thank you.
Herald: Thank you. We actually do not have

time for any more questions, but there
will be time after the talk to ask you

questions... maybe?
Clifford: Yeah. So, maybe you can find me

at the open FPGA assembly, which is part
of the hardware hacking area.

Herald: Super. Very great job to put that
much information into 30 minutes. Please

help me thank Cliff
for his wonderful talk.

applause

34c3 outro

subtitles created by c3subtitles.de
in the year 2018. Join, and help us!