36C3 preroll music
Herald: Our next talk is held by Mike
Sperber, and he is already very ready for
that. He's head of a software company in
Tübingen in Germany and he's going to talk
about "Getting software right with
properties, generator tests and proofs".
And the main thing here is functional
programming. One tiny thing you might not
know about him is that 1986 he won a
federal competition on IT and so give him
a warm applause for that and also for his
talk.
applause
Mike Sperber: Thank you very much. So is
anybody actively using the induction loop
feature in the first couple of rows? Cuz
I know somebody who would like to know.
Not right now. Okay. Anyway, so let me get
one shameless plug of advertising out of
the way. If you find the contents of this
talk interesting, we're running a
developer conference in Berlin in February
called Bob, which is very friendly and
very nice, very tiny compared to this one.
And we'd love to see you there. Another
thing, this is an introductory talk. So if
you were expecting the latest developments
on proof tactic, in fact, if you know what
proof tactic is, then all you might get
from this talk is sort of mild amusement.
And I won't be mad at you at all if you go
for one of the more exciting talks. Ok?
So. Or leave at any time. That's perfectly
fine, if this material is not for you.
Speaking of introductory talks, here's a
piece of code written in the language that
I will use for this talk and it's called
Idris. Who has written an Idris program
before? Very good. Ok. Oh, there's one
person back there. That means if any part
of this program, as soon as I'm done
explaining, is not clear to you, it's also
not clear to two or three hundred other
people in this room. And I would love to
have your help. Interrupt me, ask a
question anytime in the talk if there's
anything here not clear. It's going to
get, even though it's meant to be
introductory, will get quite technical at
times. So let me try explaining this one.
So this is a classic example in functional
programming that I use often in my talks,
about animals on the Texas Highway. And if
you can see there, the central definition
says data Animal. That's the data
definition of animals. And in this
particular version of the Texas Highway,
there's two different kinds of animals.
There is Armadillo, it's where it says
Dillo there. And there's Parrots, for some
reason, on the Texas Highway. Does that
make sense? Two different kinds of
animals. And you see that definition.
Yeah, nod, that greatly helps me. And if
you see those two definitions for Dillo
and Parrot, you can see, while the arrows
are kind of funny, but you can see that
Dillo and Parrots have two properties
each, and it says their Liveness. That's
one of the properties of an armadillo. And
up there at the very top, you see the
definition of Liveness, it says Liveness
means dead or alive. It's an armadillo,
can be dead or alive on the Texas Highway.
And there's also the Weight. And well, you
see, this colon thing is the type
signature for the constructor for
armadillos. So it says there's a liveness
going on, there's a weight going on, and
then it constructs an animal. And for a
Parrot, there's a string. Every parrot
speaks, right? And so it's the sentence
the Parrot says, and also the Weight. And
it also produces an animal. And, up there,
you can see the definition of Weight is
for simplicity's sake, I'm saying that
Weight is a type. So that's kind of
unusual for Idris, but you don't have to
worry about it. But you can see there,
Weight is just the same thing as an
integer. And if you look down there, where
it says a1, a2 and a3, that has three
examples for animals. So it says a1:
Animal, just to say that a1 is an animal.
So, Idris is a language that always has
type declarations. And it says a1 is Dillo
Alive 10. And that means it's an
armadillo, it's still alive, and it
weighs, let's say, ten kilograms. The
second one is dead, a little bit heavier,
weighs twelve kilograms. And the third
animal is a parrot that knows, well, it's
a pirate's parrot, obviously, and maybe
weighs three kilograms. Ok, so far? Ok. So
if you have any question about any of
that, then please ask away. So, what
happens to animals on the Texas Highway
is, you know, people drive cars, they run
them over. So there's a function down
here, and, well, we're doing functional
programing, shouldn't worry you at all.
But what's important here is that it says
there is an animal going in, an animal
going out. And really what this means is
that this animal object up there is not
really the animal. It is the state of the
animal at a given time. So, runOverAnimal.
you can see the type signature that says
an animal goes in, an animal goes out. And
what it really means is, the state of the
animal goes in before it gets run over and
the state of the animal after it gets run
over comes out. And then while we know
there's two different kinds of animals.
And that means that for the definition of
runOverAnimal, we need what's called
equations. There's two different
equations. And the first equation says
what happens to armadillos when they get
run over. So an armadillo has a liveness
and a weight. Here's something going on
called pattern matching. And the second
equation says when there's a parrot going
on it has a sentence and a weight, and on
the right hand side, you can see, well,
when an armadillo gets run over. Well, all
that means is, the liveness sort of turns
to Dead. We're constructing a new
armadillo object and it's dead and it has
the same weight as before. And the
function, the equation at the bottom says,
well, when we run over a parrot, it turns
really, really quiet. Ok? So, classic
example. Ok so far. We're going to return
to that example at the very end. Right now
it's just to illustrate the language that
we're doing things in and we're going to
do a lot of things without complicated
programs. So, well. So, I'm going to jump
around a little bit. So, one thing. So,
just the other day, two weeks ago, I was
teaching a course on architecture and
somebody said: Well, there's this problem.
I'm building a domain model. I'm putting
the domain model in a database. And, you
know, customer comes in, has new
requirement or somebody comes in, has new
requirements. And that always ends the
same way. I put a new call in the database
and, you know, seven, eight, nine, it just
goes on and on. As the software gets older
and older and older, more columns that
create the old ones seem a little stale.
And so, yes, well, how can we build models
that are flexible? And so, here's
something completely different, you might
think. So, here's sort of the key to that,
to building flexible models. Does anybody
recognize this? Does anybody associate a
word with this? laughter Very good. So,
you might remember, depending on what
state you went to school in, you might
remember that this is a property called
associativity. Right? And it means that we
can associate either the A and the B first
with the parentheses or the B and the C.
So, and this is, if you take away one
thing from this talk, it's associativity.
Knowing what that is is one of the most
useful things in software development. So,
of course it's just a generic equation, we
really need to be more specific, namely
that we're dealing with numbers and
addition. And you might know that it's not
just addition that's associative. Also,
multiplication, for example, is
associative. So here's a little mathy
stuff there at the beginning. So, you see
that upside down A. That says "for all".
We just say for all. What that means is
"for all A, B and C". And then, this funny
epsilon-shape letter kind of thing, it
means "element of". And then that funky N
means the natural numbers. So all the
numbers from zero, one, two, three, the
whole numbers from zero on up. So, what
that means is, for all natural numbers A,
B and C, the associative property holds
when you add them up. But while it says
numbers in addition, it doesn't just hold
for numbers and, addition, in fact,
associativity is everywhere around us.
Specifically, it's everywhere around us
when we program. So here's another
example. When you're dealing with lists
and that funky, the two pluses that you
see there, they are just list
concatenation. So you concatenate two
lists and well, of course you can
concatenate three lists by just using that
double plus in any order. And that's also
associative. So, it doesn't matter if you
first concatenate the B and the C and then
tack the A onto the front, or if you
concatenate the A and B and tack the C on
at the end. Doesn't matter, you always get
the same result. So lists and
concatenation also have this associative
property. And here's something that I
always find very, very enlightening is
that you can construct images that way.
Well, you don't see it here. So here's an
image. Well, it's from a cool researcher
of mine and functional programing, Brent
Yorgey, and he has a great library out
called diagrams, for constructing diagrams
out of parts. And so this really is what
associativity is about. It's about
operators that construct things out of
parts. And so, as you can see here, well
there's different shapes here, there's
sort of the black rectangles, there's a
different rectangle set, that denote the
towers of Hanoi. We're not really going to
deal with the towers of Hanoi here,
really. The important thing that the image
consists of several parts. And well, in
normal or sort of in classic object-
oriented programming, when you do
graphics, you have a canvas and you might
draw pixels on that canvas. You know,
might be square shaped or a circle shaped
canvas pixels. But what we're doing here
is, we are treating an image as a data
type and the definition is not important.
What is important is that there are a
couple of functions that construct sort of
simple images. So here's a function that
you might imagine called star and it
constructs stars. And well, you can see up
there there's a type declaration and it
says, well, the star function, it accepts
an integer, it accepts a Mode, whatever
that is, accepts a Color and it produces
an Image. And we can call that star
function with the arguments 200 and Solid
and Gold. So Mode is Solid or Outline. And
then we have a Color and we get Image and
that image is an object. Not particularly
exciting. But while we might have another
function called Polygon, Polygon takes two
integers that denote the size of the
polygon and the number of vertices, and
also whether it's an outline or whether
it's solid and a color. And for example,
if we call it with 180, again, that's the
size and 5 we get a five corner polygon
and we get that as an outline and it's in
red. Now, the idea here is that we can
combine. Just as we can combine two
numbers or we can combine two lists, we
can combine two images. Maybe the most
intuitive way of combining two images is
just sticking them beside each other. So
there's a function there called beside.
And it takes an image and it takes another
image and produces an image. Right. And
this is exactly what we're thinking about
when we talk about associativity. We're
talking about a sort of a binary operator
that produces the same thing that went in.
And so, for example, we could stick those
two images next to each other. We could
also imagine an operator called above that
just puts one image above the other image.
And we can combine these two things. Here
it really is important that the same thing
comes out so that it's image goes in,
another image goes in, an image goes out.
So we could again call above on the result
of beside and make arrangements. So here's
a tiling arrangement for your bathroom or
something like that. Now, beside and above
are are two possible operators and you
might already think about associativity,
but really the more fundamental one is
overlay. You put two images on top of each
other. And so again, overlay has the right
type. An image goes in, another image goes
in, and an image comes out. And if we take
the gold star and the pentagon and put
them on top of each other, then it looks
like this. And we can then formulate an
associativity property. It might not quite
look the same because I wrote overlay in
front rather than between the operators.
We could also write it between. But just
to show you that it's the same idea. So,
it doesn't really matter if we first take
two images, A and B and superimpose those
two and then put those two on top of C or
if we do it in another order. Does that
make sense so far? Ok. No? Do you have a
question?
Anonymous: Mumbling
Mike: Yeah, so ahh, good point, good
point. So this implies that there must be
some kind of, that there's probably some
notion of transparency involved. Yes. Yes,
there is. But then you have associativity.
And really what it means. Very good
question. So, if you think of this image
in terms of the color at certain
coordinates, Right, Well, you need to
think about how to combine those two
colors that are in the constituent images.
And you can imagine that there also has to
be a combination operation for the colour.
And that also needs to be associative as a
prerequisite for the for the overlay
operation to be associative. Does that
make sense now? Thank you. Good question.
Great question. So anyway, so since this
associativity property is something that
is not just restricted to numbers, as we
may have learned in school, it really
makes sense to get. And that means that
when we talk about associativity, we
always have to name two things. We have to
say what set we're operating on and what
the operation is. And the combination
those two things has a name in mathematics
and it's not the best name, but it's
called a semigroup. Right. And, but, you
know, if you drop it in certain circles,
they'll think that you're an expert on
mathematics, you might try that. So, just
to go over that: So, you have some subset
S, and that S might be Image, it might be
the natural numbers or something like
that. And we have an operation that I'm
just gonna call circle here, then take any
a, b and c from that set S. We can use
circle as an operator and we have that
associativity property and for that circle
you can put in overlay, you can put in
beside, you can put in above, you can put
in + you can put in times or you can put
in the list concatenation operator, the
++. Okay? And that associativity is great.
It's really my favorite property because
it means when we have a whole lot of
things that we combine, we can
parenthesise in any way we want. We will
get the same result no matter which way we
parenthesise them. And that really means,
we can leave out the parentheses when we
write an expression that involves only the
circle operator if it's associative, if we
can just leave out all the parentheses
because the parentheses don't matter. And
that makes it well, that makes it
instantly easier to read, I think. Also it
has practical uses. So if you do big data
processing associativity means that if you
have large datasets that span several
machines or several hard drives or several
data sources, and you're combining them
and you have an associative combination
operation, it just means you can rearrange
that combination operation according to
the load in your compute cluster. And that
makes it a very useful property when
you're doing big data processing in sort
of MapReduce based frameworks. But, I
mean, that's a practical application, but
I think it's much more useful,
associativity is much more useful when you
use it for designing your domain model.
And I talked in the beginning how, well,
you want to avoid always adding more
database columns. And one way of doing
that is to view your domain model, not as
something that has more and more
properties, but your domain model as
building blocks that you combine into a
larger building blocks the same way that
we combine images from simpler images. So
here's one of the great papers from
functional programing, one of my two or
three favorites from Brent Yorgey. And
it's called "Something Something: Theme
and Variations". And you can see that it
is about images. And these images get
superimposed with an operation that is
just like overlay and that is, that title
is eminently googleable. Now, it has a
funny word there. It says, it doesn't say
semigroup, could say"Semigroup: Theme and
Variation", it says "Monoid: Theme and
Variation". And a monoid, well, it's also
not something, even though it sounds kind
of fancy, it's actually not much more
complicated than a semigroup. It's a
semigroup. And also the semigroup has a
special element called the neutral
element. And whenever we combine something
with a neutral element, it doesn't matter
if we do it in front or at the back, we
get the same thing back. So, of course,
the neutral element with respect to
numbers, in addition, would be zero. The
neutral element with respect to lists and
and concatenation would be the empty list.
I always hear several voices. That's
wonderful. Thank you. And the same thing
for the overlay and beside and above, you
can imagine that you have just an empty
image that has only, that consists only of
transparency, that can work as the neutral
element. So all of these things that I
showed you that are associative, they're
not just associative. They're not just
semigroups, they're also monoids. And so,
as I said, as long as you remember
associativity, that's the important thing.
But often you also find a monoid, and
monoids in the wild they're just
everywhere. We've seen them for numbers
and lists and images, music forms a
natural monoid. You can you can describe
musical structure with monoid operations.
You can treat animations, the time axis.
You can define monoidal combination of
animations. A famous example in functional
programming is with financial contracts.
If you were here last year for a talk of
mine, we talked about semiconductor-
fabrication routes, which sounds very
concrete, but also they form a monoid. The
properties themselves that we'll see for a
monoid are all kinds of things. They're
everywhere around you. And these are
really the key towards making flexible
domain models because in almost any domain
model you can find a monoid just by
looking for building blocks and for ways
of combining those building blocks into a
larger building blocks. So let me get
back. So I said, well, you can use
associativity or you can use this monoid
thing to guide your design. And I haven't
really made that concrete yet. And so I
stole a couple of pictures from Brent's
paper. So you remember the beside and the
above operations. And those are fine for
arranging things sort of in the vertical
and the horizontal axis. The way that they
work is, they make, they put a bounding
box around every picture and then they
arrange the bounding boxes either beside
each other or above each other. So it's a
slightly more involved thought. And that
works great when you're, when, you know,
your picture is, happens to be a square
that's aligned with the axes. It doesn't
work so well if your picture is rotated,
right. Because the bounding box, the
bounding box then is too big. And if you
want to attach anything about, just about
in any direction, then there's going to be
a gap in your picture. And so beside and
above are not particularly good operations
as the basis for an image library. The
overlay operation is much better. But that
leaves open the question how you can
arrange pictures, several pictures so that
they are beside each other or that they
just touch. And Brent came up with this
idea of an envelope, technical idea. So
the idea is that, well, if you give me, so
the red dot there, that's the origin. If
you give me a vector starting at the
origin, I will tell you how far you have
to go along that vector so that I can draw
that blue perpendicular line that's just
outside the shape. And that's called an
envelope. And envelopes are wonderful. So
if you ship each picture not just with
sort of the visuals that you see, but also
with a function that describes the
envelope, then you can use that envelope
to arrange pictures both in the horizontal
and the vertical, but also in the diagonal
by just drawing vectors so that they
touch. So, that's a slightly more
complicated idea. Does it make sense? And
Brent goes through the motions of using
that inspiration from the monoid that he
is getting. He's saying "Everything must
be a monoid! Absolutely.", and uses that
as a guiding principle through the
library. So I'm not going to go into
technical detail on how that works, but
it's a very pleasing paper to read on
that. And it results in a beautiful
library that's great fun to use. So that
means, though, that you also have to find
a monoidal combination operation for the
envelopes. You can't just, we've already
seen how we can combine the pictures
themselves, but we also need to combine
the envelopes. And fortunately, that's
pretty easy. If somebody sets a vector in
a certain direction, then that envelope is
just a maximum. Those two pictures, right,
if you combine that ellipse and that
square, you can see that I'm just going to
have to go to the maximum of those two
numbers in order to just be outside the
composite shape that that comes up
superimposing those two things. So that's
great. Now, I sort of introduced these
properties as a mathematical thing, right.
I said, well, there's this fancy, fancy
upside down operator says for all images
and we might say for all images. Now, we
can also formulate these properties as
code. And that's really where additional
magic is. So, for example, the
associativity property, well, there's not
much of a difference except that the
image1 and image2, they are now in
typewriter font. So we could put those in
the program. But there's still that
mathematical stuff on top. But in a
functional language, in a lot of other
languages too by now, we could also put
the top line and translate that into code.
And it might look like this. So that's
what it looks like in Idris. So, it's not
quite the same, but maybe we recognize the
structure. So, we say, well, there's a
property called and the property is just
called overlayAssociative. So we give it a
name. So, Idris is an ASCII language,
still so, primarily. So, we say just
forAll there instead of the upside down
all. And then it says arbTriple arbImage
arbImage arbImage. And that means for all
arbitrary triples of arbitrary images and
other arbitrary image and another
arbitrary image, so, triples, three
things. And we're going to call those
three images image1, image2 and image3.
That funky backslash there, that's a
Lambda in Idris. And then the overlay prop
means that while, if we overlay one way
and we overlay another way, according to
associativity, we get the same result. Do
you recognize that structure? Right. That
it's the same thing. So that we're writing
structurally the same thing that we wrote
in mathematical notation. Now as a piece
of code. And now the great thing is once
we've written it as a piece of code, we
can manipulate it in a program.
So, one way, there is different ways
of manipulating it. But one of the most
useful ones is, again by another great
researcher in functional programming,
John Hughes, came up with something called
QuickCheck. So if there's another thing
you take away from this talk is: google
QuickCheck. And whatever language you use,
it doesn't have to be Idris. In fact, I
had to hack together a QuickCheck for this
talk, but basically any other language is
going to have a QuickCheck, whether that
language be a functional language or
whether it's Java or Python or R or
something like that. You can always get a
QuickCheck for that. And I'm going to try
and demonstrate this QuickCheck thing not
by thinking about the design so much, but
by demonstrating a property of something
that's very error prone. So, here's this
idea, we want to have a representation for
sets of natural numbers. And we're going
to represent those sets of natural numbers
by a list of intervals. So, by a list of
ranges, if you will, between two numbers.
Now, I'll try to explain that. So, up
there at the top, it has a type
definition. It says, ISet, interval set,
is a type. And that type is defined to be
just a synonym for a list of pairs of
natural numbers. That's what those round
parentheses with a comma in the middle
mean. OK. And just to see what that means
is, there's a function there. I haven't,
I've lighted the definition, but what's
important about it is its type signature.
It takes an interval set and it produces a
list of all of the members of that set.
And you can see sort of a demo thing here
that I typed in before the talk. So, if I
apply iToList so that, the brackets there
they just mean the list, and we feed in a
list of intervals and those intervals are
from zero to three, from five to seven,
and from nine to ten, respectively.
They're all inclusive. And you can see
down there is a list of all of the
members. So, the first interval is from 0
to 3. So, it has the numbers 0, 1, 2 and
3. The next one goes from five to seven.
So it has the three numbers 5, 6, 7. And
the last one goes from nine to ten. So it
has the two numbers, 9 and 10 there. Does
that make any sense again? Slightly more
complicated example. So let's see. So, of
course, well, not of course, but the way
we want to do it, the way I want to do it
is, I want to have the interval set
structured in a certain way. I don't just
want any list of any pair of numbers to
denote an interval set. And therefore,
here is a function that describes what it
needs to be a valid interval set. Right.
So, for example, we don't really, in order
to have efficient processing, we don't
really want two intervals in one interval
set to overlap. Right. We want them to be
disjoint and we also want them to be
ordered so we can have efficient
operations for certain things. Right. And
so, let's go through this. So, there is an
isValid function that just tells you
whether that interval set is valid or not.
It says, well, if that set, and there's
three different cases here, which is why
there's three different equations, in the
first equation says the empty interval
set, the empty brackets mean the empty
list, and if the intervals, the list
representing the interval set is empty,
then we're going to say True. Empty set -
perfectly fine. The next case says, our
interval set consists only of a single
interval and that single interval goes
from low to high. Well, we kind of
interpret that there, but, and, well, that
interval set is valid, if low comes in
front of high. Right, they shouldn't be
the other way around. So, does that make
sense? Somebody in, can you nod at the
back, a little bit? You're still there?
OK. Thank you. Great. So, then it becomes
a little bit more complicated and it says,
well, this is the third case, when there's
at least two intervals in the interval
set. And those two intervals are, the
first one goes from lo1 to hi1. The second
one goes from lo2 to hi2. So, those ::,
they separate the first element of a list
from the rest. And then there's the rest
of the list. And it says, well, again, we
want the interval to be ordered so that
the lower numbers are on the left. That's
where it says lo1 is less or equal hi1.
And then it says, well, that there should
be a gap between two consecutive
intervals. Otherwise, they should be one
interval, which is why the high from one
interval should be separated from the low
of the next interval by at least one. And
then we're going to say, well, also we
want all the rest of the list, including
lo2 and hi2 to be valid too. So far so
good? OK, so this is probably, well, the
second most complicated piece of code. So,
anyway, so, here's, so, we might imagine a
union function. And the union function,
guess what, it forms a monoid, with
respect to interval sets. So, it takes,
two internal sets go in and another one
comes out. And if you've written that kind
of thing before, you might notice it's
probably a little tricky with that fancy
validity condition that's there. So how
can we get this right? Well, what we do is
we write down properties. Of course, we
could write down associativity. I'll leave
that as an exercise. Another one is just
very simple. Just a very simple property
that says for all pairs of two arbitrary
interval sets, we want the union of those
two interval sets to be valid, a valid
data structure. We want the union function
to preserve validity. OK? Makes sense? So
here's another property that says, well, I
already gave you this function or I told
you that there is this function iToList,
which just gives us a list of elements of
an interval set. And what we can do is, we
can use sort of that representation,
that's also a representation for sets. We
can use that representation sort of as a
model and say, well, if we take the
unions, you see there for all pairs,
again, of arbitrary interval sets, we take
the union. It says iUnion, iset1 and
iset2. And we convert that to a list. And,
we could also do, we could instead convert
each individual set to a list and then
just merge those two lists. And that
should yield the same result. So, in a
way, we're just giving a very simple model
for our interval sets, right, and that
would, so those two criteria would be kind
of nice to have in order to get our
implementation correct. And I already got
started before the talk on this. Looks
like this. No. Doesn't look like this.
We'll get to that later. But like this.
So, here's what I came up with. So, you
see there is that, while there's all this
other code there, ignore that. But there
is iUnion says ISet -> ISet -> ISet, do
you see that? And then, there's two
equations that say, well, the first set is
empty, then I'm just going to give you the
second one. And if the second one is
empty, I'm just going to give you the
first one. Right? Classic things when you
have union or concatenation operations or
something like that. And now you can see
the third case. It gets tricky, right?
Again, you don't need, I mean, main thing
is you need to understand it's tricky.
Well, the third one is such that, well,
says so that both have at least one
element and that element is in the
interval lo1 and hi1 in the first case and
lo2 and hi2 in the second case. And then
there's the rest. And I already put in a
little bit of code, and I said, well if
lo1 comes after hi1 (means hi2), then we
want to start with lo2 to hi2 and then
continue with the union. In the other
case, if lo2 comes after hi2 (means hi1),
then we're gonna start with lo1 and hi1.
And in the other case, it means, that no
interval comes before the other, and
therefore we need to merge the two
intervals at the beginning. Does that make
remote sense? Right. Don't worry. We'll
get back on solid track. So, we just take
the minimum of those two intervals and
maximum of those two intervals and we do
this. Now, the great thing is, I told you
about this tool by John Hughes called
QuickCheck. And the great thing is, we can
load this into Idris. And then here comes
a REPL, and we can say, I hope I'm doing
this right. So, we want QuickCheck, and we
want, what was it called? It was called
prop_unionCorrect. I hope I'm doing this
right. And, well, very small font. But you
can see here it says "100 tests". And that
is what QuickCheck does, as, it takes your
code version off the property and
automatically generates a lot of tests for
them. And that is super effective at
weeding out bugs. So it says, well, the
thing that you wrote is correct. It always
produces interval sets that when you take
the list, it gives you the right result.
But there was that other criterion called
unionValid. And there it says, and this is
really the better part, of course, of
QuickCheck is, when it fails, it says it's
falsifiable. It says there is a counter
example. And so, here it says, I did nine
tests, I generated nine random tests, and
I found one where the result is not valid.
And the great thing is that we can then go
and cut and paste this example. So I could
say iUnion, this, remove the comma in the
middle, and call this. And well, what
happens here is, what we can see is, we
can see 2 and 4, 1 and 1, and 3 and 5, and
what's not valid about. So, by the way,
this is randomized. So, this always goes
differently. So I have to look at it, too.
So, then it says, well, those two
intervals, they should really just be
merged and they should just be one
interval. Right? And so, it didn't do that
correctly. And the reason for that, maybe
you saw it. So, and, what happened is that
it ran into one of those two cases here
where it says if lo1 greater than hi2 or
lo2 greater than hi1. Remember that I told
you there needs to be a gap of at least
one between them. Remember? And here's an
off-by-one error that says, well. So this
says, they can, lo1 greater than hi2 says
they can still be right next to each
other. Right? And this is what happened
here. We need to make sure that there is
that gap in here. So, I can fix it like
this. Loaded again. Oh, no. There's still
a counterexample. So, and we can try that
out, so, and that's great. We get test
cases that sort of show where the bugs
are. And in this case, well, what happened
here? They still overlap. And what
happened here? So can you see it? So, you
can see that the first two intervals, they
must run into that last case. Right.
Because they overlap. Zero is the interval
from 0 to 3 and the interval from 0 to 5.
They overlap. So we need to get to that
case. And so it merges them. And then it
went and and somehow didn't merge it with
the 6 and the 7 that's there. And, so,
well, if you look at it. So it must have
done this. And and what it did is, it then
went on with the rest there. Let's have
one more look. What actually happened? So
there it is. So, it merged those and then
you can see that it went into a symmetry
problem here. Well, maybe you don't see.
But, you know, this is tricky stuff. I
couldn't do this by myself. So you can see
here that it just tacks the result onto
iset1Rest, whereas the maximum of hi1 and
hi2 could, might violate the consistency
criteria if it's the wrong one, and then
it runs into one of the other cases. Now
I've never seen this tricky one. Does it
make sense? But, can you see that it
should be symmetrical? The last one. Can
you see it? OK, so we'll try and make it
symmetrical. Do it like this. So we'll
say, well, if so, this only works. So if
hi1 is less than hi2. So we really need to
make sure, then it is perfectly. And then
the maximum of those two numbers is hi1.
Does that make sense? And so the max of
those two numbers is hi1 and then it's
perfectly valid to tack it onto iset1Rest.
In the other case, hi2 is greater and we
need to go and do something different and
rip this out here. Stick it in front here
and then. And then. And now it's
symmetrical. OK. So, load this. And, ahh!
It has passed the test. OK, live great.
applause
Thank you. I did practice getting it
correct, right. But you can, you know,
this kind of stuff. It always gets me. I
mean, you know, with old age especially,
this kind of stuff, it always drives the
sweat on my forehead, right? You know,
there's off-by-one. There is, you know, I
don't know how many cases there need to
be. And QuickCheck is the kind of thing
that weeds out the bugs. And even though
it weeds out the bugs in a different order
each time, it always weeds them all out.
OK. So it's a great tool. Now, I recommend
that you try that. It generates tests from
properties. OK, where are we? So let me
let me give you a couple of real world
examples. So if you're using X windows,
there's a there's a tiling, a window
manager, xmonad. It's already a couple of
years old and they don't do much
development on it anymore. That's because
it's correct. Right. laughter Right. And
why is it correct? Well, it's because they
wrote down a lot of properties for the
geometry and the tiling algorithms and
verified them using QuickCheck. And so I
sort of loosely translated. So, Don
Stewart, one of the authors of xmonad
graciously wrote a couple of blog posts on
a simplified version of xmonad and I
translated them into Idris. So, here's a
very simple idea of just a stacking window
manager. So, it doesn't do geometry, it
just has stacks of windows and it has
several workspaces. In each workspace is a
stack of windows. So here's a data type
called a StackSet, its parameterized by a
type called window. We'll see later why
there's a type parameter and why it just
doesn't say what the windows are. And then
it says there's a constructor StackSet
and there's two fields in there. One is
called "current", that's the number of the
workspace that's currently
active. And then there's "stacks",
which is a map from the number of the
workspace to the stack of, to the list of
windows that sit in that workspace. Again,
so here, really the technicalities are
not particularly important, but there is a
bunch of operations that operate on this
window manager configuration. And again,
here, really the details aren't important.
So you could create an empty stack set.
You could say, well, you know, I have the
number of a window that I want to get to
the front. And please make me, please
rotate me, the stack set around so that I
can see it. "peek" means, you know, maybe
I can get the topmost window that the user
is currently looking at. "rotate" means
I'm just going to rotate the workspaces
around in either left or right direction.
That's what that ordering argument. "push"
is, I push a new window onto the current
workspace. "insert" means insert a window
into one of the other workspaces. "delete"
means I delete a window. "shift" means,
also means I shift something with the
windows. Not really important what they
do. But you can imagine again, just as we
did with the interval sets is validity
criterion or an invariant that should hold
for these operations. And it's very
simple. Well, it says well, if you have a
stack set with some windows in it, I'm
just going to tell you whether that stack
set is consistent. And by doing that, I'm
just going to say, well, the current, the
number of the current stack
should not be higher than the number of
window stacks that there are. Right. So,
the number of stacks that there are. And
the other one, that just says a window
should not be in several of the
workspaces. Right? And then I can go and
maybe with this definition, all
those function definitions aren't very
complicated. But, I can go and write a
whole bunch of properties. And if you just
understand, well, maybe the second one,
"prop_view_I", you understand all of them.
It just says, well, for all pairs of a
natural number and a stack set that are a
"stackIndex" and "stackSet", I want, if I
call the "view" function, which is one of
the operations, I want the view function
to produce a consistent stack set. And
then it goes on to do all of that for all
the other ones. At the bottom here, you
can see some prerequisites that need to
hold for the property so that invariant
only needs to hold if the window, if the
number of the window is actually smaller
than the size of the stack set. Otherwise,
I think the function just returns what
would go with it, what went in there, So
that's a very, that's just a very
efficient way to invent properties, to
think of some invariant that shall hold in
your data structure. And if you know
Idris, you can sometimes encode that in
the types, but often that's kind of
tedious. And you can just write it down as
a property and then have QuickCheck check
it for you. And it's not particularly
exciting for the simple definition, but
you can imagine that the actual definition
when you have tiling window management
going on is much more complicated than the
one that you just saw. But you can keep
those same properties, right? There still
needs to be some consistency invariant
that, if you have tilings, the windows
don't overlap, and things like that. That
should be obvious. Write those properties
down, check them using QuickCheck and that
will weed out a lot of the bugs.
Here's an example from our
practice. We, couple months ago, we were
tasked with migrating a giant Visual Basic
6 application. It had a password checking
function there. You can see here a Visual
Basic 6 type signature. And the property
that we wrote was, well, if we create the
hash from the password and we compare it
with the hash that's in the database, then
they should all come out the same. And to
our surprise, that function, that test,
that property, failed when we ran it for
QuickCheck and we had to correct it
because that password hash is restricted
to 11 characters by some restriction in
the database schema. And so that means
that you can use QuickCheck not just to
sort of check the correctness of things
that you already know, but to actually
develop a model for what goes on in your
software, which you don't always know very
well. So that's what we did there. Another
example is, we wrote, for a large
industrial client, we needed to write a
synchronization application. So when you
had two mobile devices and they would sort
of meet as strangers, they would exchange
data and they all needed to look at the
same sort of device configuration data.
And we didn't want them to exchange all
the data every single time. We just wanted
to exchange them, the data blocks that the
other side was missing. And again, there's
great algorithms for this based on Merkle
trees. They're pretty complicated. You
have to do a lot of bit fiddling with
that. But fortunately, the property for
that is pretty easy to write. So here's
the property that says, well, so the
synchronization algorithm works on sets of
blocks, whatever a block is. So you can
see the property here for all pairs of
sets of blocks and more sets of blocks. So
they're called bs1 and bs2. Block set one
and block set two. What we can do is, we
want, if we union those two, then we get
all the blocks in the system. We call that
all, or we can call the synchronization
algorithm and that will give us two new
block sets, block set bs1' and bs2'. And
those block sets are the ones that get
transferred to the other side. OK. And the
criterion here just says if we take the
ones that we have, if we union them with
the ones that we get, we should get all of
them. That should be all of them. And that
should be the same for both sides. And
also, we want the algorithm to be
efficient so we don't want it to transfer
blocks. So we want to make sure that the
blocks that we have and the blocks that we
get are disjoint. That they don't have any
elements in common. Otherwise, we could
make that algorithm trivially correct by
just transferring all the blocks every
single time. And I can tell you, I
sweated. You know, I sweated one or two
weeks over this algorithm and it was
really hard to write. But this one test
weeded out all the bugs that I found along
the way. So that is just super, super
effective. John Hughes has a couple of
papers on hard bugs that he found. So he
found a bug in a distributed database
called mnesia. And that bug was dependent
on opening the database, closing it and
opening it again. So this is not the kind
of bug that you find by just writing a
bunch of smart unit tests. Right? So, if
you did anything shorter in the beginning,
so if you just open the file and then did
some lookups there, that would not
manifest the bug. You really needed to
close and then open again. Have you
turned, have you tried turning it off and
on again? But then the database breaks in
this case. And here's another example
called The Mysteries of Dropbox. So you
can imagine that with Dropbox you really
want certain properties to hold. Right?
And it turns out they didn't. They never
worried about writing properties down. But
John Hughes did it and found a couple of
bugs. So here's one. It's kind of hard to
read where it says client 1 writes a into
a file that was previously empty. So that
funky turnstile there is empty. So writes
a into a file and then deletes the file
and another client writes, replaces, sees
the a in the file replaces it with a b.
And then client 1 goes and writes c into
the file that it previously thought to be
empty. And then unfortunately, even though
you can imagine that you should see either
b or c in that file, but Dropbox deleted
it. So I think they fixed that bug now.
But. so you go. So it goes. Oscar Wikstrom
has a couple of great, pretty recent blog
posts on properties in a screencasts
editor that I highly recommend. So this is
a great tool for finding bugs, but it's
not the same as having a proof. Right? So,
you can still imagine that you can find
very subtle bugs that are not covered by
QuickCheck. QuickCheck just randomizes,
just generates randomized tests. So, that
is not the same thing as making sure that
there aren't any bugs. So the great thing
about Idris and the reason I chose it for
this talk is that Idris allows you to not
just encode properties in the language. It
also allows you to encode proofs in the
language. So here is the associative
property for the list concatenation
operation. And if you look at the top that
has the definition of that function from
the Idris standard library, it says ++, in
goes a list, in goes another list, out
comes a list. Then it says, well, if you
concatenate the empty list with any list,
that is just that list "right". Do you
see that? The second one says, well, if we
concatenate a list that starts with the
element x and goes on with xs, then we
just sort of pull the x in front and
concatenate the rest with "right". So
that's a classic recursive definition of
list concatenation in functional
programming. And now here's something
really strange in Idris. Here's the type
declaration for a definition, again in the
standard library, called appendAssoc. And
it says, if you have a list a, you have a
list b, and you have a list c, and in the
type it says, oh, the associative property
should hold. Right. And so this is a
statement of that property. That's
wonderful. It's not the same as a proof.
So, but writing proofs, who loved that in
math? Oh. Oh, you're good! I didn't. I'm
sorry. So. So the great thing about Idris
is, it helps you write down the proofs.
I'll show you how that works just really,
really briefly. So here's that. So here's
just what I showed you on that slide. So I
can load that in there and it says, well,
you're not done. You didn't write a proof
for that property, but in Idris, you can
just push a bunch of buttons. Now, I love
that. So I can push one button and it
says, oh, well, you should write a proof
of that form. You have lists a, b and c.
Well, now and I can push another button
that says, well, you're doing this on
lists and if you're writing anything on
lists, you always need to distinguish
between the two cases of the empty list
and the list that consists of the first
element x and further element xs. And then
it says, well, write down something, but
then I can tell Idris: Well, I'm too lazy.
I'm not going to write anything so I can
just push a button. And Idris wrote this
so you can see me, but I didn't type this
right. I just pushed a button and it says,
Refl. What is Refl? What could that be?
Well, you can ask it what Refl is. It says
Refl. Oh, you can see here, landing here.
Refl is just a proof sort of a built in
proof that says that if two things are
true, two things are equal, if they're
identical, if they're the same. Right. And
that kind of makes sense in the first
equation, because the first equation of
appendAssoc corresponds to the first
equation of ++. Can you see that, how it
corresponds? Can you see that? The first
list is empty. Can you see that? Can you
see how the first list is empty with the
first equation of appendAssoc and the
first list is empty up there with ++. Can
you see that? OK. And then it just says,
well, then obviously. Well, not quite
obviously, but then sort of the the way
that the definition works, it comes out
just right. So what's really important is
that Idris accepts that proof with the
first. The second one is slightly more
tricky. But again, we can get help because
we know that appendAssoc is this recursive
function. It recurses on the first
argument. So we're just going to do the
same thing in the proof. And you can tell
Idris that it should use that, that it
should use that fact, if you will. So
here's the recursive call. And again, I'm
too lazy to push a button. But if I push
that button, it also puts in Refl and
there's loads. So this it might be a
mystery to you how it works, but this is a
proof of the associate of property of the
list concatenation in Idris. And since
Idris helps you write it, it's kind of fun
to do that. Oddly enough, even for
somebody who doesn't, usually who doesn't
usually doesn't enjoy proofs. So the way
that you program in Idris, we haven't done
that a lot in this talk, is that you put a
lot of information in the types and the
more information you put in the types, the
better Idris will get at figuring out the
correct definition. And you don't have to
do it by yourself. OK. So that's really
nice. OK. So we got that and and sort of
these kinds of proof assisting systems
such as Idris have been used in a lot of
real world systems. One one prominent
example is SEL4, a version of the L4
micro kernel, has a long history, but
important properties of that kernel have
been verified. It runs in the security
enclave on iOS and even though it's
written in C, it provably does not have
buffer overflows or a lot of the nasty
things that are responsible for a lot of
security exploits. Compcert is another
example, which is a verified, I should
mention this has been verified with the
help of a tool called Isabel. Also, great
fun to use. There's a project called
called Compcert, which is a verified C
compiler, which is important for a lot of
certified software where, you know, the
source code might be certified. But how do
you know that the compiler generates
correct code? And you know, because it's
been proven to be correct. And even there,
you can shoot, you can cheat sometimes. So
for example, register allocators, very
complicated, very hard to prove right.
But what you can do is you can write a
checker that the register allocator did
its job, did its job well and you can
verify the checker. And so you can cheat a
bit. So there's tools for that. We've seen
Idris and there's a number of other tools
and they're getting more and more mature.
And they're great fun, really. They really
are great fun. But, you know, going back,
switching down a gear a little bit,
there's lots of useful properties that you
can look for in your programs. So
commutativity might be useful that you can
switch the two arguments for an operation.
Also, if you have relations, you might
remember that from some math class,
there's some properties here like
reflexivity, symmetry, antisymmetry and
transitivity. Reflexivity says that a is
always related to a. Symmetry says if it's
one way, if a and b are one way related,
they need to be related the other way too.
Antisymmetry intuitively would seem kind
of the opposite. That doesn't make sense.
It's just says: if two things are related
in both both ways around. So, for example,
you know, orders like less or equal are
antisymmetrical, then they must be the
same. And transitivity just says that you
can form chains of your relation. So those
are a little dictionary of useful
properties that you can look for. Let me
close with one fancy property that you've
probably seen somewhere and that property
is called Functor. And you might have seen
in your programing language, in your list
library or in your stream library. There's
a function called map, right? And you
know, even Java has that and has had it
for many years. And what map does is, if
you have some, you know, in Java, for
example, it says "Stream", or it might be
"List", right. It says, well, if I have a
list of As, I can apply a function to each
element of that list. But you can
generalize that, it doesn't have to be
lists. It could be an Optional of As, for
example. You could also apply a function
to the value that's in there. So, you can
generalize that notion, and then it's a
functor. And, of course, in Idris, you can
write down equations for functors. And,
please ignore the technicalities here,
(stammers) but, if you sort of pick out
where it says "functorIdentity",
the middle row says g v equals to v,
which means g is the identity
function. When you feed in v, you always
get v back. And when you use map with the
identity function, you apply the identity
function on each element of your list or
whatever it is. Then then you always get
back the same list. And here just says you
get function composition. So if you apply
one function and then another function and
you do that either inside or outside the
map, you should also get the same results.
So there's also just as there is
associativity with monoids, with functors.
There's these laws and you might think,
well, where would I look for a functor?
I've never seen a functor except for the
ones on streams. A couple weeks ago in a
training, somebody said, well, you always
start with that animal example. Shouldn't
you look for a functor there? And I was
kind of, you know, sweat broke out on my
forehead, I was like, where's that gonna
go? But, we came up with this. So, if you
go back, you can see that this is obvious.
So, what you need for functors is, you
need a type parameter. Right. And so you
just look for a place to stick a type
parameter, any place here at all. And
if you look at Dillo and Parrot, they
both prominently have this weight thing.
Right. And so that seems more important
than the other two properties, which are
specific to particular kind of animal. And
so the weight, the thing to do is just to,
well, you can see I replaced upper case
Weight by lower case weight and made that
into a type parameter, and, I can then
provide a functor implementation down
there. And you might think, what is that
good for? Well, I don't know. Well, one
thing that you could do is you could
provide a different representation for
weights. Another thing that you could do,
if you look at the type for runOverAnimal,
it says animal weight -> animal weight and
weight is a type variable. What that type
signature tells you is that runOverAnimal
does not know what weight is. And that
means that the weight cannot change as a
result of that function. And you see that
in the type signature you get immediate,
small benefit, but you get a benefit even
with silly examples such as this one. And
that really brings me to the end. So in
your software, in your domain model, look
for a Combinator, look for a function that
will combine two things into a bigger
thing. See if you can make that thing
associative and look for a neutral
element. And very often you will find one;
make it a monoid, you know, say monoid a
couple of times. You'll remember it.
You'll remember it. Generally, write
properties for the things, for the
operations in your software, test those
properties using QuickCheck. You know, if
you feel like you have a lot of time,
prove them correct. Find the functor. If
you found, if you found the monoid, you
know, find the functor next. You know, and
it takes, it might take time. I'm
very old. As you noticed at the beginning.
So. So it gets easier over the years and
it will just seem like a regular staple of
your of your arsenal when you program. And
of course, when the important properties
in your program have either been written
down, if they've been tested with
QuickCheck or even proven, then you can
sleep much more soundly than maybe you
currently can. Thank you very much.
applause
Herald: Thank you, Mike. So I see we have
three minutes for questions. Maybe that's
two or three questions. If you have any
come to the microphones, please. Do we
have a question from the Internet? No, not
yet. So, microphone two. Right.
Question: Hi. So, QuickCheck generated
hundred tests. Yes. What can we say about
the quality of this test? Could we say
your program was correct using thoese
tests? Are these tests good?
Answer: Yeah. Very good question. So what
would you say about the quality of the
tests? And indeed, if you really do serve
industrial strength applications, a quick
QuickCheck comes with a bunch of tools
that let you look, for example, at the
distribution of the individual example
rated and while you didn't quite see me do
that, but I mean, for your domain objects,
you will typically write generators that
will generate those examples and you can
reason about the distribution of those.
And you absolutely should do that because
otherwise you might miss large areas of
your test space. So, but there
are tools and they help you do that. But
even if you don't do that, you know, it's,
you find a lot of, I found a lot of
bugs in my software even without worrying
about that. But if you go beyond that,
look at the distribution thing.
Herald: Thank you, next one, please.
Number two.
Q: Let's say I've hacked a program, for
example, in Java or C# or whatever. How do
I, how do I apply what I learned so
far? So, where do I start when I have
already completed the C# program with,
yeah, how do I apply QuickCheck on that?
A: So, just pragmatically because it's
written in C#? That's the question?
Q: Yes
A: So, well, I have to be very concrete
here, I mean, so, if you can think
properties, right, one way to do, so,
for example, so, C# you can link with F#
and there is a QuickCheck version for F#
called "FsCheck". And FsCheck, actually,
even though it's itself written in F#, you
can also use it from C#. So, you have two
options. You can write your tests in a
slightly more awkward fashion in C#, or
you could just link your code with F# test
suite and write it down there. And there
is a fairly reasonable Java QuickCheck, I
hear. Another idea would be to use the
slightly more fancier, the slightly
fancier QuickChecks that exist for Scala
and Enclosure. I'm sure there's one for
Kotlin as well, and link that against your
Java code. Does that answer your question?
Q: So whatever language I use, I have to
find out what the correct implementation
of QuickCkeck?
A: Yeah. Yeah. But as I said, I mean
usually, a fun thing I do in training is,
I chat "QuickCheck" and somebody calls on
the language, you know. Quick, QuickCheck
PHP or something like that. And there is
one, sure enough, I didn't know about before.
Q: Thank you.
Herald: All right. Thank you. And thank
you, Mike, again, for showing us a way to
sleeping soundly.
A: Thank you.
applause
postroll music
Subtitles created by c3subtitles.de
in the year 2020. Join, and help us!