-
36C3 preroll music
-
Herald: Our next talk is held by Mike
Sperber, and he is already very ready for
-
that. He's head of a software company in
Tübingen in Germany and he's going to talk
-
about "Getting software right with
properties, generator tests and proofs".
-
And the main thing here is functional
programming. One tiny thing you might not
-
know about him is that 1986 he won a
federal competition on IT and so give him
-
a warm applause for that and also for his
talk.
-
applause
-
Mike Sperber: Thank you very much. So is
anybody actively using the induction loop
-
feature in the first couple of rows? Cuz
I know somebody who would like to know.
-
Not right now. Okay. Anyway, so let me get
one shameless plug of advertising out of
-
the way. If you find the contents of this
talk interesting, we're running a
-
developer conference in Berlin in February
called Bob, which is very friendly and
-
very nice, very tiny compared to this one.
And we'd love to see you there. Another
-
thing, this is an introductory talk. So if
you were expecting the latest developments
-
on proof tactic, in fact, if you know what
proof tactic is, then all you might get
-
from this talk is sort of mild amusement.
And I won't be mad at you at all if you go
-
for one of the more exciting talks. Ok?
So. Or leave at any time. That's perfectly
-
fine, if this material is not for you.
Speaking of introductory talks, here's a
-
piece of code written in the language that
I will use for this talk and it's called
-
Idris. Who has written an Idris program
before? Very good. Ok. Oh, there's one
-
person back there. That means if any part
of this program, as soon as I'm done
-
explaining, is not clear to you, it's also
not clear to two or three hundred other
-
people in this room. And I would love to
have your help. Interrupt me, ask a
-
question anytime in the talk if there's
anything here not clear. It's going to
-
get, even though it's meant to be
introductory, will get quite technical at
-
times. So let me try explaining this one.
So this is a classic example in functional
-
programming that I use often in my talks,
about animals on the Texas Highway. And if
-
you can see there, the central definition
says data Animal. That's the data
-
definition of animals. And in this
particular version of the Texas Highway,
-
there's two different kinds of animals.
There is Armadillo, it's where it says
-
Dillo there. And there's Parrots, for some
reason, on the Texas Highway. Does that
-
make sense? Two different kinds of
animals. And you see that definition.
-
Yeah, nod, that greatly helps me. And if
you see those two definitions for Dillo
-
and Parrot, you can see, while the arrows
are kind of funny, but you can see that
-
Dillo and Parrots have two properties
each, and it says their Liveness. That's
-
one of the properties of an armadillo. And
up there at the very top, you see the
-
definition of Liveness, it says Liveness
means dead or alive. It's an armadillo,
-
can be dead or alive on the Texas Highway.
And there's also the Weight. And well, you
-
see, this colon thing is the type
signature for the constructor for
-
armadillos. So it says there's a liveness
going on, there's a weight going on, and
-
then it constructs an animal. And for a
Parrot, there's a string. Every parrot
-
speaks, right? And so it's the sentence
the Parrot says, and also the Weight. And
-
it also produces an animal. And, up there,
you can see the definition of Weight is
-
for simplicity's sake, I'm saying that
Weight is a type. So that's kind of
-
unusual for Idris, but you don't have to
worry about it. But you can see there,
-
Weight is just the same thing as an
integer. And if you look down there, where
-
it says a1, a2 and a3, that has three
examples for animals. So it says a1:
-
Animal, just to say that a1 is an animal.
So, Idris is a language that always has
-
type declarations. And it says a1 is Dillo
Alive 10. And that means it's an
-
armadillo, it's still alive, and it
weighs, let's say, ten kilograms. The
-
second one is dead, a little bit heavier,
weighs twelve kilograms. And the third
-
animal is a parrot that knows, well, it's
a pirate's parrot, obviously, and maybe
-
weighs three kilograms. Ok, so far? Ok. So
if you have any question about any of
-
that, then please ask away. So, what
happens to animals on the Texas Highway
-
is, you know, people drive cars, they run
them over. So there's a function down
-
here, and, well, we're doing functional
programing, shouldn't worry you at all.
-
But what's important here is that it says
there is an animal going in, an animal
-
going out. And really what this means is
that this animal object up there is not
-
really the animal. It is the state of the
animal at a given time. So, runOverAnimal.
-
you can see the type signature that says
an animal goes in, an animal goes out. And
-
what it really means is, the state of the
animal goes in before it gets run over and
-
the state of the animal after it gets run
over comes out. And then while we know
-
there's two different kinds of animals.
And that means that for the definition of
-
runOverAnimal, we need what's called
equations. There's two different
-
equations. And the first equation says
what happens to armadillos when they get
-
run over. So an armadillo has a liveness
and a weight. Here's something going on
-
called pattern matching. And the second
equation says when there's a parrot going
-
on it has a sentence and a weight, and on
the right hand side, you can see, well,
-
when an armadillo gets run over. Well, all
that means is, the liveness sort of turns
-
to Dead. We're constructing a new
armadillo object and it's dead and it has
-
the same weight as before. And the
function, the equation at the bottom says,
-
well, when we run over a parrot, it turns
really, really quiet. Ok? So, classic
-
example. Ok so far. We're going to return
to that example at the very end. Right now
-
it's just to illustrate the language that
we're doing things in and we're going to
-
do a lot of things without complicated
programs. So, well. So, I'm going to jump
-
around a little bit. So, one thing. So,
just the other day, two weeks ago, I was
-
teaching a course on architecture and
somebody said: Well, there's this problem.
-
I'm building a domain model. I'm putting
the domain model in a database. And, you
-
know, customer comes in, has new
requirement or somebody comes in, has new
-
requirements. And that always ends the
same way. I put a new call in the database
-
and, you know, seven, eight, nine, it just
goes on and on. As the software gets older
-
and older and older, more columns that
create the old ones seem a little stale.
-
And so, yes, well, how can we build models
that are flexible? And so, here's
-
something completely different, you might
think. So, here's sort of the key to that,
-
to building flexible models. Does anybody
recognize this? Does anybody associate a
-
word with this? laughter Very good. So,
you might remember, depending on what
-
state you went to school in, you might
remember that this is a property called
-
associativity. Right? And it means that we
can associate either the A and the B first
-
with the parentheses or the B and the C.
So, and this is, if you take away one
-
thing from this talk, it's associativity.
Knowing what that is is one of the most
-
useful things in software development. So,
of course it's just a generic equation, we
-
really need to be more specific, namely
that we're dealing with numbers and
-
addition. And you might know that it's not
just addition that's associative. Also,
-
multiplication, for example, is
associative. So here's a little mathy
-
stuff there at the beginning. So, you see
that upside down A. That says "for all".
-
We just say for all. What that means is
"for all A, B and C". And then, this funny
-
epsilon-shape letter kind of thing, it
means "element of". And then that funky N
-
means the natural numbers. So all the
numbers from zero, one, two, three, the
-
whole numbers from zero on up. So, what
that means is, for all natural numbers A,
-
B and C, the associative property holds
when you add them up. But while it says
-
numbers in addition, it doesn't just hold
for numbers and, addition, in fact,
-
associativity is everywhere around us.
Specifically, it's everywhere around us
-
when we program. So here's another
example. When you're dealing with lists
-
and that funky, the two pluses that you
see there, they are just list
-
concatenation. So you concatenate two
lists and well, of course you can
-
concatenate three lists by just using that
double plus in any order. And that's also
-
associative. So, it doesn't matter if you
first concatenate the B and the C and then
-
tack the A onto the front, or if you
concatenate the A and B and tack the C on
-
at the end. Doesn't matter, you always get
the same result. So lists and
-
concatenation also have this associative
property. And here's something that I
-
always find very, very enlightening is
that you can construct images that way.
-
Well, you don't see it here. So here's an
image. Well, it's from a cool researcher
-
of mine and functional programing, Brent
Yorgey, and he has a great library out
-
called diagrams, for constructing diagrams
out of parts. And so this really is what
-
associativity is about. It's about
operators that construct things out of
-
parts. And so, as you can see here, well
there's different shapes here, there's
-
sort of the black rectangles, there's a
different rectangle set, that denote the
-
towers of Hanoi. We're not really going to
deal with the towers of Hanoi here,
-
really. The important thing that the image
consists of several parts. And well, in
-
normal or sort of in classic object-
oriented programming, when you do
-
graphics, you have a canvas and you might
draw pixels on that canvas. You know,
-
might be square shaped or a circle shaped
canvas pixels. But what we're doing here
-
is, we are treating an image as a data
type and the definition is not important.
-
What is important is that there are a
couple of functions that construct sort of
-
simple images. So here's a function that
you might imagine called star and it
-
constructs stars. And well, you can see up
there there's a type declaration and it
-
says, well, the star function, it accepts
an integer, it accepts a Mode, whatever
-
that is, accepts a Color and it produces
an Image. And we can call that star
-
function with the arguments 200 and Solid
and Gold. So Mode is Solid or Outline. And
-
then we have a Color and we get Image and
that image is an object. Not particularly
-
exciting. But while we might have another
function called Polygon, Polygon takes two
-
integers that denote the size of the
polygon and the number of vertices, and
-
also whether it's an outline or whether
it's solid and a color. And for example,
-
if we call it with 180, again, that's the
size and 5 we get a five corner polygon
-
and we get that as an outline and it's in
red. Now, the idea here is that we can
-
combine. Just as we can combine two
numbers or we can combine two lists, we
-
can combine two images. Maybe the most
intuitive way of combining two images is
-
just sticking them beside each other. So
there's a function there called beside.
-
And it takes an image and it takes another
image and produces an image. Right. And
-
this is exactly what we're thinking about
when we talk about associativity. We're
-
talking about a sort of a binary operator
that produces the same thing that went in.
-
And so, for example, we could stick those
two images next to each other. We could
-
also imagine an operator called above that
just puts one image above the other image.
-
And we can combine these two things. Here
it really is important that the same thing
-
comes out so that it's image goes in,
another image goes in, an image goes out.
-
So we could again call above on the result
of beside and make arrangements. So here's
-
a tiling arrangement for your bathroom or
something like that. Now, beside and above
-
are are two possible operators and you
might already think about associativity,
-
but really the more fundamental one is
overlay. You put two images on top of each
-
other. And so again, overlay has the right
type. An image goes in, another image goes
-
in, and an image comes out. And if we take
the gold star and the pentagon and put
-
them on top of each other, then it looks
like this. And we can then formulate an
-
associativity property. It might not quite
look the same because I wrote overlay in
-
front rather than between the operators.
We could also write it between. But just
-
to show you that it's the same idea. So,
it doesn't really matter if we first take
-
two images, A and B and superimpose those
two and then put those two on top of C or
-
if we do it in another order. Does that
make sense so far? Ok. No? Do you have a
-
question?
Anonymous: Mumbling
-
Mike: Yeah, so ahh, good point, good
point. So this implies that there must be
-
some kind of, that there's probably some
notion of transparency involved. Yes. Yes,
-
there is. But then you have associativity.
And really what it means. Very good
-
question. So, if you think of this image
in terms of the color at certain
-
coordinates, Right, Well, you need to
think about how to combine those two
-
colors that are in the constituent images.
And you can imagine that there also has to
-
be a combination operation for the colour.
And that also needs to be associative as a
-
prerequisite for the for the overlay
operation to be associative. Does that
-
make sense now? Thank you. Good question.
Great question. So anyway, so since this
-
associativity property is something that
is not just restricted to numbers, as we
-
may have learned in school, it really
makes sense to get. And that means that
-
when we talk about associativity, we
always have to name two things. We have to
-
say what set we're operating on and what
the operation is. And the combination
-
those two things has a name in mathematics
and it's not the best name, but it's
-
called a semigroup. Right. And, but, you
know, if you drop it in certain circles,
-
they'll think that you're an expert on
mathematics, you might try that. So, just
-
to go over that: So, you have some subset
S, and that S might be Image, it might be
-
the natural numbers or something like
that. And we have an operation that I'm
-
just gonna call circle here, then take any
a, b and c from that set S. We can use
-
circle as an operator and we have that
associativity property and for that circle
-
you can put in overlay, you can put in
beside, you can put in above, you can put
-
in + you can put in times or you can put
in the list concatenation operator, the
-
++. Okay? And that associativity is great.
It's really my favorite property because
-
it means when we have a whole lot of
things that we combine, we can
-
parenthesise in any way we want. We will
get the same result no matter which way we
-
parenthesise them. And that really means,
we can leave out the parentheses when we
-
write an expression that involves only the
circle operator if it's associative, if we
-
can just leave out all the parentheses
because the parentheses don't matter. And
-
that makes it well, that makes it
instantly easier to read, I think. Also it
-
has practical uses. So if you do big data
processing associativity means that if you
-
have large datasets that span several
machines or several hard drives or several
-
data sources, and you're combining them
and you have an associative combination
-
operation, it just means you can rearrange
that combination operation according to
-
the load in your compute cluster. And that
makes it a very useful property when
-
you're doing big data processing in sort
of MapReduce based frameworks. But, I
-
mean, that's a practical application, but
I think it's much more useful,
-
associativity is much more useful when you
use it for designing your domain model.
-
And I talked in the beginning how, well,
you want to avoid always adding more
-
database columns. And one way of doing
that is to view your domain model, not as
-
something that has more and more
properties, but your domain model as
-
building blocks that you combine into a
larger building blocks the same way that
-
we combine images from simpler images. So
here's one of the great papers from
-
functional programing, one of my two or
three favorites from Brent Yorgey. And
-
it's called "Something Something: Theme
and Variations". And you can see that it
-
is about images. And these images get
superimposed with an operation that is
-
just like overlay and that is, that title
is eminently googleable. Now, it has a
-
funny word there. It says, it doesn't say
semigroup, could say"Semigroup: Theme and
-
Variation", it says "Monoid: Theme and
Variation". And a monoid, well, it's also
-
not something, even though it sounds kind
of fancy, it's actually not much more
-
complicated than a semigroup. It's a
semigroup. And also the semigroup has a
-
special element called the neutral
element. And whenever we combine something
-
with a neutral element, it doesn't matter
if we do it in front or at the back, we
-
get the same thing back. So, of course,
the neutral element with respect to
-
numbers, in addition, would be zero. The
neutral element with respect to lists and
-
and concatenation would be the empty list.
I always hear several voices. That's
-
wonderful. Thank you. And the same thing
for the overlay and beside and above, you
-
can imagine that you have just an empty
image that has only, that consists only of
-
transparency, that can work as the neutral
element. So all of these things that I
-
showed you that are associative, they're
not just associative. They're not just
-
semigroups, they're also monoids. And so,
as I said, as long as you remember
-
associativity, that's the important thing.
But often you also find a monoid, and
-
monoids in the wild they're just
everywhere. We've seen them for numbers
-
and lists and images, music forms a
natural monoid. You can you can describe
-
musical structure with monoid operations.
You can treat animations, the time axis.
-
You can define monoidal combination of
animations. A famous example in functional
-
programming is with financial contracts.
If you were here last year for a talk of
-
mine, we talked about semiconductor-
fabrication routes, which sounds very
-
concrete, but also they form a monoid. The
properties themselves that we'll see for a
-
monoid are all kinds of things. They're
everywhere around you. And these are
-
really the key towards making flexible
domain models because in almost any domain
-
model you can find a monoid just by
looking for building blocks and for ways
-
of combining those building blocks into a
larger building blocks. So let me get
-
back. So I said, well, you can use
associativity or you can use this monoid
-
thing to guide your design. And I haven't
really made that concrete yet. And so I
-
stole a couple of pictures from Brent's
paper. So you remember the beside and the
-
above operations. And those are fine for
arranging things sort of in the vertical
-
and the horizontal axis. The way that they
work is, they make, they put a bounding
-
box around every picture and then they
arrange the bounding boxes either beside
-
each other or above each other. So it's a
slightly more involved thought. And that
-
works great when you're, when, you know,
your picture is, happens to be a square
-
that's aligned with the axes. It doesn't
work so well if your picture is rotated,
-
right. Because the bounding box, the
bounding box then is too big. And if you
-
want to attach anything about, just about
in any direction, then there's going to be
-
a gap in your picture. And so beside and
above are not particularly good operations
-
as the basis for an image library. The
overlay operation is much better. But that
-
leaves open the question how you can
arrange pictures, several pictures so that
-
they are beside each other or that they
just touch. And Brent came up with this
-
idea of an envelope, technical idea. So
the idea is that, well, if you give me, so
-
the red dot there, that's the origin. If
you give me a vector starting at the
-
origin, I will tell you how far you have
to go along that vector so that I can draw
-
that blue perpendicular line that's just
outside the shape. And that's called an
-
envelope. And envelopes are wonderful. So
if you ship each picture not just with
-
sort of the visuals that you see, but also
with a function that describes the
-
envelope, then you can use that envelope
to arrange pictures both in the horizontal
-
and the vertical, but also in the diagonal
by just drawing vectors so that they
-
touch. So, that's a slightly more
complicated idea. Does it make sense? And
-
Brent goes through the motions of using
that inspiration from the monoid that he
-
is getting. He's saying "Everything must
be a monoid! Absolutely.", and uses that
-
as a guiding principle through the
library. So I'm not going to go into
-
technical detail on how that works, but
it's a very pleasing paper to read on
-
that. And it results in a beautiful
library that's great fun to use. So that
-
means, though, that you also have to find
a monoidal combination operation for the
-
envelopes. You can't just, we've already
seen how we can combine the pictures
-
themselves, but we also need to combine
the envelopes. And fortunately, that's
-
pretty easy. If somebody sets a vector in
a certain direction, then that envelope is
-
just a maximum. Those two pictures, right,
if you combine that ellipse and that
-
square, you can see that I'm just going to
have to go to the maximum of those two
-
numbers in order to just be outside the
composite shape that that comes up
-
superimposing those two things. So that's
great. Now, I sort of introduced these
-
properties as a mathematical thing, right.
I said, well, there's this fancy, fancy
-
upside down operator says for all images
and we might say for all images. Now, we
-
can also formulate these properties as
code. And that's really where additional
-
magic is. So, for example, the
associativity property, well, there's not
-
much of a difference except that the
image1 and image2, they are now in
-
typewriter font. So we could put those in
the program. But there's still that
-
mathematical stuff on top. But in a
functional language, in a lot of other
-
languages too by now, we could also put
the top line and translate that into code.
-
And it might look like this. So that's
what it looks like in Idris. So, it's not
-
quite the same, but maybe we recognize the
structure. So, we say, well, there's a
-
property called and the property is just
called overlayAssociative. So we give it a
-
name. So, Idris is an ASCII language,
still so, primarily. So, we say just
-
forAll there instead of the upside down
all. And then it says arbTriple arbImage
-
arbImage arbImage. And that means for all
arbitrary triples of arbitrary images and
-
other arbitrary image and another
arbitrary image, so, triples, three
-
things. And we're going to call those
three images image1, image2 and image3.
-
That funky backslash there, that's a
Lambda in Idris. And then the overlay prop
-
means that while, if we overlay one way
and we overlay another way, according to
-
associativity, we get the same result. Do
you recognize that structure? Right. That
-
it's the same thing. So that we're writing
structurally the same thing that we wrote
-
in mathematical notation. Now as a piece
of code. And now the great thing is once
-
we've written it as a piece of code, we
can manipulate it in a program.
-
So, one way, there is different ways
of manipulating it. But one of the most
-
useful ones is, again by another great
researcher in functional programming,
-
John Hughes, came up with something called
QuickCheck. So if there's another thing
-
you take away from this talk is: google
QuickCheck. And whatever language you use,
-
it doesn't have to be Idris. In fact, I
had to hack together a QuickCheck for this
-
talk, but basically any other language is
going to have a QuickCheck, whether that
-
language be a functional language or
whether it's Java or Python or R or
-
something like that. You can always get a
QuickCheck for that. And I'm going to try
-
and demonstrate this QuickCheck thing not
by thinking about the design so much, but
-
by demonstrating a property of something
that's very error prone. So, here's this
-
idea, we want to have a representation for
sets of natural numbers. And we're going
-
to represent those sets of natural numbers
by a list of intervals. So, by a list of
-
ranges, if you will, between two numbers.
Now, I'll try to explain that. So, up
-
there at the top, it has a type
definition. It says, ISet, interval set,
-
is a type. And that type is defined to be
just a synonym for a list of pairs of
-
natural numbers. That's what those round
parentheses with a comma in the middle
-
mean. OK. And just to see what that means
is, there's a function there. I haven't,
-
I've lighted the definition, but what's
important about it is its type signature.
-
It takes an interval set and it produces a
list of all of the members of that set.
-
And you can see sort of a demo thing here
that I typed in before the talk. So, if I
-
apply iToList so that, the brackets there
they just mean the list, and we feed in a
-
list of intervals and those intervals are
from zero to three, from five to seven,
-
and from nine to ten, respectively.
They're all inclusive. And you can see
-
down there is a list of all of the
members. So, the first interval is from 0
-
to 3. So, it has the numbers 0, 1, 2 and
3. The next one goes from five to seven.
-
So it has the three numbers 5, 6, 7. And
the last one goes from nine to ten. So it
-
has the two numbers, 9 and 10 there. Does
that make any sense again? Slightly more
-
complicated example. So let's see. So, of
course, well, not of course, but the way
-
we want to do it, the way I want to do it
is, I want to have the interval set
-
structured in a certain way. I don't just
want any list of any pair of numbers to
-
denote an interval set. And therefore,
here is a function that describes what it
-
needs to be a valid interval set. Right.
So, for example, we don't really, in order
-
to have efficient processing, we don't
really want two intervals in one interval
-
set to overlap. Right. We want them to be
disjoint and we also want them to be
-
ordered so we can have efficient
operations for certain things. Right. And
-
so, let's go through this. So, there is an
isValid function that just tells you
-
whether that interval set is valid or not.
It says, well, if that set, and there's
-
three different cases here, which is why
there's three different equations, in the
-
first equation says the empty interval
set, the empty brackets mean the empty
-
list, and if the intervals, the list
representing the interval set is empty,
-
then we're going to say True. Empty set -
perfectly fine. The next case says, our
-
interval set consists only of a single
interval and that single interval goes
-
from low to high. Well, we kind of
interpret that there, but, and, well, that
-
interval set is valid, if low comes in
front of high. Right, they shouldn't be
-
the other way around. So, does that make
sense? Somebody in, can you nod at the
-
back, a little bit? You're still there?
OK. Thank you. Great. So, then it becomes
-
a little bit more complicated and it says,
well, this is the third case, when there's
-
at least two intervals in the interval
set. And those two intervals are, the
-
first one goes from lo1 to hi1. The second
one goes from lo2 to hi2. So, those ::,
-
they separate the first element of a list
from the rest. And then there's the rest
-
of the list. And it says, well, again, we
want the interval to be ordered so that
-
the lower numbers are on the left. That's
where it says lo1 is less or equal hi1.
-
And then it says, well, that there should
be a gap between two consecutive
-
intervals. Otherwise, they should be one
interval, which is why the high from one
-
interval should be separated from the low
of the next interval by at least one. And
-
then we're going to say, well, also we
want all the rest of the list, including
-
lo2 and hi2 to be valid too. So far so
good? OK, so this is probably, well, the
-
second most complicated piece of code. So,
anyway, so, here's, so, we might imagine a
-
union function. And the union function,
guess what, it forms a monoid, with
-
respect to interval sets. So, it takes,
two internal sets go in and another one
-
comes out. And if you've written that kind
of thing before, you might notice it's
-
probably a little tricky with that fancy
validity condition that's there. So how
-
can we get this right? Well, what we do is
we write down properties. Of course, we
-
could write down associativity. I'll leave
that as an exercise. Another one is just
-
very simple. Just a very simple property
that says for all pairs of two arbitrary
-
interval sets, we want the union of those
two interval sets to be valid, a valid
-
data structure. We want the union function
to preserve validity. OK? Makes sense? So
-
here's another property that says, well, I
already gave you this function or I told
-
you that there is this function iToList,
which just gives us a list of elements of
-
an interval set. And what we can do is, we
can use sort of that representation,
-
that's also a representation for sets. We
can use that representation sort of as a
-
model and say, well, if we take the
unions, you see there for all pairs,
-
again, of arbitrary interval sets, we take
the union. It says iUnion, iset1 and
-
iset2. And we convert that to a list. And,
we could also do, we could instead convert
-
each individual set to a list and then
just merge those two lists. And that
-
should yield the same result. So, in a
way, we're just giving a very simple model
-
for our interval sets, right, and that
would, so those two criteria would be kind
-
of nice to have in order to get our
implementation correct. And I already got
-
started before the talk on this. Looks
like this. No. Doesn't look like this.
-
We'll get to that later. But like this.
So, here's what I came up with. So, you
-
see there is that, while there's all this
other code there, ignore that. But there
-
is iUnion says ISet -> ISet -> ISet, do
you see that? And then, there's two
-
equations that say, well, the first set is
empty, then I'm just going to give you the
-
second one. And if the second one is
empty, I'm just going to give you the
-
first one. Right? Classic things when you
have union or concatenation operations or
-
something like that. And now you can see
the third case. It gets tricky, right?
-
Again, you don't need, I mean, main thing
is you need to understand it's tricky.
-
Well, the third one is such that, well,
says so that both have at least one
-
element and that element is in the
interval lo1 and hi1 in the first case and
-
lo2 and hi2 in the second case. And then
there's the rest. And I already put in a
-
little bit of code, and I said, well if
lo1 comes after hi1 (means hi2), then we
-
want to start with lo2 to hi2 and then
continue with the union. In the other
-
case, if lo2 comes after hi2 (means hi1),
then we're gonna start with lo1 and hi1.
-
And in the other case, it means, that no
interval comes before the other, and
-
therefore we need to merge the two
intervals at the beginning. Does that make
-
remote sense? Right. Don't worry. We'll
get back on solid track. So, we just take
-
the minimum of those two intervals and
maximum of those two intervals and we do
-
this. Now, the great thing is, I told you
about this tool by John Hughes called
-
QuickCheck. And the great thing is, we can
load this into Idris. And then here comes
-
a REPL, and we can say, I hope I'm doing
this right. So, we want QuickCheck, and we
-
want, what was it called? It was called
prop_unionCorrect. I hope I'm doing this
-
right. And, well, very small font. But you
can see here it says "100 tests". And that
-
is what QuickCheck does, as, it takes your
code version off the property and
-
automatically generates a lot of tests for
them. And that is super effective at
-
weeding out bugs. So it says, well, the
thing that you wrote is correct. It always
-
produces interval sets that when you take
the list, it gives you the right result.
-
But there was that other criterion called
unionValid. And there it says, and this is
-
really the better part, of course, of
QuickCheck is, when it fails, it says it's
-
falsifiable. It says there is a counter
example. And so, here it says, I did nine
-
tests, I generated nine random tests, and
I found one where the result is not valid.
-
And the great thing is that we can then go
and cut and paste this example. So I could
-
say iUnion, this, remove the comma in the
middle, and call this. And well, what
-
happens here is, what we can see is, we
can see 2 and 4, 1 and 1, and 3 and 5, and
-
what's not valid about. So, by the way,
this is randomized. So, this always goes
-
differently. So I have to look at it, too.
So, then it says, well, those two
-
intervals, they should really just be
merged and they should just be one
-
interval. Right? And so, it didn't do that
correctly. And the reason for that, maybe
-
you saw it. So, and, what happened is that
it ran into one of those two cases here
-
where it says if lo1 greater than hi2 or
lo2 greater than hi1. Remember that I told
-
you there needs to be a gap of at least
one between them. Remember? And here's an
-
off-by-one error that says, well. So this
says, they can, lo1 greater than hi2 says
-
they can still be right next to each
other. Right? And this is what happened
-
here. We need to make sure that there is
that gap in here. So, I can fix it like
-
this. Loaded again. Oh, no. There's still
a counterexample. So, and we can try that
-
out, so, and that's great. We get test
cases that sort of show where the bugs
-
are. And in this case, well, what happened
here? They still overlap. And what
-
happened here? So can you see it? So, you
can see that the first two intervals, they
-
must run into that last case. Right.
Because they overlap. Zero is the interval
-
from 0 to 3 and the interval from 0 to 5.
They overlap. So we need to get to that
-
case. And so it merges them. And then it
went and and somehow didn't merge it with
-
the 6 and the 7 that's there. And, so,
well, if you look at it. So it must have
-
done this. And and what it did is, it then
went on with the rest there. Let's have
-
one more look. What actually happened? So
there it is. So, it merged those and then
-
you can see that it went into a symmetry
problem here. Well, maybe you don't see.
-
But, you know, this is tricky stuff. I
couldn't do this by myself. So you can see
-
here that it just tacks the result onto
iset1Rest, whereas the maximum of hi1 and
-
hi2 could, might violate the consistency
criteria if it's the wrong one, and then
-
it runs into one of the other cases. Now
I've never seen this tricky one. Does it
-
make sense? But, can you see that it
should be symmetrical? The last one. Can
-
you see it? OK, so we'll try and make it
symmetrical. Do it like this. So we'll
-
say, well, if so, this only works. So if
hi1 is less than hi2. So we really need to
-
make sure, then it is perfectly. And then
the maximum of those two numbers is hi1.
-
Does that make sense? And so the max of
those two numbers is hi1 and then it's
-
perfectly valid to tack it onto iset1Rest.
In the other case, hi2 is greater and we
-
need to go and do something different and
rip this out here. Stick it in front here
-
and then. And then. And now it's
symmetrical. OK. So, load this. And, ahh!
-
It has passed the test. OK, live great.
applause
-
Thank you. I did practice getting it
correct, right. But you can, you know,
-
this kind of stuff. It always gets me. I
mean, you know, with old age especially,
-
this kind of stuff, it always drives the
sweat on my forehead, right? You know,
-
there's off-by-one. There is, you know, I
don't know how many cases there need to
-
be. And QuickCheck is the kind of thing
that weeds out the bugs. And even though
-
it weeds out the bugs in a different order
each time, it always weeds them all out.
-
OK. So it's a great tool. Now, I recommend
that you try that. It generates tests from
-
properties. OK, where are we? So let me
let me give you a couple of real world
-
examples. So if you're using X windows,
there's a there's a tiling, a window
-
manager, xmonad. It's already a couple of
years old and they don't do much
-
development on it anymore. That's because
it's correct. Right. laughter Right. And
-
why is it correct? Well, it's because they
wrote down a lot of properties for the
-
geometry and the tiling algorithms and
verified them using QuickCheck. And so I
-
sort of loosely translated. So, Don
Stewart, one of the authors of xmonad
-
graciously wrote a couple of blog posts on
a simplified version of xmonad and I
-
translated them into Idris. So, here's a
very simple idea of just a stacking window
-
manager. So, it doesn't do geometry, it
just has stacks of windows and it has
-
several workspaces. In each workspace is a
stack of windows. So here's a data type
-
called a StackSet, its parameterized by a
type called window. We'll see later why
-
there's a type parameter and why it just
doesn't say what the windows are. And then
-
it says there's a constructor StackSet
and there's two fields in there. One is
-
called "current", that's the number of the
workspace that's currently
-
active. And then there's "stacks",
which is a map from the number of the
-
workspace to the stack of, to the list of
windows that sit in that workspace. Again,
-
so here, really the technicalities are
not particularly important, but there is a
-
bunch of operations that operate on this
window manager configuration. And again,
-
here, really the details aren't important.
So you could create an empty stack set.
-
You could say, well, you know, I have the
number of a window that I want to get to
-
the front. And please make me, please
rotate me, the stack set around so that I
-
can see it. "peek" means, you know, maybe
I can get the topmost window that the user
-
is currently looking at. "rotate" means
I'm just going to rotate the workspaces
-
around in either left or right direction.
That's what that ordering argument. "push"
-
is, I push a new window onto the current
workspace. "insert" means insert a window
-
into one of the other workspaces. "delete"
means I delete a window. "shift" means,
-
also means I shift something with the
windows. Not really important what they
-
do. But you can imagine again, just as we
did with the interval sets is validity
-
criterion or an invariant that should hold
for these operations. And it's very
-
simple. Well, it says well, if you have a
stack set with some windows in it, I'm
-
just going to tell you whether that stack
set is consistent. And by doing that, I'm
-
just going to say, well, the current, the
number of the current stack
-
should not be higher than the number of
window stacks that there are. Right. So,
-
the number of stacks that there are. And
the other one, that just says a window
-
should not be in several of the
workspaces. Right? And then I can go and
-
maybe with this definition, all
those function definitions aren't very
-
complicated. But, I can go and write a
whole bunch of properties. And if you just
-
understand, well, maybe the second one,
"prop_view_I", you understand all of them.
-
It just says, well, for all pairs of a
natural number and a stack set that are a
-
"stackIndex" and "stackSet", I want, if I
call the "view" function, which is one of
-
the operations, I want the view function
to produce a consistent stack set. And
-
then it goes on to do all of that for all
the other ones. At the bottom here, you
-
can see some prerequisites that need to
hold for the property so that invariant
-
only needs to hold if the window, if the
number of the window is actually smaller
-
than the size of the stack set. Otherwise,
I think the function just returns what
-
would go with it, what went in there, So
that's a very, that's just a very
-
efficient way to invent properties, to
think of some invariant that shall hold in
-
your data structure. And if you know
Idris, you can sometimes encode that in
-
the types, but often that's kind of
tedious. And you can just write it down as
-
a property and then have QuickCheck check
it for you. And it's not particularly
-
exciting for the simple definition, but
you can imagine that the actual definition
-
when you have tiling window management
going on is much more complicated than the
-
one that you just saw. But you can keep
those same properties, right? There still
-
needs to be some consistency invariant
that, if you have tilings, the windows
-
don't overlap, and things like that. That
should be obvious. Write those properties
-
down, check them using QuickCheck and that
will weed out a lot of the bugs.
-
Here's an example from our
practice. We, couple months ago, we were
-
tasked with migrating a giant Visual Basic
6 application. It had a password checking
-
function there. You can see here a Visual
Basic 6 type signature. And the property
-
that we wrote was, well, if we create the
hash from the password and we compare it
-
with the hash that's in the database, then
they should all come out the same. And to
-
our surprise, that function, that test,
that property, failed when we ran it for
-
QuickCheck and we had to correct it
because that password hash is restricted
-
to 11 characters by some restriction in
the database schema. And so that means
-
that you can use QuickCheck not just to
sort of check the correctness of things
-
that you already know, but to actually
develop a model for what goes on in your
-
software, which you don't always know very
well. So that's what we did there. Another
-
example is, we wrote, for a large
industrial client, we needed to write a
-
synchronization application. So when you
had two mobile devices and they would sort
-
of meet as strangers, they would exchange
data and they all needed to look at the
-
same sort of device configuration data.
And we didn't want them to exchange all
-
the data every single time. We just wanted
to exchange them, the data blocks that the
-
other side was missing. And again, there's
great algorithms for this based on Merkle
-
trees. They're pretty complicated. You
have to do a lot of bit fiddling with
-
that. But fortunately, the property for
that is pretty easy to write. So here's
-
the property that says, well, so the
synchronization algorithm works on sets of
-
blocks, whatever a block is. So you can
see the property here for all pairs of
-
sets of blocks and more sets of blocks. So
they're called bs1 and bs2. Block set one
-
and block set two. What we can do is, we
want, if we union those two, then we get
-
all the blocks in the system. We call that
all, or we can call the synchronization
-
algorithm and that will give us two new
block sets, block set bs1' and bs2'. And
-
those block sets are the ones that get
transferred to the other side. OK. And the
-
criterion here just says if we take the
ones that we have, if we union them with
-
the ones that we get, we should get all of
them. That should be all of them. And that
-
should be the same for both sides. And
also, we want the algorithm to be
-
efficient so we don't want it to transfer
blocks. So we want to make sure that the
-
blocks that we have and the blocks that we
get are disjoint. That they don't have any
-
elements in common. Otherwise, we could
make that algorithm trivially correct by
-
just transferring all the blocks every
single time. And I can tell you, I
-
sweated. You know, I sweated one or two
weeks over this algorithm and it was
-
really hard to write. But this one test
weeded out all the bugs that I found along
-
the way. So that is just super, super
effective. John Hughes has a couple of
-
papers on hard bugs that he found. So he
found a bug in a distributed database
-
called mnesia. And that bug was dependent
on opening the database, closing it and
-
opening it again. So this is not the kind
of bug that you find by just writing a
-
bunch of smart unit tests. Right? So, if
you did anything shorter in the beginning,
-
so if you just open the file and then did
some lookups there, that would not
-
manifest the bug. You really needed to
close and then open again. Have you
-
turned, have you tried turning it off and
on again? But then the database breaks in
-
this case. And here's another example
called The Mysteries of Dropbox. So you
-
can imagine that with Dropbox you really
want certain properties to hold. Right?
-
And it turns out they didn't. They never
worried about writing properties down. But
-
John Hughes did it and found a couple of
bugs. So here's one. It's kind of hard to
-
read where it says client 1 writes a into
a file that was previously empty. So that
-
funky turnstile there is empty. So writes
a into a file and then deletes the file
-
and another client writes, replaces, sees
the a in the file replaces it with a b.
-
And then client 1 goes and writes c into
the file that it previously thought to be
-
empty. And then unfortunately, even though
you can imagine that you should see either
-
b or c in that file, but Dropbox deleted
it. So I think they fixed that bug now.
-
But. so you go. So it goes. Oscar Wikstrom
has a couple of great, pretty recent blog
-
posts on properties in a screencasts
editor that I highly recommend. So this is
-
a great tool for finding bugs, but it's
not the same as having a proof. Right? So,
-
you can still imagine that you can find
very subtle bugs that are not covered by
-
QuickCheck. QuickCheck just randomizes,
just generates randomized tests. So, that
-
is not the same thing as making sure that
there aren't any bugs. So the great thing
-
about Idris and the reason I chose it for
this talk is that Idris allows you to not
-
just encode properties in the language. It
also allows you to encode proofs in the
-
language. So here is the associative
property for the list concatenation
-
operation. And if you look at the top that
has the definition of that function from
-
the Idris standard library, it says ++, in
goes a list, in goes another list, out
-
comes a list. Then it says, well, if you
concatenate the empty list with any list,
-
that is just that list "right". Do you
see that? The second one says, well, if we
-
concatenate a list that starts with the
element x and goes on with xs, then we
-
just sort of pull the x in front and
concatenate the rest with "right". So
-
that's a classic recursive definition of
list concatenation in functional
-
programming. And now here's something
really strange in Idris. Here's the type
-
declaration for a definition, again in the
standard library, called appendAssoc. And
-
it says, if you have a list a, you have a
list b, and you have a list c, and in the
-
type it says, oh, the associative property
should hold. Right. And so this is a
-
statement of that property. That's
wonderful. It's not the same as a proof.
-
So, but writing proofs, who loved that in
math? Oh. Oh, you're good! I didn't. I'm
-
sorry. So. So the great thing about Idris
is, it helps you write down the proofs.
-
I'll show you how that works just really,
really briefly. So here's that. So here's
-
just what I showed you on that slide. So I
can load that in there and it says, well,
-
you're not done. You didn't write a proof
for that property, but in Idris, you can
-
just push a bunch of buttons. Now, I love
that. So I can push one button and it
-
says, oh, well, you should write a proof
of that form. You have lists a, b and c.
-
Well, now and I can push another button
that says, well, you're doing this on
-
lists and if you're writing anything on
lists, you always need to distinguish
-
between the two cases of the empty list
and the list that consists of the first
-
element x and further element xs. And then
it says, well, write down something, but
-
then I can tell Idris: Well, I'm too lazy.
I'm not going to write anything so I can
-
just push a button. And Idris wrote this
so you can see me, but I didn't type this
-
right. I just pushed a button and it says,
Refl. What is Refl? What could that be?
-
Well, you can ask it what Refl is. It says
Refl. Oh, you can see here, landing here.
-
Refl is just a proof sort of a built in
proof that says that if two things are
-
true, two things are equal, if they're
identical, if they're the same. Right. And
-
that kind of makes sense in the first
equation, because the first equation of
-
appendAssoc corresponds to the first
equation of ++. Can you see that, how it
-
corresponds? Can you see that? The first
list is empty. Can you see that? Can you
-
see how the first list is empty with the
first equation of appendAssoc and the
-
first list is empty up there with ++. Can
you see that? OK. And then it just says,
-
well, then obviously. Well, not quite
obviously, but then sort of the the way
-
that the definition works, it comes out
just right. So what's really important is
-
that Idris accepts that proof with the
first. The second one is slightly more
-
tricky. But again, we can get help because
we know that appendAssoc is this recursive
-
function. It recurses on the first
argument. So we're just going to do the
-
same thing in the proof. And you can tell
Idris that it should use that, that it
-
should use that fact, if you will. So
here's the recursive call. And again, I'm
-
too lazy to push a button. But if I push
that button, it also puts in Refl and
-
there's loads. So this it might be a
mystery to you how it works, but this is a
-
proof of the associate of property of the
list concatenation in Idris. And since
-
Idris helps you write it, it's kind of fun
to do that. Oddly enough, even for
-
somebody who doesn't, usually who doesn't
usually doesn't enjoy proofs. So the way
-
that you program in Idris, we haven't done
that a lot in this talk, is that you put a
-
lot of information in the types and the
more information you put in the types, the
-
better Idris will get at figuring out the
correct definition. And you don't have to
-
do it by yourself. OK. So that's really
nice. OK. So we got that and and sort of
-
these kinds of proof assisting systems
such as Idris have been used in a lot of
-
real world systems. One one prominent
example is SEL4, a version of the L4
-
micro kernel, has a long history, but
important properties of that kernel have
-
been verified. It runs in the security
enclave on iOS and even though it's
-
written in C, it provably does not have
buffer overflows or a lot of the nasty
-
things that are responsible for a lot of
security exploits. Compcert is another
-
example, which is a verified, I should
mention this has been verified with the
-
help of a tool called Isabel. Also, great
fun to use. There's a project called
-
called Compcert, which is a verified C
compiler, which is important for a lot of
-
certified software where, you know, the
source code might be certified. But how do
-
you know that the compiler generates
correct code? And you know, because it's
-
been proven to be correct. And even there,
you can shoot, you can cheat sometimes. So
-
for example, register allocators, very
complicated, very hard to prove right.
-
But what you can do is you can write a
checker that the register allocator did
-
its job, did its job well and you can
verify the checker. And so you can cheat a
-
bit. So there's tools for that. We've seen
Idris and there's a number of other tools
-
and they're getting more and more mature.
And they're great fun, really. They really
-
are great fun. But, you know, going back,
switching down a gear a little bit,
-
there's lots of useful properties that you
can look for in your programs. So
-
commutativity might be useful that you can
switch the two arguments for an operation.
-
Also, if you have relations, you might
remember that from some math class,
-
there's some properties here like
reflexivity, symmetry, antisymmetry and
-
transitivity. Reflexivity says that a is
always related to a. Symmetry says if it's
-
one way, if a and b are one way related,
they need to be related the other way too.
-
Antisymmetry intuitively would seem kind
of the opposite. That doesn't make sense.
-
It's just says: if two things are related
in both both ways around. So, for example,
-
you know, orders like less or equal are
antisymmetrical, then they must be the
-
same. And transitivity just says that you
can form chains of your relation. So those
-
are a little dictionary of useful
properties that you can look for. Let me
-
close with one fancy property that you've
probably seen somewhere and that property
-
is called Functor. And you might have seen
in your programing language, in your list
-
library or in your stream library. There's
a function called map, right? And you
-
know, even Java has that and has had it
for many years. And what map does is, if
-
you have some, you know, in Java, for
example, it says "Stream", or it might be
-
"List", right. It says, well, if I have a
list of As, I can apply a function to each
-
element of that list. But you can
generalize that, it doesn't have to be
-
lists. It could be an Optional of As, for
example. You could also apply a function
-
to the value that's in there. So, you can
generalize that notion, and then it's a
-
functor. And, of course, in Idris, you can
write down equations for functors. And,
-
please ignore the technicalities here,
(stammers) but, if you sort of pick out
-
where it says "functorIdentity",
the middle row says g v equals to v,
-
which means g is the identity
function. When you feed in v, you always
-
get v back. And when you use map with the
identity function, you apply the identity
-
function on each element of your list or
whatever it is. Then then you always get
-
back the same list. And here just says you
get function composition. So if you apply
-
one function and then another function and
you do that either inside or outside the
-
map, you should also get the same results.
So there's also just as there is
-
associativity with monoids, with functors.
There's these laws and you might think,
-
well, where would I look for a functor?
I've never seen a functor except for the
-
ones on streams. A couple weeks ago in a
training, somebody said, well, you always
-
start with that animal example. Shouldn't
you look for a functor there? And I was
-
kind of, you know, sweat broke out on my
forehead, I was like, where's that gonna
-
go? But, we came up with this. So, if you
go back, you can see that this is obvious.
-
So, what you need for functors is, you
need a type parameter. Right. And so you
-
just look for a place to stick a type
parameter, any place here at all. And
-
if you look at Dillo and Parrot, they
both prominently have this weight thing.
-
Right. And so that seems more important
than the other two properties, which are
-
specific to particular kind of animal. And
so the weight, the thing to do is just to,
-
well, you can see I replaced upper case
Weight by lower case weight and made that
-
into a type parameter, and, I can then
provide a functor implementation down
-
there. And you might think, what is that
good for? Well, I don't know. Well, one
-
thing that you could do is you could
provide a different representation for
-
weights. Another thing that you could do,
if you look at the type for runOverAnimal,
-
it says animal weight -> animal weight and
weight is a type variable. What that type
-
signature tells you is that runOverAnimal
does not know what weight is. And that
-
means that the weight cannot change as a
result of that function. And you see that
-
in the type signature you get immediate,
small benefit, but you get a benefit even
-
with silly examples such as this one. And
that really brings me to the end. So in
-
your software, in your domain model, look
for a Combinator, look for a function that
-
will combine two things into a bigger
thing. See if you can make that thing
-
associative and look for a neutral
element. And very often you will find one;
-
make it a monoid, you know, say monoid a
couple of times. You'll remember it.
-
You'll remember it. Generally, write
properties for the things, for the
-
operations in your software, test those
properties using QuickCheck. You know, if
-
you feel like you have a lot of time,
prove them correct. Find the functor. If
-
you found, if you found the monoid, you
know, find the functor next. You know, and
-
it takes, it might take time. I'm
very old. As you noticed at the beginning.
-
So. So it gets easier over the years and
it will just seem like a regular staple of
-
your of your arsenal when you program. And
of course, when the important properties
-
in your program have either been written
down, if they've been tested with
-
QuickCheck or even proven, then you can
sleep much more soundly than maybe you
-
currently can. Thank you very much.
applause
-
Herald: Thank you, Mike. So I see we have
three minutes for questions. Maybe that's
-
two or three questions. If you have any
come to the microphones, please. Do we
-
have a question from the Internet? No, not
yet. So, microphone two. Right.
-
Question: Hi. So, QuickCheck generated
hundred tests. Yes. What can we say about
-
the quality of this test? Could we say
your program was correct using thoese
-
tests? Are these tests good?
Answer: Yeah. Very good question. So what
-
would you say about the quality of the
tests? And indeed, if you really do serve
-
industrial strength applications, a quick
QuickCheck comes with a bunch of tools
-
that let you look, for example, at the
distribution of the individual example
-
rated and while you didn't quite see me do
that, but I mean, for your domain objects,
-
you will typically write generators that
will generate those examples and you can
-
reason about the distribution of those.
And you absolutely should do that because
-
otherwise you might miss large areas of
your test space. So, but there
-
are tools and they help you do that. But
even if you don't do that, you know, it's,
-
you find a lot of, I found a lot of
bugs in my software even without worrying
-
about that. But if you go beyond that,
look at the distribution thing.
-
Herald: Thank you, next one, please.
Number two.
-
Q: Let's say I've hacked a program, for
example, in Java or C# or whatever. How do
-
I, how do I apply what I learned so
far? So, where do I start when I have
-
already completed the C# program with,
yeah, how do I apply QuickCheck on that?
-
A: So, just pragmatically because it's
written in C#? That's the question?
-
Q: Yes
A: So, well, I have to be very concrete
-
here, I mean, so, if you can think
properties, right, one way to do, so,
-
for example, so, C# you can link with F#
and there is a QuickCheck version for F#
-
called "FsCheck". And FsCheck, actually,
even though it's itself written in F#, you
-
can also use it from C#. So, you have two
options. You can write your tests in a
-
slightly more awkward fashion in C#, or
you could just link your code with F# test
-
suite and write it down there. And there
is a fairly reasonable Java QuickCheck, I
-
hear. Another idea would be to use the
slightly more fancier, the slightly
-
fancier QuickChecks that exist for Scala
and Enclosure. I'm sure there's one for
-
Kotlin as well, and link that against your
Java code. Does that answer your question?
-
Q: So whatever language I use, I have to
find out what the correct implementation
-
of QuickCkeck?
A: Yeah. Yeah. But as I said, I mean
-
usually, a fun thing I do in training is,
I chat "QuickCheck" and somebody calls on
-
the language, you know. Quick, QuickCheck
PHP or something like that. And there is
-
one, sure enough, I didn't know about before.
Q: Thank you.
-
Herald: All right. Thank you. And thank
you, Mike, again, for showing us a way to
-
sleeping soundly.
A: Thank you.
-
applause
-
postroll music
-
Subtitles created by c3subtitles.de
in the year 2020. Join, and help us!