36C3 preroll music Herald: Our next talk is held by Mike Sperber, and he is already very ready for that. He's head of a software company in Tübingen in Germany and he's going to talk about "Getting software right with properties, generator tests and proofs". And the main thing here is functional programming. One tiny thing you might not know about him is that 1986 he won a federal competition on IT and so give him a warm applause for that and also for his talk. applause Mike Sperber: Thank you very much. So is anybody actively using the induction loop feature in the first couple of rows? Cuz I know somebody who would like to know. Not right now. Okay. Anyway, so let me get one shameless plug of advertising out of the way. If you find the contents of this talk interesting, we're running a developer conference in Berlin in February called Bob, which is very friendly and very nice, very tiny compared to this one. And we'd love to see you there. Another thing, this is an introductory talk. So if you were expecting the latest developments on proof tactic, in fact, if you know what proof tactic is, then all you might get from this talk is sort of mild amusement. And I won't be mad at you at all if you go for one of the more exciting talks. Ok? So. Or leave at any time. That's perfectly fine, if this material is not for you. Speaking of introductory talks, here's a piece of code written in the language that I will use for this talk and it's called Idris. Who has written an Idris program before? Very good. Ok. Oh, there's one person back there. That means if any part of this program, as soon as I'm done explaining, is not clear to you, it's also not clear to two or three hundred other people in this room. And I would love to have your help. Interrupt me, ask a question anytime in the talk if there's anything here not clear. It's going to get, even though it's meant to be introductory, will get quite technical at times. So let me try explaining this one. So this is a classic example in functional programming that I use often in my talks, about animals on the Texas Highway. And if you can see there, the central definition says data Animal. That's the data definition of animals. And in this particular version of the Texas Highway, there's two different kinds of animals. There is Armadillo, it's where it says Dillo there. And there's Parrots, for some reason, on the Texas Highway. Does that make sense? Two different kinds of animals. And you see that definition. Yeah, nod, that greatly helps me. And if you see those two definitions for Dillo and Parrot, you can see, while the arrows are kind of funny, but you can see that Dillo and Parrots have two properties each, and it says their Liveness. That's one of the properties of an armadillo. And up there at the very top, you see the definition of Liveness, it says Liveness means dead or alive. It's an armadillo, can be dead or alive on the Texas Highway. And there's also the Weight. And well, you see, this colon thing is the type signature for the constructor for armadillos. So it says there's a liveness going on, there's a weight going on, and then it constructs an animal. And for a Parrot, there's a string. Every parrot speaks, right? And so it's the sentence the Parrot says, and also the Weight. And it also produces an animal. And, up there, you can see the definition of Weight is for simplicity's sake, I'm saying that Weight is a type. So that's kind of unusual for Idris, but you don't have to worry about it. But you can see there, Weight is just the same thing as an integer. And if you look down there, where it says a1, a2 and a3, that has three examples for animals. So it says a1: Animal, just to say that a1 is an animal. So, Idris is a language that always has type declarations. And it says a1 is Dillo Alive 10. And that means it's an armadillo, it's still alive, and it weighs, let's say, ten kilograms. The second one is dead, a little bit heavier, weighs twelve kilograms. And the third animal is a parrot that knows, well, it's a pirate's parrot, obviously, and maybe weighs three kilograms. Ok, so far? Ok. So if you have any question about any of that, then please ask away. So, what happens to animals on the Texas Highway is, you know, people drive cars, they run them over. So there's a function down here, and, well, we're doing functional programing, shouldn't worry you at all. But what's important here is that it says there is an animal going in, an animal going out. And really what this means is that this animal object up there is not really the animal. It is the state of the animal at a given time. So, runOverAnimal. you can see the type signature that says an animal goes in, an animal goes out. And what it really means is, the state of the animal goes in before it gets run over and the state of the animal after it gets run over comes out. And then while we know there's two different kinds of animals. And that means that for the definition of runOverAnimal, we need what's called equations. There's two different equations. And the first equation says what happens to armadillos when they get run over. So an armadillo has a liveness and a weight. Here's something going on called pattern matching. And the second equation says when there's a parrot going on it has a sentence and a weight, and on the right hand side, you can see, well, when an armadillo gets run over. Well, all that means is, the liveness sort of turns to Dead. We're constructing a new armadillo object and it's dead and it has the same weight as before. And the function, the equation at the bottom says, well, when we run over a parrot, it turns really, really quiet. Ok? So, classic example. Ok so far. We're going to return to that example at the very end. Right now it's just to illustrate the language that we're doing things in and we're going to do a lot of things without complicated programs. So, well. So, I'm going to jump around a little bit. So, one thing. So, just the other day, two weeks ago, I was teaching a course on architecture and somebody said: Well, there's this problem. I'm building a domain model. I'm putting the domain model in a database. And, you know, customer comes in, has new requirement or somebody comes in, has new requirements. And that always ends the same way. I put a new call in the database and, you know, seven, eight, nine, it just goes on and on. As the software gets older and older and older, more columns that create the old ones seem a little stale. And so, yes, well, how can we build models that are flexible? And so, here's something completely different, you might think. So, here's sort of the key to that, to building flexible models. Does anybody recognize this? Does anybody associate a word with this? laughter Very good. So, you might remember, depending on what state you went to school in, you might remember that this is a property called associativity. Right? And it means that we can associate either the A and the B first with the parentheses or the B and the C. So, and this is, if you take away one thing from this talk, it's associativity. Knowing what that is is one of the most useful things in software development. So, of course it's just a generic equation, we really need to be more specific, namely that we're dealing with numbers and addition. And you might know that it's not just addition that's associative. Also, multiplication, for example, is associative. So here's a little mathy stuff there at the beginning. So, you see that upside down A. That says "for all". We just say for all. What that means is "for all A, B and C". And then, this funny epsilon-shape letter kind of thing, it means "element of". And then that funky N means the natural numbers. So all the numbers from zero, one, two, three, the whole numbers from zero on up. So, what that means is, for all natural numbers A, B and C, the associative property holds when you add them up. But while it says numbers in addition, it doesn't just hold for numbers and, addition, in fact, associativity is everywhere around us. Specifically, it's everywhere around us when we program. So here's another example. When you're dealing with lists and that funky, the two pluses that you see there, they are just list concatenation. So you concatenate two lists and well, of course you can concatenate three lists by just using that double plus in any order. And that's also associative. So, it doesn't matter if you first concatenate the B and the C and then tack the A onto the front, or if you concatenate the A and B and tack the C on at the end. Doesn't matter, you always get the same result. So lists and concatenation also have this associative property. And here's something that I always find very, very enlightening is that you can construct images that way. Well, you don't see it here. So here's an image. Well, it's from a cool researcher of mine and functional programing, Brent Yorgey, and he has a great library out called diagrams, for constructing diagrams out of parts. And so this really is what associativity is about. It's about operators that construct things out of parts. And so, as you can see here, well there's different shapes here, there's sort of the black rectangles, there's a different rectangle set, that denote the towers of Hanoi. We're not really going to deal with the towers of Hanoi here, really. The important thing that the image consists of several parts. And well, in normal or sort of in classic object- oriented programming, when you do graphics, you have a canvas and you might draw pixels on that canvas. You know, might be square shaped or a circle shaped canvas pixels. But what we're doing here is, we are treating an image as a data type and the definition is not important. What is important is that there are a couple of functions that construct sort of simple images. So here's a function that you might imagine called star and it constructs stars. And well, you can see up there there's a type declaration and it says, well, the star function, it accepts an integer, it accepts a Mode, whatever that is, accepts a Color and it produces an Image. And we can call that star function with the arguments 200 and Solid and Gold. So Mode is Solid or Outline. And then we have a Color and we get Image and that image is an object. Not particularly exciting. But while we might have another function called Polygon, Polygon takes two integers that denote the size of the polygon and the number of vertices, and also whether it's an outline or whether it's solid and a color. And for example, if we call it with 180, again, that's the size and 5 we get a five corner polygon and we get that as an outline and it's in red. Now, the idea here is that we can combine. Just as we can combine two numbers or we can combine two lists, we can combine two images. Maybe the most intuitive way of combining two images is just sticking them beside each other. So there's a function there called beside. And it takes an image and it takes another image and produces an image. Right. And this is exactly what we're thinking about when we talk about associativity. We're talking about a sort of a binary operator that produces the same thing that went in. And so, for example, we could stick those two images next to each other. We could also imagine an operator called above that just puts one image above the other image. And we can combine these two things. Here it really is important that the same thing comes out so that it's image goes in, another image goes in, an image goes out. So we could again call above on the result of beside and make arrangements. So here's a tiling arrangement for your bathroom or something like that. Now, beside and above are are two possible operators and you might already think about associativity, but really the more fundamental one is overlay. You put two images on top of each other. And so again, overlay has the right type. An image goes in, another image goes in, and an image comes out. And if we take the gold star and the pentagon and put them on top of each other, then it looks like this. And we can then formulate an associativity property. It might not quite look the same because I wrote overlay in front rather than between the operators. We could also write it between. But just to show you that it's the same idea. So, it doesn't really matter if we first take two images, A and B and superimpose those two and then put those two on top of C or if we do it in another order. Does that make sense so far? Ok. No? Do you have a question? Anonymous: Mumbling Mike: Yeah, so ahh, good point, good point. So this implies that there must be some kind of, that there's probably some notion of transparency involved. Yes. Yes, there is. But then you have associativity. And really what it means. Very good question. So, if you think of this image in terms of the color at certain coordinates, Right, Well, you need to think about how to combine those two colors that are in the constituent images. And you can imagine that there also has to be a combination operation for the colour. And that also needs to be associative as a prerequisite for the for the overlay operation to be associative. Does that make sense now? Thank you. Good question. Great question. So anyway, so since this associativity property is something that is not just restricted to numbers, as we may have learned in school, it really makes sense to get. And that means that when we talk about associativity, we always have to name two things. We have to say what set we're operating on and what the operation is. And the combination those two things has a name in mathematics and it's not the best name, but it's called a semigroup. Right. And, but, you know, if you drop it in certain circles, they'll think that you're an expert on mathematics, you might try that. So, just to go over that: So, you have some subset S, and that S might be Image, it might be the natural numbers or something like that. And we have an operation that I'm just gonna call circle here, then take any a, b and c from that set S. We can use circle as an operator and we have that associativity property and for that circle you can put in overlay, you can put in beside, you can put in above, you can put in + you can put in times or you can put in the list concatenation operator, the ++. Okay? And that associativity is great. It's really my favorite property because it means when we have a whole lot of things that we combine, we can parenthesise in any way we want. We will get the same result no matter which way we parenthesise them. And that really means, we can leave out the parentheses when we write an expression that involves only the circle operator if it's associative, if we can just leave out all the parentheses because the parentheses don't matter. And that makes it well, that makes it instantly easier to read, I think. Also it has practical uses. So if you do big data processing associativity means that if you have large datasets that span several machines or several hard drives or several data sources, and you're combining them and you have an associative combination operation, it just means you can rearrange that combination operation according to the load in your compute cluster. And that makes it a very useful property when you're doing big data processing in sort of MapReduce based frameworks. But, I mean, that's a practical application, but I think it's much more useful, associativity is much more useful when you use it for designing your domain model. And I talked in the beginning how, well, you want to avoid always adding more database columns. And one way of doing that is to view your domain model, not as something that has more and more properties, but your domain model as building blocks that you combine into a larger building blocks the same way that we combine images from simpler images. So here's one of the great papers from functional programing, one of my two or three favorites from Brent Yorgey. And it's called "Something Something: Theme and Variations". And you can see that it is about images. And these images get superimposed with an operation that is just like overlay and that is, that title is eminently googleable. Now, it has a funny word there. It says, it doesn't say semigroup, could say"Semigroup: Theme and Variation", it says "Monoid: Theme and Variation". And a monoid, well, it's also not something, even though it sounds kind of fancy, it's actually not much more complicated than a semigroup. It's a semigroup. And also the semigroup has a special element called the neutral element. And whenever we combine something with a neutral element, it doesn't matter if we do it in front or at the back, we get the same thing back. So, of course, the neutral element with respect to numbers, in addition, would be zero. The neutral element with respect to lists and and concatenation would be the empty list. I always hear several voices. That's wonderful. Thank you. And the same thing for the overlay and beside and above, you can imagine that you have just an empty image that has only, that consists only of transparency, that can work as the neutral element. So all of these things that I showed you that are associative, they're not just associative. They're not just semigroups, they're also monoids. And so, as I said, as long as you remember associativity, that's the important thing. But often you also find a monoid, and monoids in the wild they're just everywhere. We've seen them for numbers and lists and images, music forms a natural monoid. You can you can describe musical structure with monoid operations. You can treat animations, the time axis. You can define monoidal combination of animations. A famous example in functional programming is with financial contracts. If you were here last year for a talk of mine, we talked about semiconductor- fabrication routes, which sounds very concrete, but also they form a monoid. The properties themselves that we'll see for a monoid are all kinds of things. They're everywhere around you. And these are really the key towards making flexible domain models because in almost any domain model you can find a monoid just by looking for building blocks and for ways of combining those building blocks into a larger building blocks. So let me get back. So I said, well, you can use associativity or you can use this monoid thing to guide your design. And I haven't really made that concrete yet. And so I stole a couple of pictures from Brent's paper. So you remember the beside and the above operations. And those are fine for arranging things sort of in the vertical and the horizontal axis. The way that they work is, they make, they put a bounding box around every picture and then they arrange the bounding boxes either beside each other or above each other. So it's a slightly more involved thought. And that works great when you're, when, you know, your picture is, happens to be a square that's aligned with the axes. It doesn't work so well if your picture is rotated, right. Because the bounding box, the bounding box then is too big. And if you want to attach anything about, just about in any direction, then there's going to be a gap in your picture. And so beside and above are not particularly good operations as the basis for an image library. The overlay operation is much better. But that leaves open the question how you can arrange pictures, several pictures so that they are beside each other or that they just touch. And Brent came up with this idea of an envelope, technical idea. So the idea is that, well, if you give me, so the red dot there, that's the origin. If you give me a vector starting at the origin, I will tell you how far you have to go along that vector so that I can draw that blue perpendicular line that's just outside the shape. And that's called an envelope. And envelopes are wonderful. So if you ship each picture not just with sort of the visuals that you see, but also with a function that describes the envelope, then you can use that envelope to arrange pictures both in the horizontal and the vertical, but also in the diagonal by just drawing vectors so that they touch. So, that's a slightly more complicated idea. Does it make sense? And Brent goes through the motions of using that inspiration from the monoid that he is getting. He's saying "Everything must be a monoid! Absolutely.", and uses that as a guiding principle through the library. So I'm not going to go into technical detail on how that works, but it's a very pleasing paper to read on that. And it results in a beautiful library that's great fun to use. So that means, though, that you also have to find a monoidal combination operation for the envelopes. You can't just, we've already seen how we can combine the pictures themselves, but we also need to combine the envelopes. And fortunately, that's pretty easy. If somebody sets a vector in a certain direction, then that envelope is just a maximum. Those two pictures, right, if you combine that ellipse and that square, you can see that I'm just going to have to go to the maximum of those two numbers in order to just be outside the composite shape that that comes up superimposing those two things. So that's great. Now, I sort of introduced these properties as a mathematical thing, right. I said, well, there's this fancy, fancy upside down operator says for all images and we might say for all images. Now, we can also formulate these properties as code. And that's really where additional magic is. So, for example, the associativity property, well, there's not much of a difference except that the image1 and image2, they are now in typewriter font. So we could put those in the program. But there's still that mathematical stuff on top. But in a functional language, in a lot of other languages too by now, we could also put the top line and translate that into code. And it might look like this. So that's what it looks like in Idris. So, it's not quite the same, but maybe we recognize the structure. So, we say, well, there's a property called and the property is just called overlayAssociative. So we give it a name. So, Idris is an ASCII language, still so, primarily. So, we say just forAll there instead of the upside down all. And then it says arbTriple arbImage arbImage arbImage. And that means for all arbitrary triples of arbitrary images and other arbitrary image and another arbitrary image, so, triples, three things. And we're going to call those three images image1, image2 and image3. That funky backslash there, that's a Lambda in Idris. And then the overlay prop means that while, if we overlay one way and we overlay another way, according to associativity, we get the same result. Do you recognize that structure? Right. That it's the same thing. So that we're writing structurally the same thing that we wrote in mathematical notation. Now as a piece of code. And now the great thing is once we've written it as a piece of code, we can manipulate it in a program. So, one way, there is different ways of manipulating it. But one of the most useful ones is, again by another great researcher in functional programming, John Hughes, came up with something called QuickCheck. So if there's another thing you take away from this talk is: google QuickCheck. And whatever language you use, it doesn't have to be Idris. In fact, I had to hack together a QuickCheck for this talk, but basically any other language is going to have a QuickCheck, whether that language be a functional language or whether it's Java or Python or R or something like that. You can always get a QuickCheck for that. And I'm going to try and demonstrate this QuickCheck thing not by thinking about the design so much, but by demonstrating a property of something that's very error prone. So, here's this idea, we want to have a representation for sets of natural numbers. And we're going to represent those sets of natural numbers by a list of intervals. So, by a list of ranges, if you will, between two numbers. Now, I'll try to explain that. So, up there at the top, it has a type definition. It says, ISet, interval set, is a type. And that type is defined to be just a synonym for a list of pairs of natural numbers. That's what those round parentheses with a comma in the middle mean. OK. And just to see what that means is, there's a function there. I haven't, I've lighted the definition, but what's important about it is its type signature. It takes an interval set and it produces a list of all of the members of that set. And you can see sort of a demo thing here that I typed in before the talk. So, if I apply iToList so that, the brackets there they just mean the list, and we feed in a list of intervals and those intervals are from zero to three, from five to seven, and from nine to ten, respectively. They're all inclusive. And you can see down there is a list of all of the members. So, the first interval is from 0 to 3. So, it has the numbers 0, 1, 2 and 3. The next one goes from five to seven. So it has the three numbers 5, 6, 7. And the last one goes from nine to ten. So it has the two numbers, 9 and 10 there. Does that make any sense again? Slightly more complicated example. So let's see. So, of course, well, not of course, but the way we want to do it, the way I want to do it is, I want to have the interval set structured in a certain way. I don't just want any list of any pair of numbers to denote an interval set. And therefore, here is a function that describes what it needs to be a valid interval set. Right. So, for example, we don't really, in order to have efficient processing, we don't really want two intervals in one interval set to overlap. Right. We want them to be disjoint and we also want them to be ordered so we can have efficient operations for certain things. Right. And so, let's go through this. So, there is an isValid function that just tells you whether that interval set is valid or not. It says, well, if that set, and there's three different cases here, which is why there's three different equations, in the first equation says the empty interval set, the empty brackets mean the empty list, and if the intervals, the list representing the interval set is empty, then we're going to say True. Empty set - perfectly fine. The next case says, our interval set consists only of a single interval and that single interval goes from low to high. Well, we kind of interpret that there, but, and, well, that interval set is valid, if low comes in front of high. Right, they shouldn't be the other way around. So, does that make sense? Somebody in, can you nod at the back, a little bit? You're still there? OK. Thank you. Great. So, then it becomes a little bit more complicated and it says, well, this is the third case, when there's at least two intervals in the interval set. And those two intervals are, the first one goes from lo1 to hi1. The second one goes from lo2 to hi2. So, those ::, they separate the first element of a list from the rest. And then there's the rest of the list. And it says, well, again, we want the interval to be ordered so that the lower numbers are on the left. That's where it says lo1 is less or equal hi1. And then it says, well, that there should be a gap between two consecutive intervals. Otherwise, they should be one interval, which is why the high from one interval should be separated from the low of the next interval by at least one. And then we're going to say, well, also we want all the rest of the list, including lo2 and hi2 to be valid too. So far so good? OK, so this is probably, well, the second most complicated piece of code. So, anyway, so, here's, so, we might imagine a union function. And the union function, guess what, it forms a monoid, with respect to interval sets. So, it takes, two internal sets go in and another one comes out. And if you've written that kind of thing before, you might notice it's probably a little tricky with that fancy validity condition that's there. So how can we get this right? Well, what we do is we write down properties. Of course, we could write down associativity. I'll leave that as an exercise. Another one is just very simple. Just a very simple property that says for all pairs of two arbitrary interval sets, we want the union of those two interval sets to be valid, a valid data structure. We want the union function to preserve validity. OK? Makes sense? So here's another property that says, well, I already gave you this function or I told you that there is this function iToList, which just gives us a list of elements of an interval set. And what we can do is, we can use sort of that representation, that's also a representation for sets. We can use that representation sort of as a model and say, well, if we take the unions, you see there for all pairs, again, of arbitrary interval sets, we take the union. It says iUnion, iset1 and iset2. And we convert that to a list. And, we could also do, we could instead convert each individual set to a list and then just merge those two lists. And that should yield the same result. So, in a way, we're just giving a very simple model for our interval sets, right, and that would, so those two criteria would be kind of nice to have in order to get our implementation correct. And I already got started before the talk on this. Looks like this. No. Doesn't look like this. We'll get to that later. But like this. So, here's what I came up with. So, you see there is that, while there's all this other code there, ignore that. But there is iUnion says ISet -> ISet -> ISet, do you see that? And then, there's two equations that say, well, the first set is empty, then I'm just going to give you the second one. And if the second one is empty, I'm just going to give you the first one. Right? Classic things when you have union or concatenation operations or something like that. And now you can see the third case. It gets tricky, right? Again, you don't need, I mean, main thing is you need to understand it's tricky. Well, the third one is such that, well, says so that both have at least one element and that element is in the interval lo1 and hi1 in the first case and lo2 and hi2 in the second case. And then there's the rest. And I already put in a little bit of code, and I said, well if lo1 comes after hi1 (means hi2), then we want to start with lo2 to hi2 and then continue with the union. In the other case, if lo2 comes after hi2 (means hi1), then we're gonna start with lo1 and hi1. And in the other case, it means, that no interval comes before the other, and therefore we need to merge the two intervals at the beginning. Does that make remote sense? Right. Don't worry. We'll get back on solid track. So, we just take the minimum of those two intervals and maximum of those two intervals and we do this. Now, the great thing is, I told you about this tool by John Hughes called QuickCheck. And the great thing is, we can load this into Idris. And then here comes a REPL, and we can say, I hope I'm doing this right. So, we want QuickCheck, and we want, what was it called? It was called prop_unionCorrect. I hope I'm doing this right. And, well, very small font. But you can see here it says "100 tests". And that is what QuickCheck does, as, it takes your code version off the property and automatically generates a lot of tests for them. And that is super effective at weeding out bugs. So it says, well, the thing that you wrote is correct. It always produces interval sets that when you take the list, it gives you the right result. But there was that other criterion called unionValid. And there it says, and this is really the better part, of course, of QuickCheck is, when it fails, it says it's falsifiable. It says there is a counter example. And so, here it says, I did nine tests, I generated nine random tests, and I found one where the result is not valid. And the great thing is that we can then go and cut and paste this example. So I could say iUnion, this, remove the comma in the middle, and call this. And well, what happens here is, what we can see is, we can see 2 and 4, 1 and 1, and 3 and 5, and what's not valid about. So, by the way, this is randomized. So, this always goes differently. So I have to look at it, too. So, then it says, well, those two intervals, they should really just be merged and they should just be one interval. Right? And so, it didn't do that correctly. And the reason for that, maybe you saw it. So, and, what happened is that it ran into one of those two cases here where it says if lo1 greater than hi2 or lo2 greater than hi1. Remember that I told you there needs to be a gap of at least one between them. Remember? And here's an off-by-one error that says, well. So this says, they can, lo1 greater than hi2 says they can still be right next to each other. Right? And this is what happened here. We need to make sure that there is that gap in here. So, I can fix it like this. Loaded again. Oh, no. There's still a counterexample. So, and we can try that out, so, and that's great. We get test cases that sort of show where the bugs are. And in this case, well, what happened here? They still overlap. And what happened here? So can you see it? So, you can see that the first two intervals, they must run into that last case. Right. Because they overlap. Zero is the interval from 0 to 3 and the interval from 0 to 5. They overlap. So we need to get to that case. And so it merges them. And then it went and and somehow didn't merge it with the 6 and the 7 that's there. And, so, well, if you look at it. So it must have done this. And and what it did is, it then went on with the rest there. Let's have one more look. What actually happened? So there it is. So, it merged those and then you can see that it went into a symmetry problem here. Well, maybe you don't see. But, you know, this is tricky stuff. I couldn't do this by myself. So you can see here that it just tacks the result onto iset1Rest, whereas the maximum of hi1 and hi2 could, might violate the consistency criteria if it's the wrong one, and then it runs into one of the other cases. Now I've never seen this tricky one. Does it make sense? But, can you see that it should be symmetrical? The last one. Can you see it? OK, so we'll try and make it symmetrical. Do it like this. So we'll say, well, if so, this only works. So if hi1 is less than hi2. So we really need to make sure, then it is perfectly. And then the maximum of those two numbers is hi1. Does that make sense? And so the max of those two numbers is hi1 and then it's perfectly valid to tack it onto iset1Rest. In the other case, hi2 is greater and we need to go and do something different and rip this out here. Stick it in front here and then. And then. And now it's symmetrical. OK. So, load this. And, ahh! It has passed the test. OK, live great. applause Thank you. I did practice getting it correct, right. But you can, you know, this kind of stuff. It always gets me. I mean, you know, with old age especially, this kind of stuff, it always drives the sweat on my forehead, right? You know, there's off-by-one. There is, you know, I don't know how many cases there need to be. And QuickCheck is the kind of thing that weeds out the bugs. And even though it weeds out the bugs in a different order each time, it always weeds them all out. OK. So it's a great tool. Now, I recommend that you try that. It generates tests from properties. OK, where are we? So let me let me give you a couple of real world examples. So if you're using X windows, there's a there's a tiling, a window manager, xmonad. It's already a couple of years old and they don't do much development on it anymore. That's because it's correct. Right. laughter Right. And why is it correct? Well, it's because they wrote down a lot of properties for the geometry and the tiling algorithms and verified them using QuickCheck. And so I sort of loosely translated. So, Don Stewart, one of the authors of xmonad graciously wrote a couple of blog posts on a simplified version of xmonad and I translated them into Idris. So, here's a very simple idea of just a stacking window manager. So, it doesn't do geometry, it just has stacks of windows and it has several workspaces. In each workspace is a stack of windows. So here's a data type called a StackSet, its parameterized by a type called window. We'll see later why there's a type parameter and why it just doesn't say what the windows are. And then it says there's a constructor StackSet and there's two fields in there. One is called "current", that's the number of the workspace that's currently active. And then there's "stacks", which is a map from the number of the workspace to the stack of, to the list of windows that sit in that workspace. Again, so here, really the technicalities are not particularly important, but there is a bunch of operations that operate on this window manager configuration. And again, here, really the details aren't important. So you could create an empty stack set. You could say, well, you know, I have the number of a window that I want to get to the front. And please make me, please rotate me, the stack set around so that I can see it. "peek" means, you know, maybe I can get the topmost window that the user is currently looking at. "rotate" means I'm just going to rotate the workspaces around in either left or right direction. That's what that ordering argument. "push" is, I push a new window onto the current workspace. "insert" means insert a window into one of the other workspaces. "delete" means I delete a window. "shift" means, also means I shift something with the windows. Not really important what they do. But you can imagine again, just as we did with the interval sets is validity criterion or an invariant that should hold for these operations. And it's very simple. Well, it says well, if you have a stack set with some windows in it, I'm just going to tell you whether that stack set is consistent. And by doing that, I'm just going to say, well, the current, the number of the current stack should not be higher than the number of window stacks that there are. Right. So, the number of stacks that there are. And the other one, that just says a window should not be in several of the workspaces. Right? And then I can go and maybe with this definition, all those function definitions aren't very complicated. But, I can go and write a whole bunch of properties. And if you just understand, well, maybe the second one, "prop_view_I", you understand all of them. It just says, well, for all pairs of a natural number and a stack set that are a "stackIndex" and "stackSet", I want, if I call the "view" function, which is one of the operations, I want the view function to produce a consistent stack set. And then it goes on to do all of that for all the other ones. At the bottom here, you can see some prerequisites that need to hold for the property so that invariant only needs to hold if the window, if the number of the window is actually smaller than the size of the stack set. Otherwise, I think the function just returns what would go with it, what went in there, So that's a very, that's just a very efficient way to invent properties, to think of some invariant that shall hold in your data structure. And if you know Idris, you can sometimes encode that in the types, but often that's kind of tedious. And you can just write it down as a property and then have QuickCheck check it for you. And it's not particularly exciting for the simple definition, but you can imagine that the actual definition when you have tiling window management going on is much more complicated than the one that you just saw. But you can keep those same properties, right? There still needs to be some consistency invariant that, if you have tilings, the windows don't overlap, and things like that. That should be obvious. Write those properties down, check them using QuickCheck and that will weed out a lot of the bugs. Here's an example from our practice. We, couple months ago, we were tasked with migrating a giant Visual Basic 6 application. It had a password checking function there. You can see here a Visual Basic 6 type signature. And the property that we wrote was, well, if we create the hash from the password and we compare it with the hash that's in the database, then they should all come out the same. And to our surprise, that function, that test, that property, failed when we ran it for QuickCheck and we had to correct it because that password hash is restricted to 11 characters by some restriction in the database schema. And so that means that you can use QuickCheck not just to sort of check the correctness of things that you already know, but to actually develop a model for what goes on in your software, which you don't always know very well. So that's what we did there. Another example is, we wrote, for a large industrial client, we needed to write a synchronization application. So when you had two mobile devices and they would sort of meet as strangers, they would exchange data and they all needed to look at the same sort of device configuration data. And we didn't want them to exchange all the data every single time. We just wanted to exchange them, the data blocks that the other side was missing. And again, there's great algorithms for this based on Merkle trees. They're pretty complicated. You have to do a lot of bit fiddling with that. But fortunately, the property for that is pretty easy to write. So here's the property that says, well, so the synchronization algorithm works on sets of blocks, whatever a block is. So you can see the property here for all pairs of sets of blocks and more sets of blocks. So they're called bs1 and bs2. Block set one and block set two. What we can do is, we want, if we union those two, then we get all the blocks in the system. We call that all, or we can call the synchronization algorithm and that will give us two new block sets, block set bs1' and bs2'. And those block sets are the ones that get transferred to the other side. OK. And the criterion here just says if we take the ones that we have, if we union them with the ones that we get, we should get all of them. That should be all of them. And that should be the same for both sides. And also, we want the algorithm to be efficient so we don't want it to transfer blocks. So we want to make sure that the blocks that we have and the blocks that we get are disjoint. That they don't have any elements in common. Otherwise, we could make that algorithm trivially correct by just transferring all the blocks every single time. And I can tell you, I sweated. You know, I sweated one or two weeks over this algorithm and it was really hard to write. But this one test weeded out all the bugs that I found along the way. So that is just super, super effective. John Hughes has a couple of papers on hard bugs that he found. So he found a bug in a distributed database called mnesia. And that bug was dependent on opening the database, closing it and opening it again. So this is not the kind of bug that you find by just writing a bunch of smart unit tests. Right? So, if you did anything shorter in the beginning, so if you just open the file and then did some lookups there, that would not manifest the bug. You really needed to close and then open again. Have you turned, have you tried turning it off and on again? But then the database breaks in this case. And here's another example called The Mysteries of Dropbox. So you can imagine that with Dropbox you really want certain properties to hold. Right? And it turns out they didn't. They never worried about writing properties down. But John Hughes did it and found a couple of bugs. So here's one. It's kind of hard to read where it says client 1 writes a into a file that was previously empty. So that funky turnstile there is empty. So writes a into a file and then deletes the file and another client writes, replaces, sees the a in the file replaces it with a b. And then client 1 goes and writes c into the file that it previously thought to be empty. And then unfortunately, even though you can imagine that you should see either b or c in that file, but Dropbox deleted it. So I think they fixed that bug now. But. so you go. So it goes. Oscar Wikstrom has a couple of great, pretty recent blog posts on properties in a screencasts editor that I highly recommend. So this is a great tool for finding bugs, but it's not the same as having a proof. Right? So, you can still imagine that you can find very subtle bugs that are not covered by QuickCheck. QuickCheck just randomizes, just generates randomized tests. So, that is not the same thing as making sure that there aren't any bugs. So the great thing about Idris and the reason I chose it for this talk is that Idris allows you to not just encode properties in the language. It also allows you to encode proofs in the language. So here is the associative property for the list concatenation operation. And if you look at the top that has the definition of that function from the Idris standard library, it says ++, in goes a list, in goes another list, out comes a list. Then it says, well, if you concatenate the empty list with any list, that is just that list "right". Do you see that? The second one says, well, if we concatenate a list that starts with the element x and goes on with xs, then we just sort of pull the x in front and concatenate the rest with "right". So that's a classic recursive definition of list concatenation in functional programming. And now here's something really strange in Idris. Here's the type declaration for a definition, again in the standard library, called appendAssoc. And it says, if you have a list a, you have a list b, and you have a list c, and in the type it says, oh, the associative property should hold. Right. And so this is a statement of that property. That's wonderful. It's not the same as a proof. So, but writing proofs, who loved that in math? Oh. Oh, you're good! I didn't. I'm sorry. So. So the great thing about Idris is, it helps you write down the proofs. I'll show you how that works just really, really briefly. So here's that. So here's just what I showed you on that slide. So I can load that in there and it says, well, you're not done. You didn't write a proof for that property, but in Idris, you can just push a bunch of buttons. Now, I love that. So I can push one button and it says, oh, well, you should write a proof of that form. You have lists a, b and c. Well, now and I can push another button that says, well, you're doing this on lists and if you're writing anything on lists, you always need to distinguish between the two cases of the empty list and the list that consists of the first element x and further element xs. And then it says, well, write down something, but then I can tell Idris: Well, I'm too lazy. I'm not going to write anything so I can just push a button. And Idris wrote this so you can see me, but I didn't type this right. I just pushed a button and it says, Refl. What is Refl? What could that be? Well, you can ask it what Refl is. It says Refl. Oh, you can see here, landing here. Refl is just a proof sort of a built in proof that says that if two things are true, two things are equal, if they're identical, if they're the same. Right. And that kind of makes sense in the first equation, because the first equation of appendAssoc corresponds to the first equation of ++. Can you see that, how it corresponds? Can you see that? The first list is empty. Can you see that? Can you see how the first list is empty with the first equation of appendAssoc and the first list is empty up there with ++. Can you see that? OK. And then it just says, well, then obviously. Well, not quite obviously, but then sort of the the way that the definition works, it comes out just right. So what's really important is that Idris accepts that proof with the first. The second one is slightly more tricky. But again, we can get help because we know that appendAssoc is this recursive function. It recurses on the first argument. So we're just going to do the same thing in the proof. And you can tell Idris that it should use that, that it should use that fact, if you will. So here's the recursive call. And again, I'm too lazy to push a button. But if I push that button, it also puts in Refl and there's loads. So this it might be a mystery to you how it works, but this is a proof of the associate of property of the list concatenation in Idris. And since Idris helps you write it, it's kind of fun to do that. Oddly enough, even for somebody who doesn't, usually who doesn't usually doesn't enjoy proofs. So the way that you program in Idris, we haven't done that a lot in this talk, is that you put a lot of information in the types and the more information you put in the types, the better Idris will get at figuring out the correct definition. And you don't have to do it by yourself. OK. So that's really nice. OK. So we got that and and sort of these kinds of proof assisting systems such as Idris have been used in a lot of real world systems. One one prominent example is SEL4, a version of the L4 micro kernel, has a long history, but important properties of that kernel have been verified. It runs in the security enclave on iOS and even though it's written in C, it provably does not have buffer overflows or a lot of the nasty things that are responsible for a lot of security exploits. Compcert is another example, which is a verified, I should mention this has been verified with the help of a tool called Isabel. Also, great fun to use. There's a project called called Compcert, which is a verified C compiler, which is important for a lot of certified software where, you know, the source code might be certified. But how do you know that the compiler generates correct code? And you know, because it's been proven to be correct. And even there, you can shoot, you can cheat sometimes. So for example, register allocators, very complicated, very hard to prove right. But what you can do is you can write a checker that the register allocator did its job, did its job well and you can verify the checker. And so you can cheat a bit. So there's tools for that. We've seen Idris and there's a number of other tools and they're getting more and more mature. And they're great fun, really. They really are great fun. But, you know, going back, switching down a gear a little bit, there's lots of useful properties that you can look for in your programs. So commutativity might be useful that you can switch the two arguments for an operation. Also, if you have relations, you might remember that from some math class, there's some properties here like reflexivity, symmetry, antisymmetry and transitivity. Reflexivity says that a is always related to a. Symmetry says if it's one way, if a and b are one way related, they need to be related the other way too. Antisymmetry intuitively would seem kind of the opposite. That doesn't make sense. It's just says: if two things are related in both both ways around. So, for example, you know, orders like less or equal are antisymmetrical, then they must be the same. And transitivity just says that you can form chains of your relation. So those are a little dictionary of useful properties that you can look for. Let me close with one fancy property that you've probably seen somewhere and that property is called Functor. And you might have seen in your programing language, in your list library or in your stream library. There's a function called map, right? And you know, even Java has that and has had it for many years. And what map does is, if you have some, you know, in Java, for example, it says "Stream", or it might be "List", right. It says, well, if I have a list of As, I can apply a function to each element of that list. But you can generalize that, it doesn't have to be lists. It could be an Optional of As, for example. You could also apply a function to the value that's in there. So, you can generalize that notion, and then it's a functor. And, of course, in Idris, you can write down equations for functors. And, please ignore the technicalities here, (stammers) but, if you sort of pick out where it says "functorIdentity", the middle row says g v equals to v, which means g is the identity function. When you feed in v, you always get v back. And when you use map with the identity function, you apply the identity function on each element of your list or whatever it is. Then then you always get back the same list. And here just says you get function composition. So if you apply one function and then another function and you do that either inside or outside the map, you should also get the same results. So there's also just as there is associativity with monoids, with functors. There's these laws and you might think, well, where would I look for a functor? I've never seen a functor except for the ones on streams. A couple weeks ago in a training, somebody said, well, you always start with that animal example. Shouldn't you look for a functor there? And I was kind of, you know, sweat broke out on my forehead, I was like, where's that gonna go? But, we came up with this. So, if you go back, you can see that this is obvious. So, what you need for functors is, you need a type parameter. Right. And so you just look for a place to stick a type parameter, any place here at all. And if you look at Dillo and Parrot, they both prominently have this weight thing. Right. And so that seems more important than the other two properties, which are specific to particular kind of animal. And so the weight, the thing to do is just to, well, you can see I replaced upper case Weight by lower case weight and made that into a type parameter, and, I can then provide a functor implementation down there. And you might think, what is that good for? Well, I don't know. Well, one thing that you could do is you could provide a different representation for weights. Another thing that you could do, if you look at the type for runOverAnimal, it says animal weight -> animal weight and weight is a type variable. What that type signature tells you is that runOverAnimal does not know what weight is. And that means that the weight cannot change as a result of that function. And you see that in the type signature you get immediate, small benefit, but you get a benefit even with silly examples such as this one. And that really brings me to the end. So in your software, in your domain model, look for a Combinator, look for a function that will combine two things into a bigger thing. See if you can make that thing associative and look for a neutral element. And very often you will find one; make it a monoid, you know, say monoid a couple of times. You'll remember it. You'll remember it. Generally, write properties for the things, for the operations in your software, test those properties using QuickCheck. You know, if you feel like you have a lot of time, prove them correct. Find the functor. If you found, if you found the monoid, you know, find the functor next. You know, and it takes, it might take time. I'm very old. As you noticed at the beginning. So. So it gets easier over the years and it will just seem like a regular staple of your of your arsenal when you program. And of course, when the important properties in your program have either been written down, if they've been tested with QuickCheck or even proven, then you can sleep much more soundly than maybe you currently can. Thank you very much. applause Herald: Thank you, Mike. So I see we have three minutes for questions. Maybe that's two or three questions. If you have any come to the microphones, please. Do we have a question from the Internet? No, not yet. So, microphone two. Right. Question: Hi. So, QuickCheck generated hundred tests. Yes. What can we say about the quality of this test? Could we say your program was correct using thoese tests? Are these tests good? Answer: Yeah. Very good question. So what would you say about the quality of the tests? And indeed, if you really do serve industrial strength applications, a quick QuickCheck comes with a bunch of tools that let you look, for example, at the distribution of the individual example rated and while you didn't quite see me do that, but I mean, for your domain objects, you will typically write generators that will generate those examples and you can reason about the distribution of those. And you absolutely should do that because otherwise you might miss large areas of your test space. So, but there are tools and they help you do that. But even if you don't do that, you know, it's, you find a lot of, I found a lot of bugs in my software even without worrying about that. But if you go beyond that, look at the distribution thing. Herald: Thank you, next one, please. Number two. Q: Let's say I've hacked a program, for example, in Java or C# or whatever. How do I, how do I apply what I learned so far? So, where do I start when I have already completed the C# program with, yeah, how do I apply QuickCheck on that? A: So, just pragmatically because it's written in C#? That's the question? Q: Yes A: So, well, I have to be very concrete here, I mean, so, if you can think properties, right, one way to do, so, for example, so, C# you can link with F# and there is a QuickCheck version for F# called "FsCheck". And FsCheck, actually, even though it's itself written in F#, you can also use it from C#. So, you have two options. You can write your tests in a slightly more awkward fashion in C#, or you could just link your code with F# test suite and write it down there. And there is a fairly reasonable Java QuickCheck, I hear. Another idea would be to use the slightly more fancier, the slightly fancier QuickChecks that exist for Scala and Enclosure. I'm sure there's one for Kotlin as well, and link that against your Java code. Does that answer your question? Q: So whatever language I use, I have to find out what the correct implementation of QuickCkeck? A: Yeah. Yeah. But as I said, I mean usually, a fun thing I do in training is, I chat "QuickCheck" and somebody calls on the language, you know. Quick, QuickCheck PHP or something like that. And there is one, sure enough, I didn't know about before. Q: Thank you. Herald: All right. Thank you. And thank you, Mike, again, for showing us a way to sleeping soundly. A: Thank you. applause postroll music Subtitles created by c3subtitles.de in the year 2020. Join, and help us!