0:00:00.000,0:00:11.000 0:00:11.000,0:00:15.000 This presentation is delivered by the Stanford Center for Professional 0:00:15.000,0:00:25.000 Development. 0:00:25.000,0:00:26.000 Okay. 0:00:26.000,0:00:27.000 So this is 0:00:27.000,0:00:30.000 the last thing we had talked about at the end of the merge sort was 0:00:30.000,0:00:35.000 comparing a quadratic and N-squared sort, this left and right sort right here to the linear 0:00:35.000,0:00:37.000 arrhythmic which is the N-log in 0:00:37.000,0:00:40.000 kind that merge sort runs in, right, showing you that 0:00:40.000,0:00:41.000 0:00:41.000,0:00:44.000 really way out-performing even on small values and getting the large values 0:00:44.000,0:00:47.000 [inaudible] just getting enormous in terms of what you can accomplish with 0:00:47.000,0:00:48.000 a N-log-in algorithm 0:00:48.000,0:00:50.000 really vastly 0:00:50.000,0:00:52.000 superior to what you're gonna get out of a N-squared algorithm. I said 0:00:52.000,0:00:56.000 well, N-log-I told you without proof that you're going to take on faith that I 0:00:56.000,0:00:57.000 wouldn't lie to you, 0:00:57.000,0:01:01.000 that that is the best we can do in terms of a comparison based sort, 0:01:01.000,0:01:04.000 but what we're going to look for is a way of maybe even improving on those 0:01:04.000,0:01:06.000 kinds of merge sort by kind of a 0:01:06.000,0:01:09.000 one thing we might want to do is avoid that copying of the data out and back that 0:01:09.000,0:01:11.000 merge sort does and 0:01:11.000,0:01:15.000 maybe have some lower cost in factors in the process, right that can bring our 0:01:15.000,0:01:17.000 times down even better. 0:01:17.000,0:01:20.000 So the sort we're looking at is quick sort. 0:01:20.000,0:01:22.000 And I think if you were gonna come up with an algorithm name, 0:01:22.000,0:01:25.000 naming it Quick Sort becomes good marketing here so it kind of inspires you to 0:01:25.000,0:01:29.000 believe actually that it's gonna live up to its name, is a divide-and-conquer algorithm 0:01:29.000,0:01:33.000 that takes a strategy like merge sort in the abstract which is the divide into 0:01:33.000,0:01:36.000 two pieces and [inaudible] sort those pieces and then join them back 0:01:36.000,0:01:37.000 together. 0:01:37.000,0:01:40.000 But whereas merge sort does an easy split hard join, 0:01:40.000,0:01:43.000 this one flips it on its head and said what if we did some work on the front step 0:01:43.000,0:01:45.000 in the splitting process 0:01:45.000,0:01:49.000 and then that may give us less to do on the joining step 0:01:49.000,0:01:53.000 and so the strategy for quick sort is to run to the pile of 0:01:53.000,0:01:55.000 papers, whatever we're trying to work at 0:01:55.000,0:01:57.000 in a partitioning step 0:01:57.000,0:01:58.000 is the first thing that it does 0:01:58.000,0:02:01.000 and that's the splitting step and that partitioning is designed to kind of move 0:02:01.000,0:02:04.000 stuff to the lower half and the upper half. 0:02:04.000,0:02:07.000 So having some idea of what the middle most value would be, 0:02:07.000,0:02:11.000 and then actually just walking through the pile of papers and kind of you know, 0:02:11.000,0:02:13.000 distributing to the left and right 0:02:13.000,0:02:17.000 portions based on their comparison to this middle-most element. 0:02:17.000,0:02:21.000 Once I have those two stacks - I've got A through M over here and N through Z over there, 0:02:21.000,0:02:23.000 that we're cursively sorting those 0:02:23.000,0:02:26.000 takes place, kind of assuming my delegates do their work 0:02:26.000,0:02:32.000 and in the process of re-joining them is really kind of simple. They kind of, you know, 0:02:32.000,0:02:37.000 joined together with actually no extra work, no looking, no process there. So the one 0:02:37.000,0:02:41.000 thing that actually is is where all the trickiness comes into is this idea of how we 0:02:41.000,0:02:44.000 do this partitioning. 0:02:44.000,0:02:47.000 So I was being a little big disingenuous when I said well if I had this stack of exam papers and 0:02:47.000,0:02:48.000 I know that 0:02:48.000,0:02:51.000 the names, you know range over the alphabet A through Z then I know where M is and I 0:02:51.000,0:02:54.000 can say well, that's a good mid pint around to divide. 0:02:54.000,0:02:57.000 Given that we want to make this work for kind of any sort of data, 0:02:57.000,0:03:01.000 we're not going to [inaudible] know, what's the minimal most value. If we have a bunch of 0:03:01.000,0:03:04.000 numbers, are they test scores, in which case maybe 50 is a good place to move. 0:03:04.000,0:03:06.000 Are they instead, you know, 0:03:06.000,0:03:09.000 population counts of states in which case millions would be a better 0:03:09.000,0:03:11.000 number to pick to kind of divide 0:03:11.000,0:03:12.000 them up. So 0:03:12.000,0:03:15.000 we have to come up with a strategy that's gonna work, you know, no matter 0:03:15.000,0:03:19.000 what the input that's somehow gonna pick something that's kind of reasonable 0:03:19.000,0:03:21.000 for how we're gonna do these divisions. There's 0:03:21.000,0:03:23.000 this idea of picking, called a pivot. 0:03:23.000,0:03:28.000 So given this, you know, region, this set of things to sort, 0:03:28.000,0:03:30.000 how do I know what's a reasonable pivot. 0:03:30.000,0:03:33.000 We're looking for something that's close to the median is actually ideal. 0:03:33.000,0:03:34.000 So if we could 0:03:34.000,0:03:37.000 walk through them and compute the median, that would be the best we could 0:03:37.000,0:03:38.000 do, we're 0:03:38.000,0:03:41.000 gonna try to get around to not doing that much work 0:03:41.000,0:03:45.000 about it, though. We're gonna actually try to kind of make an approximation and we're gonna make a really 0:03:45.000,0:03:47.000 very simplistic 0:03:47.000,0:03:50.000 choice about what my approximation would be. We're gonna come back and re-visit 0:03:50.000,0:03:51.000 this a little bit later. 0:03:51.000,0:03:56.000 But I want to say, ''Well, I just took the first paper off the stack. If they're in random order, 0:03:56.000,0:03:58.000 right. I look at the first one; I say Okay, it's, 0:03:58.000,0:04:00.000 you know, somebody king. 0:04:00.000,0:04:03.000 Well, everybody less than king goes over here. Everybody greater than king goes over 0:04:03.000,0:04:07.000 there. Now king may not be perfectly in the middle and we're gonna come back to see 0:04:07.000,0:04:09.000 what that will do to the analysis, 0:04:09.000,0:04:12.000 but at least we know it's in the range of the values we're looking at, right, and so it 0:04:12.000,0:04:14.000 at least did that for us. 0:04:14.000,0:04:17.000 And it means no matter what our data is, we can always use that. It's a strategy that will always work. Let's 0:04:17.000,0:04:18.000 just take the thing 0:04:18.000,0:04:20.000 that we first see 0:04:20.000,0:04:22.000 and use it to be the pivot and them 0:04:22.000,0:04:24.000 slot them left and right. 0:04:24.000,0:04:28.000 So let me go through looking at what the code actually does. 0:04:28.000,0:04:30.000 This is another of those cases where the 0:04:30.000,0:04:32.000 code for partition 0:04:32.000,0:04:35.000 is a little bit 0:04:35.000,0:04:36.000 frightening 0:04:36.000,0:04:40.000 and the analysis we're looking at is not actually to get really really worried 0:04:40.000,0:04:42.000 about the exact details of the less 0:04:42.000,0:04:45.000 than or the less than or equal to. I'm gonna talk you through what it does, 0:04:45.000,0:04:48.000 but the more important take-away point is what's the algorithm behind Quick Sort. What's 0:04:48.000,0:04:53.000 the strategy it's using to divide and conquer and how that's working out. 0:04:53.000,0:04:56.000 So the main Quick Sort algorithm looks like this. We're given this 0:04:56.000,0:04:59.000 vector of things we're trying to sort and we have a start index and a stop index 0:04:59.000,0:05:04.000 that identifies the sub region of the vector that we're currently trying to sort. I'm gonna 0:05:04.000,0:05:08.000 use this in subsequent calls to kind of identify what piece we're recursively 0:05:08.000,0:05:09.000 focusing on. 0:05:09.000,0:05:12.000 As long as there's a 0:05:12.000,0:05:15.000 positive difference between the start and the stop so there's at least 0:05:15.000,0:05:16.000 two elements to sort, 0:05:16.000,0:05:19.000 then we go through the process of doing the division and the sorting. So then 0:05:19.000,0:05:23.000 it makes the call to partition saying sub-region array 0:05:23.000,0:05:26.000 do the partition and we'll come back and talk about that code in a second and the thing we're trying to find 0:05:26.000,0:05:31.000 the partition is the index at which the pivot was placed. So after we did all the 0:05:31.000,0:05:31.000 work, 0:05:31.000,0:05:34.000 it moved everything less than the pivot to the left side of that, everything 0:05:34.000,0:05:36.000 greater than the pivot to the right side 0:05:36.000,0:05:37.000 and then the pivot will tell us 0:05:37.000,0:05:41.000 where the division is and then we make two recursive calls 0:05:41.000,0:05:43.000 to sort everything up to but not including pivot, 0:05:43.000,0:05:46.000 to sort everything past the pivot to the end, 0:05:46.000,0:05:49.000 and then those calls are operating on the array in place, so we're not copying it out and 0:05:49.000,0:05:54.000 copying it back. So in fact, there actually even isn't really any joint step here. 0:05:54.000,0:05:57.000 We said sort the four things on the left, sort the seven things on the right 0:05:57.000,0:06:00.000 and then the joining of them was where they're already where they need 0:06:00.000,0:06:02.000 to be so in fact we don't have anything to do in the 0:06:02.000,0:06:04.000 joined wall. 0:06:04.000,0:06:09.000 The tricky part is certainly in partition. The 0:06:09.000,0:06:12.000 version of the partition we're using here is one that kind of takes two indices, 0:06:12.000,0:06:14.000 two fingers, you might imagine 0:06:14.000,0:06:16.000 and it walks a 0:06:16.000,0:06:18.000 one end from 0:06:18.000,0:06:22.000 the stop position down and one from the start position over 0:06:22.000,0:06:26.000 and it's looking for elements that are on the wrong side. 0:06:26.000,0:06:30.000 So if we start with our right hand finger on the very end of the array, 0:06:30.000,0:06:31.000 that first loop in there 0:06:31.000,0:06:35.000 is designed to move the right hand down 0:06:35.000,0:06:38.000 looking for something that doesn't belong in left side, so we 0:06:38.000,0:06:41.000 identified 45 as the pivot value. It says 0:06:41.000,0:06:43.000 find something that's not 0:06:43.000,0:06:48.000 greater then 45. That's the first thing we found coming in from the right 0:06:48.000,0:06:50.000 downward that doesn't belong, 0:06:50.000,0:06:51.000 so only that first step. 0:06:51.000,0:06:54.000 So it turns out it takes it just one iteration to find that, 0:06:54.000,0:06:57.000 and so okay, this guy doesn't belong on the right-hand side. 0:06:57.000,0:07:00.000 Now it does the same thing on the inverse on the left-hand side. 0:07:00.000,0:07:03.000 Try to find something that doesn't belong on the left-hand side. 0:07:03.000,0:07:06.000 So the left hand moves up. In 0:07:06.000,0:07:08.000 this case, it took two iterations to get to that, 0:07:08.000,0:07:09.000 to this 92. 0:07:09.000,0:07:13.000 So now we have a perfect opportunity to fix both our problems 0:07:13.000,0:07:15.000 with one swap, 0:07:15.000,0:07:16.000 exchange what's 0:07:16.000,0:07:19.000 at the current position of left hand with right hand and now we will have 0:07:19.000,0:07:23.000 made both of our problems go away and we'll have kind of more things on the 0:07:23.000,0:07:24.000 left and the right that belong there 0:07:24.000,0:07:27.000 and then we can walk inward from there, you know, to find subsequent ones 0:07:27.000,0:07:29.000 that are out of order. 0:07:29.000,0:07:31.000 So those two get exchanged and now 0:07:31.000,0:07:33.000 right again 0:07:33.000,0:07:35.000 moves into find that 8, 0:07:35.000,0:07:37.000 left moves over to find that 74. 0:07:37.000,0:07:38.000 We swap again. 0:07:38.000,0:07:40.000 And as we just keep doing this, 0:07:40.000,0:07:41.000 right, until 0:07:41.000,0:07:43.000 they cross, and once they've cross, 0:07:43.000,0:07:47.000 right then we know that everything that the right scanned over is greater than the pivot, 0:07:47.000,0:07:48.000 everything 0:07:48.000,0:07:51.000 in the left was less than and 0:07:51.000,0:07:55.000 at that point, right we have divided it, right, everything that's small is kind 0:07:55.000,0:07:56.000 of in the left side of the 0:07:56.000,0:07:59.000 vector. Everything that's large in the right side. So I 0:07:59.000,0:08:03.000 swap these two and now I get to that place and I say okay, 0:08:03.000,0:08:05.000 so now big things here, small things there. 0:08:05.000,0:08:10.000 The last thing we do is we move the pivot into the place right between them and 0:08:10.000,0:08:13.000 know that it is exactly located right in the 0:08:13.000,0:08:15.000 slot where they cross. Now 0:08:15.000,0:08:17.000 I have two sub arrays to work on. 0:08:17.000,0:08:21.000 So the return from partition in this case will be the index four here and it 0:08:21.000,0:08:24.000 says Okay, go ahead and quick sort 0:08:24.000,0:08:25.000 this front-most part 0:08:25.000,0:08:28.000 and then come back and do the second part. 0:08:28.000,0:08:31.000 So in recursion, kind of think of that as being postponed and it's waiting; we'll get back to it in a minute, but 0:08:31.000,0:08:34.000 while we work on this step, we'll do this same thing. It's a 0:08:34.000,0:08:39.000 little big harder to see it in the small kind of what's going on, but in this case, right, it 0:08:39.000,0:08:44.000 divided that because the pivot was 8 into 8 and the three things greater than the pivot. In 0:08:44.000,0:08:48.000 this case, the 41 is the pivot so it's actually going to move 0:08:48.000,0:08:50.000 41 over there. It's going to have the left of that as it keeps 0:08:50.000,0:08:53.000 working its way down, it gets smaller and smaller and it's harder to see the workings of 0:08:53.000,0:08:56.000 the algorithm in such a small case of it 0:08:56.000,0:08:58.000 rearranging it, but 0:08:58.000,0:08:59.000 we'll get back to the big 0:08:59.000,0:09:01.000 left half in 0:09:01.000,0:09:02.000 a second. 0:09:02.000,0:09:05.000 And so now that that's all been sorted, 0:09:05.000,0:09:09.000 we're revisiting the side that's advancing above the pivot 0:09:09.000,0:09:13.000 to get these five guys, six guys in the order, taking the same 0:09:13.000,0:09:18.000 strategy. You pivot around 67 [inaudible] doing some swapping 0:09:18.000,0:09:28.000 [inaudible] and 0:09:28.000,0:09:30.000 0:09:30.000,0:09:32.000 we'll see it in slightly bigger case to kind of get the idea, 0:09:32.000,0:09:35.000 but very quickly there's something very interesting about the way quick sort is working 0:09:35.000,0:09:38.000 is that that partition step 0:09:38.000,0:09:41.000 very quickly gets things kind of close to the right place. And 0:09:41.000,0:09:42.000 so now that we've 0:09:42.000,0:09:45.000 done both the left and the right 0:09:45.000,0:09:47.000 we're done. The whole thing is done. Everything kind of got moved into the 0:09:47.000,0:09:50.000 right position as part of the recursive, part of the sorting and doesn't need to be 0:09:50.000,0:09:52.000 further moved 0:09:52.000,0:09:54.000 to solve the whole problem. 0:09:54.000,0:09:57.000 That first step of the partition is doing a lot of the kind of throwing things kind 0:09:57.000,0:09:59.000 of left and right, 0:09:59.000,0:10:03.000 but it's actually quickly moving big things and small things that are out of 0:10:03.000,0:10:05.000 place closer to where they're gonna eventually need to go, and it turns 0:10:05.000,0:10:07.000 out to be a real advantage 0:10:07.000,0:10:11.000 in terms of the running time of this sort because it actually is doing 0:10:11.000,0:10:14.000 some quick movement close to where it needs to be and that kind of fixes it up in the recursive calls that 0:10:14.000,0:10:21.000 examine the smaller sub sections of the [inaudible]. Let 0:10:21.000,0:10:28.000 me show you what it sounds like, because that's really what you want to know. So 0:10:28.000,0:10:29.000 let's - 0:10:29.000,0:10:34.000 it's gonna kind of [inaudible]. So 0:10:34.000,0:10:37.000 you can see the big partitioning happening and 0:10:37.000,0:10:40.000 then it kind of jumbling things up and then coming back to revisit them, but you 0:10:40.000,0:10:44.000 notice that quite quickly 0:10:44.000,0:10:47.000 kind of all the small ones kind of got thrown to the left all. All the large elements to the right 0:10:47.000,0:10:51.000 and then the kind of process of coming back and so the big partition that you can see kind 0:10:51.000,0:10:53.000 of working across them and then kind of noodling down. 0:10:53.000,0:11:21.000 If I turn the sound on for it and I'll probably take it down a little bit. 0:11:21.000,0:11:25.000 So the big partition steps making a lot of noise, right and moving things kind of 0:11:25.000,0:11:28.000 quickly and it almost appears, you know, to hear this kind of random sort of it's hard to 0:11:28.000,0:11:30.000 identify what's going on during the big partition, 0:11:30.000,0:11:34.000 but then as you hear it make its way down the recursive tree it's 0:11:34.000,0:11:36.000 focusing on these smaller and smaller regions where the numbers have all been 0:11:36.000,0:11:39.000 gathered because they are similar in value. And 0:11:39.000,0:11:42.000 you hear a lot of noodling in the same pitch range 0:11:42.000,0:11:45.000 region as its working on the smaller and smaller sub arrays and then they definitely go 0:11:45.000,0:11:49.000 from low to high because of the recursion chooses the left side to operate on 0:11:49.000,0:11:50.000 before the right side 0:11:50.000,0:11:51.000 that you hear 0:11:51.000,0:12:00.000 the work being done in the lower tones before it gets to the finishing up 0:12:00.000,0:12:01.000 of the high tones. Kinda cool. Let us 0:12:01.000,0:12:02.000 come back here and talk about 0:12:02.000,0:12:09.000 how to analyze this guy a little bit. 0:12:09.000,0:12:09.000 So the main part 0:12:09.000,0:12:13.000 of the algorithm is very simple and deceptively simple 0:12:13.000,0:12:15.000 because all the hard work was actually done in partition. 0:12:15.000,0:12:17.000 The partition step is linear and 0:12:17.000,0:12:18.000 if you 0:12:18.000,0:12:21.000 can kind of just go along with me conceptually, you'll see that we're moving this 0:12:21.000,0:12:24.000 left finger in; we're moving this right finger in 0:12:24.000,0:12:27.000 and we stop when they cross, so that means every element was looked at. 0:12:27.000,0:12:28.000 Either 0:12:28.000,0:12:31.000 they both matched over here and they matched over here, but eventually they let you 0:12:31.000,0:12:33.000 look at every element as you worked your way in 0:12:33.000,0:12:37.000 and did some swaps along the way. And so that process is linear and the number of 0:12:37.000,0:12:39.000 elements. There's a thousand of them kind of has to 0:12:39.000,0:12:41.000 advance maybe 400 here and 600 there, 0:12:41.000,0:12:45.000 but we'll look at all the elements and the process of partitioning them to the 0:12:45.000,0:12:48.000 lower or upper halves. 0:12:48.000,0:12:50.000 Then it makes two calls, quick sort 0:12:50.000,0:12:54.000 to the left and the right portions of that. 0:12:54.000,0:12:56.000 Well, we don't know exactly what that division was, 0:12:56.000,0:13:00.000 was it even, was it a little lop sided and that's certainly going to come back to be 0:13:00.000,0:13:01.000 something we're gonna look at. 0:13:01.000,0:13:05.000 If we assume kind of this is the ideal 50/50 split 0:13:05.000,0:13:07.000 sort of in an ideal world 0:13:07.000,0:13:09.000 if that choice we had for the pivot 0:13:09.000,0:13:13.000 happened to be the median or close enough to the median that effectively we got an even 0:13:13.000,0:13:15.000 division, at every level of recursion, 0:13:15.000,0:13:19.000 then the recurrence relation for the whole process is the time required to 0:13:19.000,0:13:19.000 sort, an 0:13:19.000,0:13:20.000 input of N 0:13:20.000,0:13:22.000 is in to partition it 0:13:22.000,0:13:24.000 and then 2, 0:13:24.000,0:13:27.000 sorting two halves that are each 0:13:27.000,0:13:29.000 half again as big as the original input. 0:13:29.000,0:13:33.000 We saw that recurrence relationship for merge sort; we solved it down. 0:13:33.000,0:13:37.000 Think in terms of the tree, right you have the branching of three, branching of two, branching of two 0:13:37.000,0:13:41.000 and at each level the partitioning being done across each level and then it 0:13:41.000,0:13:47.000 going to a depth of login, right, the number of times you can [inaudible] by two [inaudible] 0:13:47.000,0:13:49.000 single case arrays that are easily handled. 0:13:49.000,0:13:53.000 So it is an N login sort in this 0:13:53.000,0:13:54.000 perfect best 0:13:54.000,0:13:55.000 case. 0:13:55.000,0:13:56.000 So that should mean that 0:13:56.000,0:13:59.000 in terms of Big O growth curves, right, 0:13:59.000,0:14:02.000 merge sort and quick sort should look about the same. 0:14:02.000,0:14:05.000 It does have sort of better factors, though. 0:14:05.000,0:14:06.000 Let me show you. 0:14:06.000,0:14:08.000 I go back here and I just run them against each other. 0:14:08.000,0:14:15.000 If I put quick sort versus merge sort here, stop 0:14:15.000,0:14:17.000 making noise, 0:14:17.000,0:14:19.000 0:14:19.000,0:14:21.000 the quick sort 0:14:21.000,0:14:22.000 pretty handily 0:14:22.000,0:14:27.000 managed to beat merge sort in this case, merge sort was on the top, quick sort was underneath 0:14:27.000,0:14:29.000 and certainly in terms of what was going on it was doing, 0:14:29.000,0:14:32.000 looks like more comparisons right, to get everything there, that partition 0:14:32.000,0:14:33.000 step did a lot of comparison, 0:14:33.000,0:14:35.000 but fewer moves, 0:14:35.000,0:14:38.000 and that kind of actually does sort of make intuitive sense. You think that quick 0:14:38.000,0:14:41.000 sort very quickly moves stuff to where it goes and does a little bit of noodling. 0:14:41.000,0:14:44.000 Merge sort does a lot of moving things away and moving them back and kind of like 0:14:44.000,0:14:46.000 as you see it growing, 0:14:46.000,0:14:47.000 you'll see a lot of 0:14:47.000,0:14:50.000 large elements that get 0:14:50.000,0:14:53.000 placed up in the front, but then have to be moved again right, because that was 0:14:53.000,0:14:57.000 not their final resting place. And so merge sort does a lot of kind of movement away and back 0:14:57.000,0:15:00.000 that tends to cost it overall, right. You know, a 0:15:00.000,0:15:05.000 higher constant factor of the moves than something like quick sort 0:15:05.000,0:15:08.000 does where it more quickly gets to the right place and then doesn't move it 0:15:08.000,0:15:13.000 again. So 0:15:13.000,0:15:16.000 that was all well and good. That was assuming that everything went perfectly. 0:15:16.000,0:15:20.000 That was given our simplistic choice of the pivot, 0:15:20.000,0:15:24.000 you can imagine that's really assuming a lot right that went our way. 0:15:24.000,0:15:29.000 If we look at a particularly bad split, 0:15:29.000,0:15:32.000 let's imagine that the number we have to pick is pretty close to, 0:15:32.000,0:15:34.000 you know, one of the extremes. If I were 0:15:34.000,0:15:35.000 sorting papers by alphabet 0:15:35.000,0:15:38.000 and if the one on top happened to be a C, 0:15:38.000,0:15:40.000 right, well then I'm gonna be dividing it pretty lop sided, right, it's just gonna 0:15:40.000,0:15:44.000 be the As and the Bs on one side and then everything [inaudible] to the end on the other 0:15:44.000,0:15:45.000 side. 0:15:45.000,0:15:49.000 If I got a split that was about 90/10, ninety percent went on one side, ten 0:15:49.000,0:15:50.000 percent on the other, 0:15:50.000,0:15:52.000 you would expect right 0:15:52.000,0:15:56.000 for it to kind of change the running time of what's going on here, so I get 0:15:56.000,0:15:59.000 one tenth of them over here and the low half and maybe be nine-tenths in the right half. 0:15:59.000,0:16:03.000 Let's just say every time it always takes nine-tenths and moves it to the upper half, so I 0:16:03.000,0:16:05.000 kept picking something that's artificially close to the front. 0:16:05.000,0:16:07.000 These like will kind of peter out 0:16:07.000,0:16:11.000 fairly quickly diving N by 10 and 10 and 10 and 10, eventually these are gonna peter 0:16:11.000,0:16:14.000 out very soon, but this one arm over there 0:16:14.000,0:16:18.000 where I keep taking 90 percent of what remains. I had 90 percent, but I had 0:16:18.000,0:16:19.000 81 percent. 0:16:19.000,0:16:23.000 I keep multiplying by nine-tenths and I'm still kind of retaining a lot of 0:16:23.000,0:16:26.000 elements on this one big portion, 0:16:26.000,0:16:29.000 that the depth of this tree isn't gonna bottom out quite as quickly as it 0:16:29.000,0:16:33.000 used to, but it used to the number of times you could divide N by 2, the log based 0:16:33.000,0:16:35.000 2 of N was where we landed. 0:16:35.000,0:16:37.000 In this case, I have to 0:16:37.000,0:16:40.000 take N and multiply it by nine-tenths so 0:16:40.000,0:16:41.000 instead of one half, 0:16:41.000,0:16:43.000 right, to the K, it's nine tenths to the K 0:16:43.000,0:16:47.000 and it's like how many iterations how deep will this get before it gets 0:16:47.000,0:16:49.000 to the simple case 0:16:49.000,0:16:51.000 and so solving for 0:16:51.000,0:16:56.000 N equals ten-ninths to the K is taking the log-based ten-ninths of both sides, 0:16:56.000,0:17:02.000 then K, the number of levels is the log-based ten-ninths of them. We're kind of 0:17:02.000,0:17:05.000 relying on a fact of mathematics here though is that all algorithms are the 0:17:05.000,0:17:09.000 same value, so log-based A of N and log-based B of 0:17:09.000,0:17:12.000 N, they only differ by some constant that you can 0:17:12.000,0:17:15.000 compute if you just rearrange the terms around. So in effect 0:17:15.000,0:17:16.000 this means that 0:17:16.000,0:17:20.000 there is a constant in front of this. It's like a constant 0:17:20.000,0:17:23.000 difference, the ratio between ten-ninths and 2 0:17:23.000,0:17:27.000 that distinguishes these, but it still can be logged based 2 of N in 0:17:27.000,0:17:30.000 terms of Big O, right, where we can throw away those constants. 0:17:30.000,0:17:34.000 So even those this will perform, obviously in, you know, 0:17:34.000,0:17:38.000 experimentally you would expect that getting this kind of split would cause it 0:17:38.000,0:17:39.000 to 0:17:39.000,0:17:40.000 work more slowly than it would have 0:17:40.000,0:17:44.000 in a perfect split situation. The Big O [inaudible] is still gonna look in-log-in arrhythmic and 0:17:44.000,0:17:47.000 [inaudible]. So 0:17:47.000,0:17:50.000 that was pretty good to know. So it makes you feel a little bit confident, like sometimes it might be 0:17:50.000,0:17:54.000 getting an even split and sometimes it might be getting one-third, two-thirds, sometimes it might be getting 0:17:54.000,0:17:56.000 you know, nine-tenths, but if it was kind of 0:17:56.000,0:18:01.000 in the mix still dividing off some fraction is still doing okay. 0:18:01.000,0:18:07.000 Now, let's look at the really, really, really worst case split. The 0:18:07.000,0:18:10.000 really, really worst case split right, it didn't even take off 0:18:10.000,0:18:13.000 a fraction. It just managed to separate one, 0:18:13.000,0:18:15.000 but somehow the pivot here 0:18:15.000,0:18:16.000 0:18:16.000,0:18:17.000 did a really 0:18:17.000,0:18:20.000 terribly crummy job, right, of dividing it at all 0:18:20.000,0:18:23.000 that in effect, all the elements were greater than the pivot or less than 0:18:23.000,0:18:27.000 the pivot since they all landed on one side and the pivot was kind of by itself. 0:18:27.000,0:18:29.000 So starting with an input of size N, 0:18:29.000,0:18:33.000 we sectioned off 1 and we had N minus 1 remaining. Well, the next time, 0:18:33.000,0:18:36.000 we sectioned off another one, so this happened again and again and again. 0:18:36.000,0:18:39.000 We just got really bad luck all the way through. 0:18:39.000,0:18:42.000 Each of these levels is only making progress for the base case at a snails 0:18:42.000,0:18:43.000 pace. 0:18:43.000,0:18:47.000 Right, one element is being subtracted each subsequent level 0:18:47.000,0:18:50.000 and that's gonna go to a depth of N. 0:18:50.000,0:18:53.000 And so now if we're doing N work across the levels, it isn't quite N 0:18:53.000,0:18:56.000 work because in fact, some of these bottom out. It's still the Gaussian sum in the 0:18:56.000,0:18:57.000 end 0:18:57.000,0:19:01.000 is that we are got N levels doing N work each. 0:19:01.000,0:19:03.000 We suddenly have 0:19:03.000,0:19:05.000 what was a linear arrhythmic algorithm 0:19:05.000,0:19:07.000 now performing quadritically. So 0:19:07.000,0:19:09.000 in the class the selection sort, 0:19:09.000,0:19:11.000 inscription sort 0:19:11.000,0:19:14.000 in its worst-case behavior. 0:19:14.000,0:19:17.000 So quite a range whereas merge sort is a very reliable performer, merge doesn't 0:19:17.000,0:19:18.000 care. 0:19:18.000,0:19:21.000 No matter what the order is, no matter what's going on, it's the same 0:19:21.000,0:19:22.000 amount of work, 0:19:22.000,0:19:26.000 it means that there's some inputs that can cause quick sort to vary between being linear 0:19:26.000,0:19:27.000 arrhythmic or being quadratic, 0:19:27.000,0:19:30.000 and there's a huge difference as we saw in our earlier charts about 0:19:30.000,0:19:32.000 how those things will run for 0:19:32.000,0:19:36.000 large values. So what 0:19:36.000,0:19:38.000 makes the worst case 0:19:38.000,0:19:42.000 - given how we're choosing a pivot right now is to take the first thing in the 0:19:42.000,0:19:44.000 sub section to look at as the pivot, 0:19:44.000,0:19:47.000 what's the example input that gives you this? Sorted. If 0:19:47.000,0:19:49.000 it's already sorted. 0:19:49.000,0:19:50.000 Isn't that a charmer? 0:19:50.000,0:19:54.000 Here's a sorting algorithm. If you ask it to do something 0:19:54.000,0:19:56.000 and in fact if you accidentally [inaudible] twice, you already had sorted 0:19:56.000,0:19:59.000 the data and you said, ''Oh, you did something,'' and you passed it back to the sort, 0:19:59.000,0:20:03.000 it would suddenly actually degenerate into its very worst case. It's already 0:20:03.000,0:20:05.000 sorted, so it would say, 0:20:05.000,0:20:06.000 ''I've got this stack of exam papers.'' 0:20:06.000,0:20:09.000 I look at the first one and I say, ''Oh, look it's Adam's.'' And I say, ''Okay, well, let me 0:20:09.000,0:20:12.000 go find all the ones that belong in the left side.'' None of them do. I said, ''Okay, 0:20:12.000,0:20:15.000 well put Adam's over here.'' I'll [inaudible] sort that. That was easy. 0:20:15.000,0:20:18.000 Oh, let me look at what I've got left, oh with baker on the top. 0:20:18.000,0:20:21.000 Okay, well let me put Baker by itself, but could we find the other ones? Oh, no, 0:20:21.000,0:20:24.000 no more. It would just you 0:20:24.000,0:20:27.000 know, continue because I'm doing this thing, looking at all N, looking at all N minus 1, looking 0:20:27.000,0:20:31.000 at N minus 2, all the way down to the bottom making no, 0:20:31.000,0:20:35.000 recognizing nothing about how beautifully the data already was arranged 0:20:35.000,0:20:36.000 and 0:20:36.000,0:20:38.000 you know, getting its worst case. 0:20:38.000,0:20:42.000 There's actually a couple of others that come in, too. That's probably the most obvious one to think 0:20:42.000,0:20:46.000 of is it's coming in in increasing order. It's also true if its in decreasing order. 0:20:46.000,0:20:49.000 If I happen to have the largest value on the top, then 0:20:49.000,0:20:52.000 I'll end up splitting it all into, everything to the left side and nothing 0:20:52.000,0:20:53.000 to the right. 0:20:53.000,0:20:58.000 There are actually other variations that will also produce this. If at any given 0:20:58.000,0:21:01.000 stage that we're looking at a sub array, if the first value happens to be the extreme 0:21:01.000,0:21:04.000 of what remains whether it's the small to the large, and it could alternate, which would 0:21:04.000,0:21:06.000 be kind of a funny pattern, but if you 0:21:06.000,0:21:10.000 have the tallest and the shortest and the tallest and then another tallest and then a shortest, 0:21:10.000,0:21:13.000 so those would look a little bit harder to describe, but 0:21:13.000,0:21:18.000 there are some other alternatives, that product this bad result. 0:21:18.000,0:21:21.000 All of these we could call degenerate cases. 0:21:21.000,0:21:25.000 There's a small number of these relative to the N factorial [inaudible] your 0:21:25.000,0:21:28.000 data could come in. In fact, there was a huge number, and so there's lots and 0:21:28.000,0:21:29.000 lots of ways it could be arranged. 0:21:29.000,0:21:32.000 There's a very small number of them that would be arranged in such a way to 0:21:32.000,0:21:35.000 trigger this very, very worst-case behaviors. 0:21:35.000,0:21:38.000 Some are close to worst case, like if they were almost sorted, they might every now and 0:21:38.000,0:21:41.000 then get a good split, but mostly a bad split, 0:21:41.000,0:21:44.000 but the number that actually degenerate to the absolute worst case is actually 0:21:44.000,0:21:47.000 quite small, but 0:21:47.000,0:21:49.000 the truth is 0:21:49.000,0:21:51.000 is do we expect that those outputs are somehow 0:21:51.000,0:21:56.000 hard to imagine us getting. Are they so unusual 0:21:56.000,0:21:59.000 and weird that you might be willing to say it's okay that my algorithm has 0:21:59.000,0:22:01.000 this one or a couple of bad cases in it 0:22:01.000,0:22:03.000 because it never comes up. 0:22:03.000,0:22:06.000 In this case, the only thing that your data came in sorted or almost sorted or reverse sorted 0:22:06.000,0:22:09.000 is probably not unlikely. 0:22:09.000,0:22:11.000 It might be that you know, you happen to be reading from 0:22:11.000,0:22:15.000 a file on disc and somebody has taken the [inaudible] and sort it before they gave the 0:22:15.000,0:22:19.000 data to you. If all the sudden that caused your program to behave really badly, 0:22:19.000,0:22:20.000 that would really be 0:22:20.000,0:22:21.000 a 0:22:21.000,0:22:23.000 pretty questionable outcome. 0:22:23.000,0:22:27.000 So let's go aback and look at just to see, though. It's kind of interesting to see 0:22:27.000,0:22:30.000 it happening in 0:22:30.000,0:22:33.000 - this is against quick sort versus merge sort. 0:22:33.000,0:22:38.000 If I change the data to be partially sorted 0:22:38.000,0:22:41.000 and let it run again, 0:22:41.000,0:22:43.000 if I still manage to beat it 0:22:43.000,0:22:44.000 doing a little bit more, 0:22:44.000,0:22:47.000 if I change it to totally sorted 0:22:47.000,0:22:53.000 or reverse sorted, for that matter, and 0:22:53.000,0:22:54.000 so they look like they're doing a lot of work, 0:22:54.000,0:22:56.000 a lot of work about nothing. 0:22:56.000,0:22:57.000 And you can see that quick sort 0:22:57.000,0:22:59.000 really is still way back there. 0:22:59.000,0:23:02.000 It has looked at the first 10 percent or so and is doing a lot 0:23:02.000,0:23:04.000 of frantic partitioning 0:23:04.000,0:23:07.000 that never moves anything. 0:23:07.000,0:23:11.000 And visibly kind of traipsing it's way up there. Merge sort meanwhile is actually 0:23:11.000,0:23:13.000 on the beach in Jamaica with a 0:23:13.000,0:23:15.000 daiquiri 0:23:15.000,0:23:16.000 mocking quick sort 0:23:16.000,0:23:21.000 for all those times that it had said it was so much better than it was. 0:23:21.000,0:23:24.000 ''Duh, it was already sorted; you Doofus.'' 0:23:24.000,0:23:25.000 Apparently it wasn't listening. Merge 0:23:25.000,0:23:28.000 sort almost looked like it did nothing and that's because it took the left half, 0:23:28.000,0:23:32.000 which is the smallest half, moved it off, right, kind of copied it back and it ends up copying 0:23:32.000,0:23:38.000 each element back to the same location it was already originally present in. There you go, quick sort 0:23:38.000,0:23:39.000 taking its time and it if I 0:23:39.000,0:23:42.000 ran it against, let's say, 0:23:42.000,0:23:46.000 insertion sort and selection sort, why not 0:23:46.000,0:23:50.000 [inaudible] nothing better to do than to sort for you all day long. 0:23:50.000,0:23:54.000 So there insertion sort actually finishing very early because it does nothing. It sort of 0:23:54.000,0:23:57.000 looked at them all and said they were already in the right order, 0:23:57.000,0:24:00.000 merge sort doing a little bit more work to move everything back and forth in the same place. 0:24:00.000,0:24:02.000 But 0:24:02.000,0:24:02.000 0:24:02.000,0:24:04.000 in this case, selection sort 0:24:04.000,0:24:06.000 and 0:24:06.000,0:24:08.000 quick sort here at the bottom and selection sort here at the top 0:24:08.000,0:24:11.000 doing really roughly about the same work if you look at the sum total of the comparisons 0:24:11.000,0:24:13.000 and moves 0:24:13.000,0:24:16.000 and then time spent on those suddenly shows that yeah, in this input 0:24:16.000,0:24:19.000 situation quick sort is performing like an N-squared sort, 0:24:19.000,0:24:23.000 and obviously very similar constant factors to those present in selection sort, 0:24:23.000,0:24:24.000 so really 0:24:24.000,0:24:28.000 behaving totally quadratically in that worst-case input. If I make 0:24:28.000,0:24:38.000 it reverse sorted, it's a little more interesting because you can kind of see something going on. Everybody kind of doing their thing. 0:24:38.000,0:24:40.000 Insertion sort now kind of hitting its worst case, 0:24:40.000,0:24:43.000 so it actually coming in dead last because it 0:24:43.000,0:24:45.000 did so much extra moving, but 0:24:45.000,0:24:49.000 still showing the kind of quadratic terms, right, that selection 0:24:49.000,0:24:53.000 sort and quick sort are both having, but higher constant factors there, so taking almost 0:24:53.000,0:24:54.000 twice as long because 0:24:54.000,0:24:55.000 actually doing a little 0:24:55.000,0:25:02.000 bit extra work because of that inverted 0:25:02.000,0:25:04.000 situation. [Inaudible] something big, just cause. We'll put it back 0:25:04.000,0:25:10.000 into random, back into full speed and kind of watch some things happen. 0:25:10.000,0:25:11.000 So quick sort right, just 0:25:11.000,0:25:14.000 sort of done in a flash, merge sort finishing behind it 0:25:14.000,0:25:17.000 and ordinarily 0:25:17.000,0:25:20.000 the quadratic sorts taking quite a much 0:25:20.000,0:25:22.000 longer time than our two recursive sorts. But 0:25:22.000,0:25:30.000 it's the random input really buying quick sort its speed. Is 0:25:30.000,0:25:33.000 it gonna win - oh, come on. Oh, come on. Don't you want selection sort to come behind? I always wanted it to 0:25:33.000,0:25:37.000 come from behind. It's kind of like rooting for the underdog. 0:25:37.000,0:25:38.000 Yes. 0:25:38.000,0:25:42.000 And yet, just remember don't mock insertion sort. What it sorted is 0:25:42.000,0:25:46.000 the only one that recognizes it and does something clever about it. [Inaudible] 0:25:46.000,0:25:49.000 sort is anti-clever about it, so 0:25:49.000,0:25:55.000 it still can hold its head up and be proud. This 0:25:55.000,0:25:58.000 kind of [inaudible]. I was 0:25:58.000,0:26:01.000 thinking about like what algorithms, right, there's a reason why 0:26:01.000,0:26:04.000 there's four that we talked about and there's actually a dozen 0:26:04.000,0:26:07.000 more. It's like different situations actually produce different 0:26:07.000,0:26:09.000 outcomes for different 0:26:09.000,0:26:12.000 sorts and there are reasons that even though they might not be the best sort for all 0:26:12.000,0:26:15.000 purposes, there are some situations where it might be the right one, so if you had it, 0:26:15.000,0:26:18.000 as data said that you expect it to be mostly sorted, 0:26:18.000,0:26:19.000 but had a few [inaudible] in it. 0:26:19.000,0:26:22.000 Like if you had a pile or sand papers that were sorted and you dropped them on the floor and they 0:26:22.000,0:26:25.000 got scuffed up a little bit, but they're still largely sorted, insertion sort is the 0:26:25.000,0:26:26.000 way to 0:26:26.000,0:26:29.000 go. It will take advantage of the work that was already done 0:26:29.000,0:26:33.000 and not redo it or create kind of havoc the way quick sort 0:26:33.000,0:26:36.000 would. But quick sort, the last thing we need to tell you, though, about it is we can't 0:26:36.000,0:26:40.000 tolerate this. Like the way quick sort is in its classic form 0:26:40.000,0:26:41.000 with this 0:26:41.000,0:26:42.000 first element as pivot 0:26:42.000,0:26:45.000 would be an unacceptable way to do this algorithm. 0:26:45.000,0:26:49.000 So quick sort actually typically is one of the most standard element that's offered in 0:26:49.000,0:26:52.000 a library of programming languages, the sort element. 0:26:52.000,0:26:53.000 Well, it 0:26:53.000,0:26:57.000 has to have some strategy for how to deal with this in a way that does not 0:26:57.000,0:26:59.000 degenerate 0:26:59.000,0:27:02.000 and so the idea is, what you need to do is just pick a different 0:27:02.000,0:27:03.000 choice for the pivot, 0:27:03.000,0:27:06.000 a little bit more clever, spend a little bit more time, 0:27:06.000,0:27:09.000 do something that's a little less predictable than just picking that first 0:27:09.000,0:27:11.000 most element. 0:27:11.000,0:27:12.000 So to take 0:27:12.000,0:27:16.000 it to the far extreme, one thing you could do is just computer the median, analyze the data and compute the median. 0:27:16.000,0:27:19.000 It turns out there is a linear time algorithm for this that 0:27:19.000,0:27:22.000 would look at every element of the data set once and then be able to tell 0:27:22.000,0:27:23.000 you what the median was 0:27:23.000,0:27:25.000 and that would guarantee you that 50/50 split. 0:27:25.000,0:27:30.000 So if you go find it and use it at every stage, you will guarantee it. 0:27:30.000,0:27:34.000 Most algorithms don't tend to do that. That's actually kind of overkill for the problem. We 0:27:34.000,0:27:34.000 want to get it to where 0:27:34.000,0:27:38.000 it's pretty much guaranteed to never get the worst case. But we're not 0:27:38.000,0:27:41.000 concerned with it getting 50/50. It's got 60/40 or something close enough, that 0:27:41.000,0:27:45.000 actually, you know, 60/40, 70/30 and it was 0:27:45.000,0:27:47.000 bopping around in that range it'd be fine, 0:27:47.000,0:27:50.000 so the other two that are much more commonly used 0:27:50.000,0:27:53.000 is some kind of approximation of the median 0:27:53.000,0:27:56.000 with a little bit of guessing or something in it. For example, median of three 0:27:56.000,0:27:57.000 takes 0:27:57.000,0:28:01.000 three elements and typically it takes three from some specified position, it takes 0:28:01.000,0:28:02.000 the middle most element, 0:28:02.000,0:28:04.000 the last element and the front element and it says, 0:28:04.000,0:28:06.000 ''Okay, given those three, 0:28:06.000,0:28:10.000 we arrange them to find out what's the middle of those three, and use that.'' 0:28:10.000,0:28:12.000 If the data was already sorted, it turns out you've got the median, right 0:28:12.000,0:28:14.000 because it wasn't the middle most. 0:28:14.000,0:28:17.000 If data was just in random position, then you've got one that you know, at least 0:28:17.000,0:28:22.000 there's one element on one side, right, and so the odds that that every single time 0:28:22.000,0:28:25.000 would produce a very bad split is pretty low. 0:28:25.000,0:28:28.000 There are some inputs that could kind of get it to generate a little bit, but it's 0:28:28.000,0:28:32.000 pretty foolproof in most ordinary situations. 0:28:32.000,0:28:36.000 Even more unpredictable would be just choose a random element. 0:28:36.000,0:28:39.000 So look at the start to stop index you have, pick an random there, 0:28:39.000,0:28:44.000 flop in to the front and now use it as the pivot element. 0:28:44.000,0:28:45.000 If you 0:28:45.000,0:28:47.000 don't know ahead of time how your random number [inaudible] it's impossible 0:28:47.000,0:28:51.000 to generate an input and force it into the worst-case behavior. And so the idea is that 0:28:51.000,0:28:53.000 you're kind of counting on randomness 0:28:53.000,0:28:56.000 and just the probabilistic outcome if 0:28:56.000,0:29:00.000 it managing to be such that the way it shows the random element was to pick the 0:29:00.000,0:29:02.000 extreme and everything, it is just impossible, 0:29:02.000,0:29:04.000 the odds are astronomically against it, 0:29:04.000,0:29:06.000 and so it will, 0:29:06.000,0:29:08.000 a very simple fix, right 0:29:08.000,0:29:09.000 0:29:09.000,0:29:12.000 that still leaves the possibility of the worst case in there, 0:29:12.000,0:29:17.000 but you know, much, much, much far removed probability sense. So, 0:29:17.000,0:29:20.000 just simple things, right, but from there the algorithm operates the 0:29:20.000,0:29:22.000 same way it always has which is to say, 0:29:22.000,0:29:24.000 ''Pick the median, however, you want to do it, 0:29:24.000,0:29:27.000 move it into that front slot and then carry on as before in terms of the left 0:29:27.000,0:29:30.000 and the right hand and moving down and swapping and recursing and all that [inaudible].'' 0:29:30.000,0:29:34.000 Any 0:29:34.000,0:29:40.000 questions; anything about sorting, sorting algorithms? Why don't we 0:29:40.000,0:29:42.000 do some coding actually, 0:29:42.000,0:29:44.000 which is the next thing I want to show you 0:29:44.000,0:29:46.000 because once you know how to write sorts, 0:29:46.000,0:29:47.000 0:29:47.000,0:29:50.000 sort of the other piece I think you need to have to go with it is how would I write 0:29:50.000,0:29:54.000 a sort to be generally useful in a large variety of situations that 0:29:54.000,0:29:57.000 knowing about these sorts I might decide to build 0:29:57.000,0:30:01.000 myself a really general purpose, good, high performance, quick sort, but I 0:30:01.000,0:30:03.000 could use again, and again and again. 0:30:03.000,0:30:07.000 I want to make it work generically so I could sort strings, I could sort 0:30:07.000,0:30:10.000 numbers, or I could sort students, or I could sort 0:30:10.000,0:30:11.000 vectors of students or whatever, 0:30:11.000,0:30:14.000 and I'd want to make sure that no matter what I needed to do it was gonna 0:30:14.000,0:30:17.000 solve all my problems, this one kid of sorting tool, 0:30:17.000,0:30:20.000 and we need to know a little bit of C++ to make that work 0:30:20.000,0:30:22.000 by use of the function template. 0:30:22.000,0:30:27.000 So if you want to ask about sorting algorithms, now is the time. [Inaudible]. We don't 0:30:27.000,0:30:28.000 bubble sort, 0:30:28.000,0:30:31.000 which is kind of interesting. It will come up actually in the assignment 0:30:31.000,0:30:34.000 I give you next week because it is a little bit of an historical artifact 0:30:34.000,0:30:38.000 to know about it, but it turns out bubble sort is one of those sorts that 0:30:38.000,0:30:43.000 is dominated by pretty much every other sort you'll know about in every way, as there's 0:30:43.000,0:30:45.000 really nothing to recommend it. As I said, each of these four have something 0:30:45.000,0:30:49.000 about them that actually has a strength or whatever. Bubble sort really doesn't. 0:30:49.000,0:30:52.000 It's a little bit harder to write than insertion sort. It's a little bit slower than insertion 0:30:52.000,0:30:56.000 sort; it's a little bit easier to get it wrong than insertion sort. It has higher constant 0:30:56.000,0:30:57.000 factors. You know, 0:30:57.000,0:31:01.000 it does recognize the data and sort order, but so does insertion sort, so it's hard to come up with 0:31:01.000,0:31:04.000 a reason to actually spend a lot of time on it, but I will expose you to it in the 0:31:04.000,0:31:07.000 assignment because I think it's a little bit of a - 0:31:07.000,0:31:08.000 it's part of 0:31:08.000,0:31:10.000 our history right, as a computer science 0:31:10.000,0:31:14.000 student to be expose to. 0:31:14.000,0:31:16.000 Anything else? I'm gonna show 0:31:16.000,0:31:19.000 you some things about function templates. Oh, these 0:31:19.000,0:31:21.000 are some numbers; I forgot to give that. 0:31:21.000,0:31:23.000 Just to say, in the end, right, 0:31:23.000,0:31:26.000 which we saw a little bit in the analysis that the constant factors on quick 0:31:26.000,0:31:29.000 sort are 0:31:29.000,0:31:33.000 noticeable when compared to merge sort by a factor of 4, 0:31:33.000,0:31:35.000 moving things more quickly and not having to 0:31:35.000,0:31:39.000 mess with them again. Quick 0:31:39.000,0:31:41.000 sort [inaudible] 0:31:41.000,0:31:44.000 in totally random order? 0:31:44.000,0:31:48.000 Yes, they are. Yes - So it doesn't have them taking - So this is actually using a classic quick 0:31:48.000,0:31:52.000 sort with no degenerate protection on random data. 0:31:52.000,0:31:55.000 If I had put it in, sorted it in on that case, you would definitely see 0:31:55.000,0:31:58.000 like numbers comparable to like selection sorts, eight hours in that 0:31:58.000,0:32:00.000 last slot, right. Or 0:32:00.000,0:32:01.000 you 0:32:01.000,0:32:03.000 [inaudible] degenerate protection, and it would probably 0:32:03.000,0:32:07.000 have an in the noise a small slow down for that little extra work that's being 0:32:07.000,0:32:11.000 done, but then you'll be getting in log and performance reliable across 0:32:11.000,0:32:13.000 all states of inputs. So 0:32:13.000,0:32:15.000 [inaudible] protection as it 0:32:15.000,0:32:19.000 starts to even out time wise with merge sort, does it still - No, it doesn't. 0:32:19.000,0:32:23.000 It actually is an almost imperceptible change in the time because it depends on 0:32:23.000,0:32:26.000 which form of it you use because if you use, for example the random or median of three, 0:32:26.000,0:32:31.000 the amount of work you're doing is about two more things per level in the tree and 0:32:31.000,0:32:34.000 so it turns out that, yeah, 0:32:34.000,0:32:37.000 over the space of login it's just not enough to even be noticeable. 0:32:37.000,0:32:41.000 Now if you did the full median algorithm, you could actually start to see it because then 0:32:41.000,0:32:45.000 you'd see both a linear partition step and a linear median step and that 0:32:45.000,0:32:47.000 would actually raise the cause and factors where it would probably be closer in line to 0:32:47.000,0:32:49.000 merge sort. Pretty 0:32:49.000,0:32:58.000 much nobody writes it that way is the truth, but you could kind of in theory is why we [inaudible]. What I'm 0:32:58.000,0:32:59.000 going to show you, kind 0:32:59.000,0:33:02.000 of motivate this by starting with something simple, and then we're gonna move towards kind of building 0:33:02.000,0:33:04.000 the 0:33:04.000,0:33:05.000 fully generic, 0:33:05.000,0:33:06.000 0:33:06.000,0:33:08.000 you know, one 0:33:08.000,0:33:10.000 algorithm does it all, 0:33:10.000,0:33:12.000 sorting function. 0:33:12.000,0:33:16.000 So what I'm gonna first look at is flop because it's a little bit easier to see it in this case, 0:33:16.000,0:33:19.000 is that you know, sometimes you need to take two variables and exchange their values. I mean 0:33:19.000,0:33:23.000 you need to go through a temporary to copy one and then copy the other back and what 0:33:23.000,0:33:26.000 not - swapping characters, swapping ends, swapping strings, swapping students, you 0:33:26.000,0:33:28.000 know, any kind of variables you wanted to swap, 0:33:28.000,0:33:29.000 you could write 0:33:29.000,0:33:31.000 a specific swap function 0:33:31.000,0:33:35.000 that takes two variables by reference of the integer, strain or character type that 0:33:35.000,0:33:36.000 you're trying to exchange 0:33:36.000,0:33:40.000 and then copies one to a temporary and exchanges the two values. 0:33:40.000,0:33:43.000 Because of the way C++ functions work, right, 0:33:43.000,0:33:46.000 you really do need to say, ''It's a string, this is a string, you know, what's being 0:33:46.000,0:33:49.000 declared here is a string,'' and so you might say, ''Well, if I need more than one swap, sometimes that swaps strings and characters, you know, my choices 0:33:49.000,0:33:52.000 are 0:33:52.000,0:33:53.000 basically 0:33:53.000,0:33:58.000 to duplicate and copy and paste and so on. 0:33:58.000,0:34:02.000 That's not good; we'd rather not do that. 0:34:02.000,0:34:04.000 But what I want to do is I want to 0:34:04.000,0:34:08.000 discuss what it takes to write something in template form. 0:34:08.000,0:34:12.000 We have been using templates right, the vector is a template, the set is a 0:34:12.000,0:34:15.000 template, the stack and queue and all these things are templates 0:34:15.000,0:34:17.000 that are written 0:34:17.000,0:34:20.000 in a very generic way using a place 0:34:20.000,0:34:24.000 holder, so the code that holds onto your collection of things is written 0:34:24.000,0:34:27.000 saying, ''I don't know what the thing is; is it a string; is it an N; is it a car?'' Well, 0:34:27.000,0:34:31.000 vectors just hold onto things and you as a client can decide what you're gonna 0:34:31.000,0:34:33.000 be storing in this particular kind of vector. 0:34:33.000,0:34:37.000 And so what we're gonna see is if I take those flat functions and I try to 0:34:37.000,0:34:41.000 distill them down and say, ''Well, what is it that any swap routine looks like?'' 0:34:41.000,0:34:43.000 It needs to take two variables by reference, it needs 0:34:43.000,0:34:46.000 to declare a template of that type and it needs to do the assignment all around 0:34:46.000,0:34:47.000 0:34:47.000,0:34:49.000 to exchange those values. 0:34:49.000,0:34:51.000 Can I write a version 0:34:51.000,0:34:55.000 that is tight unspecified leaving a placeholder in there for 0:34:55.000,0:34:59.000 what - 0:34:59.000,0:35:00.000 that's really 0:35:00.000,0:35:01.000 kind of amazing, 0:35:01.000,0:35:03.000 that what we'Q1: gonna be swapping 0:35:03.000,0:35:05.000 and then 0:35:05.000,0:35:10.000 let there be multiple swaps generated from that one pattern. 0:35:10.000,0:35:11.000 So we're gonna do this actually in 0:35:11.000,0:35:14.000 the compiler because it's always 0:35:14.000,0:35:16.000 nice to see a little bit a code happening. So I 0:35:16.000,0:35:19.000 go over here 0:35:19.000,0:35:20.000 and 0:35:20.000,0:35:24.000 I got some code up there that 0:35:24.000,0:35:26.000 I'm just currently gonna [inaudible] because I don't want to deal 0:35:26.000,0:35:30.000 with it right now. We're gonna see it in a second. It doesn't [inaudible]. Okay, 0:35:30.000,0:35:41.000 so your usual swap looks like this. 0:35:41.000,0:35:44.000 And that would only swap exactly integers. If I wanted characters I'd 0:35:44.000,0:35:49.000 change it all around. So what I'm gonna change it to is a template. I'm gonna add a template 0:35:49.000,0:35:50.000 header at the top 0:35:50.000,0:35:53.000 and so the template header starts with a key-word template and then angle brackets that 0:35:53.000,0:35:54.000 says, 0:35:54.000,0:35:59.000 ''What kind of placeholders does this template depend on?'' It depends on one type name 0:35:59.000,0:36:00.000 and then I've chose then name 0:36:00.000,0:36:02.000 capital T, type for it. 0:36:02.000,0:36:04.000 That's a choice that I'm going to make and it says 0:36:04.000,0:36:07.000 in the body of the function that's coming up 0:36:07.000,0:36:08.000 where I would have 0:36:08.000,0:36:11.000 ordinarily fully committed on the type, I'm gonna use this placeholder 0:36:11.000,0:36:13.000 capital T, type 0:36:13.000,0:36:22.000 instead. 0:36:22.000,0:36:25.000 And now I have written swap in such a way that it doesn't say for sure 0:36:25.000,0:36:29.000 what's being exchanged, two strings, two Ns, two doubles. They're all 0:36:29.000,0:36:33.000 have to be matching in this form, and I said, ''Well, there's a placeholder, and the placeholder 0:36:33.000,0:36:36.000 was gonna be filled in by the client who used this flop, 0:36:36.000,0:36:41.000 and on demand, we can instantiate as many versions of the swap function as we need.'' 0:36:41.000,0:36:45.000 If you need to swap integers, you can instantiate a swap that then fills in the 0:36:45.000,0:36:48.000 placeholder type with Ns all the way through 0:36:48.000,0:36:51.000 and then I can use that same template or pattern to generate one that will swap 0:36:51.000,0:36:53.000 characters or swap strings or swap whatever. 0:36:53.000,0:37:02.000 So if I go down to my main, maybe I'll just pull my main up [inaudible]. And 0:37:02.000,0:37:05.000 I will show you how I can do this. I can say 0:37:05.000,0:37:08.000 int 1 equals 0:37:08.000,0:37:13.000 54, 2 equals 32 and I say swap 1-2. 0:37:13.000,0:37:17.000 So if the usage of a function is a little bit different 0:37:17.000,0:37:19.000 than it was for 0:37:19.000,0:37:21.000 class templates, 0:37:21.000,0:37:25.000 in class templates we always did this thing where I said 0:37:25.000,0:37:29.000 the name of the template and then it angled back, as I said. And in particular I want 0:37:29.000,0:37:33.000 the swap function that operates where the template parameter has been filled in 0:37:33.000,0:37:35.000 with int 0:37:35.000,0:37:38.000 that in the case of functions, it turns out it's a little bit easier for the compiler 0:37:38.000,0:37:39.000 to know what you intended 0:37:39.000,0:37:42.000 that on the basis of what my arguments are to this, 0:37:42.000,0:37:45.000 is I called swap passing two integers, 0:37:45.000,0:37:49.000 it knows that there's only one possibility for what version of swap you were interested 0:37:49.000,0:37:50.000 in, 0:37:50.000,0:37:54.000 that it must be the one where type has been filled in with int and that's 0:37:54.000,0:37:56.000 the one to generate for you. 0:37:56.000,0:37:59.000 So you can use that long form of saying swap, angle bracket int, 0:37:59.000,0:38:03.000 but typically you will not; you will let the compiler infer it for you on the usage, 0:38:03.000,0:38:05.000 so based on the arguments you passed, 0:38:05.000,0:38:10.000 it will figure out how to kind of match them to what swap said it was taking and 0:38:10.000,0:38:13.000 if it can't match them, for example, if you pass one double and one int, you'll get compiler 0:38:13.000,0:38:14.000 errors. 0:38:14.000,0:38:19.000 But if I've done it correctly, then it will generate for me a swap 0:38:19.000,0:38:23.000 where type's been filled in with int and if I call it again, 0:38:23.000,0:38:26.000 passing different kinds of things, so having 0:38:26.000,0:38:29.000 string S equals hello and T 0:38:29.000,0:38:31.000 equals 0:38:31.000,0:38:33.000 world, 0:38:33.000,0:38:37.000 that I can write swap S and T and now 0:38:37.000,0:38:40.000 the compiler generated for me two different versions of swap, right, 0:38:40.000,0:38:43.000 one for integers, one for 0:38:43.000,0:38:46.000 strings on the [inaudible] and I didn't do anything with them. I put them [inaudible], so nothing to see there, 0:38:46.000,0:38:47.000 but showing 0:38:47.000,0:38:50.000 it does compile and build up. That's 0:38:50.000,0:38:53.000 a pretty neat idea. 0:38:53.000,0:38:55.000 Kind of important because it turns out there are 0:38:55.000,0:38:59.000 a lot of pieces of code that you're gonna find yourself wanting to write that don't 0:38:59.000,0:39:02.000 depend, in a case like swap, swap doesn't care what it's exchanging, that they're Ns, that 0:39:02.000,0:39:06.000 they're strings, that they're doubles. The way you swap two things 0:39:06.000,0:39:09.000 is the same no matter what they are. You copy one aside; you overwrite one; 0:39:09.000,0:39:11.000 you overwrite the other 0:39:11.000,0:39:14.000 and the same thing applies to much more algorithmically interesting problems 0:39:14.000,0:39:15.000 like 0:39:15.000,0:39:19.000 searching, like sorting, like removing duplicates, like printing, 0:39:19.000,0:39:23.000 where you want to take a vector and print all of its contents right that 0:39:23.000,0:39:24.000 the 0:39:24.000,0:39:27.000 steps you need to do to print a vector of events looks just like the steps you need to do to print a vector 0:39:27.000,0:39:31.000 of strings. And so it would be very annoying every time I need to do that to 0:39:31.000,0:39:33.000 duplicate that code that there's a real appeal toward 0:39:33.000,0:39:35.000 writing it once, as a template 0:39:35.000,0:39:37.000 and using it 0:39:37.000,0:39:41.000 in multiple situations. So 0:39:41.000,0:39:44.000 this shows that if I swap, right, 0:39:44.000,0:39:47.000 it infers what I meant 0:39:47.000,0:39:50.000 and then this thing basically just shows what happened is when I said 0:39:50.000,0:39:53.000 int equals four, 0:39:53.000,0:39:56.000 [inaudible] use that pattern to generate the swap int 0:39:56.000,0:39:59.000 where the type parameter, that placeholder has been 0:39:59.000,0:40:01.000 filled in 0:40:01.000,0:40:05.000 and established that it's int for this particular version which is distinct from 0:40:05.000,0:40:10.000 the swap for characters and swap for strings and what not. So 0:40:10.000,0:40:13.000 let me show you that version I talked about print, and this one I'm 0:40:13.000,0:40:15.000 gonna go back to the 0:40:15.000,0:40:16.000 compiler for just a second. 0:40:16.000,0:40:19.000 Let's say I want to print a vector, printing a vector it like iterating all 0:40:19.000,0:40:22.000 the members and using the 0:40:22.000,0:40:23.000 screen insertion to print them out 0:40:23.000,0:40:26.000 and so here it is taking some vector 0:40:26.000,0:40:29.000 where the elements in it are of this type name. In this case, I've just used T as 0:40:29.000,0:40:31.000 my short name here. 0:40:31.000,0:40:31.000 The iterator 0:40:31.000,0:40:35.000 looks the same; the bracketing looks the same. I see the end now and I'd 0:40:35.000,0:40:38.000 like to be able to print vectors of strings, vectors of ints, vector of 0:40:38.000,0:40:39.000 doubles 0:40:39.000,0:40:43.000 with this one piece of code. So if I call print vector and I pas code vector events, 0:40:43.000,0:40:43.000 all's good; 0:40:43.000,0:40:47.000 vector doubles, all's good; vector strings, all's good. 0:40:47.000,0:40:50.000 Here's where I could get a little bit into trouble here. 0:40:50.000,0:40:52.000 I've made this [inaudible] chord that has an X 0:40:52.000,0:40:57.000 and a Y field. I make a vector that contains these chords. 0:40:57.000,0:41:00.000 And then I try to call print vector of C. 0:41:00.000,0:41:03.000 So when I do this, right, 0:41:03.000,0:41:06.000 it will instantiate a versive print vector 0:41:06.000,0:41:11.000 where the T has been matched up to, oh it's chords that are in there; you've got a vector 0:41:11.000,0:41:12.000 of chords. 0:41:12.000,0:41:15.000 And so okay, we'll go through and iterate over the sides of the vector 0:41:15.000,0:41:18.000 and this line is trying to say see out 0:41:18.000,0:41:21.000 of a chord type. And 0:41:21.000,0:41:22.000 struts 0:41:22.000,0:41:26.000 don't automatically know how to out put themselves onto a string, 0:41:26.000,0:41:30.000 and so at this point when I try to instantiate, so the code in print vector is 0:41:30.000,0:41:33.000 actually fine and works for a large number of types, all the primitive types 0:41:33.000,0:41:35.000 will be fine, 0:41:35.000,0:41:36.000 string and things like that are fine. 0:41:36.000,0:41:40.000 But if I try to use it for a type for which it doesn't have this output 0:41:40.000,0:41:43.000 behavior, right I will get a compiler error, 0:41:43.000,0:41:46.000 and it may come as a surprise because a print vector appeared to be working fine in all the other 0:41:46.000,0:41:49.000 situations and all the sudden it seems like a new error has cropped up 0:41:49.000,0:41:51.000 but that error came from 0:41:51.000,0:41:54.000 the instantiation of a particular version in this case, that suddenly ran 0:41:54.000,0:41:58.000 afoul of what the template expected of the type. 0:41:58.000,0:42:02.000 The message you get, let's say in X code looks like this, no match for 0:42:02.000,0:42:08.000 operator, less than, less than in standard CL, vector element, operator [inaudible]. 0:42:08.000,0:42:12.000 So I'm trying to give you a little clue, but in its very cryptic C++ way of what 0:42:12.000,0:42:15.000 you've done wrong. 0:42:15.000,0:42:18.000 So it comes back to what the template has to be a little bit clear about 0:42:18.000,0:42:20.000 from a client point of view is to say well 0:42:20.000,0:42:24.000 what is it that needs to be true. Will it really work for all types or is there 0:42:24.000,0:42:27.000 something special about the kind of things that actually need to be true 0:42:27.000,0:42:29.000 about what's filled in there with a placeholder to make the rest of the code 0:42:29.000,0:42:31.000 work correctly. 0:42:31.000,0:42:33.000 So if it uses string insertion 0:42:33.000,0:42:36.000 or it compares to elements using equals equals, right, 0:42:36.000,0:42:40.000 something like a strut doesn't by default have those behaviors and so you 0:42:40.000,0:42:44.000 could have a template that would work for primitive types or types that have these 0:42:44.000,0:42:47.000 operations declared, but wouldn't work when 0:42:47.000,0:42:51.000 you gave it a more complex type that didn't have that 0:42:51.000,0:42:53.000 data support in there. 0:42:53.000,0:42:55.000 So I have that peace code over here. I can just 0:42:55.000,0:42:59.000 show it to you. I just took it down here. 0:42:59.000,0:43:05.000 This is print vector and 0:43:05.000,0:43:07.000 it's a little bit of 0:43:07.000,0:43:10.000 template. 0:43:10.000,0:43:11.000 And 0:43:11.000,0:43:14.000 if I were to have a vector 0:43:14.000,0:43:17.000 declared of ints, 0:43:17.000,0:43:20.000 I say print vector, V 0:43:20.000,0:43:24.000 that this compiles, there's nothing to print. It turns out it'll just print empty line because 0:43:24.000,0:43:32.000 there's nothing in there, but if I change this to be chord T 0:43:32.000,0:43:37.000 and my build is failing, right, and the error message I'm getting here is no match for 0:43:37.000,0:43:38.000 trying to 0:43:38.000,0:43:41.000 output that thing and it tells me that all the things I can output on a string. Well, it could be that you could output a 0:43:41.000,0:43:44.000 [inaudible] but it's not one of those things, right, chord T 0:43:44.000,0:43:48.000 doesn't have 0:43:48.000,0:43:51.000 a built-in behavior for outputting to a stream. 0:43:51.000,0:43:53.000 So this is going to come back to be kind of important because I'm gonna 0:43:53.000,0:43:54.000 keep going here 0:43:54.000,0:43:58.000 is that 0:43:58.000,0:44:00.000 when we get to the sort, right, when 0:44:00.000,0:44:02.000 we try to build this thing, we're gonna definitely be depending on 0:44:02.000,0:44:05.000 something being true about the elements, right, 0:44:05.000,0:44:07.000 that's gonna constrain what the type could be. 0:44:07.000,0:44:10.000 So this is selection sort in 0:44:10.000,0:44:14.000 its ordinary form, nothing special about it other than the fact that I 0:44:14.000,0:44:17.000 changed it as so sorting an int, it's the sort of vector with 0:44:17.000,0:44:18.000 placeholder type. So it's [inaudible] with 0:44:18.000,0:44:23.000 the template typed in header, vector of type and then in here, 0:44:23.000,0:44:26.000 operations like this really are accessing a type 0:44:26.000,0:44:28.000 and another type 0:44:28.000,0:44:30.000 variable out of 0:44:30.000,0:44:33.000 the vector that was passed and then comparing them, which is smaller to decide 0:44:33.000,0:44:34.000 which index. 0:44:34.000,0:44:36.000 It's using a swap 0:44:36.000,0:44:39.000 and so the generic swap would also be necessary for this so that we need to 0:44:39.000,0:44:42.000 have a swap for any kind of things we want to exchange. We have a template 0:44:42.000,0:44:44.000 up there for swap, 0:44:44.000,0:44:47.000 then the sort can build on top of that and use that as part of its workings which is then able 0:44:47.000,0:44:48.000 to come under 0:44:48.000,0:44:52.000 swap to any two things can be swapped using the same strategy. 0:44:52.000,0:44:55.000 And this is actually where templates really start to shine. Like swap in itself isn't that 0:44:55.000,0:44:56.000 0:44:56.000,0:44:59.000 stunning of an example, because you think whatever, who cares if three lines a code, but every 0:44:59.000,0:45:00.000 0:45:00.000,0:45:01.000 little bit does count, 0:45:01.000,0:45:04.000 but once we get to things like sorting where we could build a really 0:45:04.000,0:45:05.000 killer 0:45:05.000,0:45:09.000 totally tuned, really high performance, you know, protection against a [inaudible] 0:45:09.000,0:45:09.000 quick sort 0:45:09.000,0:45:12.000 and we want to be sure that we can use it in all the places we need to sort. I don't to 0:45:12.000,0:45:15.000 make it sort just strings and later when I need to sort for integers, I need to 0:45:15.000,0:45:18.000 copy and paste it and change string to everywhere, 0:45:18.000,0:45:21.000 and then if I find a bug in one, I forget to change it in the other I have 0:45:21.000,0:45:23.000 the bug still lurking there. 0:45:23.000,0:45:25.000 And so template functions are generally used for 0:45:25.000,0:45:29.000 all kinds of algorithmically interesting things, sorting and searching and removing 0:45:29.000,0:45:32.000 duplicates and permuting and finding the mode 0:45:32.000,0:45:33.000 and 0:45:33.000,0:45:36.000 shuffling and all these things that like okay, no matter what the type of 0:45:36.000,0:45:38.000 thing you're working on, 0:45:38.000,0:45:41.000 the algorithm for how you do it looks the same whether it's ints or 0:45:41.000,0:45:43.000 strings or whatever. So 0:45:43.000,0:45:46.000 we got this guy 0:45:46.000,0:45:46.000 0:45:46.000,0:45:49.000 and as is, right, we could 0:45:49.000,0:45:53.000 push vectors of ints in there, vectors of strings in there and the right thing 0:45:53.000,0:45:57.000 would work for those without any changes to it, 0:45:57.000,0:45:59.000 but it does have a lurking 0:45:59.000,0:46:02.000 constraint on what much be true about the kind of elements that we're 0:46:02.000,0:46:04.000 trying to sort here. 0:46:04.000,0:46:05.000 Not every type will work. If 0:46:05.000,0:46:07.000 I go back to my chord example, 0:46:07.000,0:46:09.000 this XY, 0:46:09.000,0:46:13.000 if have a vector of chord and I pass it to sort and if I go back and look at the 0:46:13.000,0:46:14.000 code, 0:46:14.000,0:46:17.000 you think about instantiating this with vector, is all chord in here, all the way 0:46:17.000,0:46:18.000 through here, 0:46:18.000,0:46:22.000 this part's fine; this part's fine; this part's fine, but suddenly you get to this line and 0:46:22.000,0:46:25.000 that's gonna be taking two coordinate structures and saying is this coordinate 0:46:25.000,0:46:27.000 structure less than another. 0:46:27.000,0:46:30.000 And that code's not gonna compile. Yes. Back when we were doing sets, we were able to explicitly 0:46:30.000,0:46:36.000 say how this works. Yeah, so you're right where I'm gonna be, but I'm probably 0:46:36.000,0:46:40.000 not gonna finish today, but I'll have to finish up on Wednesday is we're gonna do exactly what 0:46:40.000,0:46:43.000 Seth did, and what did Seth do? Comparison function. 0:46:43.000,0:46:46.000 That's right, he used a comparison function, so you have to have some coordination here when you say, 0:46:46.000,0:46:49.000 well instead of trying to use the operate or less than on these things, 0:46:49.000,0:46:53.000 how about the client tell me as part of calling sort, why don't you give me the 0:46:53.000,0:46:57.000 information in the form of a function that says call me back and tell me are 0:46:57.000,0:46:59.000 these things less than 0:46:59.000,0:47:02.000 and so on and how to order them. So that's exactly what we need to do, but I can't 0:47:02.000,0:47:05.000 really show you the code right now, but I'll just kind of like 0:47:05.000,0:47:09.000 flash it up there and we will talk about it on Wednesday, what changes make 0:47:09.000,0:47:12.000 that happen. We'll pick up 0:47:12.000,0:47:22.000 with Chapter 8, actually after the midterm in terms of 0:47:22.000,0:47:23.000 reading.