WEBVTT 00:00:00.000 --> 00:00:11.000 00:00:11.000 --> 00:00:15.000 This presentation is delivered by the Stanford Center for Professional 00:00:15.000 --> 00:00:25.000 Development. 00:00:25.000 --> 00:00:26.000 Okay. 00:00:26.000 --> 00:00:27.000 So this is 00:00:27.000 --> 00:00:30.000 the last thing we had talked about at the end of the merge sort was 00:00:30.000 --> 00:00:35.000 comparing a quadratic and N-squared sort, this left and right sort right here to the linear 00:00:35.000 --> 00:00:37.000 arrhythmic which is the N-log in 00:00:37.000 --> 00:00:40.000 kind that merge sort runs in, right, showing you that 00:00:40.000 --> 00:00:41.000 00:00:41.000 --> 00:00:44.000 really way out-performing even on small values and getting the large values 00:00:44.000 --> 00:00:47.000 [inaudible] just getting enormous in terms of what you can accomplish with 00:00:47.000 --> 00:00:48.000 a N-log-in algorithm 00:00:48.000 --> 00:00:50.000 really vastly 00:00:50.000 --> 00:00:52.000 superior to what you're gonna get out of a N-squared algorithm. I said 00:00:52.000 --> 00:00:56.000 well, N-log-I told you without proof that you're going to take on faith that I 00:00:56.000 --> 00:00:57.000 wouldn't lie to you, 00:00:57.000 --> 00:01:01.000 that that is the best we can do in terms of a comparison based sort, 00:01:01.000 --> 00:01:04.000 but what we're going to look for is a way of maybe even improving on those 00:01:04.000 --> 00:01:06.000 kinds of merge sort by kind of a 00:01:06.000 --> 00:01:09.000 one thing we might want to do is avoid that copying of the data out and back that 00:01:09.000 --> 00:01:11.000 merge sort does and 00:01:11.000 --> 00:01:15.000 maybe have some lower cost in factors in the process, right that can bring our 00:01:15.000 --> 00:01:17.000 times down even better. 00:01:17.000 --> 00:01:20.000 So the sort we're looking at is quick sort. 00:01:20.000 --> 00:01:22.000 And I think if you were gonna come up with an algorithm name, 00:01:22.000 --> 00:01:25.000 naming it Quick Sort becomes good marketing here so it kind of inspires you to 00:01:25.000 --> 00:01:29.000 believe actually that it's gonna live up to its name, is a divide-and-conquer algorithm 00:01:29.000 --> 00:01:33.000 that takes a strategy like merge sort in the abstract which is the divide into 00:01:33.000 --> 00:01:36.000 two pieces and [inaudible] sort those pieces and then join them back 00:01:36.000 --> 00:01:37.000 together. 00:01:37.000 --> 00:01:40.000 But whereas merge sort does an easy split hard join, 00:01:40.000 --> 00:01:43.000 this one flips it on its head and said what if we did some work on the front step 00:01:43.000 --> 00:01:45.000 in the splitting process 00:01:45.000 --> 00:01:49.000 and then that may give us less to do on the joining step 00:01:49.000 --> 00:01:53.000 and so the strategy for quick sort is to run to the pile of 00:01:53.000 --> 00:01:55.000 papers, whatever we're trying to work at 00:01:55.000 --> 00:01:57.000 in a partitioning step 00:01:57.000 --> 00:01:58.000 is the first thing that it does 00:01:58.000 --> 00:02:01.000 and that's the splitting step and that partitioning is designed to kind of move 00:02:01.000 --> 00:02:04.000 stuff to the lower half and the upper half. 00:02:04.000 --> 00:02:07.000 So having some idea of what the middle most value would be, 00:02:07.000 --> 00:02:11.000 and then actually just walking through the pile of papers and kind of you know, 00:02:11.000 --> 00:02:13.000 distributing to the left and right 00:02:13.000 --> 00:02:17.000 portions based on their comparison to this middle-most element. 00:02:17.000 --> 00:02:21.000 Once I have those two stacks - I've got A through M over here and N through Z over there, 00:02:21.000 --> 00:02:23.000 that we're cursively sorting those 00:02:23.000 --> 00:02:26.000 takes place, kind of assuming my delegates do their work 00:02:26.000 --> 00:02:32.000 and in the process of re-joining them is really kind of simple. They kind of, you know, 00:02:32.000 --> 00:02:37.000 joined together with actually no extra work, no looking, no process there. So the one 00:02:37.000 --> 00:02:41.000 thing that actually is is where all the trickiness comes into is this idea of how we 00:02:41.000 --> 00:02:44.000 do this partitioning. 00:02:44.000 --> 00:02:47.000 So I was being a little big disingenuous when I said well if I had this stack of exam papers and 00:02:47.000 --> 00:02:48.000 I know that 00:02:48.000 --> 00:02:51.000 the names, you know range over the alphabet A through Z then I know where M is and I 00:02:51.000 --> 00:02:54.000 can say well, that's a good mid pint around to divide. 00:02:54.000 --> 00:02:57.000 Given that we want to make this work for kind of any sort of data, 00:02:57.000 --> 00:03:01.000 we're not going to [inaudible] know, what's the minimal most value. If we have a bunch of 00:03:01.000 --> 00:03:04.000 numbers, are they test scores, in which case maybe 50 is a good place to move. 00:03:04.000 --> 00:03:06.000 Are they instead, you know, 00:03:06.000 --> 00:03:09.000 population counts of states in which case millions would be a better 00:03:09.000 --> 00:03:11.000 number to pick to kind of divide 00:03:11.000 --> 00:03:12.000 them up. So 00:03:12.000 --> 00:03:15.000 we have to come up with a strategy that's gonna work, you know, no matter 00:03:15.000 --> 00:03:19.000 what the input that's somehow gonna pick something that's kind of reasonable 00:03:19.000 --> 00:03:21.000 for how we're gonna do these divisions. There's 00:03:21.000 --> 00:03:23.000 this idea of picking, called a pivot. 00:03:23.000 --> 00:03:28.000 So given this, you know, region, this set of things to sort, 00:03:28.000 --> 00:03:30.000 how do I know what's a reasonable pivot. 00:03:30.000 --> 00:03:33.000 We're looking for something that's close to the median is actually ideal. 00:03:33.000 --> 00:03:34.000 So if we could 00:03:34.000 --> 00:03:37.000 walk through them and compute the median, that would be the best we could 00:03:37.000 --> 00:03:38.000 do, we're 00:03:38.000 --> 00:03:41.000 gonna try to get around to not doing that much work 00:03:41.000 --> 00:03:45.000 about it, though. We're gonna actually try to kind of make an approximation and we're gonna make a really 00:03:45.000 --> 00:03:47.000 very simplistic 00:03:47.000 --> 00:03:50.000 choice about what my approximation would be. We're gonna come back and re-visit 00:03:50.000 --> 00:03:51.000 this a little bit later. 00:03:51.000 --> 00:03:56.000 But I want to say, ''Well, I just took the first paper off the stack. If they're in random order, 00:03:56.000 --> 00:03:58.000 right. I look at the first one; I say Okay, it's, 00:03:58.000 --> 00:04:00.000 you know, somebody king. 00:04:00.000 --> 00:04:03.000 Well, everybody less than king goes over here. Everybody greater than king goes over 00:04:03.000 --> 00:04:07.000 there. Now king may not be perfectly in the middle and we're gonna come back to see 00:04:07.000 --> 00:04:09.000 what that will do to the analysis, 00:04:09.000 --> 00:04:12.000 but at least we know it's in the range of the values we're looking at, right, and so it 00:04:12.000 --> 00:04:14.000 at least did that for us. 00:04:14.000 --> 00:04:17.000 And it means no matter what our data is, we can always use that. It's a strategy that will always work. Let's 00:04:17.000 --> 00:04:18.000 just take the thing 00:04:18.000 --> 00:04:20.000 that we first see 00:04:20.000 --> 00:04:22.000 and use it to be the pivot and them 00:04:22.000 --> 00:04:24.000 slot them left and right. 00:04:24.000 --> 00:04:28.000 So let me go through looking at what the code actually does. 00:04:28.000 --> 00:04:30.000 This is another of those cases where the 00:04:30.000 --> 00:04:32.000 code for partition 00:04:32.000 --> 00:04:35.000 is a little bit 00:04:35.000 --> 00:04:36.000 frightening 00:04:36.000 --> 00:04:40.000 and the analysis we're looking at is not actually to get really really worried 00:04:40.000 --> 00:04:42.000 about the exact details of the less 00:04:42.000 --> 00:04:45.000 than or the less than or equal to. I'm gonna talk you through what it does, 00:04:45.000 --> 00:04:48.000 but the more important take-away point is what's the algorithm behind Quick Sort. What's 00:04:48.000 --> 00:04:53.000 the strategy it's using to divide and conquer and how that's working out. 00:04:53.000 --> 00:04:56.000 So the main Quick Sort algorithm looks like this. We're given this 00:04:56.000 --> 00:04:59.000 vector of things we're trying to sort and we have a start index and a stop index 00:04:59.000 --> 00:05:04.000 that identifies the sub region of the vector that we're currently trying to sort. I'm gonna 00:05:04.000 --> 00:05:08.000 use this in subsequent calls to kind of identify what piece we're recursively 00:05:08.000 --> 00:05:09.000 focusing on. 00:05:09.000 --> 00:05:12.000 As long as there's a 00:05:12.000 --> 00:05:15.000 positive difference between the start and the stop so there's at least 00:05:15.000 --> 00:05:16.000 two elements to sort, 00:05:16.000 --> 00:05:19.000 then we go through the process of doing the division and the sorting. So then 00:05:19.000 --> 00:05:23.000 it makes the call to partition saying sub-region array 00:05:23.000 --> 00:05:26.000 do the partition and we'll come back and talk about that code in a second and the thing we're trying to find 00:05:26.000 --> 00:05:31.000 the partition is the index at which the pivot was placed. So after we did all the 00:05:31.000 --> 00:05:31.000 work, 00:05:31.000 --> 00:05:34.000 it moved everything less than the pivot to the left side of that, everything 00:05:34.000 --> 00:05:36.000 greater than the pivot to the right side 00:05:36.000 --> 00:05:37.000 and then the pivot will tell us 00:05:37.000 --> 00:05:41.000 where the division is and then we make two recursive calls 00:05:41.000 --> 00:05:43.000 to sort everything up to but not including pivot, 00:05:43.000 --> 00:05:46.000 to sort everything past the pivot to the end, 00:05:46.000 --> 00:05:49.000 and then those calls are operating on the array in place, so we're not copying it out and 00:05:49.000 --> 00:05:54.000 copying it back. So in fact, there actually even isn't really any joint step here. 00:05:54.000 --> 00:05:57.000 We said sort the four things on the left, sort the seven things on the right 00:05:57.000 --> 00:06:00.000 and then the joining of them was where they're already where they need 00:06:00.000 --> 00:06:02.000 to be so in fact we don't have anything to do in the 00:06:02.000 --> 00:06:04.000 joined wall. 00:06:04.000 --> 00:06:09.000 The tricky part is certainly in partition. The 00:06:09.000 --> 00:06:12.000 version of the partition we're using here is one that kind of takes two indices, 00:06:12.000 --> 00:06:14.000 two fingers, you might imagine 00:06:14.000 --> 00:06:16.000 and it walks a 00:06:16.000 --> 00:06:18.000 one end from 00:06:18.000 --> 00:06:22.000 the stop position down and one from the start position over 00:06:22.000 --> 00:06:26.000 and it's looking for elements that are on the wrong side. 00:06:26.000 --> 00:06:30.000 So if we start with our right hand finger on the very end of the array, 00:06:30.000 --> 00:06:31.000 that first loop in there 00:06:31.000 --> 00:06:35.000 is designed to move the right hand down 00:06:35.000 --> 00:06:38.000 looking for something that doesn't belong in left side, so we 00:06:38.000 --> 00:06:41.000 identified 45 as the pivot value. It says 00:06:41.000 --> 00:06:43.000 find something that's not 00:06:43.000 --> 00:06:48.000 greater then 45. That's the first thing we found coming in from the right 00:06:48.000 --> 00:06:50.000 downward that doesn't belong, 00:06:50.000 --> 00:06:51.000 so only that first step. 00:06:51.000 --> 00:06:54.000 So it turns out it takes it just one iteration to find that, 00:06:54.000 --> 00:06:57.000 and so okay, this guy doesn't belong on the right-hand side. 00:06:57.000 --> 00:07:00.000 Now it does the same thing on the inverse on the left-hand side. 00:07:00.000 --> 00:07:03.000 Try to find something that doesn't belong on the left-hand side. 00:07:03.000 --> 00:07:06.000 So the left hand moves up. In 00:07:06.000 --> 00:07:08.000 this case, it took two iterations to get to that, 00:07:08.000 --> 00:07:09.000 to this 92. 00:07:09.000 --> 00:07:13.000 So now we have a perfect opportunity to fix both our problems 00:07:13.000 --> 00:07:15.000 with one swap, 00:07:15.000 --> 00:07:16.000 exchange what's 00:07:16.000 --> 00:07:19.000 at the current position of left hand with right hand and now we will have 00:07:19.000 --> 00:07:23.000 made both of our problems go away and we'll have kind of more things on the 00:07:23.000 --> 00:07:24.000 left and the right that belong there 00:07:24.000 --> 00:07:27.000 and then we can walk inward from there, you know, to find subsequent ones 00:07:27.000 --> 00:07:29.000 that are out of order. 00:07:29.000 --> 00:07:31.000 So those two get exchanged and now 00:07:31.000 --> 00:07:33.000 right again 00:07:33.000 --> 00:07:35.000 moves into find that 8, 00:07:35.000 --> 00:07:37.000 left moves over to find that 74. 00:07:37.000 --> 00:07:38.000 We swap again. 00:07:38.000 --> 00:07:40.000 And as we just keep doing this, 00:07:40.000 --> 00:07:41.000 right, until 00:07:41.000 --> 00:07:43.000 they cross, and once they've cross, 00:07:43.000 --> 00:07:47.000 right then we know that everything that the right scanned over is greater than the pivot, 00:07:47.000 --> 00:07:48.000 everything 00:07:48.000 --> 00:07:51.000 in the left was less than and 00:07:51.000 --> 00:07:55.000 at that point, right we have divided it, right, everything that's small is kind 00:07:55.000 --> 00:07:56.000 of in the left side of the 00:07:56.000 --> 00:07:59.000 vector. Everything that's large in the right side. So I 00:07:59.000 --> 00:08:03.000 swap these two and now I get to that place and I say okay, 00:08:03.000 --> 00:08:05.000 so now big things here, small things there. 00:08:05.000 --> 00:08:10.000 The last thing we do is we move the pivot into the place right between them and 00:08:10.000 --> 00:08:13.000 know that it is exactly located right in the 00:08:13.000 --> 00:08:15.000 slot where they cross. Now 00:08:15.000 --> 00:08:17.000 I have two sub arrays to work on. 00:08:17.000 --> 00:08:21.000 So the return from partition in this case will be the index four here and it 00:08:21.000 --> 00:08:24.000 says Okay, go ahead and quick sort 00:08:24.000 --> 00:08:25.000 this front-most part 00:08:25.000 --> 00:08:28.000 and then come back and do the second part. 00:08:28.000 --> 00:08:31.000 So in recursion, kind of think of that as being postponed and it's waiting; we'll get back to it in a minute, but 00:08:31.000 --> 00:08:34.000 while we work on this step, we'll do this same thing. It's a 00:08:34.000 --> 00:08:39.000 little big harder to see it in the small kind of what's going on, but in this case, right, it 00:08:39.000 --> 00:08:44.000 divided that because the pivot was 8 into 8 and the three things greater than the pivot. In 00:08:44.000 --> 00:08:48.000 this case, the 41 is the pivot so it's actually going to move 00:08:48.000 --> 00:08:50.000 41 over there. It's going to have the left of that as it keeps 00:08:50.000 --> 00:08:53.000 working its way down, it gets smaller and smaller and it's harder to see the workings of 00:08:53.000 --> 00:08:56.000 the algorithm in such a small case of it 00:08:56.000 --> 00:08:58.000 rearranging it, but 00:08:58.000 --> 00:08:59.000 we'll get back to the big 00:08:59.000 --> 00:09:01.000 left half in 00:09:01.000 --> 00:09:02.000 a second. 00:09:02.000 --> 00:09:05.000 And so now that that's all been sorted, 00:09:05.000 --> 00:09:09.000 we're revisiting the side that's advancing above the pivot 00:09:09.000 --> 00:09:13.000 to get these five guys, six guys in the order, taking the same 00:09:13.000 --> 00:09:18.000 strategy. You pivot around 67 [inaudible] doing some swapping 00:09:18.000 --> 00:09:28.000 [inaudible] and 00:09:28.000 --> 00:09:30.000 00:09:30.000 --> 00:09:32.000 we'll see it in slightly bigger case to kind of get the idea, 00:09:32.000 --> 00:09:35.000 but very quickly there's something very interesting about the way quick sort is working 00:09:35.000 --> 00:09:38.000 is that that partition step 00:09:38.000 --> 00:09:41.000 very quickly gets things kind of close to the right place. And 00:09:41.000 --> 00:09:42.000 so now that we've 00:09:42.000 --> 00:09:45.000 done both the left and the right 00:09:45.000 --> 00:09:47.000 we're done. The whole thing is done. Everything kind of got moved into the 00:09:47.000 --> 00:09:50.000 right position as part of the recursive, part of the sorting and doesn't need to be 00:09:50.000 --> 00:09:52.000 further moved 00:09:52.000 --> 00:09:54.000 to solve the whole problem. 00:09:54.000 --> 00:09:57.000 That first step of the partition is doing a lot of the kind of throwing things kind 00:09:57.000 --> 00:09:59.000 of left and right, 00:09:59.000 --> 00:10:03.000 but it's actually quickly moving big things and small things that are out of 00:10:03.000 --> 00:10:05.000 place closer to where they're gonna eventually need to go, and it turns 00:10:05.000 --> 00:10:07.000 out to be a real advantage 00:10:07.000 --> 00:10:11.000 in terms of the running time of this sort because it actually is doing 00:10:11.000 --> 00:10:14.000 some quick movement close to where it needs to be and that kind of fixes it up in the recursive calls that 00:10:14.000 --> 00:10:21.000 examine the smaller sub sections of the [inaudible]. Let 00:10:21.000 --> 00:10:28.000 me show you what it sounds like, because that's really what you want to know. So 00:10:28.000 --> 00:10:29.000 let's - 00:10:29.000 --> 00:10:34.000 it's gonna kind of [inaudible]. So 00:10:34.000 --> 00:10:37.000 you can see the big partitioning happening and 00:10:37.000 --> 00:10:40.000 then it kind of jumbling things up and then coming back to revisit them, but you 00:10:40.000 --> 00:10:44.000 notice that quite quickly 00:10:44.000 --> 00:10:47.000 kind of all the small ones kind of got thrown to the left all. All the large elements to the right 00:10:47.000 --> 00:10:51.000 and then the kind of process of coming back and so the big partition that you can see kind 00:10:51.000 --> 00:10:53.000 of working across them and then kind of noodling down. 00:10:53.000 --> 00:11:21.000 If I turn the sound on for it and I'll probably take it down a little bit. 00:11:21.000 --> 00:11:25.000 So the big partition steps making a lot of noise, right and moving things kind of 00:11:25.000 --> 00:11:28.000 quickly and it almost appears, you know, to hear this kind of random sort of it's hard to 00:11:28.000 --> 00:11:30.000 identify what's going on during the big partition, 00:11:30.000 --> 00:11:34.000 but then as you hear it make its way down the recursive tree it's 00:11:34.000 --> 00:11:36.000 focusing on these smaller and smaller regions where the numbers have all been 00:11:36.000 --> 00:11:39.000 gathered because they are similar in value. And 00:11:39.000 --> 00:11:42.000 you hear a lot of noodling in the same pitch range 00:11:42.000 --> 00:11:45.000 region as its working on the smaller and smaller sub arrays and then they definitely go 00:11:45.000 --> 00:11:49.000 from low to high because of the recursion chooses the left side to operate on 00:11:49.000 --> 00:11:50.000 before the right side 00:11:50.000 --> 00:11:51.000 that you hear 00:11:51.000 --> 00:12:00.000 the work being done in the lower tones before it gets to the finishing up 00:12:00.000 --> 00:12:01.000 of the high tones. Kinda cool. Let us 00:12:01.000 --> 00:12:02.000 come back here and talk about 00:12:02.000 --> 00:12:09.000 how to analyze this guy a little bit. 00:12:09.000 --> 00:12:09.000 So the main part 00:12:09.000 --> 00:12:13.000 of the algorithm is very simple and deceptively simple 00:12:13.000 --> 00:12:15.000 because all the hard work was actually done in partition. 00:12:15.000 --> 00:12:17.000 The partition step is linear and 00:12:17.000 --> 00:12:18.000 if you 00:12:18.000 --> 00:12:21.000 can kind of just go along with me conceptually, you'll see that we're moving this 00:12:21.000 --> 00:12:24.000 left finger in; we're moving this right finger in 00:12:24.000 --> 00:12:27.000 and we stop when they cross, so that means every element was looked at. 00:12:27.000 --> 00:12:28.000 Either 00:12:28.000 --> 00:12:31.000 they both matched over here and they matched over here, but eventually they let you 00:12:31.000 --> 00:12:33.000 look at every element as you worked your way in 00:12:33.000 --> 00:12:37.000 and did some swaps along the way. And so that process is linear and the number of 00:12:37.000 --> 00:12:39.000 elements. There's a thousand of them kind of has to 00:12:39.000 --> 00:12:41.000 advance maybe 400 here and 600 there, 00:12:41.000 --> 00:12:45.000 but we'll look at all the elements and the process of partitioning them to the 00:12:45.000 --> 00:12:48.000 lower or upper halves. 00:12:48.000 --> 00:12:50.000 Then it makes two calls, quick sort 00:12:50.000 --> 00:12:54.000 to the left and the right portions of that. 00:12:54.000 --> 00:12:56.000 Well, we don't know exactly what that division was, 00:12:56.000 --> 00:13:00.000 was it even, was it a little lop sided and that's certainly going to come back to be 00:13:00.000 --> 00:13:01.000 something we're gonna look at. 00:13:01.000 --> 00:13:05.000 If we assume kind of this is the ideal 50/50 split 00:13:05.000 --> 00:13:07.000 sort of in an ideal world 00:13:07.000 --> 00:13:09.000 if that choice we had for the pivot 00:13:09.000 --> 00:13:13.000 happened to be the median or close enough to the median that effectively we got an even 00:13:13.000 --> 00:13:15.000 division, at every level of recursion, 00:13:15.000 --> 00:13:19.000 then the recurrence relation for the whole process is the time required to 00:13:19.000 --> 00:13:19.000 sort, an 00:13:19.000 --> 00:13:20.000 input of N 00:13:20.000 --> 00:13:22.000 is in to partition it 00:13:22.000 --> 00:13:24.000 and then 2, 00:13:24.000 --> 00:13:27.000 sorting two halves that are each 00:13:27.000 --> 00:13:29.000 half again as big as the original input. 00:13:29.000 --> 00:13:33.000 We saw that recurrence relationship for merge sort; we solved it down. 00:13:33.000 --> 00:13:37.000 Think in terms of the tree, right you have the branching of three, branching of two, branching of two 00:13:37.000 --> 00:13:41.000 and at each level the partitioning being done across each level and then it 00:13:41.000 --> 00:13:47.000 going to a depth of login, right, the number of times you can [inaudible] by two [inaudible] 00:13:47.000 --> 00:13:49.000 single case arrays that are easily handled. 00:13:49.000 --> 00:13:53.000 So it is an N login sort in this 00:13:53.000 --> 00:13:54.000 perfect best 00:13:54.000 --> 00:13:55.000 case. 00:13:55.000 --> 00:13:56.000 So that should mean that 00:13:56.000 --> 00:13:59.000 in terms of Big O growth curves, right, 00:13:59.000 --> 00:14:02.000 merge sort and quick sort should look about the same. 00:14:02.000 --> 00:14:05.000 It does have sort of better factors, though. 00:14:05.000 --> 00:14:06.000 Let me show you. 00:14:06.000 --> 00:14:08.000 I go back here and I just run them against each other. 00:14:08.000 --> 00:14:15.000 If I put quick sort versus merge sort here, stop 00:14:15.000 --> 00:14:17.000 making noise, 00:14:17.000 --> 00:14:19.000 00:14:19.000 --> 00:14:21.000 the quick sort 00:14:21.000 --> 00:14:22.000 pretty handily 00:14:22.000 --> 00:14:27.000 managed to beat merge sort in this case, merge sort was on the top, quick sort was underneath 00:14:27.000 --> 00:14:29.000 and certainly in terms of what was going on it was doing, 00:14:29.000 --> 00:14:32.000 looks like more comparisons right, to get everything there, that partition 00:14:32.000 --> 00:14:33.000 step did a lot of comparison, 00:14:33.000 --> 00:14:35.000 but fewer moves, 00:14:35.000 --> 00:14:38.000 and that kind of actually does sort of make intuitive sense. You think that quick 00:14:38.000 --> 00:14:41.000 sort very quickly moves stuff to where it goes and does a little bit of noodling. 00:14:41.000 --> 00:14:44.000 Merge sort does a lot of moving things away and moving them back and kind of like 00:14:44.000 --> 00:14:46.000 as you see it growing, 00:14:46.000 --> 00:14:47.000 you'll see a lot of 00:14:47.000 --> 00:14:50.000 large elements that get 00:14:50.000 --> 00:14:53.000 placed up in the front, but then have to be moved again right, because that was 00:14:53.000 --> 00:14:57.000 not their final resting place. And so merge sort does a lot of kind of movement away and back 00:14:57.000 --> 00:15:00.000 that tends to cost it overall, right. You know, a 00:15:00.000 --> 00:15:05.000 higher constant factor of the moves than something like quick sort 00:15:05.000 --> 00:15:08.000 does where it more quickly gets to the right place and then doesn't move it 00:15:08.000 --> 00:15:13.000 again. So 00:15:13.000 --> 00:15:16.000 that was all well and good. That was assuming that everything went perfectly. 00:15:16.000 --> 00:15:20.000 That was given our simplistic choice of the pivot, 00:15:20.000 --> 00:15:24.000 you can imagine that's really assuming a lot right that went our way. 00:15:24.000 --> 00:15:29.000 If we look at a particularly bad split, 00:15:29.000 --> 00:15:32.000 let's imagine that the number we have to pick is pretty close to, 00:15:32.000 --> 00:15:34.000 you know, one of the extremes. If I were 00:15:34.000 --> 00:15:35.000 sorting papers by alphabet 00:15:35.000 --> 00:15:38.000 and if the one on top happened to be a C, 00:15:38.000 --> 00:15:40.000 right, well then I'm gonna be dividing it pretty lop sided, right, it's just gonna 00:15:40.000 --> 00:15:44.000 be the As and the Bs on one side and then everything [inaudible] to the end on the other 00:15:44.000 --> 00:15:45.000 side. 00:15:45.000 --> 00:15:49.000 If I got a split that was about 90/10, ninety percent went on one side, ten 00:15:49.000 --> 00:15:50.000 percent on the other, 00:15:50.000 --> 00:15:52.000 you would expect right 00:15:52.000 --> 00:15:56.000 for it to kind of change the running time of what's going on here, so I get 00:15:56.000 --> 00:15:59.000 one tenth of them over here and the low half and maybe be nine-tenths in the right half. 00:15:59.000 --> 00:16:03.000 Let's just say every time it always takes nine-tenths and moves it to the upper half, so I 00:16:03.000 --> 00:16:05.000 kept picking something that's artificially close to the front. 00:16:05.000 --> 00:16:07.000 These like will kind of peter out 00:16:07.000 --> 00:16:11.000 fairly quickly diving N by 10 and 10 and 10 and 10, eventually these are gonna peter 00:16:11.000 --> 00:16:14.000 out very soon, but this one arm over there 00:16:14.000 --> 00:16:18.000 where I keep taking 90 percent of what remains. I had 90 percent, but I had 00:16:18.000 --> 00:16:19.000 81 percent. 00:16:19.000 --> 00:16:23.000 I keep multiplying by nine-tenths and I'm still kind of retaining a lot of 00:16:23.000 --> 00:16:26.000 elements on this one big portion, 00:16:26.000 --> 00:16:29.000 that the depth of this tree isn't gonna bottom out quite as quickly as it 00:16:29.000 --> 00:16:33.000 used to, but it used to the number of times you could divide N by 2, the log based 00:16:33.000 --> 00:16:35.000 2 of N was where we landed. 00:16:35.000 --> 00:16:37.000 In this case, I have to 00:16:37.000 --> 00:16:40.000 take N and multiply it by nine-tenths so 00:16:40.000 --> 00:16:41.000 instead of one half, 00:16:41.000 --> 00:16:43.000 right, to the K, it's nine tenths to the K 00:16:43.000 --> 00:16:47.000 and it's like how many iterations how deep will this get before it gets 00:16:47.000 --> 00:16:49.000 to the simple case 00:16:49.000 --> 00:16:51.000 and so solving for 00:16:51.000 --> 00:16:56.000 N equals ten-ninths to the K is taking the log-based ten-ninths of both sides, 00:16:56.000 --> 00:17:02.000 then K, the number of levels is the log-based ten-ninths of them. We're kind of 00:17:02.000 --> 00:17:05.000 relying on a fact of mathematics here though is that all algorithms are the 00:17:05.000 --> 00:17:09.000 same value, so log-based A of N and log-based B of 00:17:09.000 --> 00:17:12.000 N, they only differ by some constant that you can 00:17:12.000 --> 00:17:15.000 compute if you just rearrange the terms around. So in effect 00:17:15.000 --> 00:17:16.000 this means that 00:17:16.000 --> 00:17:20.000 there is a constant in front of this. It's like a constant 00:17:20.000 --> 00:17:23.000 difference, the ratio between ten-ninths and 2 00:17:23.000 --> 00:17:27.000 that distinguishes these, but it still can be logged based 2 of N in 00:17:27.000 --> 00:17:30.000 terms of Big O, right, where we can throw away those constants. 00:17:30.000 --> 00:17:34.000 So even those this will perform, obviously in, you know, 00:17:34.000 --> 00:17:38.000 experimentally you would expect that getting this kind of split would cause it 00:17:38.000 --> 00:17:39.000 to 00:17:39.000 --> 00:17:40.000 work more slowly than it would have 00:17:40.000 --> 00:17:44.000 in a perfect split situation. The Big O [inaudible] is still gonna look in-log-in arrhythmic and 00:17:44.000 --> 00:17:47.000 [inaudible]. So 00:17:47.000 --> 00:17:50.000 that was pretty good to know. So it makes you feel a little bit confident, like sometimes it might be 00:17:50.000 --> 00:17:54.000 getting an even split and sometimes it might be getting one-third, two-thirds, sometimes it might be getting 00:17:54.000 --> 00:17:56.000 you know, nine-tenths, but if it was kind of 00:17:56.000 --> 00:18:01.000 in the mix still dividing off some fraction is still doing okay. 00:18:01.000 --> 00:18:07.000 Now, let's look at the really, really, really worst case split. The 00:18:07.000 --> 00:18:10.000 really, really worst case split right, it didn't even take off 00:18:10.000 --> 00:18:13.000 a fraction. It just managed to separate one, 00:18:13.000 --> 00:18:15.000 but somehow the pivot here 00:18:15.000 --> 00:18:16.000 00:18:16.000 --> 00:18:17.000 did a really 00:18:17.000 --> 00:18:20.000 terribly crummy job, right, of dividing it at all 00:18:20.000 --> 00:18:23.000 that in effect, all the elements were greater than the pivot or less than 00:18:23.000 --> 00:18:27.000 the pivot since they all landed on one side and the pivot was kind of by itself. 00:18:27.000 --> 00:18:29.000 So starting with an input of size N, 00:18:29.000 --> 00:18:33.000 we sectioned off 1 and we had N minus 1 remaining. Well, the next time, 00:18:33.000 --> 00:18:36.000 we sectioned off another one, so this happened again and again and again. 00:18:36.000 --> 00:18:39.000 We just got really bad luck all the way through. 00:18:39.000 --> 00:18:42.000 Each of these levels is only making progress for the base case at a snails 00:18:42.000 --> 00:18:43.000 pace. 00:18:43.000 --> 00:18:47.000 Right, one element is being subtracted each subsequent level 00:18:47.000 --> 00:18:50.000 and that's gonna go to a depth of N. 00:18:50.000 --> 00:18:53.000 And so now if we're doing N work across the levels, it isn't quite N 00:18:53.000 --> 00:18:56.000 work because in fact, some of these bottom out. It's still the Gaussian sum in the 00:18:56.000 --> 00:18:57.000 end 00:18:57.000 --> 00:19:01.000 is that we are got N levels doing N work each. 00:19:01.000 --> 00:19:03.000 We suddenly have 00:19:03.000 --> 00:19:05.000 what was a linear arrhythmic algorithm 00:19:05.000 --> 00:19:07.000 now performing quadritically. So 00:19:07.000 --> 00:19:09.000 in the class the selection sort, 00:19:09.000 --> 00:19:11.000 inscription sort 00:19:11.000 --> 00:19:14.000 in its worst-case behavior. 00:19:14.000 --> 00:19:17.000 So quite a range whereas merge sort is a very reliable performer, merge doesn't 00:19:17.000 --> 00:19:18.000 care. 00:19:18.000 --> 00:19:21.000 No matter what the order is, no matter what's going on, it's the same 00:19:21.000 --> 00:19:22.000 amount of work, 00:19:22.000 --> 00:19:26.000 it means that there's some inputs that can cause quick sort to vary between being linear 00:19:26.000 --> 00:19:27.000 arrhythmic or being quadratic, 00:19:27.000 --> 00:19:30.000 and there's a huge difference as we saw in our earlier charts about 00:19:30.000 --> 00:19:32.000 how those things will run for 00:19:32.000 --> 00:19:36.000 large values. So what 00:19:36.000 --> 00:19:38.000 makes the worst case 00:19:38.000 --> 00:19:42.000 - given how we're choosing a pivot right now is to take the first thing in the 00:19:42.000 --> 00:19:44.000 sub section to look at as the pivot, 00:19:44.000 --> 00:19:47.000 what's the example input that gives you this? Sorted. If 00:19:47.000 --> 00:19:49.000 it's already sorted. 00:19:49.000 --> 00:19:50.000 Isn't that a charmer? 00:19:50.000 --> 00:19:54.000 Here's a sorting algorithm. If you ask it to do something 00:19:54.000 --> 00:19:56.000 and in fact if you accidentally [inaudible] twice, you already had sorted 00:19:56.000 --> 00:19:59.000 the data and you said, ''Oh, you did something,'' and you passed it back to the sort, 00:19:59.000 --> 00:20:03.000 it would suddenly actually degenerate into its very worst case. It's already 00:20:03.000 --> 00:20:05.000 sorted, so it would say, 00:20:05.000 --> 00:20:06.000 ''I've got this stack of exam papers.'' 00:20:06.000 --> 00:20:09.000 I look at the first one and I say, ''Oh, look it's Adam's.'' And I say, ''Okay, well, let me 00:20:09.000 --> 00:20:12.000 go find all the ones that belong in the left side.'' None of them do. I said, ''Okay, 00:20:12.000 --> 00:20:15.000 well put Adam's over here.'' I'll [inaudible] sort that. That was easy. 00:20:15.000 --> 00:20:18.000 Oh, let me look at what I've got left, oh with baker on the top. 00:20:18.000 --> 00:20:21.000 Okay, well let me put Baker by itself, but could we find the other ones? Oh, no, 00:20:21.000 --> 00:20:24.000 no more. It would just you 00:20:24.000 --> 00:20:27.000 know, continue because I'm doing this thing, looking at all N, looking at all N minus 1, looking 00:20:27.000 --> 00:20:31.000 at N minus 2, all the way down to the bottom making no, 00:20:31.000 --> 00:20:35.000 recognizing nothing about how beautifully the data already was arranged 00:20:35.000 --> 00:20:36.000 and 00:20:36.000 --> 00:20:38.000 you know, getting its worst case. 00:20:38.000 --> 00:20:42.000 There's actually a couple of others that come in, too. That's probably the most obvious one to think 00:20:42.000 --> 00:20:46.000 of is it's coming in in increasing order. It's also true if its in decreasing order. 00:20:46.000 --> 00:20:49.000 If I happen to have the largest value on the top, then 00:20:49.000 --> 00:20:52.000 I'll end up splitting it all into, everything to the left side and nothing 00:20:52.000 --> 00:20:53.000 to the right. 00:20:53.000 --> 00:20:58.000 There are actually other variations that will also produce this. If at any given 00:20:58.000 --> 00:21:01.000 stage that we're looking at a sub array, if the first value happens to be the extreme 00:21:01.000 --> 00:21:04.000 of what remains whether it's the small to the large, and it could alternate, which would 00:21:04.000 --> 00:21:06.000 be kind of a funny pattern, but if you 00:21:06.000 --> 00:21:10.000 have the tallest and the shortest and the tallest and then another tallest and then a shortest, 00:21:10.000 --> 00:21:13.000 so those would look a little bit harder to describe, but 00:21:13.000 --> 00:21:18.000 there are some other alternatives, that product this bad result. 00:21:18.000 --> 00:21:21.000 All of these we could call degenerate cases. 00:21:21.000 --> 00:21:25.000 There's a small number of these relative to the N factorial [inaudible] your 00:21:25.000 --> 00:21:28.000 data could come in. In fact, there was a huge number, and so there's lots and 00:21:28.000 --> 00:21:29.000 lots of ways it could be arranged. 00:21:29.000 --> 00:21:32.000 There's a very small number of them that would be arranged in such a way to 00:21:32.000 --> 00:21:35.000 trigger this very, very worst-case behaviors. 00:21:35.000 --> 00:21:38.000 Some are close to worst case, like if they were almost sorted, they might every now and 00:21:38.000 --> 00:21:41.000 then get a good split, but mostly a bad split, 00:21:41.000 --> 00:21:44.000 but the number that actually degenerate to the absolute worst case is actually 00:21:44.000 --> 00:21:47.000 quite small, but 00:21:47.000 --> 00:21:49.000 the truth is 00:21:49.000 --> 00:21:51.000 is do we expect that those outputs are somehow 00:21:51.000 --> 00:21:56.000 hard to imagine us getting. Are they so unusual 00:21:56.000 --> 00:21:59.000 and weird that you might be willing to say it's okay that my algorithm has 00:21:59.000 --> 00:22:01.000 this one or a couple of bad cases in it 00:22:01.000 --> 00:22:03.000 because it never comes up. 00:22:03.000 --> 00:22:06.000 In this case, the only thing that your data came in sorted or almost sorted or reverse sorted 00:22:06.000 --> 00:22:09.000 is probably not unlikely. 00:22:09.000 --> 00:22:11.000 It might be that you know, you happen to be reading from 00:22:11.000 --> 00:22:15.000 a file on disc and somebody has taken the [inaudible] and sort it before they gave the 00:22:15.000 --> 00:22:19.000 data to you. If all the sudden that caused your program to behave really badly, 00:22:19.000 --> 00:22:20.000 that would really be 00:22:20.000 --> 00:22:21.000 a 00:22:21.000 --> 00:22:23.000 pretty questionable outcome. 00:22:23.000 --> 00:22:27.000 So let's go aback and look at just to see, though. It's kind of interesting to see 00:22:27.000 --> 00:22:30.000 it happening in 00:22:30.000 --> 00:22:33.000 - this is against quick sort versus merge sort. 00:22:33.000 --> 00:22:38.000 If I change the data to be partially sorted 00:22:38.000 --> 00:22:41.000 and let it run again, 00:22:41.000 --> 00:22:43.000 if I still manage to beat it 00:22:43.000 --> 00:22:44.000 doing a little bit more, 00:22:44.000 --> 00:22:47.000 if I change it to totally sorted 00:22:47.000 --> 00:22:53.000 or reverse sorted, for that matter, and 00:22:53.000 --> 00:22:54.000 so they look like they're doing a lot of work, 00:22:54.000 --> 00:22:56.000 a lot of work about nothing. 00:22:56.000 --> 00:22:57.000 And you can see that quick sort 00:22:57.000 --> 00:22:59.000 really is still way back there. 00:22:59.000 --> 00:23:02.000 It has looked at the first 10 percent or so and is doing a lot 00:23:02.000 --> 00:23:04.000 of frantic partitioning 00:23:04.000 --> 00:23:07.000 that never moves anything. 00:23:07.000 --> 00:23:11.000 And visibly kind of traipsing it's way up there. Merge sort meanwhile is actually 00:23:11.000 --> 00:23:13.000 on the beach in Jamaica with a 00:23:13.000 --> 00:23:15.000 daiquiri 00:23:15.000 --> 00:23:16.000 mocking quick sort 00:23:16.000 --> 00:23:21.000 for all those times that it had said it was so much better than it was. 00:23:21.000 --> 00:23:24.000 ''Duh, it was already sorted; you Doofus.'' 00:23:24.000 --> 00:23:25.000 Apparently it wasn't listening. Merge 00:23:25.000 --> 00:23:28.000 sort almost looked like it did nothing and that's because it took the left half, 00:23:28.000 --> 00:23:32.000 which is the smallest half, moved it off, right, kind of copied it back and it ends up copying 00:23:32.000 --> 00:23:38.000 each element back to the same location it was already originally present in. There you go, quick sort 00:23:38.000 --> 00:23:39.000 taking its time and it if I 00:23:39.000 --> 00:23:42.000 ran it against, let's say, 00:23:42.000 --> 00:23:46.000 insertion sort and selection sort, why not 00:23:46.000 --> 00:23:50.000 [inaudible] nothing better to do than to sort for you all day long. 00:23:50.000 --> 00:23:54.000 So there insertion sort actually finishing very early because it does nothing. It sort of 00:23:54.000 --> 00:23:57.000 looked at them all and said they were already in the right order, 00:23:57.000 --> 00:24:00.000 merge sort doing a little bit more work to move everything back and forth in the same place. 00:24:00.000 --> 00:24:02.000 But 00:24:02.000 --> 00:24:02.000 00:24:02.000 --> 00:24:04.000 in this case, selection sort 00:24:04.000 --> 00:24:06.000 and 00:24:06.000 --> 00:24:08.000 quick sort here at the bottom and selection sort here at the top 00:24:08.000 --> 00:24:11.000 doing really roughly about the same work if you look at the sum total of the comparisons 00:24:11.000 --> 00:24:13.000 and moves 00:24:13.000 --> 00:24:16.000 and then time spent on those suddenly shows that yeah, in this input 00:24:16.000 --> 00:24:19.000 situation quick sort is performing like an N-squared sort, 00:24:19.000 --> 00:24:23.000 and obviously very similar constant factors to those present in selection sort, 00:24:23.000 --> 00:24:24.000 so really 00:24:24.000 --> 00:24:28.000 behaving totally quadratically in that worst-case input. If I make 00:24:28.000 --> 00:24:38.000 it reverse sorted, it's a little more interesting because you can kind of see something going on. Everybody kind of doing their thing. 00:24:38.000 --> 00:24:40.000 Insertion sort now kind of hitting its worst case, 00:24:40.000 --> 00:24:43.000 so it actually coming in dead last because it 00:24:43.000 --> 00:24:45.000 did so much extra moving, but 00:24:45.000 --> 00:24:49.000 still showing the kind of quadratic terms, right, that selection 00:24:49.000 --> 00:24:53.000 sort and quick sort are both having, but higher constant factors there, so taking almost 00:24:53.000 --> 00:24:54.000 twice as long because 00:24:54.000 --> 00:24:55.000 actually doing a little 00:24:55.000 --> 00:25:02.000 bit extra work because of that inverted 00:25:02.000 --> 00:25:04.000 situation. [Inaudible] something big, just cause. We'll put it back 00:25:04.000 --> 00:25:10.000 into random, back into full speed and kind of watch some things happen. 00:25:10.000 --> 00:25:11.000 So quick sort right, just 00:25:11.000 --> 00:25:14.000 sort of done in a flash, merge sort finishing behind it 00:25:14.000 --> 00:25:17.000 and ordinarily 00:25:17.000 --> 00:25:20.000 the quadratic sorts taking quite a much 00:25:20.000 --> 00:25:22.000 longer time than our two recursive sorts. But 00:25:22.000 --> 00:25:30.000 it's the random input really buying quick sort its speed. Is 00:25:30.000 --> 00:25:33.000 it gonna win - oh, come on. Oh, come on. Don't you want selection sort to come behind? I always wanted it to 00:25:33.000 --> 00:25:37.000 come from behind. It's kind of like rooting for the underdog. 00:25:37.000 --> 00:25:38.000 Yes. 00:25:38.000 --> 00:25:42.000 And yet, just remember don't mock insertion sort. What it sorted is 00:25:42.000 --> 00:25:46.000 the only one that recognizes it and does something clever about it. [Inaudible] 00:25:46.000 --> 00:25:49.000 sort is anti-clever about it, so 00:25:49.000 --> 00:25:55.000 it still can hold its head up and be proud. This 00:25:55.000 --> 00:25:58.000 kind of [inaudible]. I was 00:25:58.000 --> 00:26:01.000 thinking about like what algorithms, right, there's a reason why 00:26:01.000 --> 00:26:04.000 there's four that we talked about and there's actually a dozen 00:26:04.000 --> 00:26:07.000 more. It's like different situations actually produce different 00:26:07.000 --> 00:26:09.000 outcomes for different 00:26:09.000 --> 00:26:12.000 sorts and there are reasons that even though they might not be the best sort for all 00:26:12.000 --> 00:26:15.000 purposes, there are some situations where it might be the right one, so if you had it, 00:26:15.000 --> 00:26:18.000 as data said that you expect it to be mostly sorted, 00:26:18.000 --> 00:26:19.000 but had a few [inaudible] in it. 00:26:19.000 --> 00:26:22.000 Like if you had a pile or sand papers that were sorted and you dropped them on the floor and they 00:26:22.000 --> 00:26:25.000 got scuffed up a little bit, but they're still largely sorted, insertion sort is the 00:26:25.000 --> 00:26:26.000 way to 00:26:26.000 --> 00:26:29.000 go. It will take advantage of the work that was already done 00:26:29.000 --> 00:26:33.000 and not redo it or create kind of havoc the way quick sort 00:26:33.000 --> 00:26:36.000 would. But quick sort, the last thing we need to tell you, though, about it is we can't 00:26:36.000 --> 00:26:40.000 tolerate this. Like the way quick sort is in its classic form 00:26:40.000 --> 00:26:41.000 with this 00:26:41.000 --> 00:26:42.000 first element as pivot 00:26:42.000 --> 00:26:45.000 would be an unacceptable way to do this algorithm. 00:26:45.000 --> 00:26:49.000 So quick sort actually typically is one of the most standard element that's offered in 00:26:49.000 --> 00:26:52.000 a library of programming languages, the sort element. 00:26:52.000 --> 00:26:53.000 Well, it 00:26:53.000 --> 00:26:57.000 has to have some strategy for how to deal with this in a way that does not 00:26:57.000 --> 00:26:59.000 degenerate 00:26:59.000 --> 00:27:02.000 and so the idea is, what you need to do is just pick a different 00:27:02.000 --> 00:27:03.000 choice for the pivot, 00:27:03.000 --> 00:27:06.000 a little bit more clever, spend a little bit more time, 00:27:06.000 --> 00:27:09.000 do something that's a little less predictable than just picking that first 00:27:09.000 --> 00:27:11.000 most element. 00:27:11.000 --> 00:27:12.000 So to take 00:27:12.000 --> 00:27:16.000 it to the far extreme, one thing you could do is just computer the median, analyze the data and compute the median. 00:27:16.000 --> 00:27:19.000 It turns out there is a linear time algorithm for this that 00:27:19.000 --> 00:27:22.000 would look at every element of the data set once and then be able to tell 00:27:22.000 --> 00:27:23.000 you what the median was 00:27:23.000 --> 00:27:25.000 and that would guarantee you that 50/50 split. 00:27:25.000 --> 00:27:30.000 So if you go find it and use it at every stage, you will guarantee it. 00:27:30.000 --> 00:27:34.000 Most algorithms don't tend to do that. That's actually kind of overkill for the problem. We 00:27:34.000 --> 00:27:34.000 want to get it to where 00:27:34.000 --> 00:27:38.000 it's pretty much guaranteed to never get the worst case. But we're not 00:27:38.000 --> 00:27:41.000 concerned with it getting 50/50. It's got 60/40 or something close enough, that 00:27:41.000 --> 00:27:45.000 actually, you know, 60/40, 70/30 and it was 00:27:45.000 --> 00:27:47.000 bopping around in that range it'd be fine, 00:27:47.000 --> 00:27:50.000 so the other two that are much more commonly used 00:27:50.000 --> 00:27:53.000 is some kind of approximation of the median 00:27:53.000 --> 00:27:56.000 with a little bit of guessing or something in it. For example, median of three 00:27:56.000 --> 00:27:57.000 takes 00:27:57.000 --> 00:28:01.000 three elements and typically it takes three from some specified position, it takes 00:28:01.000 --> 00:28:02.000 the middle most element, 00:28:02.000 --> 00:28:04.000 the last element and the front element and it says, 00:28:04.000 --> 00:28:06.000 ''Okay, given those three, 00:28:06.000 --> 00:28:10.000 we arrange them to find out what's the middle of those three, and use that.'' 00:28:10.000 --> 00:28:12.000 If the data was already sorted, it turns out you've got the median, right 00:28:12.000 --> 00:28:14.000 because it wasn't the middle most. 00:28:14.000 --> 00:28:17.000 If data was just in random position, then you've got one that you know, at least 00:28:17.000 --> 00:28:22.000 there's one element on one side, right, and so the odds that that every single time 00:28:22.000 --> 00:28:25.000 would produce a very bad split is pretty low. 00:28:25.000 --> 00:28:28.000 There are some inputs that could kind of get it to generate a little bit, but it's 00:28:28.000 --> 00:28:32.000 pretty foolproof in most ordinary situations. 00:28:32.000 --> 00:28:36.000 Even more unpredictable would be just choose a random element. 00:28:36.000 --> 00:28:39.000 So look at the start to stop index you have, pick an random there, 00:28:39.000 --> 00:28:44.000 flop in to the front and now use it as the pivot element. 00:28:44.000 --> 00:28:45.000 If you 00:28:45.000 --> 00:28:47.000 don't know ahead of time how your random number [inaudible] it's impossible 00:28:47.000 --> 00:28:51.000 to generate an input and force it into the worst-case behavior. And so the idea is that 00:28:51.000 --> 00:28:53.000 you're kind of counting on randomness 00:28:53.000 --> 00:28:56.000 and just the probabilistic outcome if 00:28:56.000 --> 00:29:00.000 it managing to be such that the way it shows the random element was to pick the 00:29:00.000 --> 00:29:02.000 extreme and everything, it is just impossible, 00:29:02.000 --> 00:29:04.000 the odds are astronomically against it, 00:29:04.000 --> 00:29:06.000 and so it will, 00:29:06.000 --> 00:29:08.000 a very simple fix, right 00:29:08.000 --> 00:29:09.000 00:29:09.000 --> 00:29:12.000 that still leaves the possibility of the worst case in there, 00:29:12.000 --> 00:29:17.000 but you know, much, much, much far removed probability sense. So, 00:29:17.000 --> 00:29:20.000 just simple things, right, but from there the algorithm operates the 00:29:20.000 --> 00:29:22.000 same way it always has which is to say, 00:29:22.000 --> 00:29:24.000 ''Pick the median, however, you want to do it, 00:29:24.000 --> 00:29:27.000 move it into that front slot and then carry on as before in terms of the left 00:29:27.000 --> 00:29:30.000 and the right hand and moving down and swapping and recursing and all that [inaudible].'' 00:29:30.000 --> 00:29:34.000 Any 00:29:34.000 --> 00:29:40.000 questions; anything about sorting, sorting algorithms? Why don't we 00:29:40.000 --> 00:29:42.000 do some coding actually, 00:29:42.000 --> 00:29:44.000 which is the next thing I want to show you 00:29:44.000 --> 00:29:46.000 because once you know how to write sorts, 00:29:46.000 --> 00:29:47.000 00:29:47.000 --> 00:29:50.000 sort of the other piece I think you need to have to go with it is how would I write 00:29:50.000 --> 00:29:54.000 a sort to be generally useful in a large variety of situations that 00:29:54.000 --> 00:29:57.000 knowing about these sorts I might decide to build 00:29:57.000 --> 00:30:01.000 myself a really general purpose, good, high performance, quick sort, but I 00:30:01.000 --> 00:30:03.000 could use again, and again and again. 00:30:03.000 --> 00:30:07.000 I want to make it work generically so I could sort strings, I could sort 00:30:07.000 --> 00:30:10.000 numbers, or I could sort students, or I could sort 00:30:10.000 --> 00:30:11.000 vectors of students or whatever, 00:30:11.000 --> 00:30:14.000 and I'd want to make sure that no matter what I needed to do it was gonna 00:30:14.000 --> 00:30:17.000 solve all my problems, this one kid of sorting tool, 00:30:17.000 --> 00:30:20.000 and we need to know a little bit of C++ to make that work 00:30:20.000 --> 00:30:22.000 by use of the function template. 00:30:22.000 --> 00:30:27.000 So if you want to ask about sorting algorithms, now is the time. [Inaudible]. We don't 00:30:27.000 --> 00:30:28.000 bubble sort, 00:30:28.000 --> 00:30:31.000 which is kind of interesting. It will come up actually in the assignment 00:30:31.000 --> 00:30:34.000 I give you next week because it is a little bit of an historical artifact 00:30:34.000 --> 00:30:38.000 to know about it, but it turns out bubble sort is one of those sorts that 00:30:38.000 --> 00:30:43.000 is dominated by pretty much every other sort you'll know about in every way, as there's 00:30:43.000 --> 00:30:45.000 really nothing to recommend it. As I said, each of these four have something 00:30:45.000 --> 00:30:49.000 about them that actually has a strength or whatever. Bubble sort really doesn't. 00:30:49.000 --> 00:30:52.000 It's a little bit harder to write than insertion sort. It's a little bit slower than insertion 00:30:52.000 --> 00:30:56.000 sort; it's a little bit easier to get it wrong than insertion sort. It has higher constant 00:30:56.000 --> 00:30:57.000 factors. You know, 00:30:57.000 --> 00:31:01.000 it does recognize the data and sort order, but so does insertion sort, so it's hard to come up with 00:31:01.000 --> 00:31:04.000 a reason to actually spend a lot of time on it, but I will expose you to it in the 00:31:04.000 --> 00:31:07.000 assignment because I think it's a little bit of a - 00:31:07.000 --> 00:31:08.000 it's part of 00:31:08.000 --> 00:31:10.000 our history right, as a computer science 00:31:10.000 --> 00:31:14.000 student to be expose to. 00:31:14.000 --> 00:31:16.000 Anything else? I'm gonna show 00:31:16.000 --> 00:31:19.000 you some things about function templates. Oh, these 00:31:19.000 --> 00:31:21.000 are some numbers; I forgot to give that. 00:31:21.000 --> 00:31:23.000 Just to say, in the end, right, 00:31:23.000 --> 00:31:26.000 which we saw a little bit in the analysis that the constant factors on quick 00:31:26.000 --> 00:31:29.000 sort are 00:31:29.000 --> 00:31:33.000 noticeable when compared to merge sort by a factor of 4, 00:31:33.000 --> 00:31:35.000 moving things more quickly and not having to 00:31:35.000 --> 00:31:39.000 mess with them again. Quick 00:31:39.000 --> 00:31:41.000 sort [inaudible] 00:31:41.000 --> 00:31:44.000 in totally random order? 00:31:44.000 --> 00:31:48.000 Yes, they are. Yes - So it doesn't have them taking - So this is actually using a classic quick 00:31:48.000 --> 00:31:52.000 sort with no degenerate protection on random data. 00:31:52.000 --> 00:31:55.000 If I had put it in, sorted it in on that case, you would definitely see 00:31:55.000 --> 00:31:58.000 like numbers comparable to like selection sorts, eight hours in that 00:31:58.000 --> 00:32:00.000 last slot, right. Or 00:32:00.000 --> 00:32:01.000 you 00:32:01.000 --> 00:32:03.000 [inaudible] degenerate protection, and it would probably 00:32:03.000 --> 00:32:07.000 have an in the noise a small slow down for that little extra work that's being 00:32:07.000 --> 00:32:11.000 done, but then you'll be getting in log and performance reliable across 00:32:11.000 --> 00:32:13.000 all states of inputs. So 00:32:13.000 --> 00:32:15.000 [inaudible] protection as it 00:32:15.000 --> 00:32:19.000 starts to even out time wise with merge sort, does it still - No, it doesn't. 00:32:19.000 --> 00:32:23.000 It actually is an almost imperceptible change in the time because it depends on 00:32:23.000 --> 00:32:26.000 which form of it you use because if you use, for example the random or median of three, 00:32:26.000 --> 00:32:31.000 the amount of work you're doing is about two more things per level in the tree and 00:32:31.000 --> 00:32:34.000 so it turns out that, yeah, 00:32:34.000 --> 00:32:37.000 over the space of login it's just not enough to even be noticeable. 00:32:37.000 --> 00:32:41.000 Now if you did the full median algorithm, you could actually start to see it because then 00:32:41.000 --> 00:32:45.000 you'd see both a linear partition step and a linear median step and that 00:32:45.000 --> 00:32:47.000 would actually raise the cause and factors where it would probably be closer in line to 00:32:47.000 --> 00:32:49.000 merge sort. Pretty 00:32:49.000 --> 00:32:58.000 much nobody writes it that way is the truth, but you could kind of in theory is why we [inaudible]. What I'm 00:32:58.000 --> 00:32:59.000 going to show you, kind 00:32:59.000 --> 00:33:02.000 of motivate this by starting with something simple, and then we're gonna move towards kind of building 00:33:02.000 --> 00:33:04.000 the 00:33:04.000 --> 00:33:05.000 fully generic, 00:33:05.000 --> 00:33:06.000 00:33:06.000 --> 00:33:08.000 you know, one 00:33:08.000 --> 00:33:10.000 algorithm does it all, 00:33:10.000 --> 00:33:12.000 sorting function. 00:33:12.000 --> 00:33:16.000 So what I'm gonna first look at is flop because it's a little bit easier to see it in this case, 00:33:16.000 --> 00:33:19.000 is that you know, sometimes you need to take two variables and exchange their values. I mean 00:33:19.000 --> 00:33:23.000 you need to go through a temporary to copy one and then copy the other back and what 00:33:23.000 --> 00:33:26.000 not - swapping characters, swapping ends, swapping strings, swapping students, you 00:33:26.000 --> 00:33:28.000 know, any kind of variables you wanted to swap, 00:33:28.000 --> 00:33:29.000 you could write 00:33:29.000 --> 00:33:31.000 a specific swap function 00:33:31.000 --> 00:33:35.000 that takes two variables by reference of the integer, strain or character type that 00:33:35.000 --> 00:33:36.000 you're trying to exchange 00:33:36.000 --> 00:33:40.000 and then copies one to a temporary and exchanges the two values. 00:33:40.000 --> 00:33:43.000 Because of the way C++ functions work, right, 00:33:43.000 --> 00:33:46.000 you really do need to say, ''It's a string, this is a string, you know, what's being 00:33:46.000 --> 00:33:49.000 declared here is a string,'' and so you might say, ''Well, if I need more than one swap, sometimes that swaps strings and characters, you know, my choices 00:33:49.000 --> 00:33:52.000 are 00:33:52.000 --> 00:33:53.000 basically 00:33:53.000 --> 00:33:58.000 to duplicate and copy and paste and so on. 00:33:58.000 --> 00:34:02.000 That's not good; we'd rather not do that. 00:34:02.000 --> 00:34:04.000 But what I want to do is I want to 00:34:04.000 --> 00:34:08.000 discuss what it takes to write something in template form. 00:34:08.000 --> 00:34:12.000 We have been using templates right, the vector is a template, the set is a 00:34:12.000 --> 00:34:15.000 template, the stack and queue and all these things are templates 00:34:15.000 --> 00:34:17.000 that are written 00:34:17.000 --> 00:34:20.000 in a very generic way using a place 00:34:20.000 --> 00:34:24.000 holder, so the code that holds onto your collection of things is written 00:34:24.000 --> 00:34:27.000 saying, ''I don't know what the thing is; is it a string; is it an N; is it a car?'' Well, 00:34:27.000 --> 00:34:31.000 vectors just hold onto things and you as a client can decide what you're gonna 00:34:31.000 --> 00:34:33.000 be storing in this particular kind of vector. 00:34:33.000 --> 00:34:37.000 And so what we're gonna see is if I take those flat functions and I try to 00:34:37.000 --> 00:34:41.000 distill them down and say, ''Well, what is it that any swap routine looks like?'' 00:34:41.000 --> 00:34:43.000 It needs to take two variables by reference, it needs 00:34:43.000 --> 00:34:46.000 to declare a template of that type and it needs to do the assignment all around 00:34:46.000 --> 00:34:47.000 00:34:47.000 --> 00:34:49.000 to exchange those values. 00:34:49.000 --> 00:34:51.000 Can I write a version 00:34:51.000 --> 00:34:55.000 that is tight unspecified leaving a placeholder in there for 00:34:55.000 --> 00:34:59.000 what - 00:34:59.000 --> 00:35:00.000 that's really 00:35:00.000 --> 00:35:01.000 kind of amazing, 00:35:01.000 --> 00:35:03.000 that what we'Q1: gonna be swapping 00:35:03.000 --> 00:35:05.000 and then 00:35:05.000 --> 00:35:10.000 let there be multiple swaps generated from that one pattern. 00:35:10.000 --> 00:35:11.000 So we're gonna do this actually in 00:35:11.000 --> 00:35:14.000 the compiler because it's always 00:35:14.000 --> 00:35:16.000 nice to see a little bit a code happening. So I 00:35:16.000 --> 00:35:19.000 go over here 00:35:19.000 --> 00:35:20.000 and 00:35:20.000 --> 00:35:24.000 I got some code up there that 00:35:24.000 --> 00:35:26.000 I'm just currently gonna [inaudible] because I don't want to deal 00:35:26.000 --> 00:35:30.000 with it right now. We're gonna see it in a second. It doesn't [inaudible]. Okay, 00:35:30.000 --> 00:35:41.000 so your usual swap looks like this. 00:35:41.000 --> 00:35:44.000 And that would only swap exactly integers. If I wanted characters I'd 00:35:44.000 --> 00:35:49.000 change it all around. So what I'm gonna change it to is a template. I'm gonna add a template 00:35:49.000 --> 00:35:50.000 header at the top 00:35:50.000 --> 00:35:53.000 and so the template header starts with a key-word template and then angle brackets that 00:35:53.000 --> 00:35:54.000 says, 00:35:54.000 --> 00:35:59.000 ''What kind of placeholders does this template depend on?'' It depends on one type name 00:35:59.000 --> 00:36:00.000 and then I've chose then name 00:36:00.000 --> 00:36:02.000 capital T, type for it. 00:36:02.000 --> 00:36:04.000 That's a choice that I'm going to make and it says 00:36:04.000 --> 00:36:07.000 in the body of the function that's coming up 00:36:07.000 --> 00:36:08.000 where I would have 00:36:08.000 --> 00:36:11.000 ordinarily fully committed on the type, I'm gonna use this placeholder 00:36:11.000 --> 00:36:13.000 capital T, type 00:36:13.000 --> 00:36:22.000 instead. 00:36:22.000 --> 00:36:25.000 And now I have written swap in such a way that it doesn't say for sure 00:36:25.000 --> 00:36:29.000 what's being exchanged, two strings, two Ns, two doubles. They're all 00:36:29.000 --> 00:36:33.000 have to be matching in this form, and I said, ''Well, there's a placeholder, and the placeholder 00:36:33.000 --> 00:36:36.000 was gonna be filled in by the client who used this flop, 00:36:36.000 --> 00:36:41.000 and on demand, we can instantiate as many versions of the swap function as we need.'' 00:36:41.000 --> 00:36:45.000 If you need to swap integers, you can instantiate a swap that then fills in the 00:36:45.000 --> 00:36:48.000 placeholder type with Ns all the way through 00:36:48.000 --> 00:36:51.000 and then I can use that same template or pattern to generate one that will swap 00:36:51.000 --> 00:36:53.000 characters or swap strings or swap whatever. 00:36:53.000 --> 00:37:02.000 So if I go down to my main, maybe I'll just pull my main up [inaudible]. And 00:37:02.000 --> 00:37:05.000 I will show you how I can do this. I can say 00:37:05.000 --> 00:37:08.000 int 1 equals 00:37:08.000 --> 00:37:13.000 54, 2 equals 32 and I say swap 1-2. 00:37:13.000 --> 00:37:17.000 So if the usage of a function is a little bit different 00:37:17.000 --> 00:37:19.000 than it was for 00:37:19.000 --> 00:37:21.000 class templates, 00:37:21.000 --> 00:37:25.000 in class templates we always did this thing where I said 00:37:25.000 --> 00:37:29.000 the name of the template and then it angled back, as I said. And in particular I want 00:37:29.000 --> 00:37:33.000 the swap function that operates where the template parameter has been filled in 00:37:33.000 --> 00:37:35.000 with int 00:37:35.000 --> 00:37:38.000 that in the case of functions, it turns out it's a little bit easier for the compiler 00:37:38.000 --> 00:37:39.000 to know what you intended 00:37:39.000 --> 00:37:42.000 that on the basis of what my arguments are to this, 00:37:42.000 --> 00:37:45.000 is I called swap passing two integers, 00:37:45.000 --> 00:37:49.000 it knows that there's only one possibility for what version of swap you were interested 00:37:49.000 --> 00:37:50.000 in, 00:37:50.000 --> 00:37:54.000 that it must be the one where type has been filled in with int and that's 00:37:54.000 --> 00:37:56.000 the one to generate for you. 00:37:56.000 --> 00:37:59.000 So you can use that long form of saying swap, angle bracket int, 00:37:59.000 --> 00:38:03.000 but typically you will not; you will let the compiler infer it for you on the usage, 00:38:03.000 --> 00:38:05.000 so based on the arguments you passed, 00:38:05.000 --> 00:38:10.000 it will figure out how to kind of match them to what swap said it was taking and 00:38:10.000 --> 00:38:13.000 if it can't match them, for example, if you pass one double and one int, you'll get compiler 00:38:13.000 --> 00:38:14.000 errors. 00:38:14.000 --> 00:38:19.000 But if I've done it correctly, then it will generate for me a swap 00:38:19.000 --> 00:38:23.000 where type's been filled in with int and if I call it again, 00:38:23.000 --> 00:38:26.000 passing different kinds of things, so having 00:38:26.000 --> 00:38:29.000 string S equals hello and T 00:38:29.000 --> 00:38:31.000 equals 00:38:31.000 --> 00:38:33.000 world, 00:38:33.000 --> 00:38:37.000 that I can write swap S and T and now 00:38:37.000 --> 00:38:40.000 the compiler generated for me two different versions of swap, right, 00:38:40.000 --> 00:38:43.000 one for integers, one for 00:38:43.000 --> 00:38:46.000 strings on the [inaudible] and I didn't do anything with them. I put them [inaudible], so nothing to see there, 00:38:46.000 --> 00:38:47.000 but showing 00:38:47.000 --> 00:38:50.000 it does compile and build up. That's 00:38:50.000 --> 00:38:53.000 a pretty neat idea. 00:38:53.000 --> 00:38:55.000 Kind of important because it turns out there are 00:38:55.000 --> 00:38:59.000 a lot of pieces of code that you're gonna find yourself wanting to write that don't 00:38:59.000 --> 00:39:02.000 depend, in a case like swap, swap doesn't care what it's exchanging, that they're Ns, that 00:39:02.000 --> 00:39:06.000 they're strings, that they're doubles. The way you swap two things 00:39:06.000 --> 00:39:09.000 is the same no matter what they are. You copy one aside; you overwrite one; 00:39:09.000 --> 00:39:11.000 you overwrite the other 00:39:11.000 --> 00:39:14.000 and the same thing applies to much more algorithmically interesting problems 00:39:14.000 --> 00:39:15.000 like 00:39:15.000 --> 00:39:19.000 searching, like sorting, like removing duplicates, like printing, 00:39:19.000 --> 00:39:23.000 where you want to take a vector and print all of its contents right that 00:39:23.000 --> 00:39:24.000 the 00:39:24.000 --> 00:39:27.000 steps you need to do to print a vector of events looks just like the steps you need to do to print a vector 00:39:27.000 --> 00:39:31.000 of strings. And so it would be very annoying every time I need to do that to 00:39:31.000 --> 00:39:33.000 duplicate that code that there's a real appeal toward 00:39:33.000 --> 00:39:35.000 writing it once, as a template 00:39:35.000 --> 00:39:37.000 and using it 00:39:37.000 --> 00:39:41.000 in multiple situations. So 00:39:41.000 --> 00:39:44.000 this shows that if I swap, right, 00:39:44.000 --> 00:39:47.000 it infers what I meant 00:39:47.000 --> 00:39:50.000 and then this thing basically just shows what happened is when I said 00:39:50.000 --> 00:39:53.000 int equals four, 00:39:53.000 --> 00:39:56.000 [inaudible] use that pattern to generate the swap int 00:39:56.000 --> 00:39:59.000 where the type parameter, that placeholder has been 00:39:59.000 --> 00:40:01.000 filled in 00:40:01.000 --> 00:40:05.000 and established that it's int for this particular version which is distinct from 00:40:05.000 --> 00:40:10.000 the swap for characters and swap for strings and what not. So 00:40:10.000 --> 00:40:13.000 let me show you that version I talked about print, and this one I'm 00:40:13.000 --> 00:40:15.000 gonna go back to the 00:40:15.000 --> 00:40:16.000 compiler for just a second. 00:40:16.000 --> 00:40:19.000 Let's say I want to print a vector, printing a vector it like iterating all 00:40:19.000 --> 00:40:22.000 the members and using the 00:40:22.000 --> 00:40:23.000 screen insertion to print them out 00:40:23.000 --> 00:40:26.000 and so here it is taking some vector 00:40:26.000 --> 00:40:29.000 where the elements in it are of this type name. In this case, I've just used T as 00:40:29.000 --> 00:40:31.000 my short name here. 00:40:31.000 --> 00:40:31.000 The iterator 00:40:31.000 --> 00:40:35.000 looks the same; the bracketing looks the same. I see the end now and I'd 00:40:35.000 --> 00:40:38.000 like to be able to print vectors of strings, vectors of ints, vector of 00:40:38.000 --> 00:40:39.000 doubles 00:40:39.000 --> 00:40:43.000 with this one piece of code. So if I call print vector and I pas code vector events, 00:40:43.000 --> 00:40:43.000 all's good; 00:40:43.000 --> 00:40:47.000 vector doubles, all's good; vector strings, all's good. 00:40:47.000 --> 00:40:50.000 Here's where I could get a little bit into trouble here. 00:40:50.000 --> 00:40:52.000 I've made this [inaudible] chord that has an X 00:40:52.000 --> 00:40:57.000 and a Y field. I make a vector that contains these chords. 00:40:57.000 --> 00:41:00.000 And then I try to call print vector of C. 00:41:00.000 --> 00:41:03.000 So when I do this, right, 00:41:03.000 --> 00:41:06.000 it will instantiate a versive print vector 00:41:06.000 --> 00:41:11.000 where the T has been matched up to, oh it's chords that are in there; you've got a vector 00:41:11.000 --> 00:41:12.000 of chords. 00:41:12.000 --> 00:41:15.000 And so okay, we'll go through and iterate over the sides of the vector 00:41:15.000 --> 00:41:18.000 and this line is trying to say see out 00:41:18.000 --> 00:41:21.000 of a chord type. And 00:41:21.000 --> 00:41:22.000 struts 00:41:22.000 --> 00:41:26.000 don't automatically know how to out put themselves onto a string, 00:41:26.000 --> 00:41:30.000 and so at this point when I try to instantiate, so the code in print vector is 00:41:30.000 --> 00:41:33.000 actually fine and works for a large number of types, all the primitive types 00:41:33.000 --> 00:41:35.000 will be fine, 00:41:35.000 --> 00:41:36.000 string and things like that are fine. 00:41:36.000 --> 00:41:40.000 But if I try to use it for a type for which it doesn't have this output 00:41:40.000 --> 00:41:43.000 behavior, right I will get a compiler error, 00:41:43.000 --> 00:41:46.000 and it may come as a surprise because a print vector appeared to be working fine in all the other 00:41:46.000 --> 00:41:49.000 situations and all the sudden it seems like a new error has cropped up 00:41:49.000 --> 00:41:51.000 but that error came from 00:41:51.000 --> 00:41:54.000 the instantiation of a particular version in this case, that suddenly ran 00:41:54.000 --> 00:41:58.000 afoul of what the template expected of the type. 00:41:58.000 --> 00:42:02.000 The message you get, let's say in X code looks like this, no match for 00:42:02.000 --> 00:42:08.000 operator, less than, less than in standard CL, vector element, operator [inaudible]. 00:42:08.000 --> 00:42:12.000 So I'm trying to give you a little clue, but in its very cryptic C++ way of what 00:42:12.000 --> 00:42:15.000 you've done wrong. 00:42:15.000 --> 00:42:18.000 So it comes back to what the template has to be a little bit clear about 00:42:18.000 --> 00:42:20.000 from a client point of view is to say well 00:42:20.000 --> 00:42:24.000 what is it that needs to be true. Will it really work for all types or is there 00:42:24.000 --> 00:42:27.000 something special about the kind of things that actually need to be true 00:42:27.000 --> 00:42:29.000 about what's filled in there with a placeholder to make the rest of the code 00:42:29.000 --> 00:42:31.000 work correctly. 00:42:31.000 --> 00:42:33.000 So if it uses string insertion 00:42:33.000 --> 00:42:36.000 or it compares to elements using equals equals, right, 00:42:36.000 --> 00:42:40.000 something like a strut doesn't by default have those behaviors and so you 00:42:40.000 --> 00:42:44.000 could have a template that would work for primitive types or types that have these 00:42:44.000 --> 00:42:47.000 operations declared, but wouldn't work when 00:42:47.000 --> 00:42:51.000 you gave it a more complex type that didn't have that 00:42:51.000 --> 00:42:53.000 data support in there. 00:42:53.000 --> 00:42:55.000 So I have that peace code over here. I can just 00:42:55.000 --> 00:42:59.000 show it to you. I just took it down here. 00:42:59.000 --> 00:43:05.000 This is print vector and 00:43:05.000 --> 00:43:07.000 it's a little bit of 00:43:07.000 --> 00:43:10.000 template. 00:43:10.000 --> 00:43:11.000 And 00:43:11.000 --> 00:43:14.000 if I were to have a vector 00:43:14.000 --> 00:43:17.000 declared of ints, 00:43:17.000 --> 00:43:20.000 I say print vector, V 00:43:20.000 --> 00:43:24.000 that this compiles, there's nothing to print. It turns out it'll just print empty line because 00:43:24.000 --> 00:43:32.000 there's nothing in there, but if I change this to be chord T 00:43:32.000 --> 00:43:37.000 and my build is failing, right, and the error message I'm getting here is no match for 00:43:37.000 --> 00:43:38.000 trying to 00:43:38.000 --> 00:43:41.000 output that thing and it tells me that all the things I can output on a string. Well, it could be that you could output a 00:43:41.000 --> 00:43:44.000 [inaudible] but it's not one of those things, right, chord T 00:43:44.000 --> 00:43:48.000 doesn't have 00:43:48.000 --> 00:43:51.000 a built-in behavior for outputting to a stream. 00:43:51.000 --> 00:43:53.000 So this is going to come back to be kind of important because I'm gonna 00:43:53.000 --> 00:43:54.000 keep going here 00:43:54.000 --> 00:43:58.000 is that 00:43:58.000 --> 00:44:00.000 when we get to the sort, right, when 00:44:00.000 --> 00:44:02.000 we try to build this thing, we're gonna definitely be depending on 00:44:02.000 --> 00:44:05.000 something being true about the elements, right, 00:44:05.000 --> 00:44:07.000 that's gonna constrain what the type could be. 00:44:07.000 --> 00:44:10.000 So this is selection sort in 00:44:10.000 --> 00:44:14.000 its ordinary form, nothing special about it other than the fact that I 00:44:14.000 --> 00:44:17.000 changed it as so sorting an int, it's the sort of vector with 00:44:17.000 --> 00:44:18.000 placeholder type. So it's [inaudible] with 00:44:18.000 --> 00:44:23.000 the template typed in header, vector of type and then in here, 00:44:23.000 --> 00:44:26.000 operations like this really are accessing a type 00:44:26.000 --> 00:44:28.000 and another type 00:44:28.000 --> 00:44:30.000 variable out of 00:44:30.000 --> 00:44:33.000 the vector that was passed and then comparing them, which is smaller to decide 00:44:33.000 --> 00:44:34.000 which index. 00:44:34.000 --> 00:44:36.000 It's using a swap 00:44:36.000 --> 00:44:39.000 and so the generic swap would also be necessary for this so that we need to 00:44:39.000 --> 00:44:42.000 have a swap for any kind of things we want to exchange. We have a template 00:44:42.000 --> 00:44:44.000 up there for swap, 00:44:44.000 --> 00:44:47.000 then the sort can build on top of that and use that as part of its workings which is then able 00:44:47.000 --> 00:44:48.000 to come under 00:44:48.000 --> 00:44:52.000 swap to any two things can be swapped using the same strategy. 00:44:52.000 --> 00:44:55.000 And this is actually where templates really start to shine. Like swap in itself isn't that 00:44:55.000 --> 00:44:56.000 00:44:56.000 --> 00:44:59.000 stunning of an example, because you think whatever, who cares if three lines a code, but every 00:44:59.000 --> 00:45:00.000 00:45:00.000 --> 00:45:01.000 little bit does count, 00:45:01.000 --> 00:45:04.000 but once we get to things like sorting where we could build a really 00:45:04.000 --> 00:45:05.000 killer 00:45:05.000 --> 00:45:09.000 totally tuned, really high performance, you know, protection against a [inaudible] 00:45:09.000 --> 00:45:09.000 quick sort 00:45:09.000 --> 00:45:12.000 and we want to be sure that we can use it in all the places we need to sort. I don't to 00:45:12.000 --> 00:45:15.000 make it sort just strings and later when I need to sort for integers, I need to 00:45:15.000 --> 00:45:18.000 copy and paste it and change string to everywhere, 00:45:18.000 --> 00:45:21.000 and then if I find a bug in one, I forget to change it in the other I have 00:45:21.000 --> 00:45:23.000 the bug still lurking there. 00:45:23.000 --> 00:45:25.000 And so template functions are generally used for 00:45:25.000 --> 00:45:29.000 all kinds of algorithmically interesting things, sorting and searching and removing 00:45:29.000 --> 00:45:32.000 duplicates and permuting and finding the mode 00:45:32.000 --> 00:45:33.000 and 00:45:33.000 --> 00:45:36.000 shuffling and all these things that like okay, no matter what the type of 00:45:36.000 --> 00:45:38.000 thing you're working on, 00:45:38.000 --> 00:45:41.000 the algorithm for how you do it looks the same whether it's ints or 00:45:41.000 --> 00:45:43.000 strings or whatever. So 00:45:43.000 --> 00:45:46.000 we got this guy 00:45:46.000 --> 00:45:46.000 00:45:46.000 --> 00:45:49.000 and as is, right, we could 00:45:49.000 --> 00:45:53.000 push vectors of ints in there, vectors of strings in there and the right thing 00:45:53.000 --> 00:45:57.000 would work for those without any changes to it, 00:45:57.000 --> 00:45:59.000 but it does have a lurking 00:45:59.000 --> 00:46:02.000 constraint on what much be true about the kind of elements that we're 00:46:02.000 --> 00:46:04.000 trying to sort here. 00:46:04.000 --> 00:46:05.000 Not every type will work. If 00:46:05.000 --> 00:46:07.000 I go back to my chord example, 00:46:07.000 --> 00:46:09.000 this XY, 00:46:09.000 --> 00:46:13.000 if have a vector of chord and I pass it to sort and if I go back and look at the 00:46:13.000 --> 00:46:14.000 code, 00:46:14.000 --> 00:46:17.000 you think about instantiating this with vector, is all chord in here, all the way 00:46:17.000 --> 00:46:18.000 through here, 00:46:18.000 --> 00:46:22.000 this part's fine; this part's fine; this part's fine, but suddenly you get to this line and 00:46:22.000 --> 00:46:25.000 that's gonna be taking two coordinate structures and saying is this coordinate 00:46:25.000 --> 00:46:27.000 structure less than another. 00:46:27.000 --> 00:46:30.000 And that code's not gonna compile. Yes. Back when we were doing sets, we were able to explicitly 00:46:30.000 --> 00:46:36.000 say how this works. Yeah, so you're right where I'm gonna be, but I'm probably 00:46:36.000 --> 00:46:40.000 not gonna finish today, but I'll have to finish up on Wednesday is we're gonna do exactly what 00:46:40.000 --> 00:46:43.000 Seth did, and what did Seth do? Comparison function. 00:46:43.000 --> 00:46:46.000 That's right, he used a comparison function, so you have to have some coordination here when you say, 00:46:46.000 --> 00:46:49.000 well instead of trying to use the operate or less than on these things, 00:46:49.000 --> 00:46:53.000 how about the client tell me as part of calling sort, why don't you give me the 00:46:53.000 --> 00:46:57.000 information in the form of a function that says call me back and tell me are 00:46:57.000 --> 00:46:59.000 these things less than 00:46:59.000 --> 00:47:02.000 and so on and how to order them. So that's exactly what we need to do, but I can't 00:47:02.000 --> 00:47:05.000 really show you the code right now, but I'll just kind of like 00:47:05.000 --> 00:47:09.000 flash it up there and we will talk about it on Wednesday, what changes make 00:47:09.000 --> 00:47:12.000 that happen. We'll pick up 00:47:12.000 --> 00:47:22.000 with Chapter 8, actually after the midterm in terms of 00:47:22.000 --> 00:47:23.000 reading.