WEBVTT

00:00:00.000 --> 00:00:11.000


00:00:11.000 --> 00:00:15.000
This presentation is delivered by the Stanford Center for Professional

00:00:15.000 --> 00:00:25.000
Development.

00:00:25.000 --> 00:00:26.000
Okay.

00:00:26.000 --> 00:00:27.000
So this is

00:00:27.000 --> 00:00:30.000
the last thing we had talked about at the end of the merge sort was

00:00:30.000 --> 00:00:35.000
comparing a quadratic and N-squared sort, this left and right sort right here to the linear

00:00:35.000 --> 00:00:37.000
arrhythmic which is the N-log in

00:00:37.000 --> 00:00:40.000
kind that merge sort runs in, right, showing you that

00:00:40.000 --> 00:00:41.000


00:00:41.000 --> 00:00:44.000
really way out-performing even on small values and getting the large values

00:00:44.000 --> 00:00:47.000
[inaudible] just getting enormous in terms of what you can accomplish with

00:00:47.000 --> 00:00:48.000
a N-log-in algorithm

00:00:48.000 --> 00:00:50.000
really vastly

00:00:50.000 --> 00:00:52.000
superior to what you're gonna get out of a N-squared algorithm. I said

00:00:52.000 --> 00:00:56.000
well, N-log-I told you without proof that you're going to take on faith that I

00:00:56.000 --> 00:00:57.000
wouldn't lie to you,

00:00:57.000 --> 00:01:01.000
that that is the best we can do in terms of a comparison based sort,

00:01:01.000 --> 00:01:04.000
but what we're going to look for is a way of maybe even improving on those

00:01:04.000 --> 00:01:06.000
kinds of merge sort by kind of a

00:01:06.000 --> 00:01:09.000
one thing we might want to do is avoid that copying of the data out and back that

00:01:09.000 --> 00:01:11.000
merge sort does and

00:01:11.000 --> 00:01:15.000
maybe have some lower cost in factors in the process, right that can bring our

00:01:15.000 --> 00:01:17.000
times down even better.

00:01:17.000 --> 00:01:20.000
So the sort we're looking at is quick sort.

00:01:20.000 --> 00:01:22.000
And I think if you were gonna come up with an algorithm name,

00:01:22.000 --> 00:01:25.000
naming it Quick Sort becomes good marketing here so it kind of inspires you to

00:01:25.000 --> 00:01:29.000
believe actually that it's gonna live up to its name, is a divide-and-conquer algorithm

00:01:29.000 --> 00:01:33.000
that takes a strategy like merge sort in the abstract which is the divide into

00:01:33.000 --> 00:01:36.000
two pieces and [inaudible] sort those pieces and then join them back

00:01:36.000 --> 00:01:37.000
together.

00:01:37.000 --> 00:01:40.000
But whereas merge sort does an easy split hard join,

00:01:40.000 --> 00:01:43.000
this one flips it on its head and said what if we did some work on the front step

00:01:43.000 --> 00:01:45.000
in the splitting process

00:01:45.000 --> 00:01:49.000
and then that may give us less to do on the joining step

00:01:49.000 --> 00:01:53.000
and so the strategy for quick sort is to run to the pile of

00:01:53.000 --> 00:01:55.000
papers, whatever we're trying to work at

00:01:55.000 --> 00:01:57.000
in a partitioning step

00:01:57.000 --> 00:01:58.000
is the first thing that it does

00:01:58.000 --> 00:02:01.000
and that's the splitting step and that partitioning is designed to kind of move

00:02:01.000 --> 00:02:04.000
stuff to the lower half and the upper half.

00:02:04.000 --> 00:02:07.000
So having some idea of what the middle most value would be,

00:02:07.000 --> 00:02:11.000
and then actually just walking through the pile of papers and kind of you know,

00:02:11.000 --> 00:02:13.000
distributing to the left and right

00:02:13.000 --> 00:02:17.000
portions based on their comparison to this middle-most element.

00:02:17.000 --> 00:02:21.000
Once I have those two stacks - I've got A through M over here and N through Z over there,

00:02:21.000 --> 00:02:23.000
that we're cursively sorting those

00:02:23.000 --> 00:02:26.000
takes place, kind of assuming my delegates do their work

00:02:26.000 --> 00:02:32.000
and in the process of re-joining them is really kind of simple. They kind of, you know,

00:02:32.000 --> 00:02:37.000
joined together with actually no extra work, no looking, no process there. So the one

00:02:37.000 --> 00:02:41.000
thing that actually is is where all the trickiness comes into is this idea of how we

00:02:41.000 --> 00:02:44.000
do this partitioning.

00:02:44.000 --> 00:02:47.000
So I was being a little big disingenuous when I said well if I had this stack of exam papers and

00:02:47.000 --> 00:02:48.000
I know that

00:02:48.000 --> 00:02:51.000
the names, you know range over the alphabet A through Z then I know where M is and I

00:02:51.000 --> 00:02:54.000
can say well, that's a good mid pint around to divide.

00:02:54.000 --> 00:02:57.000
Given that we want to make this work for kind of any sort of data,

00:02:57.000 --> 00:03:01.000
we're not going to [inaudible] know, what's the minimal most value. If we have a bunch of

00:03:01.000 --> 00:03:04.000
numbers, are they test scores, in which case maybe 50 is a good place to move.

00:03:04.000 --> 00:03:06.000
Are they instead, you know,

00:03:06.000 --> 00:03:09.000
population counts of states in which case millions would be a better

00:03:09.000 --> 00:03:11.000
number to pick to kind of divide

00:03:11.000 --> 00:03:12.000
them up. So

00:03:12.000 --> 00:03:15.000
we have to come up with a strategy that's gonna work, you know, no matter

00:03:15.000 --> 00:03:19.000
what the input that's somehow gonna pick something that's kind of reasonable

00:03:19.000 --> 00:03:21.000
for how we're gonna do these divisions. There's

00:03:21.000 --> 00:03:23.000
this idea of picking, called a pivot.

00:03:23.000 --> 00:03:28.000
So given this, you know, region, this set of things to sort,

00:03:28.000 --> 00:03:30.000
how do I know what's a reasonable pivot.

00:03:30.000 --> 00:03:33.000
We're looking for something that's close to the median is actually ideal.

00:03:33.000 --> 00:03:34.000
So if we could

00:03:34.000 --> 00:03:37.000
walk through them and compute the median, that would be the best we could

00:03:37.000 --> 00:03:38.000
do, we're

00:03:38.000 --> 00:03:41.000
gonna try to get around to not doing that much work

00:03:41.000 --> 00:03:45.000
about it, though. We're gonna actually try to kind of make an approximation and we're gonna make a really

00:03:45.000 --> 00:03:47.000
very simplistic

00:03:47.000 --> 00:03:50.000
choice about what my approximation would be. We're gonna come back and re-visit

00:03:50.000 --> 00:03:51.000
this a little bit later.

00:03:51.000 --> 00:03:56.000
But I want to say, ''Well, I just took the first paper off the stack. If they're in random order,

00:03:56.000 --> 00:03:58.000
right. I look at the first one; I say Okay, it's,

00:03:58.000 --> 00:04:00.000
you know, somebody king.

00:04:00.000 --> 00:04:03.000
Well, everybody less than king goes over here. Everybody greater than king goes over

00:04:03.000 --> 00:04:07.000
there. Now king may not be perfectly in the middle and we're gonna come back to see

00:04:07.000 --> 00:04:09.000
what that will do to the analysis,

00:04:09.000 --> 00:04:12.000
but at least we know it's in the range of the values we're looking at, right, and so it

00:04:12.000 --> 00:04:14.000
at least did that for us.

00:04:14.000 --> 00:04:17.000
And it means no matter what our data is, we can always use that. It's a strategy that will always work. Let's

00:04:17.000 --> 00:04:18.000
just take the thing

00:04:18.000 --> 00:04:20.000
that we first see

00:04:20.000 --> 00:04:22.000
and use it to be the pivot and them

00:04:22.000 --> 00:04:24.000
slot them left and right.

00:04:24.000 --> 00:04:28.000
So let me go through looking at what the code actually does.

00:04:28.000 --> 00:04:30.000
This is another of those cases where the

00:04:30.000 --> 00:04:32.000
code for partition

00:04:32.000 --> 00:04:35.000
is a little bit

00:04:35.000 --> 00:04:36.000
frightening

00:04:36.000 --> 00:04:40.000
and the analysis we're looking at is not actually to get really really worried

00:04:40.000 --> 00:04:42.000
about the exact details of the less

00:04:42.000 --> 00:04:45.000
than or the less than or equal to. I'm gonna talk you through what it does,

00:04:45.000 --> 00:04:48.000
but the more important take-away point is what's the algorithm behind Quick Sort. What's

00:04:48.000 --> 00:04:53.000
the strategy it's using to divide and conquer and how that's working out.

00:04:53.000 --> 00:04:56.000
So the main Quick Sort algorithm looks like this. We're given this

00:04:56.000 --> 00:04:59.000
vector of things we're trying to sort and we have a start index and a stop index

00:04:59.000 --> 00:05:04.000
that identifies the sub region of the vector that we're currently trying to sort. I'm gonna

00:05:04.000 --> 00:05:08.000
use this in subsequent calls to kind of identify what piece we're recursively

00:05:08.000 --> 00:05:09.000
focusing on.

00:05:09.000 --> 00:05:12.000
As long as there's a

00:05:12.000 --> 00:05:15.000
positive difference between the start and the stop so there's at least

00:05:15.000 --> 00:05:16.000
two elements to sort,

00:05:16.000 --> 00:05:19.000
then we go through the process of doing the division and the sorting. So then

00:05:19.000 --> 00:05:23.000
it makes the call to partition saying sub-region array

00:05:23.000 --> 00:05:26.000
do the partition and we'll come back and talk about that code in a second and the thing we're trying to find

00:05:26.000 --> 00:05:31.000
the partition is the index at which the pivot was placed. So after we did all the

00:05:31.000 --> 00:05:31.000
work,

00:05:31.000 --> 00:05:34.000
it moved everything less than the pivot to the left side of that, everything

00:05:34.000 --> 00:05:36.000
greater than the pivot to the right side

00:05:36.000 --> 00:05:37.000
and then the pivot will tell us

00:05:37.000 --> 00:05:41.000
where the division is and then we make two recursive calls

00:05:41.000 --> 00:05:43.000
to sort everything up to but not including pivot,

00:05:43.000 --> 00:05:46.000
to sort everything past the pivot to the end,

00:05:46.000 --> 00:05:49.000
and then those calls are operating on the array in place, so we're not copying it out and

00:05:49.000 --> 00:05:54.000
copying it back. So in fact, there actually even isn't really any joint step here.

00:05:54.000 --> 00:05:57.000
We said sort the four things on the left, sort the seven things on the right

00:05:57.000 --> 00:06:00.000
and then the joining of them was where they're already where they need

00:06:00.000 --> 00:06:02.000
to be so in fact we don't have anything to do in the

00:06:02.000 --> 00:06:04.000
joined wall.

00:06:04.000 --> 00:06:09.000
The tricky part is certainly in partition. The

00:06:09.000 --> 00:06:12.000
version of the partition we're using here is one that kind of takes two indices,

00:06:12.000 --> 00:06:14.000
two fingers, you might imagine

00:06:14.000 --> 00:06:16.000
and it walks a

00:06:16.000 --> 00:06:18.000
one end from

00:06:18.000 --> 00:06:22.000
the stop position down and one from the start position over

00:06:22.000 --> 00:06:26.000
and it's looking for elements that are on the wrong side.

00:06:26.000 --> 00:06:30.000
So if we start with our right hand finger on the very end of the array,

00:06:30.000 --> 00:06:31.000
that first loop in there

00:06:31.000 --> 00:06:35.000
is designed to move the right hand down

00:06:35.000 --> 00:06:38.000
looking for something that doesn't belong in left side, so we

00:06:38.000 --> 00:06:41.000
identified 45 as the pivot value. It says

00:06:41.000 --> 00:06:43.000
find something that's not

00:06:43.000 --> 00:06:48.000
greater then 45. That's the first thing we found coming in from the right

00:06:48.000 --> 00:06:50.000
downward that doesn't belong,

00:06:50.000 --> 00:06:51.000
so only that first step.

00:06:51.000 --> 00:06:54.000
So it turns out it takes it just one iteration to find that,

00:06:54.000 --> 00:06:57.000
and so okay, this guy doesn't belong on the right-hand side.

00:06:57.000 --> 00:07:00.000
Now it does the same thing on the inverse on the left-hand side.

00:07:00.000 --> 00:07:03.000
Try to find something that doesn't belong on the left-hand side.

00:07:03.000 --> 00:07:06.000
So the left hand moves up. In

00:07:06.000 --> 00:07:08.000
this case, it took two iterations to get to that,

00:07:08.000 --> 00:07:09.000
to this 92.

00:07:09.000 --> 00:07:13.000
So now we have a perfect opportunity to fix both our problems

00:07:13.000 --> 00:07:15.000
with one swap,

00:07:15.000 --> 00:07:16.000
exchange what's

00:07:16.000 --> 00:07:19.000
at the current position of left hand with right hand and now we will have

00:07:19.000 --> 00:07:23.000
made both of our problems go away and we'll have kind of more things on the

00:07:23.000 --> 00:07:24.000
left and the right that belong there

00:07:24.000 --> 00:07:27.000
and then we can walk inward from there, you know, to find subsequent ones

00:07:27.000 --> 00:07:29.000
that are out of order.

00:07:29.000 --> 00:07:31.000
So those two get exchanged and now

00:07:31.000 --> 00:07:33.000
right again

00:07:33.000 --> 00:07:35.000
moves into find that 8,

00:07:35.000 --> 00:07:37.000
left moves over to find that 74.

00:07:37.000 --> 00:07:38.000
We swap again.

00:07:38.000 --> 00:07:40.000
And as we just keep doing this,

00:07:40.000 --> 00:07:41.000
right, until

00:07:41.000 --> 00:07:43.000
they cross, and once they've cross,

00:07:43.000 --> 00:07:47.000
right then we know that everything that the right scanned over is greater than the pivot,

00:07:47.000 --> 00:07:48.000
everything

00:07:48.000 --> 00:07:51.000
in the left was less than and

00:07:51.000 --> 00:07:55.000
at that point, right we have divided it, right, everything that's small is kind

00:07:55.000 --> 00:07:56.000
of in the left side of the

00:07:56.000 --> 00:07:59.000
vector. Everything that's large in the right side. So I

00:07:59.000 --> 00:08:03.000
swap these two and now I get to that place and I say okay,

00:08:03.000 --> 00:08:05.000
so now big things here, small things there.

00:08:05.000 --> 00:08:10.000
The last thing we do is we move the pivot into the place right between them and

00:08:10.000 --> 00:08:13.000
know that it is exactly located right in the

00:08:13.000 --> 00:08:15.000
slot where they cross. Now

00:08:15.000 --> 00:08:17.000
I have two sub arrays to work on.

00:08:17.000 --> 00:08:21.000
So the return from partition in this case will be the index four here and it

00:08:21.000 --> 00:08:24.000
says Okay, go ahead and quick sort

00:08:24.000 --> 00:08:25.000
this front-most part

00:08:25.000 --> 00:08:28.000
and then come back and do the second part.

00:08:28.000 --> 00:08:31.000
So in recursion, kind of think of that as being postponed and it's waiting; we'll get back to it in a minute, but

00:08:31.000 --> 00:08:34.000
while we work on this step, we'll do this same thing. It's a

00:08:34.000 --> 00:08:39.000
little big harder to see it in the small kind of what's going on, but in this case, right, it

00:08:39.000 --> 00:08:44.000
divided that because the pivot was 8 into 8 and the three things greater than the pivot. In

00:08:44.000 --> 00:08:48.000
this case, the 41 is the pivot so it's actually going to move

00:08:48.000 --> 00:08:50.000
41 over there. It's going to have the left of that as it keeps

00:08:50.000 --> 00:08:53.000
working its way down, it gets smaller and smaller and it's harder to see the workings of

00:08:53.000 --> 00:08:56.000
the algorithm in such a small case of it

00:08:56.000 --> 00:08:58.000
rearranging it, but

00:08:58.000 --> 00:08:59.000
we'll get back to the big

00:08:59.000 --> 00:09:01.000
left half in

00:09:01.000 --> 00:09:02.000
a second.

00:09:02.000 --> 00:09:05.000
And so now that that's all been sorted,

00:09:05.000 --> 00:09:09.000
we're revisiting the side that's advancing above the pivot

00:09:09.000 --> 00:09:13.000
to get these five guys, six guys in the order, taking the same

00:09:13.000 --> 00:09:18.000
strategy. You pivot around 67 [inaudible] doing some swapping

00:09:18.000 --> 00:09:28.000
[inaudible] and

00:09:28.000 --> 00:09:30.000


00:09:30.000 --> 00:09:32.000
we'll see it in slightly bigger case to kind of get the idea,

00:09:32.000 --> 00:09:35.000
but very quickly there's something very interesting about the way quick sort is working

00:09:35.000 --> 00:09:38.000
is that that partition step

00:09:38.000 --> 00:09:41.000
very quickly gets things kind of close to the right place. And

00:09:41.000 --> 00:09:42.000
so now that we've

00:09:42.000 --> 00:09:45.000
done both the left and the right

00:09:45.000 --> 00:09:47.000
we're done. The whole thing is done. Everything kind of got moved into the

00:09:47.000 --> 00:09:50.000
right position as part of the recursive, part of the sorting and doesn't need to be

00:09:50.000 --> 00:09:52.000
further moved

00:09:52.000 --> 00:09:54.000
to solve the whole problem.

00:09:54.000 --> 00:09:57.000
That first step of the partition is doing a lot of the kind of throwing things kind

00:09:57.000 --> 00:09:59.000
of left and right,

00:09:59.000 --> 00:10:03.000
but it's actually quickly moving big things and small things that are out of

00:10:03.000 --> 00:10:05.000
place closer to where they're gonna eventually need to go, and it turns

00:10:05.000 --> 00:10:07.000
out to be a real advantage

00:10:07.000 --> 00:10:11.000
in terms of the running time of this sort because it actually is doing

00:10:11.000 --> 00:10:14.000
some quick movement close to where it needs to be and that kind of fixes it up in the recursive calls that

00:10:14.000 --> 00:10:21.000
examine the smaller sub sections of the [inaudible]. Let

00:10:21.000 --> 00:10:28.000
me show you what it sounds like, because that's really what you want to know. So

00:10:28.000 --> 00:10:29.000
let's -

00:10:29.000 --> 00:10:34.000
it's gonna kind of [inaudible]. So

00:10:34.000 --> 00:10:37.000
you can see the big partitioning happening and

00:10:37.000 --> 00:10:40.000
then it kind of jumbling things up and then coming back to revisit them, but you

00:10:40.000 --> 00:10:44.000
notice that quite quickly

00:10:44.000 --> 00:10:47.000
kind of all the small ones kind of got thrown to the left all. All the large elements to the right

00:10:47.000 --> 00:10:51.000
and then the kind of process of coming back and so the big partition that you can see kind

00:10:51.000 --> 00:10:53.000
of working across them and then kind of noodling down.

00:10:53.000 --> 00:11:21.000
If I turn the sound on for it and I'll probably take it down a little bit.

00:11:21.000 --> 00:11:25.000
So the big partition steps making a lot of noise, right and moving things kind of

00:11:25.000 --> 00:11:28.000
quickly and it almost appears, you know, to hear this kind of random sort of it's hard to

00:11:28.000 --> 00:11:30.000
identify what's going on during the big partition,

00:11:30.000 --> 00:11:34.000
but then as you hear it make its way down the recursive tree it's

00:11:34.000 --> 00:11:36.000
focusing on these smaller and smaller regions where the numbers have all been

00:11:36.000 --> 00:11:39.000
gathered because they are similar in value. And

00:11:39.000 --> 00:11:42.000
you hear a lot of noodling in the same pitch range

00:11:42.000 --> 00:11:45.000
region as its working on the smaller and smaller sub arrays and then they definitely go

00:11:45.000 --> 00:11:49.000
from low to high because of the recursion chooses the left side to operate on

00:11:49.000 --> 00:11:50.000
before the right side

00:11:50.000 --> 00:11:51.000
that you hear

00:11:51.000 --> 00:12:00.000
the work being done in the lower tones before it gets to the finishing up

00:12:00.000 --> 00:12:01.000
of the high tones. Kinda cool. Let us

00:12:01.000 --> 00:12:02.000
come back here and talk about

00:12:02.000 --> 00:12:09.000
how to analyze this guy a little bit.

00:12:09.000 --> 00:12:09.000
So the main part

00:12:09.000 --> 00:12:13.000
of the algorithm is very simple and deceptively simple

00:12:13.000 --> 00:12:15.000
because all the hard work was actually done in partition.

00:12:15.000 --> 00:12:17.000
The partition step is linear and

00:12:17.000 --> 00:12:18.000
if you

00:12:18.000 --> 00:12:21.000
can kind of just go along with me conceptually, you'll see that we're moving this

00:12:21.000 --> 00:12:24.000
left finger in; we're moving this right finger in

00:12:24.000 --> 00:12:27.000
and we stop when they cross, so that means every element was looked at.

00:12:27.000 --> 00:12:28.000
Either

00:12:28.000 --> 00:12:31.000
they both matched over here and they matched over here, but eventually they let you

00:12:31.000 --> 00:12:33.000
look at every element as you worked your way in

00:12:33.000 --> 00:12:37.000
and did some swaps along the way. And so that process is linear and the number of

00:12:37.000 --> 00:12:39.000
elements. There's a thousand of them kind of has to

00:12:39.000 --> 00:12:41.000
advance maybe 400 here and 600 there,

00:12:41.000 --> 00:12:45.000
but we'll look at all the elements and the process of partitioning them to the

00:12:45.000 --> 00:12:48.000
lower or upper halves.

00:12:48.000 --> 00:12:50.000
Then it makes two calls, quick sort

00:12:50.000 --> 00:12:54.000
to the left and the right portions of that.

00:12:54.000 --> 00:12:56.000
Well, we don't know exactly what that division was,

00:12:56.000 --> 00:13:00.000
was it even, was it a little lop sided and that's certainly going to come back to be

00:13:00.000 --> 00:13:01.000
something we're gonna look at.

00:13:01.000 --> 00:13:05.000
If we assume kind of this is the ideal 50/50 split

00:13:05.000 --> 00:13:07.000
sort of in an ideal world

00:13:07.000 --> 00:13:09.000
if that choice we had for the pivot

00:13:09.000 --> 00:13:13.000
happened to be the median or close enough to the median that effectively we got an even

00:13:13.000 --> 00:13:15.000
division, at every level of recursion,

00:13:15.000 --> 00:13:19.000
then the recurrence relation for the whole process is the time required to

00:13:19.000 --> 00:13:19.000
sort, an

00:13:19.000 --> 00:13:20.000
input of N

00:13:20.000 --> 00:13:22.000
is in to partition it

00:13:22.000 --> 00:13:24.000
and then 2,

00:13:24.000 --> 00:13:27.000
sorting two halves that are each

00:13:27.000 --> 00:13:29.000
half again as big as the original input.

00:13:29.000 --> 00:13:33.000
We saw that recurrence relationship for merge sort; we solved it down.

00:13:33.000 --> 00:13:37.000
Think in terms of the tree, right you have the branching of three, branching of two, branching of two

00:13:37.000 --> 00:13:41.000
and at each level the partitioning being done across each level and then it

00:13:41.000 --> 00:13:47.000
going to a depth of login, right, the number of times you can [inaudible] by two [inaudible]

00:13:47.000 --> 00:13:49.000
single case arrays that are easily handled.

00:13:49.000 --> 00:13:53.000
So it is an N login sort in this

00:13:53.000 --> 00:13:54.000
perfect best

00:13:54.000 --> 00:13:55.000
case.

00:13:55.000 --> 00:13:56.000
So that should mean that

00:13:56.000 --> 00:13:59.000
in terms of Big O growth curves, right,

00:13:59.000 --> 00:14:02.000
merge sort and quick sort should look about the same.

00:14:02.000 --> 00:14:05.000
It does have sort of better factors, though.

00:14:05.000 --> 00:14:06.000
Let me show you.

00:14:06.000 --> 00:14:08.000
I go back here and I just run them against each other.

00:14:08.000 --> 00:14:15.000
If I put quick sort versus merge sort here, stop

00:14:15.000 --> 00:14:17.000
making noise,

00:14:17.000 --> 00:14:19.000


00:14:19.000 --> 00:14:21.000
the quick sort

00:14:21.000 --> 00:14:22.000
pretty handily

00:14:22.000 --> 00:14:27.000
managed to beat merge sort in this case, merge sort was on the top, quick sort was underneath

00:14:27.000 --> 00:14:29.000
and certainly in terms of what was going on it was doing,

00:14:29.000 --> 00:14:32.000
looks like more comparisons right, to get everything there, that partition

00:14:32.000 --> 00:14:33.000
step did a lot of comparison,

00:14:33.000 --> 00:14:35.000
but fewer moves,

00:14:35.000 --> 00:14:38.000
and that kind of actually does sort of make intuitive sense. You think that quick

00:14:38.000 --> 00:14:41.000
sort very quickly moves stuff to where it goes and does a little bit of noodling.

00:14:41.000 --> 00:14:44.000
Merge sort does a lot of moving things away and moving them back and kind of like

00:14:44.000 --> 00:14:46.000
as you see it growing,

00:14:46.000 --> 00:14:47.000
you'll see a lot of

00:14:47.000 --> 00:14:50.000
large elements that get

00:14:50.000 --> 00:14:53.000
placed up in the front, but then have to be moved again right, because that was

00:14:53.000 --> 00:14:57.000
not their final resting place. And so merge sort does a lot of kind of movement away and back

00:14:57.000 --> 00:15:00.000
that tends to cost it overall, right. You know, a

00:15:00.000 --> 00:15:05.000
higher constant factor of the moves than something like quick sort

00:15:05.000 --> 00:15:08.000
does where it more quickly gets to the right place and then doesn't move it

00:15:08.000 --> 00:15:13.000
again. So

00:15:13.000 --> 00:15:16.000
that was all well and good. That was assuming that everything went perfectly.

00:15:16.000 --> 00:15:20.000
That was given our simplistic choice of the pivot,

00:15:20.000 --> 00:15:24.000
you can imagine that's really assuming a lot right that went our way.

00:15:24.000 --> 00:15:29.000
If we look at a particularly bad split,

00:15:29.000 --> 00:15:32.000
let's imagine that the number we have to pick is pretty close to,

00:15:32.000 --> 00:15:34.000
you know, one of the extremes. If I were

00:15:34.000 --> 00:15:35.000
sorting papers by alphabet

00:15:35.000 --> 00:15:38.000
and if the one on top happened to be a C,

00:15:38.000 --> 00:15:40.000
right, well then I'm gonna be dividing it pretty lop sided, right, it's just gonna

00:15:40.000 --> 00:15:44.000
be the As and the Bs on one side and then everything [inaudible] to the end on the other

00:15:44.000 --> 00:15:45.000
side.

00:15:45.000 --> 00:15:49.000
If I got a split that was about 90/10, ninety percent went on one side, ten

00:15:49.000 --> 00:15:50.000
percent on the other,

00:15:50.000 --> 00:15:52.000
you would expect right

00:15:52.000 --> 00:15:56.000
for it to kind of change the running time of what's going on here, so I get

00:15:56.000 --> 00:15:59.000
one tenth of them over here and the low half and maybe be nine-tenths in the right half.

00:15:59.000 --> 00:16:03.000
Let's just say every time it always takes nine-tenths and moves it to the upper half, so I

00:16:03.000 --> 00:16:05.000
kept picking something that's artificially close to the front.

00:16:05.000 --> 00:16:07.000
These like will kind of peter out

00:16:07.000 --> 00:16:11.000
fairly quickly diving N by 10 and 10 and 10 and 10, eventually these are gonna peter

00:16:11.000 --> 00:16:14.000
out very soon, but this one arm over there

00:16:14.000 --> 00:16:18.000
where I keep taking 90 percent of what remains. I had 90 percent, but I had

00:16:18.000 --> 00:16:19.000
81 percent.

00:16:19.000 --> 00:16:23.000
I keep multiplying by nine-tenths and I'm still kind of retaining a lot of

00:16:23.000 --> 00:16:26.000
elements on this one big portion,

00:16:26.000 --> 00:16:29.000
that the depth of this tree isn't gonna bottom out quite as quickly as it

00:16:29.000 --> 00:16:33.000
used to, but it used to the number of times you could divide N by 2, the log based

00:16:33.000 --> 00:16:35.000
2 of N was where we landed.

00:16:35.000 --> 00:16:37.000
In this case, I have to

00:16:37.000 --> 00:16:40.000
take N and multiply it by nine-tenths so

00:16:40.000 --> 00:16:41.000
instead of one half,

00:16:41.000 --> 00:16:43.000
right, to the K, it's nine tenths to the K

00:16:43.000 --> 00:16:47.000
and it's like how many iterations how deep will this get before it gets

00:16:47.000 --> 00:16:49.000
to the simple case

00:16:49.000 --> 00:16:51.000
and so solving for

00:16:51.000 --> 00:16:56.000
N equals ten-ninths to the K is taking the log-based ten-ninths of both sides,

00:16:56.000 --> 00:17:02.000
then K, the number of levels is the log-based ten-ninths of them. We're kind of

00:17:02.000 --> 00:17:05.000
relying on a fact of mathematics here though is that all algorithms are the

00:17:05.000 --> 00:17:09.000
same value, so log-based A of N and log-based B of

00:17:09.000 --> 00:17:12.000
N, they only differ by some constant that you can

00:17:12.000 --> 00:17:15.000
compute if you just rearrange the terms around. So in effect

00:17:15.000 --> 00:17:16.000
this means that

00:17:16.000 --> 00:17:20.000
there is a constant in front of this. It's like a constant

00:17:20.000 --> 00:17:23.000
difference, the ratio between ten-ninths and 2

00:17:23.000 --> 00:17:27.000
that distinguishes these, but it still can be logged based 2 of N in

00:17:27.000 --> 00:17:30.000
terms of Big O, right, where we can throw away those constants.

00:17:30.000 --> 00:17:34.000
So even those this will perform, obviously in, you know,

00:17:34.000 --> 00:17:38.000
experimentally you would expect that getting this kind of split would cause it

00:17:38.000 --> 00:17:39.000
to

00:17:39.000 --> 00:17:40.000
work more slowly than it would have

00:17:40.000 --> 00:17:44.000
in a perfect split situation. The Big O [inaudible] is still gonna look in-log-in arrhythmic and

00:17:44.000 --> 00:17:47.000
[inaudible]. So

00:17:47.000 --> 00:17:50.000
that was pretty good to know. So it makes you feel a little bit confident, like sometimes it might be

00:17:50.000 --> 00:17:54.000
getting an even split and sometimes it might be getting one-third, two-thirds, sometimes it might be getting

00:17:54.000 --> 00:17:56.000
you know, nine-tenths, but if it was kind of

00:17:56.000 --> 00:18:01.000
in the mix still dividing off some fraction is still doing okay.

00:18:01.000 --> 00:18:07.000
Now, let's look at the really, really, really worst case split. The

00:18:07.000 --> 00:18:10.000
really, really worst case split right, it didn't even take off

00:18:10.000 --> 00:18:13.000
a fraction. It just managed to separate one,

00:18:13.000 --> 00:18:15.000
but somehow the pivot here

00:18:15.000 --> 00:18:16.000


00:18:16.000 --> 00:18:17.000
did a really

00:18:17.000 --> 00:18:20.000
terribly crummy job, right, of dividing it at all

00:18:20.000 --> 00:18:23.000
that in effect, all the elements were greater than the pivot or less than

00:18:23.000 --> 00:18:27.000
the pivot since they all landed on one side and the pivot was kind of by itself.

00:18:27.000 --> 00:18:29.000
So starting with an input of size N,

00:18:29.000 --> 00:18:33.000
we sectioned off 1 and we had N minus 1 remaining. Well, the next time,

00:18:33.000 --> 00:18:36.000
we sectioned off another one, so this happened again and again and again.

00:18:36.000 --> 00:18:39.000
We just got really bad luck all the way through.

00:18:39.000 --> 00:18:42.000
Each of these levels is only making progress for the base case at a snails

00:18:42.000 --> 00:18:43.000
pace.

00:18:43.000 --> 00:18:47.000
Right, one element is being subtracted each subsequent level

00:18:47.000 --> 00:18:50.000
and that's gonna go to a depth of N.

00:18:50.000 --> 00:18:53.000
And so now if we're doing N work across the levels, it isn't quite N

00:18:53.000 --> 00:18:56.000
work because in fact, some of these bottom out. It's still the Gaussian sum in the

00:18:56.000 --> 00:18:57.000
end

00:18:57.000 --> 00:19:01.000
is that we are got N levels doing N work each.

00:19:01.000 --> 00:19:03.000
We suddenly have

00:19:03.000 --> 00:19:05.000
what was a linear arrhythmic algorithm

00:19:05.000 --> 00:19:07.000
now performing quadritically. So

00:19:07.000 --> 00:19:09.000
in the class the selection sort,

00:19:09.000 --> 00:19:11.000
inscription sort

00:19:11.000 --> 00:19:14.000
in its worst-case behavior.

00:19:14.000 --> 00:19:17.000
So quite a range whereas merge sort is a very reliable performer, merge doesn't

00:19:17.000 --> 00:19:18.000
care.

00:19:18.000 --> 00:19:21.000
No matter what the order is, no matter what's going on, it's the same

00:19:21.000 --> 00:19:22.000
amount of work,

00:19:22.000 --> 00:19:26.000
it means that there's some inputs that can cause quick sort to vary between being linear

00:19:26.000 --> 00:19:27.000
arrhythmic or being quadratic,

00:19:27.000 --> 00:19:30.000
and there's a huge difference as we saw in our earlier charts about

00:19:30.000 --> 00:19:32.000
how those things will run for

00:19:32.000 --> 00:19:36.000
large values. So what

00:19:36.000 --> 00:19:38.000
makes the worst case

00:19:38.000 --> 00:19:42.000
- given how we're choosing a pivot right now is to take the first thing in the

00:19:42.000 --> 00:19:44.000
sub section to look at as the pivot,

00:19:44.000 --> 00:19:47.000
what's the example input that gives you this? Sorted. If

00:19:47.000 --> 00:19:49.000
it's already sorted.

00:19:49.000 --> 00:19:50.000
Isn't that a charmer?

00:19:50.000 --> 00:19:54.000
Here's a sorting algorithm. If you ask it to do something

00:19:54.000 --> 00:19:56.000
and in fact if you accidentally [inaudible] twice, you already had sorted

00:19:56.000 --> 00:19:59.000
the data and you said, ''Oh, you did something,'' and you passed it back to the sort,

00:19:59.000 --> 00:20:03.000
it would suddenly actually degenerate into its very worst case. It's already

00:20:03.000 --> 00:20:05.000
sorted, so it would say,

00:20:05.000 --> 00:20:06.000
''I've got this stack of exam papers.''

00:20:06.000 --> 00:20:09.000
I look at the first one and I say, ''Oh, look it's Adam's.'' And I say, ''Okay, well, let me

00:20:09.000 --> 00:20:12.000
go find all the ones that belong in the left side.'' None of them do. I said, ''Okay,

00:20:12.000 --> 00:20:15.000
well put Adam's over here.'' I'll [inaudible] sort that. That was easy.

00:20:15.000 --> 00:20:18.000
Oh, let me look at what I've got left, oh with baker on the top.

00:20:18.000 --> 00:20:21.000
Okay, well let me put Baker by itself, but could we find the other ones? Oh, no,

00:20:21.000 --> 00:20:24.000
no more. It would just you

00:20:24.000 --> 00:20:27.000
know, continue because I'm doing this thing, looking at all N, looking at all N minus 1, looking

00:20:27.000 --> 00:20:31.000
at N minus 2, all the way down to the bottom making no,

00:20:31.000 --> 00:20:35.000
recognizing nothing about how beautifully the data already was arranged

00:20:35.000 --> 00:20:36.000
and

00:20:36.000 --> 00:20:38.000
you know, getting its worst case.

00:20:38.000 --> 00:20:42.000
There's actually a couple of others that come in, too. That's probably the most obvious one to think

00:20:42.000 --> 00:20:46.000
of is it's coming in in increasing order. It's also true if its in decreasing order.

00:20:46.000 --> 00:20:49.000
If I happen to have the largest value on the top, then

00:20:49.000 --> 00:20:52.000
I'll end up splitting it all into, everything to the left side and nothing

00:20:52.000 --> 00:20:53.000
to the right.

00:20:53.000 --> 00:20:58.000
There are actually other variations that will also produce this. If at any given

00:20:58.000 --> 00:21:01.000
stage that we're looking at a sub array, if the first value happens to be the extreme

00:21:01.000 --> 00:21:04.000
of what remains whether it's the small to the large, and it could alternate, which would

00:21:04.000 --> 00:21:06.000
be kind of a funny pattern, but if you

00:21:06.000 --> 00:21:10.000
have the tallest and the shortest and the tallest and then another tallest and then a shortest,

00:21:10.000 --> 00:21:13.000
so those would look a little bit harder to describe, but

00:21:13.000 --> 00:21:18.000
there are some other alternatives, that product this bad result.

00:21:18.000 --> 00:21:21.000
All of these we could call degenerate cases.

00:21:21.000 --> 00:21:25.000
There's a small number of these relative to the N factorial [inaudible] your

00:21:25.000 --> 00:21:28.000
data could come in. In fact, there was a huge number, and so there's lots and

00:21:28.000 --> 00:21:29.000
lots of ways it could be arranged.

00:21:29.000 --> 00:21:32.000
There's a very small number of them that would be arranged in such a way to

00:21:32.000 --> 00:21:35.000
trigger this very, very worst-case behaviors.

00:21:35.000 --> 00:21:38.000
Some are close to worst case, like if they were almost sorted, they might every now and

00:21:38.000 --> 00:21:41.000
then get a good split, but mostly a bad split,

00:21:41.000 --> 00:21:44.000
but the number that actually degenerate to the absolute worst case is actually

00:21:44.000 --> 00:21:47.000
quite small, but

00:21:47.000 --> 00:21:49.000
the truth is

00:21:49.000 --> 00:21:51.000
is do we expect that those outputs are somehow

00:21:51.000 --> 00:21:56.000
hard to imagine us getting. Are they so unusual

00:21:56.000 --> 00:21:59.000
and weird that you might be willing to say it's okay that my algorithm has

00:21:59.000 --> 00:22:01.000
this one or a couple of bad cases in it

00:22:01.000 --> 00:22:03.000
because it never comes up.

00:22:03.000 --> 00:22:06.000
In this case, the only thing that your data came in sorted or almost sorted or reverse sorted

00:22:06.000 --> 00:22:09.000
is probably not unlikely.

00:22:09.000 --> 00:22:11.000
It might be that you know, you happen to be reading from

00:22:11.000 --> 00:22:15.000
a file on disc and somebody has taken the [inaudible] and sort it before they gave the

00:22:15.000 --> 00:22:19.000
data to you. If all the sudden that caused your program to behave really badly,

00:22:19.000 --> 00:22:20.000
that would really be

00:22:20.000 --> 00:22:21.000
a

00:22:21.000 --> 00:22:23.000
pretty questionable outcome.

00:22:23.000 --> 00:22:27.000
So let's go aback and look at just to see, though. It's kind of interesting to see

00:22:27.000 --> 00:22:30.000
it happening in

00:22:30.000 --> 00:22:33.000
- this is against quick sort versus merge sort.

00:22:33.000 --> 00:22:38.000
If I change the data to be partially sorted

00:22:38.000 --> 00:22:41.000
and let it run again,

00:22:41.000 --> 00:22:43.000
if I still manage to beat it

00:22:43.000 --> 00:22:44.000
doing a little bit more,

00:22:44.000 --> 00:22:47.000
if I change it to totally sorted

00:22:47.000 --> 00:22:53.000
or reverse sorted, for that matter, and

00:22:53.000 --> 00:22:54.000
so they look like they're doing a lot of work,

00:22:54.000 --> 00:22:56.000
a lot of work about nothing.

00:22:56.000 --> 00:22:57.000
And you can see that quick sort

00:22:57.000 --> 00:22:59.000
really is still way back there.

00:22:59.000 --> 00:23:02.000
It has looked at the first 10 percent or so and is doing a lot

00:23:02.000 --> 00:23:04.000
of frantic partitioning

00:23:04.000 --> 00:23:07.000
that never moves anything.

00:23:07.000 --> 00:23:11.000
And visibly kind of traipsing it's way up there. Merge sort meanwhile is actually

00:23:11.000 --> 00:23:13.000
on the beach in Jamaica with a

00:23:13.000 --> 00:23:15.000
daiquiri

00:23:15.000 --> 00:23:16.000
mocking quick sort

00:23:16.000 --> 00:23:21.000
for all those times that it had said it was so much better than it was.

00:23:21.000 --> 00:23:24.000
''Duh, it was already sorted; you Doofus.''

00:23:24.000 --> 00:23:25.000
Apparently it wasn't listening. Merge

00:23:25.000 --> 00:23:28.000
sort almost looked like it did nothing and that's because it took the left half,

00:23:28.000 --> 00:23:32.000
which is the smallest half, moved it off, right, kind of copied it back and it ends up copying

00:23:32.000 --> 00:23:38.000
each element back to the same location it was already originally present in. There you go, quick sort

00:23:38.000 --> 00:23:39.000
taking its time and it if I

00:23:39.000 --> 00:23:42.000
ran it against, let's say,

00:23:42.000 --> 00:23:46.000
insertion sort and selection sort, why not

00:23:46.000 --> 00:23:50.000
[inaudible] nothing better to do than to sort for you all day long.

00:23:50.000 --> 00:23:54.000
So there insertion sort actually finishing very early because it does nothing. It sort of

00:23:54.000 --> 00:23:57.000
looked at them all and said they were already in the right order,

00:23:57.000 --> 00:24:00.000
merge sort doing a little bit more work to move everything back and forth in the same place.

00:24:00.000 --> 00:24:02.000
But

00:24:02.000 --> 00:24:02.000


00:24:02.000 --> 00:24:04.000
in this case, selection sort

00:24:04.000 --> 00:24:06.000
and

00:24:06.000 --> 00:24:08.000
quick sort here at the bottom and selection sort here at the top

00:24:08.000 --> 00:24:11.000
doing really roughly about the same work if you look at the sum total of the comparisons

00:24:11.000 --> 00:24:13.000
and moves

00:24:13.000 --> 00:24:16.000
and then time spent on those suddenly shows that yeah, in this input

00:24:16.000 --> 00:24:19.000
situation quick sort is performing like an N-squared sort,

00:24:19.000 --> 00:24:23.000
and obviously very similar constant factors to those present in selection sort,

00:24:23.000 --> 00:24:24.000
so really

00:24:24.000 --> 00:24:28.000
behaving totally quadratically in that worst-case input. If I make

00:24:28.000 --> 00:24:38.000
it reverse sorted, it's a little more interesting because you can kind of see something going on. Everybody kind of doing their thing.

00:24:38.000 --> 00:24:40.000
Insertion sort now kind of hitting its worst case,

00:24:40.000 --> 00:24:43.000
so it actually coming in dead last because it

00:24:43.000 --> 00:24:45.000
did so much extra moving, but

00:24:45.000 --> 00:24:49.000
still showing the kind of quadratic terms, right, that selection

00:24:49.000 --> 00:24:53.000
sort and quick sort are both having, but higher constant factors there, so taking almost

00:24:53.000 --> 00:24:54.000
twice as long because

00:24:54.000 --> 00:24:55.000
actually doing a little

00:24:55.000 --> 00:25:02.000
bit extra work because of that inverted

00:25:02.000 --> 00:25:04.000
situation. [Inaudible] something big, just cause. We'll put it back

00:25:04.000 --> 00:25:10.000
into random, back into full speed and kind of watch some things happen.

00:25:10.000 --> 00:25:11.000
So quick sort right, just

00:25:11.000 --> 00:25:14.000
sort of done in a flash, merge sort finishing behind it

00:25:14.000 --> 00:25:17.000
and ordinarily

00:25:17.000 --> 00:25:20.000
the quadratic sorts taking quite a much

00:25:20.000 --> 00:25:22.000
longer time than our two recursive sorts. But

00:25:22.000 --> 00:25:30.000
it's the random input really buying quick sort its speed. Is

00:25:30.000 --> 00:25:33.000
it gonna win - oh, come on. Oh, come on. Don't you want selection sort to come behind? I always wanted it to

00:25:33.000 --> 00:25:37.000
come from behind. It's kind of like rooting for the underdog.

00:25:37.000 --> 00:25:38.000
Yes.

00:25:38.000 --> 00:25:42.000
And yet, just remember don't mock insertion sort. What it sorted is

00:25:42.000 --> 00:25:46.000
the only one that recognizes it and does something clever about it. [Inaudible]

00:25:46.000 --> 00:25:49.000
sort is anti-clever about it, so

00:25:49.000 --> 00:25:55.000
it still can hold its head up and be proud. This

00:25:55.000 --> 00:25:58.000
kind of [inaudible]. I was

00:25:58.000 --> 00:26:01.000
thinking about like what algorithms, right, there's a reason why

00:26:01.000 --> 00:26:04.000
there's four that we talked about and there's actually a dozen

00:26:04.000 --> 00:26:07.000
more. It's like different situations actually produce different

00:26:07.000 --> 00:26:09.000
outcomes for different

00:26:09.000 --> 00:26:12.000
sorts and there are reasons that even though they might not be the best sort for all

00:26:12.000 --> 00:26:15.000
purposes, there are some situations where it might be the right one, so if you had it,

00:26:15.000 --> 00:26:18.000
as data said that you expect it to be mostly sorted,

00:26:18.000 --> 00:26:19.000
but had a few [inaudible] in it.

00:26:19.000 --> 00:26:22.000
Like if you had a pile or sand papers that were sorted and you dropped them on the floor and they

00:26:22.000 --> 00:26:25.000
got scuffed up a little bit, but they're still largely sorted, insertion sort is the

00:26:25.000 --> 00:26:26.000
way to

00:26:26.000 --> 00:26:29.000
go. It will take advantage of the work that was already done

00:26:29.000 --> 00:26:33.000
and not redo it or create kind of havoc the way quick sort

00:26:33.000 --> 00:26:36.000
would. But quick sort, the last thing we need to tell you, though, about it is we can't

00:26:36.000 --> 00:26:40.000
tolerate this. Like the way quick sort is in its classic form

00:26:40.000 --> 00:26:41.000
with this

00:26:41.000 --> 00:26:42.000
first element as pivot

00:26:42.000 --> 00:26:45.000
would be an unacceptable way to do this algorithm.

00:26:45.000 --> 00:26:49.000
So quick sort actually typically is one of the most standard element that's offered in

00:26:49.000 --> 00:26:52.000
a library of programming languages, the sort element.

00:26:52.000 --> 00:26:53.000
Well, it

00:26:53.000 --> 00:26:57.000
has to have some strategy for how to deal with this in a way that does not

00:26:57.000 --> 00:26:59.000
degenerate

00:26:59.000 --> 00:27:02.000
and so the idea is, what you need to do is just pick a different

00:27:02.000 --> 00:27:03.000
choice for the pivot,

00:27:03.000 --> 00:27:06.000
a little bit more clever, spend a little bit more time,

00:27:06.000 --> 00:27:09.000
do something that's a little less predictable than just picking that first

00:27:09.000 --> 00:27:11.000
most element.

00:27:11.000 --> 00:27:12.000
So to take

00:27:12.000 --> 00:27:16.000
it to the far extreme, one thing you could do is just computer the median, analyze the data and compute the median.

00:27:16.000 --> 00:27:19.000
It turns out there is a linear time algorithm for this that

00:27:19.000 --> 00:27:22.000
would look at every element of the data set once and then be able to tell

00:27:22.000 --> 00:27:23.000
you what the median was

00:27:23.000 --> 00:27:25.000
and that would guarantee you that 50/50 split.

00:27:25.000 --> 00:27:30.000
So if you go find it and use it at every stage, you will guarantee it.

00:27:30.000 --> 00:27:34.000
Most algorithms don't tend to do that. That's actually kind of overkill for the problem. We

00:27:34.000 --> 00:27:34.000
want to get it to where

00:27:34.000 --> 00:27:38.000
it's pretty much guaranteed to never get the worst case. But we're not

00:27:38.000 --> 00:27:41.000
concerned with it getting 50/50. It's got 60/40 or something close enough, that

00:27:41.000 --> 00:27:45.000
actually, you know, 60/40, 70/30 and it was

00:27:45.000 --> 00:27:47.000
bopping around in that range it'd be fine,

00:27:47.000 --> 00:27:50.000
so the other two that are much more commonly used

00:27:50.000 --> 00:27:53.000
is some kind of approximation of the median

00:27:53.000 --> 00:27:56.000
with a little bit of guessing or something in it. For example, median of three

00:27:56.000 --> 00:27:57.000
takes

00:27:57.000 --> 00:28:01.000
three elements and typically it takes three from some specified position, it takes

00:28:01.000 --> 00:28:02.000
the middle most element,

00:28:02.000 --> 00:28:04.000
the last element and the front element and it says,

00:28:04.000 --> 00:28:06.000
''Okay, given those three,

00:28:06.000 --> 00:28:10.000
we arrange them to find out what's the middle of those three, and use that.''

00:28:10.000 --> 00:28:12.000
If the data was already sorted, it turns out you've got the median, right

00:28:12.000 --> 00:28:14.000
because it wasn't the middle most.

00:28:14.000 --> 00:28:17.000
If data was just in random position, then you've got one that you know, at least

00:28:17.000 --> 00:28:22.000
there's one element on one side, right, and so the odds that that every single time

00:28:22.000 --> 00:28:25.000
would produce a very bad split is pretty low.

00:28:25.000 --> 00:28:28.000
There are some inputs that could kind of get it to generate a little bit, but it's

00:28:28.000 --> 00:28:32.000
pretty foolproof in most ordinary situations.

00:28:32.000 --> 00:28:36.000
Even more unpredictable would be just choose a random element.

00:28:36.000 --> 00:28:39.000
So look at the start to stop index you have, pick an random there,

00:28:39.000 --> 00:28:44.000
flop in to the front and now use it as the pivot element.

00:28:44.000 --> 00:28:45.000
If you

00:28:45.000 --> 00:28:47.000
don't know ahead of time how your random number [inaudible] it's impossible

00:28:47.000 --> 00:28:51.000
to generate an input and force it into the worst-case behavior. And so the idea is that

00:28:51.000 --> 00:28:53.000
you're kind of counting on randomness

00:28:53.000 --> 00:28:56.000
and just the probabilistic outcome if

00:28:56.000 --> 00:29:00.000
it managing to be such that the way it shows the random element was to pick the

00:29:00.000 --> 00:29:02.000
extreme and everything, it is just impossible,

00:29:02.000 --> 00:29:04.000
the odds are astronomically against it,

00:29:04.000 --> 00:29:06.000
and so it will,

00:29:06.000 --> 00:29:08.000
a very simple fix, right

00:29:08.000 --> 00:29:09.000


00:29:09.000 --> 00:29:12.000
that still leaves the possibility of the worst case in there,

00:29:12.000 --> 00:29:17.000
but you know, much, much, much far removed probability sense. So,

00:29:17.000 --> 00:29:20.000
just simple things, right, but from there the algorithm operates the

00:29:20.000 --> 00:29:22.000
same way it always has which is to say,

00:29:22.000 --> 00:29:24.000
''Pick the median, however, you want to do it,

00:29:24.000 --> 00:29:27.000
move it into that front slot and then carry on as before in terms of the left

00:29:27.000 --> 00:29:30.000
and the right hand and moving down and swapping and recursing and all that [inaudible].''

00:29:30.000 --> 00:29:34.000
Any

00:29:34.000 --> 00:29:40.000
questions; anything about sorting, sorting algorithms? Why don't we

00:29:40.000 --> 00:29:42.000
do some coding actually,

00:29:42.000 --> 00:29:44.000
which is the next thing I want to show you

00:29:44.000 --> 00:29:46.000
because once you know how to write sorts,

00:29:46.000 --> 00:29:47.000


00:29:47.000 --> 00:29:50.000
sort of the other piece I think you need to have to go with it is how would I write

00:29:50.000 --> 00:29:54.000
a sort to be generally useful in a large variety of situations that

00:29:54.000 --> 00:29:57.000
knowing about these sorts I might decide to build

00:29:57.000 --> 00:30:01.000
myself a really general purpose, good, high performance, quick sort, but I

00:30:01.000 --> 00:30:03.000
could use again, and again and again.

00:30:03.000 --> 00:30:07.000
I want to make it work generically so I could sort strings, I could sort

00:30:07.000 --> 00:30:10.000
numbers, or I could sort students, or I could sort

00:30:10.000 --> 00:30:11.000
vectors of students or whatever,

00:30:11.000 --> 00:30:14.000
and I'd want to make sure that no matter what I needed to do it was gonna

00:30:14.000 --> 00:30:17.000
solve all my problems, this one kid of sorting tool,

00:30:17.000 --> 00:30:20.000
and we need to know a little bit of C++ to make that work

00:30:20.000 --> 00:30:22.000
by use of the function template.

00:30:22.000 --> 00:30:27.000
So if you want to ask about sorting algorithms, now is the time. [Inaudible]. We don't

00:30:27.000 --> 00:30:28.000
bubble sort,

00:30:28.000 --> 00:30:31.000
which is kind of interesting. It will come up actually in the assignment

00:30:31.000 --> 00:30:34.000
I give you next week because it is a little bit of an historical artifact

00:30:34.000 --> 00:30:38.000
to know about it, but it turns out bubble sort is one of those sorts that

00:30:38.000 --> 00:30:43.000
is dominated by pretty much every other sort you'll know about in every way, as there's

00:30:43.000 --> 00:30:45.000
really nothing to recommend it. As I said, each of these four have something

00:30:45.000 --> 00:30:49.000
about them that actually has a strength or whatever. Bubble sort really doesn't.

00:30:49.000 --> 00:30:52.000
It's a little bit harder to write than insertion sort. It's a little bit slower than insertion

00:30:52.000 --> 00:30:56.000
sort; it's a little bit easier to get it wrong than insertion sort. It has higher constant

00:30:56.000 --> 00:30:57.000
factors. You know,

00:30:57.000 --> 00:31:01.000
it does recognize the data and sort order, but so does insertion sort, so it's hard to come up with

00:31:01.000 --> 00:31:04.000
a reason to actually spend a lot of time on it, but I will expose you to it in the

00:31:04.000 --> 00:31:07.000
assignment because I think it's a little bit of a -

00:31:07.000 --> 00:31:08.000
it's part of

00:31:08.000 --> 00:31:10.000
our history right, as a computer science

00:31:10.000 --> 00:31:14.000
student to be expose to.

00:31:14.000 --> 00:31:16.000
Anything else? I'm gonna show

00:31:16.000 --> 00:31:19.000
you some things about function templates. Oh, these

00:31:19.000 --> 00:31:21.000
are some numbers; I forgot to give that.

00:31:21.000 --> 00:31:23.000
Just to say, in the end, right,

00:31:23.000 --> 00:31:26.000
which we saw a little bit in the analysis that the constant factors on quick

00:31:26.000 --> 00:31:29.000
sort are

00:31:29.000 --> 00:31:33.000
noticeable when compared to merge sort by a factor of 4,

00:31:33.000 --> 00:31:35.000
moving things more quickly and not having to

00:31:35.000 --> 00:31:39.000
mess with them again. Quick

00:31:39.000 --> 00:31:41.000
sort [inaudible]

00:31:41.000 --> 00:31:44.000
in totally random order?

00:31:44.000 --> 00:31:48.000
Yes, they are. Yes - So it doesn't have them taking - So this is actually using a classic quick

00:31:48.000 --> 00:31:52.000
sort with no degenerate protection on random data.

00:31:52.000 --> 00:31:55.000
If I had put it in, sorted it in on that case, you would definitely see

00:31:55.000 --> 00:31:58.000
like numbers comparable to like selection sorts, eight hours in that

00:31:58.000 --> 00:32:00.000
last slot, right. Or

00:32:00.000 --> 00:32:01.000
you

00:32:01.000 --> 00:32:03.000
[inaudible] degenerate protection, and it would probably

00:32:03.000 --> 00:32:07.000
have an in the noise a small slow down for that little extra work that's being

00:32:07.000 --> 00:32:11.000
done, but then you'll be getting in log and performance reliable across

00:32:11.000 --> 00:32:13.000
all states of inputs. So

00:32:13.000 --> 00:32:15.000
[inaudible] protection as it

00:32:15.000 --> 00:32:19.000
starts to even out time wise with merge sort, does it still - No, it doesn't.

00:32:19.000 --> 00:32:23.000
It actually is an almost imperceptible change in the time because it depends on

00:32:23.000 --> 00:32:26.000
which form of it you use because if you use, for example the random or median of three,

00:32:26.000 --> 00:32:31.000
the amount of work you're doing is about two more things per level in the tree and

00:32:31.000 --> 00:32:34.000
so it turns out that, yeah,

00:32:34.000 --> 00:32:37.000
over the space of login it's just not enough to even be noticeable.

00:32:37.000 --> 00:32:41.000
Now if you did the full median algorithm, you could actually start to see it because then

00:32:41.000 --> 00:32:45.000
you'd see both a linear partition step and a linear median step and that

00:32:45.000 --> 00:32:47.000
would actually raise the cause and factors where it would probably be closer in line to

00:32:47.000 --> 00:32:49.000
merge sort. Pretty

00:32:49.000 --> 00:32:58.000
much nobody writes it that way is the truth, but you could kind of in theory is why we [inaudible]. What I'm

00:32:58.000 --> 00:32:59.000
going to show you, kind

00:32:59.000 --> 00:33:02.000
of motivate this by starting with something simple, and then we're gonna move towards kind of building

00:33:02.000 --> 00:33:04.000
the

00:33:04.000 --> 00:33:05.000
fully generic,

00:33:05.000 --> 00:33:06.000


00:33:06.000 --> 00:33:08.000
you know, one

00:33:08.000 --> 00:33:10.000
algorithm does it all,

00:33:10.000 --> 00:33:12.000
sorting function.

00:33:12.000 --> 00:33:16.000
So what I'm gonna first look at is flop because it's a little bit easier to see it in this case,

00:33:16.000 --> 00:33:19.000
is that you know, sometimes you need to take two variables and exchange their values. I mean

00:33:19.000 --> 00:33:23.000
you need to go through a temporary to copy one and then copy the other back and what

00:33:23.000 --> 00:33:26.000
not - swapping characters, swapping ends, swapping strings, swapping students, you

00:33:26.000 --> 00:33:28.000
know, any kind of variables you wanted to swap,

00:33:28.000 --> 00:33:29.000
you could write

00:33:29.000 --> 00:33:31.000
a specific swap function

00:33:31.000 --> 00:33:35.000
that takes two variables by reference of the integer, strain or character type that

00:33:35.000 --> 00:33:36.000
you're trying to exchange

00:33:36.000 --> 00:33:40.000
and then copies one to a temporary and exchanges the two values.

00:33:40.000 --> 00:33:43.000
Because of the way C++ functions work, right,

00:33:43.000 --> 00:33:46.000
you really do need to say, ''It's a string, this is a string, you know, what's being

00:33:46.000 --> 00:33:49.000
declared here is a string,'' and so you might say, ''Well, if I need more than one swap, sometimes that swaps strings and characters, you know, my choices

00:33:49.000 --> 00:33:52.000
are

00:33:52.000 --> 00:33:53.000
basically

00:33:53.000 --> 00:33:58.000
to duplicate and copy and paste and so on.

00:33:58.000 --> 00:34:02.000
That's not good; we'd rather not do that.

00:34:02.000 --> 00:34:04.000
But what I want to do is I want to

00:34:04.000 --> 00:34:08.000
discuss what it takes to write something in template form.

00:34:08.000 --> 00:34:12.000
We have been using templates right, the vector is a template, the set is a

00:34:12.000 --> 00:34:15.000
template, the stack and queue and all these things are templates

00:34:15.000 --> 00:34:17.000
that are written

00:34:17.000 --> 00:34:20.000
in a very generic way using a place

00:34:20.000 --> 00:34:24.000
holder, so the code that holds onto your collection of things is written

00:34:24.000 --> 00:34:27.000
saying, ''I don't know what the thing is; is it a string; is it an N; is it a car?'' Well,

00:34:27.000 --> 00:34:31.000
vectors just hold onto things and you as a client can decide what you're gonna

00:34:31.000 --> 00:34:33.000
be storing in this particular kind of vector.

00:34:33.000 --> 00:34:37.000
And so what we're gonna see is if I take those flat functions and I try to

00:34:37.000 --> 00:34:41.000
distill them down and say, ''Well, what is it that any swap routine looks like?''

00:34:41.000 --> 00:34:43.000
It needs to take two variables by reference, it needs

00:34:43.000 --> 00:34:46.000
to declare a template of that type and it needs to do the assignment all around

00:34:46.000 --> 00:34:47.000


00:34:47.000 --> 00:34:49.000
to exchange those values.

00:34:49.000 --> 00:34:51.000
Can I write a version

00:34:51.000 --> 00:34:55.000
that is tight unspecified leaving a placeholder in there for

00:34:55.000 --> 00:34:59.000
what -

00:34:59.000 --> 00:35:00.000
that's really

00:35:00.000 --> 00:35:01.000
kind of amazing,

00:35:01.000 --> 00:35:03.000
that what we'Q1: gonna be swapping

00:35:03.000 --> 00:35:05.000
and then

00:35:05.000 --> 00:35:10.000
let there be multiple swaps generated from that one pattern.

00:35:10.000 --> 00:35:11.000
So we're gonna do this actually in

00:35:11.000 --> 00:35:14.000
the compiler because it's always

00:35:14.000 --> 00:35:16.000
nice to see a little bit a code happening. So I

00:35:16.000 --> 00:35:19.000
go over here

00:35:19.000 --> 00:35:20.000
and

00:35:20.000 --> 00:35:24.000
I got some code up there that

00:35:24.000 --> 00:35:26.000
I'm just currently gonna [inaudible] because I don't want to deal

00:35:26.000 --> 00:35:30.000
with it right now. We're gonna see it in a second. It doesn't [inaudible]. Okay,

00:35:30.000 --> 00:35:41.000
so your usual swap looks like this.

00:35:41.000 --> 00:35:44.000
And that would only swap exactly integers. If I wanted characters I'd

00:35:44.000 --> 00:35:49.000
change it all around. So what I'm gonna change it to is a template. I'm gonna add a template

00:35:49.000 --> 00:35:50.000
header at the top

00:35:50.000 --> 00:35:53.000
and so the template header starts with a key-word template and then angle brackets that

00:35:53.000 --> 00:35:54.000
says,

00:35:54.000 --> 00:35:59.000
''What kind of placeholders does this template depend on?'' It depends on one type name

00:35:59.000 --> 00:36:00.000
and then I've chose then name

00:36:00.000 --> 00:36:02.000
capital T, type for it.

00:36:02.000 --> 00:36:04.000
That's a choice that I'm going to make and it says

00:36:04.000 --> 00:36:07.000
in the body of the function that's coming up

00:36:07.000 --> 00:36:08.000
where I would have

00:36:08.000 --> 00:36:11.000
ordinarily fully committed on the type, I'm gonna use this placeholder

00:36:11.000 --> 00:36:13.000
capital T, type

00:36:13.000 --> 00:36:22.000
instead.

00:36:22.000 --> 00:36:25.000
And now I have written swap in such a way that it doesn't say for sure

00:36:25.000 --> 00:36:29.000
what's being exchanged, two strings, two Ns, two doubles. They're all

00:36:29.000 --> 00:36:33.000
have to be matching in this form, and I said, ''Well, there's a placeholder, and the placeholder

00:36:33.000 --> 00:36:36.000
was gonna be filled in by the client who used this flop,

00:36:36.000 --> 00:36:41.000
and on demand, we can instantiate as many versions of the swap function as we need.''

00:36:41.000 --> 00:36:45.000
If you need to swap integers, you can instantiate a swap that then fills in the

00:36:45.000 --> 00:36:48.000
placeholder type with Ns all the way through

00:36:48.000 --> 00:36:51.000
and then I can use that same template or pattern to generate one that will swap

00:36:51.000 --> 00:36:53.000
characters or swap strings or swap whatever.

00:36:53.000 --> 00:37:02.000
So if I go down to my main, maybe I'll just pull my main up [inaudible]. And

00:37:02.000 --> 00:37:05.000
I will show you how I can do this. I can say

00:37:05.000 --> 00:37:08.000
int 1 equals

00:37:08.000 --> 00:37:13.000
54, 2 equals 32 and I say swap 1-2.

00:37:13.000 --> 00:37:17.000
So if the usage of a function is a little bit different

00:37:17.000 --> 00:37:19.000
than it was for

00:37:19.000 --> 00:37:21.000
class templates,

00:37:21.000 --> 00:37:25.000
in class templates we always did this thing where I said

00:37:25.000 --> 00:37:29.000
the name of the template and then it angled back, as I said. And in particular I want

00:37:29.000 --> 00:37:33.000
the swap function that operates where the template parameter has been filled in

00:37:33.000 --> 00:37:35.000
with int

00:37:35.000 --> 00:37:38.000
that in the case of functions, it turns out it's a little bit easier for the compiler

00:37:38.000 --> 00:37:39.000
to know what you intended

00:37:39.000 --> 00:37:42.000
that on the basis of what my arguments are to this,

00:37:42.000 --> 00:37:45.000
is I called swap passing two integers,

00:37:45.000 --> 00:37:49.000
it knows that there's only one possibility for what version of swap you were interested

00:37:49.000 --> 00:37:50.000
in,

00:37:50.000 --> 00:37:54.000
that it must be the one where type has been filled in with int and that's

00:37:54.000 --> 00:37:56.000
the one to generate for you.

00:37:56.000 --> 00:37:59.000
So you can use that long form of saying swap, angle bracket int,

00:37:59.000 --> 00:38:03.000
but typically you will not; you will let the compiler infer it for you on the usage,

00:38:03.000 --> 00:38:05.000
so based on the arguments you passed,

00:38:05.000 --> 00:38:10.000
it will figure out how to kind of match them to what swap said it was taking and

00:38:10.000 --> 00:38:13.000
if it can't match them, for example, if you pass one double and one int, you'll get compiler

00:38:13.000 --> 00:38:14.000
errors.

00:38:14.000 --> 00:38:19.000
But if I've done it correctly, then it will generate for me a swap

00:38:19.000 --> 00:38:23.000
where type's been filled in with int and if I call it again,

00:38:23.000 --> 00:38:26.000
passing different kinds of things, so having

00:38:26.000 --> 00:38:29.000
string S equals hello and T

00:38:29.000 --> 00:38:31.000
equals

00:38:31.000 --> 00:38:33.000
world,

00:38:33.000 --> 00:38:37.000
that I can write swap S and T and now

00:38:37.000 --> 00:38:40.000
the compiler generated for me two different versions of swap, right,

00:38:40.000 --> 00:38:43.000
one for integers, one for

00:38:43.000 --> 00:38:46.000
strings on the [inaudible] and I didn't do anything with them. I put them [inaudible], so nothing to see there,

00:38:46.000 --> 00:38:47.000
but showing

00:38:47.000 --> 00:38:50.000
it does compile and build up. That's

00:38:50.000 --> 00:38:53.000
a pretty neat idea.

00:38:53.000 --> 00:38:55.000
Kind of important because it turns out there are

00:38:55.000 --> 00:38:59.000
a lot of pieces of code that you're gonna find yourself wanting to write that don't

00:38:59.000 --> 00:39:02.000
depend, in a case like swap, swap doesn't care what it's exchanging, that they're Ns, that

00:39:02.000 --> 00:39:06.000
they're strings, that they're doubles. The way you swap two things

00:39:06.000 --> 00:39:09.000
is the same no matter what they are. You copy one aside; you overwrite one;

00:39:09.000 --> 00:39:11.000
you overwrite the other

00:39:11.000 --> 00:39:14.000
and the same thing applies to much more algorithmically interesting problems

00:39:14.000 --> 00:39:15.000
like

00:39:15.000 --> 00:39:19.000
searching, like sorting, like removing duplicates, like printing,

00:39:19.000 --> 00:39:23.000
where you want to take a vector and print all of its contents right that

00:39:23.000 --> 00:39:24.000
the

00:39:24.000 --> 00:39:27.000
steps you need to do to print a vector of events looks just like the steps you need to do to print a vector

00:39:27.000 --> 00:39:31.000
of strings. And so it would be very annoying every time I need to do that to

00:39:31.000 --> 00:39:33.000
duplicate that code that there's a real appeal toward

00:39:33.000 --> 00:39:35.000
writing it once, as a template

00:39:35.000 --> 00:39:37.000
and using it

00:39:37.000 --> 00:39:41.000
in multiple situations. So

00:39:41.000 --> 00:39:44.000
this shows that if I swap, right,

00:39:44.000 --> 00:39:47.000
it infers what I meant

00:39:47.000 --> 00:39:50.000
and then this thing basically just shows what happened is when I said

00:39:50.000 --> 00:39:53.000
int equals four,

00:39:53.000 --> 00:39:56.000
[inaudible] use that pattern to generate the swap int

00:39:56.000 --> 00:39:59.000
where the type parameter, that placeholder has been

00:39:59.000 --> 00:40:01.000
filled in

00:40:01.000 --> 00:40:05.000
and established that it's int for this particular version which is distinct from

00:40:05.000 --> 00:40:10.000
the swap for characters and swap for strings and what not. So

00:40:10.000 --> 00:40:13.000
let me show you that version I talked about print, and this one I'm

00:40:13.000 --> 00:40:15.000
gonna go back to the

00:40:15.000 --> 00:40:16.000
compiler for just a second.

00:40:16.000 --> 00:40:19.000
Let's say I want to print a vector, printing a vector it like iterating all

00:40:19.000 --> 00:40:22.000
the members and using the

00:40:22.000 --> 00:40:23.000
screen insertion to print them out

00:40:23.000 --> 00:40:26.000
and so here it is taking some vector

00:40:26.000 --> 00:40:29.000
where the elements in it are of this type name. In this case, I've just used T as

00:40:29.000 --> 00:40:31.000
my short name here.

00:40:31.000 --> 00:40:31.000
The iterator

00:40:31.000 --> 00:40:35.000
looks the same; the bracketing looks the same. I see the end now and I'd

00:40:35.000 --> 00:40:38.000
like to be able to print vectors of strings, vectors of ints, vector of

00:40:38.000 --> 00:40:39.000
doubles

00:40:39.000 --> 00:40:43.000
with this one piece of code. So if I call print vector and I pas code vector events,

00:40:43.000 --> 00:40:43.000
all's good;

00:40:43.000 --> 00:40:47.000
vector doubles, all's good; vector strings, all's good.

00:40:47.000 --> 00:40:50.000
Here's where I could get a little bit into trouble here.

00:40:50.000 --> 00:40:52.000
I've made this [inaudible] chord that has an X

00:40:52.000 --> 00:40:57.000
and a Y field. I make a vector that contains these chords.

00:40:57.000 --> 00:41:00.000
And then I try to call print vector of C.

00:41:00.000 --> 00:41:03.000
So when I do this, right,

00:41:03.000 --> 00:41:06.000
it will instantiate a versive print vector

00:41:06.000 --> 00:41:11.000
where the T has been matched up to, oh it's chords that are in there; you've got a vector

00:41:11.000 --> 00:41:12.000
of chords.

00:41:12.000 --> 00:41:15.000
And so okay, we'll go through and iterate over the sides of the vector

00:41:15.000 --> 00:41:18.000
and this line is trying to say see out

00:41:18.000 --> 00:41:21.000
of a chord type. And

00:41:21.000 --> 00:41:22.000
struts

00:41:22.000 --> 00:41:26.000
don't automatically know how to out put themselves onto a string,

00:41:26.000 --> 00:41:30.000
and so at this point when I try to instantiate, so the code in print vector is

00:41:30.000 --> 00:41:33.000
actually fine and works for a large number of types, all the primitive types

00:41:33.000 --> 00:41:35.000
will be fine,

00:41:35.000 --> 00:41:36.000
string and things like that are fine.

00:41:36.000 --> 00:41:40.000
But if I try to use it for a type for which it doesn't have this output

00:41:40.000 --> 00:41:43.000
behavior, right I will get a compiler error,

00:41:43.000 --> 00:41:46.000
and it may come as a surprise because a print vector appeared to be working fine in all the other

00:41:46.000 --> 00:41:49.000
situations and all the sudden it seems like a new error has cropped up

00:41:49.000 --> 00:41:51.000
but that error came from

00:41:51.000 --> 00:41:54.000
the instantiation of a particular version in this case, that suddenly ran

00:41:54.000 --> 00:41:58.000
afoul of what the template expected of the type.

00:41:58.000 --> 00:42:02.000
The message you get, let's say in X code looks like this, no match for

00:42:02.000 --> 00:42:08.000
operator, less than, less than in standard CL, vector element, operator [inaudible].

00:42:08.000 --> 00:42:12.000
So I'm trying to give you a little clue, but in its very cryptic C++ way of what

00:42:12.000 --> 00:42:15.000
you've done wrong.

00:42:15.000 --> 00:42:18.000
So it comes back to what the template has to be a little bit clear about

00:42:18.000 --> 00:42:20.000
from a client point of view is to say well

00:42:20.000 --> 00:42:24.000
what is it that needs to be true. Will it really work for all types or is there

00:42:24.000 --> 00:42:27.000
something special about the kind of things that actually need to be true

00:42:27.000 --> 00:42:29.000
about what's filled in there with a placeholder to make the rest of the code

00:42:29.000 --> 00:42:31.000
work correctly.

00:42:31.000 --> 00:42:33.000
So if it uses string insertion

00:42:33.000 --> 00:42:36.000
or it compares to elements using equals equals, right,

00:42:36.000 --> 00:42:40.000
something like a strut doesn't by default have those behaviors and so you

00:42:40.000 --> 00:42:44.000
could have a template that would work for primitive types or types that have these

00:42:44.000 --> 00:42:47.000
operations declared, but wouldn't work when

00:42:47.000 --> 00:42:51.000
you gave it a more complex type that didn't have that

00:42:51.000 --> 00:42:53.000
data support in there.

00:42:53.000 --> 00:42:55.000
So I have that peace code over here. I can just

00:42:55.000 --> 00:42:59.000
show it to you. I just took it down here.

00:42:59.000 --> 00:43:05.000
This is print vector and

00:43:05.000 --> 00:43:07.000
it's a little bit of

00:43:07.000 --> 00:43:10.000
template.

00:43:10.000 --> 00:43:11.000
And

00:43:11.000 --> 00:43:14.000
if I were to have a vector

00:43:14.000 --> 00:43:17.000
declared of ints,

00:43:17.000 --> 00:43:20.000
I say print vector, V

00:43:20.000 --> 00:43:24.000
that this compiles, there's nothing to print. It turns out it'll just print empty line because

00:43:24.000 --> 00:43:32.000
there's nothing in there, but if I change this to be chord T

00:43:32.000 --> 00:43:37.000
and my build is failing, right, and the error message I'm getting here is no match for

00:43:37.000 --> 00:43:38.000
trying to

00:43:38.000 --> 00:43:41.000
output that thing and it tells me that all the things I can output on a string. Well, it could be that you could output a

00:43:41.000 --> 00:43:44.000
[inaudible] but it's not one of those things, right, chord T

00:43:44.000 --> 00:43:48.000
doesn't have

00:43:48.000 --> 00:43:51.000
a built-in behavior for outputting to a stream.

00:43:51.000 --> 00:43:53.000
So this is going to come back to be kind of important because I'm gonna

00:43:53.000 --> 00:43:54.000
keep going here

00:43:54.000 --> 00:43:58.000
is that

00:43:58.000 --> 00:44:00.000
when we get to the sort, right, when

00:44:00.000 --> 00:44:02.000
we try to build this thing, we're gonna definitely be depending on

00:44:02.000 --> 00:44:05.000
something being true about the elements, right,

00:44:05.000 --> 00:44:07.000
that's gonna constrain what the type could be.

00:44:07.000 --> 00:44:10.000
So this is selection sort in

00:44:10.000 --> 00:44:14.000
its ordinary form, nothing special about it other than the fact that I

00:44:14.000 --> 00:44:17.000
changed it as so sorting an int, it's the sort of vector with

00:44:17.000 --> 00:44:18.000
placeholder type. So it's [inaudible] with

00:44:18.000 --> 00:44:23.000
the template typed in header, vector of type and then in here,

00:44:23.000 --> 00:44:26.000
operations like this really are accessing a type

00:44:26.000 --> 00:44:28.000
and another type

00:44:28.000 --> 00:44:30.000
variable out of

00:44:30.000 --> 00:44:33.000
the vector that was passed and then comparing them, which is smaller to decide

00:44:33.000 --> 00:44:34.000
which index.

00:44:34.000 --> 00:44:36.000
It's using a swap

00:44:36.000 --> 00:44:39.000
and so the generic swap would also be necessary for this so that we need to

00:44:39.000 --> 00:44:42.000
have a swap for any kind of things we want to exchange. We have a template

00:44:42.000 --> 00:44:44.000
up there for swap,

00:44:44.000 --> 00:44:47.000
then the sort can build on top of that and use that as part of its workings which is then able

00:44:47.000 --> 00:44:48.000
to come under

00:44:48.000 --> 00:44:52.000
swap to any two things can be swapped using the same strategy.

00:44:52.000 --> 00:44:55.000
And this is actually where templates really start to shine. Like swap in itself isn't that

00:44:55.000 --> 00:44:56.000


00:44:56.000 --> 00:44:59.000
stunning of an example, because you think whatever, who cares if three lines a code, but every

00:44:59.000 --> 00:45:00.000


00:45:00.000 --> 00:45:01.000
little bit does count,

00:45:01.000 --> 00:45:04.000
but once we get to things like sorting where we could build a really

00:45:04.000 --> 00:45:05.000
killer

00:45:05.000 --> 00:45:09.000
totally tuned, really high performance, you know, protection against a [inaudible]

00:45:09.000 --> 00:45:09.000
quick sort

00:45:09.000 --> 00:45:12.000
and we want to be sure that we can use it in all the places we need to sort. I don't to

00:45:12.000 --> 00:45:15.000
make it sort just strings and later when I need to sort for integers, I need to

00:45:15.000 --> 00:45:18.000
copy and paste it and change string to everywhere,

00:45:18.000 --> 00:45:21.000
and then if I find a bug in one, I forget to change it in the other I have

00:45:21.000 --> 00:45:23.000
the bug still lurking there.

00:45:23.000 --> 00:45:25.000
And so template functions are generally used for

00:45:25.000 --> 00:45:29.000
all kinds of algorithmically interesting things, sorting and searching and removing

00:45:29.000 --> 00:45:32.000
duplicates and permuting and finding the mode

00:45:32.000 --> 00:45:33.000
and

00:45:33.000 --> 00:45:36.000
shuffling and all these things that like okay, no matter what the type of

00:45:36.000 --> 00:45:38.000
thing you're working on,

00:45:38.000 --> 00:45:41.000
the algorithm for how you do it looks the same whether it's ints or

00:45:41.000 --> 00:45:43.000
strings or whatever. So

00:45:43.000 --> 00:45:46.000
we got this guy

00:45:46.000 --> 00:45:46.000


00:45:46.000 --> 00:45:49.000
and as is, right, we could

00:45:49.000 --> 00:45:53.000
push vectors of ints in there, vectors of strings in there and the right thing

00:45:53.000 --> 00:45:57.000
would work for those without any changes to it,

00:45:57.000 --> 00:45:59.000
but it does have a lurking

00:45:59.000 --> 00:46:02.000
constraint on what much be true about the kind of elements that we're

00:46:02.000 --> 00:46:04.000
trying to sort here.

00:46:04.000 --> 00:46:05.000
Not every type will work. If

00:46:05.000 --> 00:46:07.000
I go back to my chord example,

00:46:07.000 --> 00:46:09.000
this XY,

00:46:09.000 --> 00:46:13.000
if have a vector of chord and I pass it to sort and if I go back and look at the

00:46:13.000 --> 00:46:14.000
code,

00:46:14.000 --> 00:46:17.000
you think about instantiating this with vector, is all chord in here, all the way

00:46:17.000 --> 00:46:18.000
through here,

00:46:18.000 --> 00:46:22.000
this part's fine; this part's fine; this part's fine, but suddenly you get to this line and

00:46:22.000 --> 00:46:25.000
that's gonna be taking two coordinate structures and saying is this coordinate

00:46:25.000 --> 00:46:27.000
structure less than another.

00:46:27.000 --> 00:46:30.000
And that code's not gonna compile. Yes. Back when we were doing sets, we were able to explicitly

00:46:30.000 --> 00:46:36.000
say how this works. Yeah, so you're right where I'm gonna be, but I'm probably

00:46:36.000 --> 00:46:40.000
not gonna finish today, but I'll have to finish up on Wednesday is we're gonna do exactly what

00:46:40.000 --> 00:46:43.000
Seth did, and what did Seth do? Comparison function.

00:46:43.000 --> 00:46:46.000
That's right, he used a comparison function, so you have to have some coordination here when you say,

00:46:46.000 --> 00:46:49.000
well instead of trying to use the operate or less than on these things,

00:46:49.000 --> 00:46:53.000
how about the client tell me as part of calling sort, why don't you give me the

00:46:53.000 --> 00:46:57.000
information in the form of a function that says call me back and tell me are

00:46:57.000 --> 00:46:59.000
these things less than

00:46:59.000 --> 00:47:02.000
and so on and how to order them. So that's exactly what we need to do, but I can't

00:47:02.000 --> 00:47:05.000
really show you the code right now, but I'll just kind of like

00:47:05.000 --> 00:47:09.000
flash it up there and we will talk about it on Wednesday, what changes make

00:47:09.000 --> 00:47:12.000
that happen. We'll pick up

00:47:12.000 --> 00:47:22.000
with Chapter 8, actually after the midterm in terms of

00:47:22.000 --> 00:47:23.000
reading.