-
タイトル:
Quick Sort - Intro to Parallel Programming
-
概説:
-
The final sort algorithm we'll discuss is one of the most efficient ones for serial processors;
-
this is quick sort. It is a more complex algorithm on GPUs
-
because of the control complexity of the algorithm, so let's recap what the quick sort algorithm does.
-
First, it chooses a pivot element, one particular element from within its input,
-
then it compares all of the elements in its input to the pivot,
-
and it uses that comparison to divide the input into 3 sub-arrays.
-
Those that are less than the pivot, those that are equal to the pivot,
-
and those that are greater than the pivot, and then it calls quick sort on each of these sub-arrays
-
and continues until the entire input is sorted.
-
As an example, let's look at a particular array and choose that the pivot is equal to 3.
-
So what we're going to do here is compare each one of these elements to the pivot,
-
and we're going to decide if they're equal to, greater than, or less than the pivot.
-
Then we'll divide it into 3 arrays, those that are less than the pivot,
-
those that are equal to the pivot, and those that are greater than the pivot,
-
and we'll call quick sort on each of these arrays and do the same thing.
-
So when we have 2 and 1, let's say we choose 2 as the pivot,
-
then we'll divide this into smaller than the pivot and equal to the pivot.
-
This doesn't need to be recursed because it only has a single element.
-
And let's say we choose 4 as the pivot here, this is greater than the pivot, this is equal to the pivot,
-
and now we have a completely sorted array.
-
So this is a very challenging algorithm to implement on the GPU.
-
The other algorithms that we've looked at are pretty simple to describe, and they don't recurse.
-
This one is more complex, and until very recently GPUs didn't support recursion at all.
-
Indeed, the GPUs we use in this class don't support recursion currently.
-
So how can we take this seemingly recursive-only algorithm and map it to the GPU using the primitives that we've learned?
-
So I'm bringing up this example for two reasons.
-
The first is that you have already learned all the pieces that you need to implement quick sort on the GPU,
-
and the second is to motivate the benefits of new GPU capabilities that do natively support recursion.
-
So we can implement quick sort without recursion by using the idea of segments.
-
Recall that segmented operations, like scans, only operate within a single segment;
-
operations on one segment don't affect other segments.
-
That sounds a little bit like recursion, and in fact it maps in a similar way.
-
For quick sort, when we begin to process the initial array,
-
we're going to use distributes, maps, and compacts to eventually divide it into 3 segments.
-
We can use segmented scans to do all the necessary operations that we need to make this work,
-
including distributing a pivot across a segment for comparisons and splitting a segment,
-
which is similar to the way that we split on a particular bit in radix sort.
-
Quick sort is interesting because it shows you how useful segments can be,
-
that they can let you mirror the same approach you use in recursion, without actually using recursion.
-
But, and I gotta be perfectly honest here,
-
it's really a challenge to implement and equally challenging to make it fast.
-
So it's very exciting that the newest in video GPUs support a more flexible programming model where kernels can actually call,
-
can launch other kernels, which makes quick sort's recursive calls much more straightforward.
-
We're not using this new capability in the class.
-
The Amazon instances where our code is running don't have these new GPUs that support this capability at this time,
-
but it's really exciting, and so when we get to unit 7,
-
we'll see exactly what it looks like and how it applies to Quick sort.