﻿[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:00.00,0:00:02.61,Default,,0000,0000,0000,,So we've seen some really interesting algorithms so far, Dialogue: 0,0:00:02.61,0:00:06.68,Default,,0000,0000,0000,,but the GPU performance leader is none of the ones that we've discussed to date. Dialogue: 0,0:00:06.68,0:00:11.02,Default,,0000,0000,0000,,Instead, typically on the GPU for the highest performing sorts Dialogue: 0,0:00:11.02,0:00:13.89,Default,,0000,0000,0000,,we use a different sorting algorithm called radix sort. Dialogue: 0,0:00:13.89,0:00:18.13,Default,,0000,0000,0000,,Now, all of the previous sorting algorithms were comparison sorts, Dialogue: 0,0:00:18.13,0:00:21.42,Default,,0000,0000,0000,,meaning the only operation we did on an item was compare it to another one. Dialogue: 0,0:00:21.42,0:00:25.45,Default,,0000,0000,0000,,Radix sort relies on a number representation that uses positional notation. Dialogue: 0,0:00:25.45,0:00:29.78,Default,,0000,0000,0000,,In this case, bits are more significant as we move further left in the word, Dialogue: 0,0:00:29.78,0:00:32.69,Default,,0000,0000,0000,,and it's most easily explained using integers. Dialogue: 0,0:00:32.69,0:00:35.46,Default,,0000,0000,0000,,So the algorithm for radix sort is as follows. Dialogue: 0,0:00:35.46,0:00:40.68,Default,,0000,0000,0000,,Start with the least significant bit of the integer, split the input into 2 sets, Dialogue: 0,0:00:40.68,0:00:45.47,Default,,0000,0000,0000,,those that have a 0 with this particular bit location and those that have a 1. Dialogue: 0,0:00:45.47,0:00:47.46,Default,,0000,0000,0000,,Otherwise, maintain the order. Dialogue: 0,0:00:47.46,0:00:51.99,Default,,0000,0000,0000,,Then proceed to the next least significant bit and repeat until we run out of bits. Dialogue: 0,0:00:51.99,0:00:55.51,Default,,0000,0000,0000,,So as usual, we're going to do an example that will make more sense, Dialogue: 0,0:00:55.51,0:00:57.90,Default,,0000,0000,0000,,and we're going to use unsigned integers. Dialogue: 0,0:00:57.90,0:01:01.06,Default,,0000,0000,0000,,So what we're going to sort is this column of numbers to the left, Dialogue: 0,0:01:01.06,0:01:04.44,Default,,0000,0000,0000,,and here is the binary representation of those numbers. Dialogue: 0,0:01:04.44,0:01:08.75,Default,,0000,0000,0000,,And so we're going to start here with the least significant bit. Dialogue: 0,0:01:08.75,0:01:14.74,Default,,0000,0000,0000,,So the way that we're going to do this is take all the elements that have a 0 as the least significant bit, Dialogue: 0,0:01:14.74,0:01:18.82,Default,,0000,0000,0000,,and we're going to otherwise maintain their order, but we're going to put them up top. Dialogue: 0,0:01:18.82,0:01:21.22,Default,,0000,0000,0000,,Then we're going to take all the rest of the elements, Dialogue: 0,0:01:21.22,0:01:23.93,Default,,0000,0000,0000,,those that have a 1 as the least significant bit, Dialogue: 0,0:01:23.93,0:01:27.73,Default,,0000,0000,0000,,and we're going to again keep them in order and append them to the list that we've just created. Dialogue: 0,0:01:27.73,0:01:31.17,Default,,0000,0000,0000,,So what this creates is a list where all the least significant bits are 0 Dialogue: 0,0:01:31.17,0:01:34.57,Default,,0000,0000,0000,,and then a list where all the least significant bits is 1. Dialogue: 0,0:01:34.57,0:01:37.49,Default,,0000,0000,0000,,Now we move to the next least significant bit, Dialogue: 0,0:01:37.49,0:01:40.15,Default,,0000,0000,0000,,so the bit in the middle, and we're going to do the same thing. Dialogue: 0,0:01:40.15,0:01:42.45,Default,,0000,0000,0000,,We're going to take all the 0s and put them up top, Dialogue: 0,0:01:42.45,0:01:44.90,Default,,0000,0000,0000,,and then we're going to take all the 1s and put them below. Dialogue: 0,0:01:44.90,0:01:48.04,Default,,0000,0000,0000,,And here the dotted lines are just showing the data movement that we're looking at. Dialogue: 0,0:01:48.04,0:01:51.02,Default,,0000,0000,0000,,The green lines are the ones where the middle bits are 0, Dialogue: 0,0:01:51.02,0:01:53.79,Default,,0000,0000,0000,,and the blue line is the one where the middle bits are 1. Dialogue: 0,0:01:53.79,0:01:56.39,Default,,0000,0000,0000,,Now we move on to the next most significant bit-- Dialogue: 0,0:01:56.39,0:02:00.27,Default,,0000,0000,0000,,in this case, the very most significant bit--and we do the same operation again. Dialogue: 0,0:02:00.27,0:02:03.97,Default,,0000,0000,0000,,Zeroes in the most significant bit move up top, 1s move to the bottom, Dialogue: 0,0:02:03.97,0:02:06.06,Default,,0000,0000,0000,,otherwise, we maintain the order. Dialogue: 0,0:02:06.06,0:02:08.91,Default,,0000,0000,0000,,And now we have a sorted sequence. Pretty cool, huh? Dialogue: 0,0:02:08.91,0:02:12.12,Default,,0000,0000,0000,,Now, there's 2 big reasons this code runs great on GPUs. Dialogue: 0,0:02:12.12,0:02:17.48,Default,,0000,0000,0000,,The first is its work complexity. The best comparison base sorts are O(n log n). Dialogue: 0,0:02:17.48,0:02:20.58,Default,,0000,0000,0000,,This algorithm, on the other hand, is O(kn), Dialogue: 0,0:02:20.58,0:02:23.57,Default,,0000,0000,0000,,meaning the runtime is linear in 2 different things. Dialogue: 0,0:02:23.57,0:02:27.03,Default,,0000,0000,0000,,First, it's linear in the number of bits in the representation. Dialogue: 0,0:02:27.03,0:02:31.94,Default,,0000,0000,0000,,So this particular integer has 3 bits in its representation, Dialogue: 0,0:02:31.94,0:02:35.64,Default,,0000,0000,0000,,and it took 3 stages for us to be able to sort the input. Dialogue: 0,0:02:35.64,0:02:38.60,Default,,0000,0000,0000,,Second, it's linear in the number of items to sort. Dialogue: 0,0:02:38.60,0:02:41.80,Default,,0000,0000,0000,,So we have 8 items in the representation here, Dialogue: 0,0:02:41.80,0:02:45.67,Default,,0000,0000,0000,,and so the amount of work is proportional to 8. Dialogue: 0,0:02:45.67,0:02:51.72,Default,,0000,0000,0000,,Generally k is constant, say a 32-bit word or a 64-bit word for any reasonable applications. Dialogue: 0,0:02:51.72,0:02:54.43,Default,,0000,0000,0000,,And so in general, the work complexity of this Dialogue: 0,0:02:54.43,0:02:58.16,Default,,0000,0000,0000,,is mostly proportional to the number of items that we need to sort. Dialogue: 0,0:02:58.16,0:03:02.99,Default,,0000,0000,0000,,And so that's a superior work complexity to any of the sorts that we've talked about to date, Dialogue: 0,0:03:02.99,0:03:05.76,Default,,0000,0000,0000,,and so that's 1 reason why this looks so good. Dialogue: 0,0:03:05.76,0:03:11.54,Default,,0000,0000,0000,,The second is that the underlying operations that we need to do this split of the input at each step Dialogue: 0,0:03:11.54,0:03:14.31,Default,,0000,0000,0000,,are ones that are actually very efficient. Dialogue: 0,0:03:14.31,0:03:17.68,Default,,0000,0000,0000,,And in fact, they're efficient operations that you already know. Dialogue: 0,0:03:17.68,0:03:19.64,Default,,0000,0000,0000,,Let's take a closer look at what we're doing. Dialogue: 0,0:03:19.64,0:03:23.28,Default,,0000,0000,0000,,We're only going to look at the first stage of the radix sort algorithm, Dialogue: 0,0:03:23.28,0:03:26.82,Default,,0000,0000,0000,,where we're only considering the value of the least significant bit, Dialogue: 0,0:03:26.82,0:03:31.52,Default,,0000,0000,0000,,and we're only going to look at the output for which the least significant bit is 0. Dialogue: 0,0:03:31.52,0:03:34.33,Default,,0000,0000,0000,,Now what are we actually doing here? Dialogue: 0,0:03:34.33,0:03:37.43,Default,,0000,0000,0000,,We've already learned an algorithm that does this operation today. Dialogue: 0,0:03:37.43,0:03:42.48,Default,,0000,0000,0000,,So what is the name of the algorithm that takes this input and creates that as the output?