
Title:
Merrills LinearComplexity BFS on GPUs Part2  Intro to Parallel Programming

Description:

Way back in Unit 1, when we talked about step in work complexity, we said that

we would love to find algorithms that have the same work complexity as the serial

algorithm but have a smaller step complexity.

Reduce would be a good example here, or scan.

But if we said we can't do that, sometimes we would be willing

to accept more work if it gets us fewer steps.

And that's what we are going to do here. But how?

This is not intuitive, or more at least it isn't for most people,

and it certainly wasn't for me when I learned this material.

Hills and Steele first expressed skepticism they could improve on quadratic work,

but then concludedand I am quoting from their paperessentially we overlooked

the power of having many processors working on the problem at once.

So at a high level, here's what we are going to do.

On every node, we start by knowing the node that's 1 hop away. That's the next pointer, in blue.

So on the next iteration, we can visit our next pointer's next pointer, and get 2 hops away.

And then, on the next iteration, we can get to 3 hops away, and so on.

So that's what I am showing here when I say straight forward approach.

As we increase the number of iterations, we are also increasing the number of hops away.

But that's the wrong way to think about it.

If we just did it that way, we'd be repeating a lot of work that the nodes down the chain are doing,

and we would have quadratic complexity work.

Instead, after the first iteration, we have each node knowing the node that is 2 hops away.

So let's say we're interested in this node and we know that we are pointing to node x here.

Well normally what we'd do is we'd take the red pointer here,

and then change our red pointer to be red pointer plus blue pointer.

So, we'd be moving from knowing the node 2 hops away to the node knowing 3 hops away.

But look, x also knows the node that is 2 hops away.

So, I know the node that's 2 hops away, x knows the node that's 2 hops away.

So, on the second iteration we can leverage the work that x already did

on it's first iteration to get a pointer that is now 4 hops away.

So we can go red, then red to set our new red pointer here, to be the chum of the chum.

Now we are going to have a pointer to a node that is 4 hops away,

which also has a pointer to a node that is 4 hops away.

So the next iteration will have a pointer that is 8 hops away and so on.

So now, instead of going 1, 2, 3, 4, we're now going 1, 2, 4, 8.

So for N iterations, as a function of N, how many steps is this going to take?