
So we'll start with the thread per row approach.

Let's start with the data structure.

We're going to use the CSR, the compressed sparse row format here, just as we did in Unit 4.

Recall that value contains the nonzero elements in the matrix,

index gives the column of each entry,

and row pointer contains the index of the beginning of each row.

So each blue dot here corresponds to the element that begins each row,

which is element 0, 2, 3, and 5.

So let's just walk through some code.

Note this code, like many spmv routines, calculates y+=mx.

So it multiplies m by vector x and then adds it to the element y

and resets the result as y.

It adds the matrix vector product to the destination vector y.

We're going to start by this line here computing the global index for each thread.

The thread with this index i will calculate the result for row i.

Next we're going to have an if statement, if row less than the number of rows.

Why do we have this if statement?

We're going to launch many blocks of many threads,

and it might be that the number of rows is not a perfect multiple of blocks and threads.

This if statement is a common one in CUDA programs.

Inside the if is the meat of the routine.

Recall that row pointer contains the indices of the starts of each row.

So, for instance, the value 3 here says that the third element D here

is the beginning of a particular row that then contains D and E. So we're going to start with D.

We're going to start at the beginning of a row and we're going to go up to,

but not including, the first element of the next row, so that's this loop right here.

And at every iteration of that loop we will multiply 2 things.

One is the value of that element, so in this case D,

and the second is we check which column D is in.

In this case D is in column 0 so we're going to look up the vector element

at position 0 and multiply D by that vector element.

So that's this value times that vector element, and then add that to dot.

And when we're finally done, we take our destination value y,

add it to dot, and put it back into y.