Return to Video

## Thread Per Row - Intro to Parallel Programming

• 0:00 - 0:02
So we'll start with the thread per row approach.
• 0:02 - 0:04
Let's start with the data structure.
• 0:04 - 0:09
We're going to use the CSR, the compressed sparse row format here, just as we did in Unit 4.
• 0:09 - 0:13
Recall that value contains the non-zero elements in the matrix,
• 0:13 - 0:17
index gives the column of each entry,
• 0:17 - 0:20
and row pointer contains the index of the beginning of each row.
• 0:20 - 0:24
So each blue dot here corresponds to the element that begins each row,
• 0:24 - 0:28
which is element 0, 2, 3, and 5.
• 0:28 - 0:30
So let's just walk through some code.
• 0:30 - 0:35
Note this code, like many spmv routines, calculates y+=mx.
• 0:35 - 0:41
So it multiplies m by vector x and then adds it to the element y
• 0:41 - 0:43
and resets the result as y.
• 0:43 - 0:47
It adds the matrix vector product to the destination vector y.
• 0:47 - 0:51
We're going to start by this line here computing the global index for each thread.
• 0:51 - 0:55
The thread with this index i will calculate the result for row i.
• 0:55 - 0:59
Next we're going to have an if statement, if row less than the number of rows.
• 0:59 - 1:01
Why do we have this if statement?
• 1:01 - 1:03
We're going to launch many blocks of many threads,
• 1:03 - 1:07
and it might be that the number of rows is not a perfect multiple of blocks and threads.
• 1:07 - 1:10
This if statement is a common one in CUDA programs.
• 1:10 - 1:13
Inside the if is the meat of the routine.
• 1:13 - 1:17
Recall that row pointer contains the indices of the starts of each row.
• 1:17 - 1:23
So, for instance, the value 3 here says that the third element D here
• 1:23 - 1:29
is the beginning of a particular row that then contains D and E. So we're going to start with D.
• 1:29 - 1:32
We're going to start at the beginning of a row and we're going to go up to,
• 1:32 - 1:36
but not including, the first element of the next row, so that's this loop right here.
• 1:36 - 1:40
And at every iteration of that loop we will multiply 2 things.
• 1:40 - 1:44
One is the value of that element, so in this case D,
• 1:44 - 1:49
and the second is we check which column D is in.
• 1:49 - 1:55
In this case D is in column 0 so we're going to look up the vector element
• 1:55 - 1:59
at position 0 and multiply D by that vector element.
• 1:59 - 2:05
So that's this value times that vector element, and then add that to dot.
• 2:05 - 2:09
And when we're finally done, we take our destination value y,
• 2:09 - 2:11
add it to dot, and put it back into y.
Title:
Thread Per Row - Intro to Parallel Programming
Description:

more » « less
Video Language:
English
Team:
Udacity
Project:
CS344 - Intro to Parallel Programming
Duration:
02:12
 Udacity Robot edited English subtitles for Thread Per Row - Intro to Parallel Programming Udacity Robot edited English subtitles for Thread Per Row - Intro to Parallel Programming Udacity Robot edited English subtitles for Thread Per Row - Intro to Parallel Programming Udacity Robot edited English subtitles for Thread Per Row - Intro to Parallel Programming Udacity Robot edited English subtitles for Thread Per Row - Intro to Parallel Programming Stacy Taylor approved English subtitles for Thread Per Row - Intro to Parallel Programming adeptpaulam edited English subtitles for Thread Per Row - Intro to Parallel Programming Cogi-Admin added a translation

# English subtitles

## Revisions Compare revisions

• API
Udacity Robot
• API
Udacity Robot
• API
Udacity Robot
• API
Udacity Robot
• API
Udacity Robot
• adeptpaulam
• Cogi-Admin