Return to Video

Thread Per Row - Intro to Parallel Programming

  • 0:00 - 0:02
    So we'll start with the thread per row approach.
  • 0:02 - 0:04
    Let's start with the data structure.
  • 0:04 - 0:09
    We're going to use the CSR, the compressed sparse row format here, just as we did in Unit 4.
  • 0:09 - 0:13
    Recall that value contains the non-zero elements in the matrix,
  • 0:13 - 0:17
    index gives the column of each entry,
  • 0:17 - 0:20
    and row pointer contains the index of the beginning of each row.
  • 0:20 - 0:24
    So each blue dot here corresponds to the element that begins each row,
  • 0:24 - 0:28
    which is element 0, 2, 3, and 5.
  • 0:28 - 0:30
    So let's just walk through some code.
  • 0:30 - 0:35
    Note this code, like many spmv routines, calculates y+=mx.
  • 0:35 - 0:41
    So it multiplies m by vector x and then adds it to the element y
  • 0:41 - 0:43
    and resets the result as y.
  • 0:43 - 0:47
    It adds the matrix vector product to the destination vector y.
  • 0:47 - 0:51
    We're going to start by this line here computing the global index for each thread.
  • 0:51 - 0:55
    The thread with this index i will calculate the result for row i.
  • 0:55 - 0:59
    Next we're going to have an if statement, if row less than the number of rows.
  • 0:59 - 1:01
    Why do we have this if statement?
  • 1:01 - 1:03
    We're going to launch many blocks of many threads,
  • 1:03 - 1:07
    and it might be that the number of rows is not a perfect multiple of blocks and threads.
  • 1:07 - 1:10
    This if statement is a common one in CUDA programs.
  • 1:10 - 1:13
    Inside the if is the meat of the routine.
  • 1:13 - 1:17
    Recall that row pointer contains the indices of the starts of each row.
  • 1:17 - 1:23
    So, for instance, the value 3 here says that the third element D here
  • 1:23 - 1:29
    is the beginning of a particular row that then contains D and E. So we're going to start with D.
  • 1:29 - 1:32
    We're going to start at the beginning of a row and we're going to go up to,
  • 1:32 - 1:36
    but not including, the first element of the next row, so that's this loop right here.
  • 1:36 - 1:40
    And at every iteration of that loop we will multiply 2 things.
  • 1:40 - 1:44
    One is the value of that element, so in this case D,
  • 1:44 - 1:49
    and the second is we check which column D is in.
  • 1:49 - 1:55
    In this case D is in column 0 so we're going to look up the vector element
  • 1:55 - 1:59
    at position 0 and multiply D by that vector element.
  • 1:59 - 2:05
    So that's this value times that vector element, and then add that to dot.
  • 2:05 - 2:09
    And when we're finally done, we take our destination value y,
  • 2:09 - 2:11
    add it to dot, and put it back into y.
Title:
Thread Per Row - Intro to Parallel Programming
Description:

more » « less
Video Language:
English
Team:
Udacity
Project:
CS344 - Intro to Parallel Programming
Duration:
02:12

English subtitles

Revisions Compare revisions