English subtitles

← Switch Statements and Thread Divergence Part1 - Intro to Parallel Programming

Get Embed Code
2 Languages

Showing Revision 4 created 05/25/2016 by Udacity Robot.

  1. It's actually easy to construct such a kernel.
  2. For example, we could have a switch statement with 32 or more cases. If each thread in a warp switches to a different case,
  3. and each case took the same amount of time to execute,
  4. this segment of the kernel would run 32 times slower than if all the threads in a warp switch to the same case.
  5. Okay, so let's explore this with a quiz.
  6. So I'm going to give you a bunch of switch statements, and we're going to pretend that
  7. they're in a kernel and that all of the cases of those switch statements take an equal amount of time.
  8. Okay, so I'll do the first example to explain the format that I'm looking for.
  9. Here, we've got a switch statement, this is in kernel code.
  10. It's switching on the thread index, .x, mod 32,
  11. and that means that you're going to get a number from 0 to 31.
  12. And I've just sort of used this short-cut notation to indicate that I've got cases 0 through 31.
  13. I want you to assume that all these cases take the same amount of time,
  14. and then there's some more code, and later on I'm actually going to launch this kernel,
  15. and I'm going to launch it in a configuration like this.
  16. It's going to have a single thread block with 1024 threads.
  17. And so the question that I'm asking you is, what is the slowdown?
  18. Is it 1x, meaning no slowdown at all? Is it 32x, meaning a factor of 30--
  19. you know, assuming all these cases take the same amount of time, it's going to run 32 times slower through this section of the code due to branch divergence?
  20. What's going on? And so I'll give you a series of these, using this sort of short-hand notation,
  21. and I want you to think about what is the slowdown that you're going to get due to the thread divergence,
  22. the different threads in a warp.
  23. So the answer to this first one, this is sort of the example we've been discussing.
  24. Okay, every thread is going to go a different path.
  25. Every thread in the warp will have a different value for this thread index,
  26. those values will be 0 through 31, so every single thread will take a different path,
  27. and that means that the warp is going to have to activate each thread in turn, run through that particular case,
  28. deactivate that thread, activate another thread, run through that case and so forth.
  29. So this will lead to the maximum slowdown of 32 times.
  30. Here is our next example, and what I am going to try to do for each of these examples is highlight in black
  31. the parts of the code that are different from the first example that I gave.
  32. So in this case we are similarly launching a kernel with 1024 threads, and now we have got cases 0 through 63,
  33. and we're taking the thread index and we're moding it by 64 instead of by 32,
  34. Your job is to figure out the slowdown.
  35. What about a two-dimensional kernel? Here, I'm using some shorthand to indicate
  36. that I'm launching a thread block which has 64 threads in the X direction and 16 threads in the Y direction,
  37. again a single thread block, and I'm going to switch on the Y index of the thread.
  38. And next, another example where I'm going to switch on the Y index of the thread as before,
  39. except this time I'm launching a thread block with 16 by 16 threads.