English subtitles

← Loops and Thread Divergence - Intro to Parallel Programming

Get Embed Code
2 Languages

Showing Revision 4 created 05/24/2016 by Udacity Robot.

  1. So to answer this, we want to look at these for loops
  2. and decide how many times each warp is going to have to execute the for loop.
  3. And that means how many times at least 1 thread in the warp is going to have to execute it.
  4. So looking at this expression here, for a single warp these values will vary from 0 to 31.
  5. So there will be at least 1 thread in that warp for whom this modulo expression evaluates to 31.
  6. And that means that the entire warp is going to go through the motions
  7. of executing this bar function 31 times.
  8. Now, some of those times some of the threads will be deactivated.
  9. So the very first time, thread 0 will not execute the bar function.
  10. It will be deactivated because i will not be less than 0.
  11. And the next time thread 0 and 1 will be deactivated and so forth.
  12. Ultimately, the total amount of time that the warp has to spend in this loop
  13. depends on the total number of time that any 1 thread has to spend on it.
  14. Each warp will executive this loop 31 times.
  15. This next loop, though, is different.
  16. In this case the integer divide means that threads 0 through 31 are going to evaluate to 0.
  17. This expression will evaluate to 0, and therefore they're not going to execute the bar function at all.
  18. And in threads 32 through 63 we'll evaluate this expression to 1,
  19. and they'll execute the loop 1s and so forth.
  20. So there will be 1 warp which evaluates at 0 times, 1 warp which evaluates at 1 time,
  21. 1 that evaluates at 2 times and so forth, all the way up to a single warp,
  22. which evaluates at 31 times.
  23. So the average number of times that all of the warps will execute this loop is 15.5.
  24. So now we know what we need to know to answer the question.
  25. Clearly, the second loop will execute faster, and it will be twice as fast
  26. because, on average, the number of times that the expensive bar function gets evaluated
  27. is half the number of times that that bar function gets evaluated during the first loop.