English subtitles

← Let Us time Some Code - Solution - Intro to Parallel Programming

Get Embed Code
2 Languages

Showing Revision 2 created 05/04/2013 by danicamaggie.

  1. So taking these 1 at a time--a million threads incrementing a million elements
  2. does give the correct answer because, in this case, there's a unique element for
  3. every thread, so there's no conflict.
  4. So even though we didn't make these atomic increments we're still safe.
  5. A million threads atomically incrementing a million elements, is, of course, also safe.
  6. So you'll also get the correct answer.
  7. A million threads incrementing a hundred elements is the same example we saw before.
  8. And as we saw, that will give the wrong answer, unless we use atomics.
  9. So the next one is not correct. The fourth one is correct.
  10. And finally, ten million threads atomically incrementing a hundred elements will still be the correct answer,
  11. So all but one of these give you the correct answer.
  12. Okay, so the more interesting question is how long do each of these options take?
  13. The fastest, perhaps counter-intuitively, is going to be option 3--
  14. a million threads writing into a hundred elements.
  15. And the next fastest would be option 1--
  16. a million threads writing into a million elements.
  17. On my laptop, these 2 operations take around 3.2 milliseconds, and 3.4 milliseconds, respectively.
  18. Of course, it's not a very useful option since it doesn't give the correct answer.
  19. But it's still interesting to look at what's going on.
  20. So the reason why this is slightly faster is that you have your million threads all
  21. trying to write to the same 100 elements.
  22. Well, those 100 elements occupy a very small fraction of memory,
  23. and they're all going to fit into a cash line or 2 on this system,
  24. whereas a million elements is large enough that it's not going to fit in cache.
  25. You're actually going to have to touch more of the DRAM, where global memory is stored.
  26. For the same reason, a million threads writing into 100 elements atomically
  27. is going to be slightly faster than a million threads writing into a million elements.
  28. So the next fastest is option 4. The next fastest after that is option 2.
  29. And in my system, again, these took 3.6 and 3.8 milliseconds,
  30. which means the slowest of all options is the one where
  31. 10 million threads write into a hundred elements.
  32. This is actually 36 milliseconds, so it takes approximately 10 times as long to complete.
  33. Not surprisingly, there's about 10 times as much going on.
  34. So you might play around with this code for a little bit.
  35. For example, see what happens to the time as you go to even more threads writing into even fewer elements.
  36. The big lesson here is that atomic memory operations come with a cost.