English subtitles

← Let Us time Some Code - Solution - Intro to Parallel Programming

Get Embed Code
2 Languages

Showing Revision 1 created 02/12/2013 by Cogi-Admin.

  1. So, taking these 1 at a time. A million threads incrementing a million elements
  2. does give the correct answer because, in this case, there's a unique element for
  3. every thread, so there's no conflict. So even though we didn't make these atomic
  4. increments we're still safe. A million threads atomically incrementing a million
  5. elements, is, of course, also safe. So you'll also get the correct answer. A
  6. million threads incrementing a hundred elements is the same example we saw
  7. before. And as we saw, that will give the wrong answer, unless we use atomics.
  8. So the next one is not correct, the fourth one is correct. And finally, ten
  9. million threads atomically incrementing a hundred elements will still be the
  10. correct answer. So all but one of these give you the correct answer. Okay, so
  11. the more interesting question is how long do each of these options take? The
  12. fastest, perhaps counter-intuitively, is going to be option three. A million
  13. threads writing into a hundred elements. And the next fastest would be option
  14. one. A million threads writing into a million elements. On my laptop, these two
  15. operations take around 3 point 2 milliseconds, and 3 point 4 milliseconds
  16. respectively. Of course, it's not a very useful option since it doesn't give the
  17. correct answer. But it's still interesting to look at what's going on. So, the
  18. reason why this is slightly faster is that you have your million threads all
  19. trying to write to the same 100 elements. Well, those 100 elements occupy a very
  20. small fraction of memory, and they're all going to fit into a cash line or two
  21. on this system. Whereas a million elements is large enough that it's not going
  22. to fit in cache. You're actually going to have to touch more of the DRAM, where
  23. global memory is stored. For the same reason, a million threads writing into 100
  24. elements atomically is going to be slightly faster than a million threads
  25. writing into a million elements. So, the next fastest is option 4. The next
  26. fastest after that is option 2 and in my system again these took 3.6 and 3.8
  27. milliseconds. Which means the slowest of all options is the one where 10 million
  28. threads write into a hundred elements. This is actually 36 milliseconds. So it
  29. takes approximately 10 times as long to complete. Not surprisingly, there's
  30. about ten times as much going on. So you might play around with this code for a
  31. little bit. For example, see what happens to the time as you go to even more
  32. threads writing into even fewer elements. The big lesson here is that, is that
  33. atomic memory operations come with a cost.