English subtitles

← Littles Law - Intro to Parallel Programming

Get Embed Code
2 Languages

Showing Revision 4 created 05/24/2016 by Udacity Robot.

  1. Let's step back for a bit to remind us why we're focusing on memory the way we are.
  2. Our overarching goal, of course, is just to make code fast.
  3. So we say great, GPUs are fast, let's use those. But why are they fast?
  4. Gpus are fast, first, because they are massively parallel, with hundreds or thousands of
  5. processors on a single chip, working for you to solve your problem, but also because they have
  6. an extremely high bandwidth memory system to feed those massively parallel processors, okay?
  7. So if the memory system can't deliver data to all of these processors
  8. and store results from all those processors, then we're not going to get the full speed out of our GPU.
  9. And that's why, on a memory limited kernel like transpose,
  10. our subgoal is really to utilize all the available memory bandwidth.
  11. Hence our focus on global memory coalescing, DRAM utilization, and so on.
  12. Now I really want to ask a question a little bit more rigorously.
  13. What do we mean by utilizing all the available memory bandwidth?
  14. And this is going to bring us to a very important, very simple principle called Little's Law.
  15. Let's have the talented Kim Dilla illustrate this for us.
  16. Now John Little is a MIT professor who studies Marketing.
  17. He formulated his Eponymous Law, when writing about queuing theory in business processes.
  18. And Little's Law is usually used to reason about things like optimizing the number
  19. of customers in a line at Starbucks, or maybe the size of queues in a factory.
  20. But Little's Law is really very general and can be applied
  21. to many things including memory systems in computers.
  22. In that context, Little's Law states that the number of bytes delivered
  23. equals the average latency of each memory transaction times the bandwidth.
  24. Let's be a little more precise and emphasize
  25. that we care about the useful bytes delivered, and the problem with uncoalesced
  26. global memory accesses is that not all of the bytes in every memory transaction are actually being used.
  27. That's why coalescing global memory accesses helps
  28. ensure that every byte delivered in a memory transaction will be used.
  29. So given this definition, what can we do to improve our bandwidth? Let's check all that apply.
  30. We can increase the number of bytes delivered, we can increase the latency--
  31. meaning the time between memory transactions--we can decrease the
  32. number of bytes delivered, or we can decrease the latency or time between transactions.