English subtitles

← Littles Law for GPUs - Intro to Parallel Programming

Get Embed Code
2 Languages

Showing Revision 4 created 05/25/2016 by Udacity Robot.

  1. So, let's look at Litte's Law for GPUs.
  2. To recap, Litte's Law states that the number of useful bytes delivered
  3. is equal to the average latency of memory transaction times the bandwidth.
  4. Now, what are some implications of this?
  5. First of all, there's a minimum latency to take a signal or piece of data all the way from an SM
  6. to somewhere on the DRAM, or to take information from the DRAM and pull it into an SM.
  7. Okay, you can find the details for your particular GPU online,
  8. but in general any DRAM transaction is going to take hundreds of clock cycles.
  9. And by the way, this isn't a GPU thing. This is true of all modern processors.
  10. A clock cycle on a modern chip takes half a nanosecond, for example, on a 2 gigahertz chip.
  11. And even the speed of light--you know, light doesn't go very far in half a nanosecond.
  12. And electricity is even slower, especially on the tiny wires that you find in computer chips.
  13. So to go from somewhere inside the GPU off the chip,
  14. over a wire somewhere on the board into the DRAM, get a result,
  15. go all the way back, hundreds and hundreds of clock cycles, many, many nanoseconds.
  16. So this means that a thread that's trying to read or write global memory
  17. is going to have to wait 100s of clocks, and time that it could otherwise
  18. be spending by doing actual computation.
  19. And this, in turn, is why we have so many threads in flight.
  20. We deal with this high latency hundreds of clocks between
  21. memory accesses by having many, many threads that are able to run at any one time,
  22. so after one thread requests a piece of data from global memory
  23. or initiates a store to global memory, another thread can step in and do some computation.