English subtitles

← CUDA DMA - Intro to Parallel Programming

Get Embed Code
2 Languages

Showing Revision 4 created 05/25/2016 by Udacity Robot.

  1. So CUDA DMA is a template library designed to make it easier to use shared memory
  2. while achieving high performance.
  3. Now to use CUDA DMA, programmers declare CUDA DMA objects for each shared memory buffer that needs to be loaded or stored,
  4. and the cool thing is that CUDA DMA let's you explicitly describe the transfer pattern for that data.
  5. So, for example, you might be transferring one long sequential trunk of memory,
  6. you might be transferring strided trunks of memory,
  7. or you might be doing sort of indirect access to memory,
  8. such as you would find in a sparse matrix representation.
  9. As with CUB, decoupling the transfer patterns from the actual processing that we're going to do on each
  10. thread to achieve that transfer pattern has several benefits and improves
  11. programmability because the code is now simpler.
  12. You packaged away all of that logic for doing the transfer separately from the actual compute in your kernel.
  13. It improves portability because you can have the CUDA DMA ninjas or the CUB ninjas
  14. develop the very best implementations of these various transfer patterns
  15. for your situation and package that all up in a library for you,
  16. and because those CUDA ninjas are good at what they do, you get high performance.
  17. You're going to achieve high DRAM memory bandwidth,
  18. you're going to hide the global memory latency for kernels that don't have a lot of occupancy,
  19. and hopefully this will lead to better compiler-generated code.
  20. And as I said, these benefits really accrue to both CUB, which tackles the whole circle from bringing
  21. data in from global memory and doing the computation on it, as well as CUDA DMA,
  22. which is just tackling the top part of that cycle where you're bringing memory into shared memory,
  23. and then you're going to do your own operations on it,
  24. so these benefits really accrue to both approaches.