YouTube

Got a YouTube account?

New: enable viewer-created translations and captions on your YouTube channel!

English subtitles

← Ease of Programming - Intro to Parallel Programming

Get Embed Code
3 Languages

Showing Revision 4 created 05/25/2016 by Udacity Robot.

  1. So part of the motivation is going to be ease of programming.
  2. So how is it specifically easier for someone to write a program that has these irregular,
  3. complex control structures, data structures, compared to not having dynamic parallelism?
  4. >> In the past, before the dynamic stuff on Keppler,
  5. which is a new GPU, in the past whenever you needed to make a new batch of work,
  6. you had to return to the CPU, which was the master controller, to go and launch that work for you.
  7. So if ever in my program I reached a point where I needed that matrix inversion or that FFT to be done,
  8. I had to halt my program, return to the CPU, have the CPU then launch this work for me that would complete,
  9. return back to the CPU, and the CPU would then have to restart me.
  10. In fact, if I could have to split my program in 2 around this moment where I needed this extra parallel work to be done,
  11. and suddenly, instead of having 1 smooth program, I have 2 fragments of program,
  12. I have state that I have to save across the two,
  13. and then I have my CPU having to get involved to manage and marshal this work.
  14. Suddenly, with the dynamic parallelism, I can just do this all compactly on the fly.
  15. If you like, the system does all that for you. It will save off your old program.
  16. It will run the new FFT for you. It will return the result to you, and it will continue where you left off.
  17. So, from the programmer's perspective, I'm no longer programming in 2 places at once.
  18. I'm no longer having the GPU and the CPU both tightly bound over my execution,
  19. and I no longer have to manage the portions of my program around where I need to launch this new work.
  20. I can just in-line it effectively, and it makes for a much simpler and more straightforward program.
  21. >>That's fantastic. And what about the performance implications?
  22. >>There's always a performance overhead bouncing backwards and forwards between the CPU and GPU.
  23. You've got the latencies of the PCI bus, which was the communication link.
  24. You've got the overheads of shutting down your first portion of your program,
  25. starting up the next portion back, and resuming right where you left.
  26. So those overheads get amortized. You save, potentially, data transfer across the buses.
  27. And in a way, something I feel is actually more important than this is that with the GPU,
  28. you're always trying to get as much work on that GPU as possible.
  29. You can much more easily overlap the new work that you're doing with
  30. other stuff that's still going on in the GPU.
  31. I don't have to shut down completely and fire up an FFT. I can, in-line, do all of these things while
  32. something else useful is going on at the same time.
  33. And so, this ability to asynchronously do this work from different threads all at the same time.
  34. Remember you've got thousands of threads on the GPU,
  35. they can all be doing this, I mean modular resources, they can all be doing this at the same time,
  36. and so you can get a much easier overlap between the different pieces of work you're doing,
  37. and it's definitely much easier to keep the GPU busy,
  38. and that gives you a lot of potential for more performance.