Got a YouTube account?

New: enable viewer-created translations and captions on your YouTube channel!

English subtitles

← Squaring Number Using CUDA Part 2 - Intro to Parallel Programming

Get Embed Code
3 Languages

Showing Revision 10 created 05/25/2016 by Udacity Robot.

  1. So, let's take a closer look at the code, line by line,
  2. so we can all be sure we know what each call does.
  3. We're going to walk through the CPU code first.
  4. The first thing we're going to do is declare the size of the array and determine how many bytes it uses.
  5. We then fill it up in this loop with floating point numbers,
  6. where array element i is simply set to i.
  7. All of this is standard C, nothing GPU-specific so far.
  8. One thing to note, though, is a common Cuda convention.
  9. Data on the CPU, the host, starts with h underscore. Data on the GPU, the device, starts with d underscore.
  10. This is just a convention. You can name your variables anything you want.
  11. But naming variables in this way helps you avoid the single most common beginner
  12. error in Cuda, where you try to access a piece of data on the CPU from the GPU, or vice versa.
  13. If you're accessing data through a pointer on the CPU,
  14. your pointer better point to something in CPU memory, or you're going to have a bad time.
  15. Same thing for the GPU. You'll find lots of Cuda code that you see uses this convention.
  16. So, let's scroll up just a little bit.
  17. And the first interesting thing that you see is how to declare a pointer on the GPU.
  18. It looks just like a pointer declared on the CPU. It's just a float star.
  19. Now to tell Cuda that your data is actually on the GPU, not the CPU, look at the next 2 lines.
  20. We're using cudaMalloc with 2 arguments, the pointer and the number of bytes to allocate.
  21. CudaMalloc means allocate the data on the GPU,
  22. whereas a plain Malloc would mean allocate the data on the CPU.
  23. The next thing we do is actually copy the data from the CPU,
  24. the array h underscore in on to the GPU, the array d underscore in.
  25. This call is cudamMemcpy--it's just like a regular Memcpy, but it takes
  26. 4 arguments instead of 3. The first 3 arguments are the same as
  27. regular C Memcpy, the destination, the source, and the number of bytes.
  28. The fourth argument says the direction of the transfer.
  29. The 3 choices are Cuda memory host to device, Cuda memory device to host,
  30. and Cuda memory device to device.