English subtitles

← Assorted Math Optimizations - Intro to Parallel Programming

Get Embed Code
2 Languages

Showing Revision 4 created 05/25/2016 by Udacity Robot.

  1. It's also worth noting that different math instructions take different amounts of time.
  2. And this topic gets maybe half a ninja.
  3. You can go really deep understanding the latencies involved in different math optimizations,
  4. but there's a few general principles that probably everybody should keep in mind.
  5. So the first to keep in mind is use double precision only when you really mean it.
  6. 64-bit math is slower than 32-bit math,
  7. but it's easy to forget that floating point literals like 2.5 here are interpreted as fp64
  8. unless you add the f suffix.
  9. Therefore, this statement on the left will take longer to execute than this one on the right.
  10. It's a subtle distinction, and clearly sometimes you need to use double precision,
  11. but if you're concerned about performance
  12. and you're trying to squeeze the last few percent out of your kernels,
  13. only use it when you're absolutely intending to use it.
  14. A second math-oriented optimization is to use intrinsics whenever possible for common operations.
  15. CUDA supports special versions of many common math operations,
  16. like sine and cosine and exponent, that are called intrinsics.
  17. These built-in functions achieve 2 to 3 bits less precision than their counterparts in math.h,
  18. but they are much faster.
  19. There are also compiler flags for fast square root, fast division, 0D norms and so forth.
  20. And you can see the programming guide for more detail.