  1. This is a really important optimization pattern, so let me emphasize it.
  2. In practice, most well-tuned GPU codes are memory-limited. I'll repeat that.
  3. Most, not all, but most GPU codes are memory-limited.
  4. So, always start by measuring your achieved bandwidth to see if you're using memory efficiently.
  5. And if not, ask yourself, why not?