This is a really important optimization pattern, so let me emphasize it.
In practice, most well-tuned GPU codes are memory-limited. I'll repeat that.
Most, not all, but most GPU codes are memory-limited.
So, always start by measuring your achieved bandwidth to see if you're using memory efficiently.
And if not, ask yourself, why not?
这是一个很重要的优化模式,所以让我强调一下。
在实践中,最完美的 GPU 代码是内存有限的。我要重复这一点。
大多数,并不是所有,但大多数 GPU 代码是内存有限的。
所以,开始总是通过测量你获取的带宽,来看看你是否在有效地使用内存。
如果不是,问问自己,为什么不是?