The answer, too big: If P has more threads than a thread block is allowed to have,
then we can't use shared memory to share data among all P threads,
because we have to distribute that tile across multiple thread blocks.
Another consideration is making sure that we have at least as many thread blocks as SMs
or else SMs will sit idle.
答案,太大: 如果 P 有比一个线程块
允许拥有的线程更多的线程,
那么我们不能使用共享内存
在P的所有线程间共享数据,
因为我们要跨多个线程块分发该图块。
另一个考虑因素是确保我们有至少与SM一样多的线程块
否则,SMs 将处于空闲状态。