So we have 2 approaches here--thread per row and thread per element. Which is better?
So we might have different performance on matrices that have a similar number of elements per row,
and we might have differing performance if we have a varying
or even a wildly varying number of elements per row.
So which of these is comparatively better on each of these kinds of matrices?
So I'd like you to put a couple checkboxes in.
我们这有2种方法—每行线程和每元素线程。哪种更好?
对于每行元素数量相似的矩阵,我们可能有不同性能;
我们可能会有不同的性能,如果我们
每行的元素数量不同,或甚至变化非常大。
那么对于这每种矩阵,哪个相对更好?
我希望你填写几个复选框。