
Title:
Configuring the Kernel Launch Parameters 2  Intro to Parallel Programming

Description:

The most general kernel launch we can do looks like thi:, square of 3 parameters.

The first is the dimensionality of the grid of blocks

that has bx X by X bz blocks.

Each one of those blocks is specified by this parameter: the block of threads that has tx X ty X tz threads in it,

and recall that this has a maximum size.

Finally, there's a third argument that defaults to zero if you don't use it,

and we're not going to cover it specifically today.

It's the amount of shared memory in bytes allocated per thread block.

With this one kernel call, you can launch an enormous number of threads.

And let's all remember, with great power comes great responsibility, so launch your kernels wisely.

One more important thing about blocks and threads.

Recall from our square kernel, that each thread knows its thread ID within a block.

It actually knows many things.

First is threaded x, as we've seen, which thread it is within the block.

Here we have a block.

Each thread, say this thread here, knows its index in each of the x, y, and z dimensions,

and we can access those as thread idx.x, thread idx.y, and dot z.

We also know block Dim, the size of a block.

How many threads are there in this block

along the x dimension, the y dimension, and potentially the z dimension?

So we know those two things for a block.

We know the analogous things for a grid.

Block index for instance is which block am I in within the grid. Again dot x, dot y, and dot z.

And grid Dim will tell us the size of the grid, how many blocks there are

in the x dimension, the y dimension, and the z dimension.

What I want you to take home from this little discussion is only the following.

It's convenient to have multidimensional grids and blocks when your problem has multiple dimensions.

CUDA implements this natively and efficiently.

When you call thread at idx.x, or block dim.y, that's a very efficient thing within CUDA.

Since we're doing image processing in this course,

you should be counting on finding a lot of two dimensional grids and blocks.

So, let's wrap up with a little quiz.

Let's say I launch the following kernel.

Kernel with 2 parameters dim 3 (8, 4, 2, 2) and dim 3 (16, 16).

How many blocks will this call launch,

how many threads per block, and how many total threads?