[Script Info]
Title:
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.20,0:00:04.03,Default,,0000,0000,0000,,So let's start by looking at how we might implement this as simply as possible,
Dialogue: 0,0:00:04.03,0:00:09.40,Default,,0000,0000,0000,,and we're going to begin by thinking about this computation as working on an N by N matrix.
Dialogue: 0,0:00:09.40,0:00:12.66,Default,,0000,0000,0000,,The N bodies on the X-axis here are the source of each force,
Dialogue: 0,0:00:12.66,0:00:17.08,Default,,0000,0000,0000,,and the N bodies on the Y-axis here are the destination of each force.
Dialogue: 0,0:00:17.08,0:00:19.56,Default,,0000,0000,0000,,The source and destination objects are the same objects, of course.
Dialogue: 0,0:00:19.56,0:00:22.06,Default,,0000,0000,0000,,This is just how we're organizing the computation.
Dialogue: 0,0:00:22.06,0:00:24.74,Default,,0000,0000,0000,,So the matrix element at the position X,Y
Dialogue: 0,0:00:24.74,0:00:30.21,Default,,0000,0000,0000,,represents the force that source element X exerts on destination element Y,
Dialogue: 0,0:00:30.21,0:00:33.27,Default,,0000,0000,0000,,and what we want to do is calculate the total amount of force
Dialogue: 0,0:00:33.27,0:00:36.48,Default,,0000,0000,0000,,from all source objects on each destination object.
Dialogue: 0,0:00:36.48,0:00:40.15,Default,,0000,0000,0000,,This formulation of the problem exhibits a lot of parallelism.
Dialogue: 0,0:00:40.15,0:00:43.68,Default,,0000,0000,0000,,Each of the N-squared forces can be computed in parallel.
Dialogue: 0,0:00:43.68,0:00:48.98,Default,,0000,0000,0000,,We just load the parameters of the source and destination thread to do one particular computation,
Dialogue: 0,0:00:48.98,0:00:52.36,Default,,0000,0000,0000,,and compute the force, say, using gravitational attraction.
Dialogue: 0,0:00:52.36,0:00:54.86,Default,,0000,0000,0000,,This requires specifying parameters for each object.
Dialogue: 0,0:00:54.86,0:00:58.83,Default,,0000,0000,0000,,For gravity, this would be, say, the mass and the position, X, Y, Z.
Dialogue: 0,0:00:58.83,0:01:03.100,Default,,0000,0000,0000,,Then each object, say object Y, must add up all N forces that act on it.
Dialogue: 0,0:01:03.100,0:01:07.83,Default,,0000,0000,0000,,Then, we have to calculate N reductions of N forces also in parallel.
Dialogue: 0,0:01:07.83,0:01:12.84,Default,,0000,0000,0000,,Since we want to run this computation on tens to hundreds of thousands, maybe even millions of elements,
Dialogue: 0,0:01:12.84,0:01:15.72,Default,,0000,0000,0000,,we can see that we have a lot of parallelism to exploit.
Dialogue: 0,0:01:15.72,0:01:18.23,Default,,0000,0000,0000,,In this case, the computation would be very straightforward.
Dialogue: 0,0:01:18.23,0:01:23.29,Default,,0000,0000,0000,,We will now see that if we run it in this way, we would be making poor use of our memory bandwidth,
Dialogue: 0,0:01:23.29,0:01:29.13,Default,,0000,0000,0000,,so if we assume that all of our N-squared force computations running in parallel must go to global memory,
Dialogue: 0,0:01:29.13,0:01:32.68,Default,,0000,0000,0000,,how many times must we fetch each element as a function of N?