﻿[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:00.20,0:00:04.03,Default,,0000,0000,0000,,So let's start by looking at how we might implement this as simply as possible, Dialogue: 0,0:00:04.03,0:00:09.40,Default,,0000,0000,0000,,and we're going to begin by thinking about this computation as working on an N by N matrix. Dialogue: 0,0:00:09.40,0:00:12.66,Default,,0000,0000,0000,,The N bodies on the X-axis here are the source of each force, Dialogue: 0,0:00:12.66,0:00:17.08,Default,,0000,0000,0000,,and the N bodies on the Y-axis here are the destination of each force. Dialogue: 0,0:00:17.08,0:00:19.56,Default,,0000,0000,0000,,The source and destination objects are the same objects, of course. Dialogue: 0,0:00:19.56,0:00:22.06,Default,,0000,0000,0000,,This is just how we're organizing the computation. Dialogue: 0,0:00:22.06,0:00:24.74,Default,,0000,0000,0000,,So the matrix element at the position X,Y Dialogue: 0,0:00:24.74,0:00:30.21,Default,,0000,0000,0000,,represents the force that source element X exerts on destination element Y, Dialogue: 0,0:00:30.21,0:00:33.27,Default,,0000,0000,0000,,and what we want to do is calculate the total amount of force Dialogue: 0,0:00:33.27,0:00:36.48,Default,,0000,0000,0000,,from all source objects on each destination object. Dialogue: 0,0:00:36.48,0:00:40.15,Default,,0000,0000,0000,,This formulation of the problem exhibits a lot of parallelism. Dialogue: 0,0:00:40.15,0:00:43.68,Default,,0000,0000,0000,,Each of the N-squared forces can be computed in parallel. Dialogue: 0,0:00:43.68,0:00:48.98,Default,,0000,0000,0000,,We just load the parameters of the source and destination thread to do one particular computation, Dialogue: 0,0:00:48.98,0:00:52.36,Default,,0000,0000,0000,,and compute the force, say, using gravitational attraction. Dialogue: 0,0:00:52.36,0:00:54.86,Default,,0000,0000,0000,,This requires specifying parameters for each object. Dialogue: 0,0:00:54.86,0:00:58.83,Default,,0000,0000,0000,,For gravity, this would be, say, the mass and the position, X, Y, Z. Dialogue: 0,0:00:58.83,0:01:03.100,Default,,0000,0000,0000,,Then each object, say object Y, must add up all N forces that act on it. Dialogue: 0,0:01:03.100,0:01:07.83,Default,,0000,0000,0000,,Then, we have to calculate N reductions of N forces also in parallel. Dialogue: 0,0:01:07.83,0:01:12.84,Default,,0000,0000,0000,,Since we want to run this computation on tens to hundreds of thousands, maybe even millions of elements, Dialogue: 0,0:01:12.84,0:01:15.72,Default,,0000,0000,0000,,we can see that we have a lot of parallelism to exploit. Dialogue: 0,0:01:15.72,0:01:18.23,Default,,0000,0000,0000,,In this case, the computation would be very straightforward. Dialogue: 0,0:01:18.23,0:01:23.29,Default,,0000,0000,0000,,We will now see that if we run it in this way, we would be making poor use of our memory bandwidth, Dialogue: 0,0:01:23.29,0:01:29.13,Default,,0000,0000,0000,,so if we assume that all of our N-squared force computations running in parallel must go to global memory, Dialogue: 0,0:01:29.13,0:01:32.68,Default,,0000,0000,0000,,how many times must we fetch each element as a function of N?