[Script Info]
Title:
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.82,0:00:03.67,Default,,0000,0000,0000,,When performing linear regression, we have a number of data points.
Dialogue: 0,0:00:03.67,0:00:09.04,Default,,0000,0000,0000,,Let's say that we have 1, 2, 3 and so on up through M data points.
Dialogue: 0,0:00:09.04,0:00:12.01,Default,,0000,0000,0000,,Each data point has an output variable, Y,
Dialogue: 0,0:00:12.01,0:00:15.40,Default,,0000,0000,0000,,and a number of input variables, X1 through X N.
Dialogue: 0,0:00:16.41,0:00:20.40,Default,,0000,0000,0000,,So in our baseball example Y is the lifetime number of home runs.
Dialogue: 0,0:00:20.40,0:00:24.37,Default,,0000,0000,0000,,And our X1 and XN are things like height and weight.
Dialogue: 0,0:00:24.37,0:00:27.76,Default,,0000,0000,0000,,Our one through M samples might be different baseball players.
Dialogue: 0,0:00:27.76,0:00:32.57,Default,,0000,0000,0000,,So maybe data point one is Derek Jeter, data point two is Barry Bonds, and
Dialogue: 0,0:00:32.57,0:00:34.87,Default,,0000,0000,0000,,data point M is Babe Ruth.
Dialogue: 0,0:00:34.87,0:00:37.82,Default,,0000,0000,0000,,Generally speaking, we are trying to predict the values of
Dialogue: 0,0:00:37.82,0:00:41.78,Default,,0000,0000,0000,,the output variable for each data point, by multiplying the input variables by
Dialogue: 0,0:00:41.78,0:00:45.58,Default,,0000,0000,0000,,some set of coefficients that we're going to call theta 1 through theta N.
Dialogue: 0,0:00:45.58,0:00:49.10,Default,,0000,0000,0000,,Each theta, which we'll from here on out call the parameters or
Dialogue: 0,0:00:49.10,0:00:52.83,Default,,0000,0000,0000,,the weights of the model, tell us how important an input variable is
Dialogue: 0,0:00:52.83,0:00:55.59,Default,,0000,0000,0000,,when predicting a value for the output variable.
Dialogue: 0,0:00:55.59,0:00:57.38,Default,,0000,0000,0000,,So if theta 1 is very small,
Dialogue: 0,0:00:57.38,0:01:01.88,Default,,0000,0000,0000,,X1 must not be very important in general when predicting Y.
Dialogue: 0,0:01:01.88,0:01:03.94,Default,,0000,0000,0000,,Whereas if theta N is very large,
Dialogue: 0,0:01:03.94,0:01:07.30,Default,,0000,0000,0000,,then XN is generally a big contributor to the value of Y.
Dialogue: 0,0:01:07.30,0:01:10.42,Default,,0000,0000,0000,,This model is built in such a way that we can multiply each X by
Dialogue: 0,0:01:10.42,0:01:13.50,Default,,0000,0000,0000,,the corresponding theta, and sum them up to get Y.
Dialogue: 0,0:01:13.50,0:01:17.08,Default,,0000,0000,0000,,So that our final equation will look something like the equation down here.
Dialogue: 0,0:01:17.08,0:01:20.25,Default,,0000,0000,0000,,Theta 1 plus X1 plus theta 2 times X2,
Dialogue: 0,0:01:20.25,0:01:23.95,Default,,0000,0000,0000,,all the way up to theta N plus XN equals Y.
Dialogue: 0,0:01:23.95,0:01:26.67,Default,,0000,0000,0000,,And we'd want to be able to predict Y for each of our M data points.
Dialogue: 0,0:01:27.78,0:01:31.84,Default,,0000,0000,0000,,In this illustration, the dark blue points represent our reserve data points,
Dialogue: 0,0:01:31.84,0:01:34.71,Default,,0000,0000,0000,,whereas the green line shows the predictive value of Y for
Dialogue: 0,0:01:34.71,0:01:37.60,Default,,0000,0000,0000,,every value of X given the model that we may have created.
Dialogue: 0,0:01:37.60,0:01:41.38,Default,,0000,0000,0000,,The best equation is the one that's going to minimize the difference across all
Dialogue: 0,0:01:41.38,0:01:44.16,Default,,0000,0000,0000,,data points between our predicted Y, and our observed Y.
Dialogue: 0,0:01:45.28,0:01:48.88,Default,,0000,0000,0000,,What we need to do is find the thetas that produce the best predictions.
Dialogue: 0,0:01:48.88,0:01:52.96,Default,,0000,0000,0000,,That is, making these differences as small as possible.
Dialogue: 0,0:01:52.96,0:01:56.65,Default,,0000,0000,0000,,If we wanted to create a value that describes the total areas of our model,
Dialogue: 0,0:01:56.65,0:01:58.38,Default,,0000,0000,0000,,we'd probably sum up the areas.
Dialogue: 0,0:01:58.38,0:02:02.44,Default,,0000,0000,0000,,That is, sum over all of our data points from I equals 1, to M.
Dialogue: 0,0:02:02.44,0:02:05.29,Default,,0000,0000,0000,,The predicted Y minus the actual Y.
Dialogue: 0,0:02:05.29,0:02:08.32,Default,,0000,0000,0000,,However, since these errors can be both negative and
Dialogue: 0,0:02:08.32,0:02:11.57,Default,,0000,0000,0000,,positive, if we simply sum them up, we could have
Dialogue: 0,0:02:11.57,0:02:17.04,Default,,0000,0000,0000,,a total error term that's very close to 0, even if our model is very wrong.
Dialogue: 0,0:02:17.04,0:02:20.71,Default,,0000,0000,0000,,In order to correct this, rather than simply adding up the error terms,
Dialogue: 0,0:02:20.71,0:02:23.10,Default,,0000,0000,0000,,we're going to add the square of the error terms.
Dialogue: 0,0:02:23.10,0:02:26.94,Default,,0000,0000,0000,,This guarantees that the magnitude of each individual error term,
Dialogue: 0,0:02:26.94,0:02:29.68,Default,,0000,0000,0000,,Y predicted minus Y actual is positive.
Dialogue: 0,0:02:30.92,0:02:33.62,Default,,0000,0000,0000,,Why don't we make sure the distinction between input of variables and
Dialogue: 0,0:02:33.62,0:02:34.83,Default,,0000,0000,0000,,output of variables is clear.