[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:00.52,0:00:02.17,Default,,0000,0000,0000,,In the last video we were able to Dialogue: 0,0:00:02.17,0:00:05.97,Default,,0000,0000,0000,,calculate the total sum of squares for these 9 data points right here, Dialogue: 0,0:00:05.98,0:00:10.03,Default,,0000,0000,0000,,these 9 data points are grouped into three different groups, Dialogue: 0,0:00:10.03,0:00:12.80,Default,,0000,0000,0000,,or if you wanted to speak generally into "m" different groups. Dialogue: 0,0:00:12.90,0:00:17.94,Default,,0000,0000,0000,,What I want to do in this video is to figure out how much of this total sum of squares Dialogue: 0,0:00:17.94,0:00:22.36,Default,,0000,0000,0000,,how much of this is due to variation within each group Dialogue: 0,0:00:22.38,0:00:26.23,Default,,0000,0000,0000,,versus variation between the actual groups. Dialogue: 0,0:00:26.25,0:00:29.97,Default,,0000,0000,0000,,So first let's figure out the total variation within the groups, Dialogue: 0,0:00:29.97,0:00:36.20,Default,,0000,0000,0000,,so let's call that the sum of squares within, I'll do that in yellow, Dialogue: 0,0:00:36.49,0:00:39.94,Default,,0000,0000,0000,,actually I've already used yellow so let's do this, I'm going to do blue. Dialogue: 0,0:00:40.18,0:00:45.91,Default,,0000,0000,0000,,So the sum of squares within. Dialogue: 0,0:00:46.29,0:00:50.85,Default,,0000,0000,0000,,Let me make that clear, that stands for within. Dialogue: 0,0:00:50.89,0:00:53.71,Default,,0000,0000,0000,,So we want to see how much of a variation is Dialogue: 0,0:00:53.71,0:00:57.96,Default,,0000,0000,0000,,due to how far each of these data points are from their central tendencies, Dialogue: 0,0:00:57.96,0:00:59.55,Default,,0000,0000,0000,,from their respective means. Dialogue: 0,0:00:59.55,0:01:02.30,Default,,0000,0000,0000,,So this is going to be equal to-- let's start with these guys. Dialogue: 0,0:01:02.50,0:01:07.22,Default,,0000,0000,0000,,So instead of taking the distance between each data point and the mean of means Dialogue: 0,0:01:07.22,0:01:11.53,Default,,0000,0000,0000,,I'm going to find the distance between each data point and that group's mean Dialogue: 0,0:01:11.55,0:01:16.55,Default,,0000,0000,0000,,because we want to square the total sum of squares Dialogue: 0,0:01:16.55,0:01:20.68,Default,,0000,0000,0000,,between each data point and their respective means Dialogue: 0,0:01:20.72,0:01:25.74,Default,,0000,0000,0000,,3 minus the mean here, it's 2. Squared. Dialogue: 0,0:01:25.76,0:01:30.70,Default,,0000,0000,0000,,+ 2 minus 2 squared, Dialogue: 0,0:01:30.94,0:01:34.48,Default,,0000,0000,0000,,+ 1 minus 2 squared. Dialogue: 0,0:01:34.70,0:01:36.60,Default,,0000,0000,0000,,I'm going to do this for all of the groups, Dialogue: 0,0:01:36.60,0:01:39.52,Default,,0000,0000,0000,,but for each group the distance between it's data point and it's mean Dialogue: 0,0:01:39.56,0:01:57.42,Default,,0000,0000,0000,,so + minus 4 squared, + 3 minus 4 squared, + 4 minus 4 squared Dialogue: 0,0:01:57.42,0:02:00.36,Default,,0000,0000,0000,,and finally we have the third group, Dialogue: 0,0:02:00.38,0:02:04.91,Default,,0000,0000,0000,,and we're finding all of the sum of squares from each point to it's central tendency Dialogue: 0,0:02:04.91,0:02:06.64,Default,,0000,0000,0000,,within that group, we're going to add them all up. Dialogue: 0,0:02:07.14,0:02:09.28,Default,,0000,0000,0000,,And then we find the third group so we have Dialogue: 0,0:02:09.28,0:02:20.55,Default,,0000,0000,0000,,5 minus 6 squared + 6 minus 6 squared, + 7 minus 6 squared. Dialogue: 0,0:02:20.55,0:02:22.39,Default,,0000,0000,0000,,And what is this going to equal? Dialogue: 0,0:02:22.42,0:02:29.05,Default,,0000,0000,0000,,So this is going to be equal to, so up here it is going to be 1 + 0 + 1, Dialogue: 0,0:02:29.55,0:02:31.51,Default,,0000,0000,0000,,that's going to be equal to 2, Dialogue: 0,0:02:31.85,0:02:39.66,Default,,0000,0000,0000,,+ this is going to be equal to 1 + 1 + 0, so another 2, Dialogue: 0,0:02:40.02,0:02:51.13,Default,,0000,0000,0000,,+ this is going to be equal to 1 + 0 + 1, so that's 2 over here. Dialogue: 0,0:02:51.54,0:02:56.47,Default,,0000,0000,0000,,Our total sum of squared within is 6. Dialogue: 0,0:02:56.60,0:03:00.87,Default,,0000,0000,0000,,So one way to think about it, our total variation was 30. Dialogue: 0,0:03:00.87,0:03:08.66,Default,,0000,0000,0000,,Based on that calculation 6 of that 30 comes from variation within these samples. Dialogue: 0,0:03:09.02,0:03:10.94,Default,,0000,0000,0000,,Now the next thing I want to think about is Dialogue: 0,0:03:11.18,0:03:15.56,Default,,0000,0000,0000,,how many degrees of freedom do we have in this calculation Dialogue: 0,0:03:15.56,0:03:19.30,Default,,0000,0000,0000,,how many, kind of, independent data points do we actually have, Dialogue: 0,0:03:19.63,0:03:27.61,Default,,0000,0000,0000,,well for each of these, over here, if you know we have 'n' data points for each one, Dialogue: 0,0:03:27.61,0:03:30.44,Default,,0000,0000,0000,,in particular n is 3 here, but if you know Dialogue: 0,0:03:30.71,0:03:37.90,Default,,0000,0000,0000,,n minus one of them, you can always find the 'n'th one, if you know the actual sample mean. Dialogue: 0,0:03:38.09,0:03:42.13,Default,,0000,0000,0000,,So in this case for any of these groups if you know 2 of these data points, Dialogue: 0,0:03:42.13,0:03:43.41,Default,,0000,0000,0000,,you can always figure out the third. Dialogue: 0,0:03:43.41,0:03:44.55,Default,,0000,0000,0000,,If you know these two, you can always Dialogue: 0,0:03:44.55,0:03:46.77,Default,,0000,0000,0000,,figure out the third if you can figure out the sample mean. Dialogue: 0,0:03:47.13,0:03:50.42,Default,,0000,0000,0000,,So in general let's figure out the degrees of freedom here. Dialogue: 0,0:03:50.42,0:03:57.33,Default,,0000,0000,0000,,You have, for each group, when you did this you had 'n' minus one degrees of freedom. Dialogue: 0,0:03:57.37,0:04:03.97,Default,,0000,0000,0000,,Remember 'n' is the number of data points you had in each group, Dialogue: 0,0:04:03.97,0:04:09.31,Default,,0000,0000,0000,,so you have n minus one degrees of freedom for each of these groups, Dialogue: 0,0:04:09.35,0:04:12.43,Default,,0000,0000,0000,,so it's n-1, n-1, n-1, Dialogue: 0,0:04:12.48,0:04:19.21,Default,,0000,0000,0000,,or you have, let me put it this way, you have 'n-1' for each of these groups, and Dialogue: 0,0:04:19.38,0:04:21.66,Default,,0000,0000,0000,,and there are m groups. Dialogue: 0,0:04:21.66,0:04:28.89,Default,,0000,0000,0000,,So there's m times n-1 degrees of freedom. Dialogue: 0,0:04:28.91,0:04:32.79,Default,,0000,0000,0000,,In this particular case, each group, n -1 is two Dialogue: 0,0:04:32.79,0:04:34.97,Default,,0000,0000,0000,,or each case, you have 2 degrees of freedom Dialogue: 0,0:04:34.97,0:04:45.68,Default,,0000,0000,0000,,and there's three groups about the there are 6 degrees of freedom. Dialogue: 0,0:04:46.10,0:04:51.34,Default,,0000,0000,0000,,In the future we may do a more detailed discussion of what degrees of freedom mean Dialogue: 0,0:04:51.34,0:04:54.38,Default,,0000,0000,0000,,how to mathematically think about it. Dialogue: 0,0:04:54.38,0:04:58.47,Default,,0000,0000,0000,,But the simplest way to think about it is really truly independent data points. Dialogue: 0,0:04:58.49,0:05:01.18,Default,,0000,0000,0000,,Assuming you knew in this case the central statistic Dialogue: 0,0:05:01.18,0:05:04.67,Default,,0000,0000,0000,,that we used to calculate the squared distances of each of these, if you know them already Dialogue: 0,0:05:04.80,0:05:08.23,Default,,0000,0000,0000,,the third data point actually could be calculated from the other 2. Dialogue: 0,0:05:08.23,0:05:10.49,Default,,0000,0000,0000,,So we have 6 degrees of freedom over here. Dialogue: 0,0:05:10.72,0:05:18.09,Default,,0000,0000,0000,,Now that was how much of the total variation is due to variation within each sample. Dialogue: 0,0:05:18.31,0:05:23.80,Default,,0000,0000,0000,,Now think about how much of the variation is due to variation between between the sample. Dialogue: 0,0:05:25.44,0:05:29.38,Default,,0000,0000,0000,,And to do that, we're going to calculate-- get a nice color here-- Dialogue: 0,0:05:29.39,0:05:30.75,Default,,0000,0000,0000,,I think I've run out of all the colors-- Dialogue: 0,0:05:30.75,0:05:40.57,Default,,0000,0000,0000,,we'll call it sum of squares between, the B stands for between. Dialogue: 0,0:05:41.09,0:05:44.56,Default,,0000,0000,0000,,So another way to think about it, how much of this total variation Dialogue: 0,0:05:44.56,0:05:49.30,Default,,0000,0000,0000,,is due to the variation between the means, between the central tendency Dialogue: 0,0:05:49.38,0:05:50.99,Default,,0000,0000,0000,,that's what we're going to calculate right now and Dialogue: 0,0:05:50.99,0:05:56.43,Default,,0000,0000,0000,,how much is due to variation from each data points to its mean. Dialogue: 0,0:05:56.74,0:06:01.48,Default,,0000,0000,0000,,Let's figure out how much is due variation between these guys over here. Dialogue: 0,0:06:01.50,0:06:06.84,Default,,0000,0000,0000,,One way to think about it for each of these data points-- Dialogue: 0,0:06:06.85,0:06:09.36,Default,,0000,0000,0000,,let's just think about this first group. Dialogue: 0,0:06:09.53,0:06:12.85,Default,,0000,0000,0000,,For this first group, how much variation for each of these guys is Dialogue: 0,0:06:12.85,0:06:18.23,Default,,0000,0000,0000,,due to the variation between this mean and the mean of means. Dialogue: 0,0:06:18.73,0:06:23.20,Default,,0000,0000,0000,,For the first guy up here-- I'll just write it all out explicitly-- Dialogue: 0,0:06:23.60,0:06:31.00,Default,,0000,0000,0000,,the variation is going to be its sample mean, 2, minus the mean of means, squared. Dialogue: 0,0:06:31.03,0:06:33.01,Default,,0000,0000,0000,,And then for this guy, it's going to be the same thing. Dialogue: 0,0:06:33.01,0:06:36.88,Default,,0000,0000,0000,,His sample mean, 2, minus the mean of means, squared. Dialogue: 0,0:06:37.65,0:06:39.22,Default,,0000,0000,0000,,Plus same thing for this guy. Dialogue: 0,0:06:39.25,0:06:41.92,Default,,0000,0000,0000,,His sample mean, 2, minus the mean of means, squared. Dialogue: 0,0:06:41.92,0:06:52.20,Default,,0000,0000,0000,,Or another way to think about it, this is equal to 3 times 2-4 squared, Dialogue: 0,0:06:52.44,0:07:02.65,Default,,0000,0000,0000,,which is the same thing as 3 times 4. It's equal to 12. Dialogue: 0,0:07:02.82,0:07:05.81,Default,,0000,0000,0000,,I can do it for each of them. I actually want to find the total sum. Dialogue: 0,0:07:05.81,0:07:08.64,Default,,0000,0000,0000,,Let me just write it all out. I think that might be an easier thing to do. Dialogue: 0,0:07:09.12,0:07:13.23,Default,,0000,0000,0000,,For all of these guys combined Dialogue: 0,0:07:13.23,0:07:18.04,Default,,0000,0000,0000,,the sum of squares due to the differences between the samples. Dialogue: 0,0:07:18.04,0:07:21.46,Default,,0000,0000,0000,,So that's from the first sample, the contribution from the first sample. Dialogue: 0,0:07:21.47,0:07:23.13,Default,,0000,0000,0000,,And then from the second sample, Dialogue: 0,0:07:23.44,0:07:28.76,Default,,0000,0000,0000,,you have this guy here, five-- sorry, you don't want to calculate him. Dialogue: 0,0:07:28.77,0:07:33.04,Default,,0000,0000,0000,,For this data point, the amount of variation due to the difference between the means Dialogue: 0,0:07:33.04,0:07:37.53,Default,,0000,0000,0000,,is going to be 4-4 squared Dialogue: 0,0:07:37.77,0:07:41.09,Default,,0000,0000,0000,,Same thing for this guys, would be 4-4 squared. Dialogue: 0,0:07:41.10,0:07:45.61,Default,,0000,0000,0000,,We're not taking it into consideration. We're only taking its sample mean into consideration. Dialogue: 0,0:07:45.92,0:07:49.11,Default,,0000,0000,0000,,And then finally + 4-4 square. Dialogue: 0,0:07:49.12,0:07:50.37,Default,,0000,0000,0000,,We're taking this Dialogue: 0,0:07:50.37,0:07:53.50,Default,,0000,0000,0000,,minus this squared for each of these data points. Dialogue: 0,0:07:53.50,0:07:57.24,Default,,0000,0000,0000,,And then finally we'll do that with the last group. Dialogue: 0,0:07:57.55,0:08:09.94,Default,,0000,0000,0000,,Sample mean is 6, so it's going to be 6-4 squared plus 6-4 squared plus 6-4 squared. Dialogue: 0,0:08:10.37,0:08:12.07,Default,,0000,0000,0000,,Now, let's think about Dialogue: 0,0:08:12.07,0:08:19.49,Default,,0000,0000,0000,,how many degrees of freedom we had in this calculation right over here. Dialogue: 0,0:08:19.94,0:08:24.65,Default,,0000,0000,0000,,Well, in general, I guess the easiest way to think about it is, Dialogue: 0,0:08:24.65,0:08:28.41,Default,,0000,0000,0000,,how much information do we have, assuming that we knew the mean of means? Dialogue: 0,0:08:28.41,0:08:31.31,Default,,0000,0000,0000,,If we know the mean of means, how much here is new information? Dialogue: 0,0:08:31.92,0:08:37.16,Default,,0000,0000,0000,,If you know 2 of these if you know the mean of the means and you know 2 of the sample means, Dialogue: 0,0:08:37.16,0:08:38.47,Default,,0000,0000,0000,,you can always figure out the third. Dialogue: 0,0:08:38.47,0:08:40.59,Default,,0000,0000,0000,,If you know this one and this one, you can figure out that one. Dialogue: 0,0:08:40.70,0:08:42.71,Default,,0000,0000,0000,,If you know that one and that one, you can figure out that one. Dialogue: 0,0:08:42.71,0:08:46.19,Default,,0000,0000,0000,,That's because this is the mean of these means over here. Dialogue: 0,0:08:46.36,0:08:51.53,Default,,0000,0000,0000,,So in general, if you m groups or if you have m means, Dialogue: 0,0:08:51.66,0:09:05.88,Default,,0000,0000,0000,,there are m-1 degrees of freedom here. Dialogue: 0,0:09:05.91,0:09:08.90,Default,,0000,0000,0000,,With that said, in this case m is 3. Dialogue: 0,0:09:08.90,0:09:14.76,Default,,0000,0000,0000,,So we could say, there's 2 degrees of freedom for this exact example. Dialogue: 0,0:09:14.76,0:09:18.67,Default,,0000,0000,0000,,Let's actually calculate the sum of squares between. So what is this going to be? Dialogue: 0,0:09:19.12,0:09:29.34,Default,,0000,0000,0000,,This is going to be equal to, this right here is, 2-4 is -2, squared is 4. Dialogue: 0,0:09:29.35,0:09:33.23,Default,,0000,0000,0000,,And then we have three fours over here, so three times four. Dialogue: 0,0:09:33.59,0:09:51.07,Default,,0000,0000,0000,,Plus 3 times 0, plus 3 times (6-4)2, which is 3 times 4. So plus 3 times 4. Dialogue: 0,0:09:51.28,0:09:59.73,Default,,0000,0000,0000,,And we get 3 times 4 is 12 + 0 + 12, is equal to 24. Dialogue: 0,0:09:59.75,0:10:03.96,Default,,0000,0000,0000,,So the sum of squares, or the variation due to Dialogue: 0,0:10:03.96,0:10:08.69,Default,,0000,0000,0000,,what's the difference between the groups, between the means is 24. Dialogue: 0,0:10:08.98,0:10:11.57,Default,,0000,0000,0000,,Not let's put this altogether. We said that Dialogue: 0,0:10:11.57,0:10:17.82,Default,,0000,0000,0000,,the total variation when you look at all 9 data points, is 30. Dialogue: 0,0:10:17.82,0:10:19.35,Default,,0000,0000,0000,,Let me write that over here. Dialogue: 0,0:10:19.80,0:10:25.50,Default,,0000,0000,0000,,So the total sum of squares is equal to 30. Dialogue: 0,0:10:25.88,0:10:32.59,Default,,0000,0000,0000,,We figured out the sum of squares between each data point and its central tendency, its sample Dialogue: 0,0:10:32.59,0:10:39.64,Default,,0000,0000,0000,,mean, we figure out and we totaled it all up, we got 6 for the sum of squares within. Dialogue: 0,0:10:40.14,0:10:48.80,Default,,0000,0000,0000,,The sum of squares within was equal to 6. In this case, it was 6 degrees of freedom. Dialogue: 0,0:10:48.81,0:10:54.43,Default,,0000,0000,0000,,If we wanted to write generally, there were m times n-1 degrees of freedom. Dialogue: 0,0:10:54.65,0:11:03.30,Default,,0000,0000,0000,,Actually for the total, we figured out we had m times n -1 degrees of freedom. Dialogue: 0,0:11:03.32,0:11:06.14,Default,,0000,0000,0000,,Let me write the degrees of freedom in this column over here. Dialogue: 0,0:11:06.24,0:11:09.24,Default,,0000,0000,0000,,In this case, the number turned out to be 8. Dialogue: 0,0:11:09.24,0:11:13.93,Default,,0000,0000,0000,,And then just now, we calculated the sum of squares between the samples. Dialogue: 0,0:11:14.18,0:11:18.18,Default,,0000,0000,0000,,The sum of squares between the samples is equal to 24 Dialogue: 0,0:11:18.18,0:11:24.20,Default,,0000,0000,0000,,and we figured out that it had m-1 degrees of freedom which ended up being 2. Dialogue: 0,0:11:24.56,0:11:31.21,Default,,0000,0000,0000,,Now the interesting thing here-- this is why this analysis of variance all fits nicely together. Dialogue: 0,0:11:31.23,0:11:35.23,Default,,0000,0000,0000,,In future videos we will think about how we can actually test hypotheses Dialogue: 0,0:11:35.23,0:11:38.04,Default,,0000,0000,0000,,using some of the tools that we're thinking about right now-- Dialogue: 0,0:11:38.30,0:11:42.70,Default,,0000,0000,0000,,is that the sum of squares within plus the sum of squares between Dialogue: 0,0:11:42.70,0:11:44.94,Default,,0000,0000,0000,,is equal to the total sum of squares. Dialogue: 0,0:11:45.04,0:11:50.68,Default,,0000,0000,0000,,So the way to think about is that the total variation in this data right here Dialogue: 0,0:11:50.68,0:11:55.80,Default,,0000,0000,0000,,can be described as the sum of the variation within each of these groups Dialogue: 0,0:11:55.80,0:11:57.80,Default,,0000,0000,0000,,when you take that total Dialogue: 0,0:11:58.13,0:12:03.75,Default,,0000,0000,0000,,plus the sum of the variation between the groups. Dialogue: 0,0:12:03.77,0:12:05.97,Default,,0000,0000,0000,,And even the degrees of freedom work out. Dialogue: 0,0:12:05.97,0:12:08.90,Default,,0000,0000,0000,,The sum of squares between has 2 degrees of freedom. Dialogue: 0,0:12:08.96,0:12:12.73,Default,,0000,0000,0000,,The sum of squares within each of the groups had 6 degrees of freedom. Dialogue: 0,0:12:12.74,0:12:14.19,Default,,0000,0000,0000,,2+6 is 8. Dialogue: 0,0:12:14.23,0:12:19.12,Default,,0000,0000,0000,,That's the total degrees of freedom we have for all of the data combined. Dialogue: 0,0:12:19.12,0:12:22.91,Default,,0000,0000,0000,,It even works if you look at the more general. Dialogue: 0,0:12:22.93,0:12:26.73,Default,,0000,0000,0000,,Our sum of squares between had m-1 degrees of freedom. Dialogue: 0,0:12:27.07,0:12:33.14,Default,,0000,0000,0000,,Our sum of squares within had m(n-1) degrees of freedom. Dialogue: 0,0:12:33.31,0:12:37.90,Default,,0000,0000,0000,,This is equal to m-1+mn-m. Dialogue: 0,0:12:38.28,0:12:43.90,Default,,0000,0000,0000,,These guys cancel out. This is equal to mn-1 degrees of freedom, Dialogue: 0,0:12:43.92,0:12:48.61,Default,,0000,0000,0000,,which is exactly the total degrees of freedom we have for the total sum of squares. Dialogue: 0,0:12:48.94,0:12:53.66,Default,,0000,0000,0000,,So the whole point of the calculations that we did in the last and this video Dialogue: 0,0:12:53.67,0:12:58.88,Default,,0000,0000,0000,,is just to appreciate that this total variation over here Dialogue: 0,0:12:58.88,0:13:04.16,Default,,0000,0000,0000,,can be viewed as the sum of these two component variations, Dialogue: 0,0:13:04.40,0:13:12.15,Default,,0000,0000,0000,,how much variation within each of the samples Dialogue: 0,0:13:12.25,0:13:16.91,Default,,0000,0000,0000,,plus how much variation is there between the means of the samples. Dialogue: 0,0:13:16.91,0:13:18.58,Default,,0000,0000,0000,,Hopefully that's not too confusing.