< Return to Video

ANOVA 2 - Calculating SSW and SSB (Total Sum of Squares Within and Between).avi

  • 0:01 - 0:02
    In the last video we were able to
  • 0:02 - 0:06
    calculate the total sum of squares for these 9 data points right here,
  • 0:06 - 0:10
    these 9 data points are grouped into three different groups,
  • 0:10 - 0:13
    or if you wanted to speak generally into "m" different groups.
  • 0:13 - 0:18
    What I want to do in this video is to figure out how much of this total sum of squares
  • 0:18 - 0:22
    how much of this is due to variation within each group
  • 0:22 - 0:26
    versus variation between the actual groups.
  • 0:26 - 0:30
    So first let's figure out the total variation within the groups,
  • 0:30 - 0:36
    so let's call that the sum of squares within, I'll do that in yellow,
  • 0:36 - 0:40
    actually I've already used yellow so let's do this, I'm going to do blue.
  • 0:40 - 0:46
    So the sum of squares within.
  • 0:46 - 0:51
    Let me make that clear, that stands for within.
  • 0:51 - 0:54
    So we want to see how much of a variation is
  • 0:54 - 0:58
    due to how far each of these data points are from their central tendencies,
  • 0:58 - 1:00
    from their respective means.
  • 1:00 - 1:02
    So this is going to be equal to-- let's start with these guys.
  • 1:02 - 1:07
    So instead of taking the distance between each data point and the mean of means
  • 1:07 - 1:12
    I'm going to find the distance between each data point and that group's mean
  • 1:12 - 1:17
    because we want to square the total sum of squares
  • 1:17 - 1:21
    between each data point and their respective means
  • 1:21 - 1:26
    3 minus the mean here, it's 2. Squared.
  • 1:26 - 1:31
    + 2 minus 2 squared,
  • 1:31 - 1:34
    + 1 minus 2 squared.
  • 1:35 - 1:37
    I'm going to do this for all of the groups,
  • 1:37 - 1:40
    but for each group the distance between it's data point and it's mean
  • 1:40 - 1:57
    so + minus 4 squared, + 3 minus 4 squared, + 4 minus 4 squared
  • 1:57 - 2:00
    and finally we have the third group,
  • 2:00 - 2:05
    and we're finding all of the sum of squares from each point to it's central tendency
  • 2:05 - 2:07
    within that group, we're going to add them all up.
  • 2:07 - 2:09
    And then we find the third group so we have
  • 2:09 - 2:21
    5 minus 6 squared + 6 minus 6 squared, + 7 minus 6 squared.
  • 2:21 - 2:22
    And what is this going to equal?
  • 2:22 - 2:29
    So this is going to be equal to, so up here it is going to be 1 + 0 + 1,
  • 2:30 - 2:32
    that's going to be equal to 2,
  • 2:32 - 2:40
    + this is going to be equal to 1 + 1 + 0, so another 2,
  • 2:40 - 2:51
    + this is going to be equal to 1 + 0 + 1, so that's 2 over here.
  • 2:52 - 2:56
    Our total sum of squared within is 6.
  • 2:57 - 3:01
    So one way to think about it, our total variation was 30.
  • 3:01 - 3:09
    Based on that calculation 6 of that 30 comes from variation within these samples.
  • 3:09 - 3:11
    Now the next thing I want to think about is
  • 3:11 - 3:16
    how many degrees of freedom do we have in this calculation
  • 3:16 - 3:19
    how many, kind of, independent data points do we actually have,
  • 3:20 - 3:28
    well for each of these, over here, if you know we have 'n' data points for each one,
  • 3:28 - 3:30
    in particular n is 3 here, but if you know
  • 3:31 - 3:38
    n minus one of them, you can always find the 'n'th one, if you know the actual sample mean.
  • 3:38 - 3:42
    So in this case for any of these groups if you know 2 of these data points,
  • 3:42 - 3:43
    you can always figure out the third.
  • 3:43 - 3:45
    If you know these two, you can always
  • 3:45 - 3:47
    figure out the third if you can figure out the sample mean.
  • 3:47 - 3:50
    So in general let's figure out the degrees of freedom here.
  • 3:50 - 3:57
    You have, for each group, when you did this you had 'n' minus one degrees of freedom.
  • 3:57 - 4:04
    Remember 'n' is the number of data points you had in each group,
  • 4:04 - 4:09
    so you have n minus one degrees of freedom for each of these groups,
  • 4:09 - 4:12
    so it's n-1, n-1, n-1,
  • 4:12 - 4:19
    or you have, let me put it this way, you have 'n-1' for each of these groups, and
  • 4:19 - 4:22
    and there are m groups.
  • 4:22 - 4:29
    So there's m times n-1 degrees of freedom.
  • 4:29 - 4:33
    In this particular case, each group, n -1 is two
  • 4:33 - 4:35
    or each case, you have 2 degrees of freedom
  • 4:35 - 4:46
    and there's three groups about the there are 6 degrees of freedom.
  • 4:46 - 4:51
    In the future we may do a more detailed discussion of what degrees of freedom mean
  • 4:51 - 4:54
    how to mathematically think about it.
  • 4:54 - 4:58
    But the simplest way to think about it is really truly independent data points.
  • 4:58 - 5:01
    Assuming you knew in this case the central statistic
  • 5:01 - 5:05
    that we used to calculate the squared distances of each of these, if you know them already
  • 5:05 - 5:08
    the third data point actually could be calculated from the other 2.
  • 5:08 - 5:10
    So we have 6 degrees of freedom over here.
  • 5:11 - 5:18
    Now that was how much of the total variation is due to variation within each sample.
  • 5:18 - 5:24
    Now think about how much of the variation is due to variation between between the sample.
  • 5:25 - 5:29
    And to do that, we're going to calculate-- get a nice color here--
  • 5:29 - 5:31
    I think I've run out of all the colors--
  • 5:31 - 5:41
    we'll call it sum of squares between, the B stands for between.
  • 5:41 - 5:45
    So another way to think about it, how much of this total variation
  • 5:45 - 5:49
    is due to the variation between the means, between the central tendency
  • 5:49 - 5:51
    that's what we're going to calculate right now and
  • 5:51 - 5:56
    how much is due to variation from each data points to its mean.
  • 5:57 - 6:01
    Let's figure out how much is due variation between these guys over here.
  • 6:02 - 6:07
    One way to think about it for each of these data points--
  • 6:07 - 6:09
    let's just think about this first group.
  • 6:10 - 6:13
    For this first group, how much variation for each of these guys is
  • 6:13 - 6:18
    due to the variation between this mean and the mean of means.
  • 6:19 - 6:23
    For the first guy up here-- I'll just write it all out explicitly--
  • 6:24 - 6:31
    the variation is going to be its sample mean, 2, minus the mean of means, squared.
  • 6:31 - 6:33
    And then for this guy, it's going to be the same thing.
  • 6:33 - 6:37
    His sample mean, 2, minus the mean of means, squared.
  • 6:38 - 6:39
    Plus same thing for this guy.
  • 6:39 - 6:42
    His sample mean, 2, minus the mean of means, squared.
  • 6:42 - 6:52
    Or another way to think about it, this is equal to 3 times 2-4 squared,
  • 6:52 - 7:03
    which is the same thing as 3 times 4. It's equal to 12.
  • 7:03 - 7:06
    I can do it for each of them. I actually want to find the total sum.
  • 7:06 - 7:09
    Let me just write it all out. I think that might be an easier thing to do.
  • 7:09 - 7:13
    For all of these guys combined
  • 7:13 - 7:18
    the sum of squares due to the differences between the samples.
  • 7:18 - 7:21
    So that's from the first sample, the contribution from the first sample.
  • 7:21 - 7:23
    And then from the second sample,
  • 7:23 - 7:29
    you have this guy here, five-- sorry, you don't want to calculate him.
  • 7:29 - 7:33
    For this data point, the amount of variation due to the difference between the means
  • 7:33 - 7:38
    is going to be 4-4 squared
  • 7:38 - 7:41
    Same thing for this guys, would be 4-4 squared.
  • 7:41 - 7:46
    We're not taking it into consideration. We're only taking its sample mean into consideration.
  • 7:46 - 7:49
    And then finally + 4-4 square.
  • 7:49 - 7:50
    We're taking this
  • 7:50 - 7:54
    minus this squared for each of these data points.
  • 7:54 - 7:57
    And then finally we'll do that with the last group.
  • 7:58 - 8:10
    Sample mean is 6, so it's going to be 6-4 squared plus 6-4 squared plus 6-4 squared.
  • 8:10 - 8:12
    Now, let's think about
  • 8:12 - 8:19
    how many degrees of freedom we had in this calculation right over here.
  • 8:20 - 8:25
    Well, in general, I guess the easiest way to think about it is,
  • 8:25 - 8:28
    how much information do we have, assuming that we knew the mean of means?
  • 8:28 - 8:31
    If we know the mean of means, how much here is new information?
  • 8:32 - 8:37
    If you know 2 of these if you know the mean of the means and you know 2 of the sample means,
  • 8:37 - 8:38
    you can always figure out the third.
  • 8:38 - 8:41
    If you know this one and this one, you can figure out that one.
  • 8:41 - 8:43
    If you know that one and that one, you can figure out that one.
  • 8:43 - 8:46
    That's because this is the mean of these means over here.
  • 8:46 - 8:52
    So in general, if you m groups or if you have m means,
  • 8:52 - 9:06
    there are m-1 degrees of freedom here.
  • 9:06 - 9:09
    With that said, in this case m is 3.
  • 9:09 - 9:15
    So we could say, there's 2 degrees of freedom for this exact example.
  • 9:15 - 9:19
    Let's actually calculate the sum of squares between. So what is this going to be?
  • 9:19 - 9:29
    This is going to be equal to, this right here is, 2-4 is -2, squared is 4.
  • 9:29 - 9:33
    And then we have three fours over here, so three times four.
  • 9:34 - 9:51
    Plus 3 times 0, plus 3 times (6-4)2, which is 3 times 4. So plus 3 times 4.
  • 9:51 - 10:00
    And we get 3 times 4 is 12 + 0 + 12, is equal to 24.
  • 10:00 - 10:04
    So the sum of squares, or the variation due to
  • 10:04 - 10:09
    what's the difference between the groups, between the means is 24.
  • 10:09 - 10:12
    Not let's put this altogether. We said that
  • 10:12 - 10:18
    the total variation when you look at all 9 data points, is 30.
  • 10:18 - 10:19
    Let me write that over here.
  • 10:20 - 10:26
    So the total sum of squares is equal to 30.
  • 10:26 - 10:33
    We figured out the sum of squares between each data point and its central tendency, its sample
  • 10:33 - 10:40
    mean, we figure out and we totaled it all up, we got 6 for the sum of squares within.
  • 10:40 - 10:49
    The sum of squares within was equal to 6. In this case, it was 6 degrees of freedom.
  • 10:49 - 10:54
    If we wanted to write generally, there were m times n-1 degrees of freedom.
  • 10:55 - 11:03
    Actually for the total, we figured out we had m times n -1 degrees of freedom.
  • 11:03 - 11:06
    Let me write the degrees of freedom in this column over here.
  • 11:06 - 11:09
    In this case, the number turned out to be 8.
  • 11:09 - 11:14
    And then just now, we calculated the sum of squares between the samples.
  • 11:14 - 11:18
    The sum of squares between the samples is equal to 24
  • 11:18 - 11:24
    and we figured out that it had m-1 degrees of freedom which ended up being 2.
  • 11:25 - 11:31
    Now the interesting thing here-- this is why this analysis of variance all fits nicely together.
  • 11:31 - 11:35
    In future videos we will think about how we can actually test hypotheses
  • 11:35 - 11:38
    using some of the tools that we're thinking about right now--
  • 11:38 - 11:43
    is that the sum of squares within plus the sum of squares between
  • 11:43 - 11:45
    is equal to the total sum of squares.
  • 11:45 - 11:51
    So the way to think about is that the total variation in this data right here
  • 11:51 - 11:56
    can be described as the sum of the variation within each of these groups
  • 11:56 - 11:58
    when you take that total
  • 11:58 - 12:04
    plus the sum of the variation between the groups.
  • 12:04 - 12:06
    And even the degrees of freedom work out.
  • 12:06 - 12:09
    The sum of squares between has 2 degrees of freedom.
  • 12:09 - 12:13
    The sum of squares within each of the groups had 6 degrees of freedom.
  • 12:13 - 12:14
    2+6 is 8.
  • 12:14 - 12:19
    That's the total degrees of freedom we have for all of the data combined.
  • 12:19 - 12:23
    It even works if you look at the more general.
  • 12:23 - 12:27
    Our sum of squares between had m-1 degrees of freedom.
  • 12:27 - 12:33
    Our sum of squares within had m(n-1) degrees of freedom.
  • 12:33 - 12:38
    This is equal to m-1+mn-m.
  • 12:38 - 12:44
    These guys cancel out. This is equal to mn-1 degrees of freedom,
  • 12:44 - 12:49
    which is exactly the total degrees of freedom we have for the total sum of squares.
  • 12:49 - 12:54
    So the whole point of the calculations that we did in the last and this video
  • 12:54 - 12:59
    is just to appreciate that this total variation over here
  • 12:59 - 13:04
    can be viewed as the sum of these two component variations,
  • 13:04 - 13:12
    how much variation within each of the samples
  • 13:12 - 13:17
    plus how much variation is there between the means of the samples.
  • 13:17 - 13:19
    Hopefully that's not too confusing.
Title:
ANOVA 2 - Calculating SSW and SSB (Total Sum of Squares Within and Between).avi
Description:

Analysis of Variance 2 - Calculating SSW and SSB (Total Sum of Squares Within and Between).avi

more » « less
Video Language:
English
Duration:
13:20

English subtitles

Revisions