ANOVA 2 - Calculating SSW and SSB (Total Sum of Squares Within and Between).avi

Edit subtitles

0:01 - 0:02

In the last video we were able to
0:02 - 0:06

calculate the total sum of squares for these 9 data points right here,
0:06 - 0:10

these 9 data points are grouped into three different groups,
0:10 - 0:13

or if you wanted to speak generally into "m" different groups.
0:13 - 0:18

What I want to do in this video is to figure out how much of this total sum of squares
0:18 - 0:22

how much of this is due to variation within each group
0:22 - 0:26

versus variation between the actual groups.
0:26 - 0:30

So first let's figure out the total variation within the groups,
0:30 - 0:36

so let's call that the sum of squares within, I'll do that in yellow,
0:36 - 0:40

actually I've already used yellow so let's do this, I'm going to do blue.
0:40 - 0:46

So the sum of squares within.
0:46 - 0:51

Let me make that clear, that stands for within.
0:51 - 0:54

So we want to see how much of a variation is
0:54 - 0:58

due to how far each of these data points are from their central tendencies,
0:58 - 1:00

from their respective means.
1:00 - 1:02

So this is going to be equal to-- let's start with these guys.
1:02 - 1:07

So instead of taking the distance between each data point and the mean of means
1:07 - 1:12

I'm going to find the distance between each data point and that group's mean
1:12 - 1:17

because we want to square the total sum of squares
1:17 - 1:21

between each data point and their respective means
1:21 - 1:26

3 minus the mean here, it's 2. Squared.
1:26 - 1:31

+ 2 minus 2 squared,
1:31 - 1:34

+ 1 minus 2 squared.
1:35 - 1:37

I'm going to do this for all of the groups,
1:37 - 1:40

but for each group the distance between it's data point and it's mean
1:40 - 1:57

so + minus 4 squared, + 3 minus 4 squared, + 4 minus 4 squared
1:57 - 2:00

and finally we have the third group,
2:00 - 2:05

and we're finding all of the sum of squares from each point to it's central tendency
2:05 - 2:07

within that group, we're going to add them all up.
2:07 - 2:09

And then we find the third group so we have
2:09 - 2:21

5 minus 6 squared + 6 minus 6 squared, + 7 minus 6 squared.
2:21 - 2:22

And what is this going to equal?
2:22 - 2:29

So this is going to be equal to, so up here it is going to be 1 + 0 + 1,
2:30 - 2:32

that's going to be equal to 2,
2:32 - 2:40

+ this is going to be equal to 1 + 1 + 0, so another 2,
2:40 - 2:51

+ this is going to be equal to 1 + 0 + 1, so that's 2 over here.
2:52 - 2:56

Our total sum of squared within is 6.
2:57 - 3:01

So one way to think about it, our total variation was 30.
3:01 - 3:09

Based on that calculation 6 of that 30 comes from variation within these samples.
3:09 - 3:11

Now the next thing I want to think about is
3:11 - 3:16

how many degrees of freedom do we have in this calculation
3:16 - 3:19

how many, kind of, independent data points do we actually have,
3:20 - 3:28

well for each of these, over here, if you know we have 'n' data points for each one,
3:28 - 3:30

in particular n is 3 here, but if you know
3:31 - 3:38

n minus one of them, you can always find the 'n'th one, if you know the actual sample mean.
3:38 - 3:42

So in this case for any of these groups if you know 2 of these data points,
3:42 - 3:43

you can always figure out the third.
3:43 - 3:45

If you know these two, you can always
3:45 - 3:47

figure out the third if you can figure out the sample mean.
3:47 - 3:50

So in general let's figure out the degrees of freedom here.
3:50 - 3:57

You have, for each group, when you did this you had 'n' minus one degrees of freedom.
3:57 - 4:04

Remember 'n' is the number of data points you had in each group,
4:04 - 4:09

so you have n minus one degrees of freedom for each of these groups,
4:09 - 4:12

so it's n-1, n-1, n-1,
4:12 - 4:19

or you have, let me put it this way, you have 'n-1' for each of these groups, and
4:19 - 4:22

and there are m groups.
4:22 - 4:29

So there's m times n-1 degrees of freedom.
4:29 - 4:33

In this particular case, each group, n -1 is two
4:33 - 4:35

or each case, you have 2 degrees of freedom
4:35 - 4:46

and there's three groups about the there are 6 degrees of freedom.
4:46 - 4:51

In the future we may do a more detailed discussion of what degrees of freedom mean
4:51 - 4:54

how to mathematically think about it.
4:54 - 4:58

But the simplest way to think about it is really truly independent data points.
4:58 - 5:01

Assuming you knew in this case the central statistic
5:01 - 5:05

that we used to calculate the squared distances of each of these, if you know them already
5:05 - 5:08

the third data point actually could be calculated from the other 2.
5:08 - 5:10

So we have 6 degrees of freedom over here.
5:11 - 5:18

Now that was how much of the total variation is due to variation within each sample.
5:18 - 5:24

Now think about how much of the variation is due to variation between between the sample.
5:25 - 5:29

And to do that, we're going to calculate-- get a nice color here--
5:29 - 5:31

I think I've run out of all the colors--
5:31 - 5:41

we'll call it sum of squares between, the B stands for between.
5:41 - 5:45

So another way to think about it, how much of this total variation
5:45 - 5:49

is due to the variation between the means, between the central tendency
5:49 - 5:51

that's what we're going to calculate right now and
5:51 - 5:56

how much is due to variation from each data points to its mean.
5:57 - 6:01

Let's figure out how much is due variation between these guys over here.
6:02 - 6:07

One way to think about it for each of these data points--
6:07 - 6:09

let's just think about this first group.
6:10 - 6:13

For this first group, how much variation for each of these guys is
6:13 - 6:18

due to the variation between this mean and the mean of means.
6:19 - 6:23

For the first guy up here-- I'll just write it all out explicitly--
6:24 - 6:31

the variation is going to be its sample mean, 2, minus the mean of means, squared.
6:31 - 6:33

And then for this guy, it's going to be the same thing.
6:33 - 6:37

His sample mean, 2, minus the mean of means, squared.
6:38 - 6:39

Plus same thing for this guy.
6:39 - 6:42

His sample mean, 2, minus the mean of means, squared.
6:42 - 6:52

Or another way to think about it, this is equal to 3 times 2-4 squared,
6:52 - 7:03

which is the same thing as 3 times 4. It's equal to 12.
7:03 - 7:06

I can do it for each of them. I actually want to find the total sum.
7:06 - 7:09

Let me just write it all out. I think that might be an easier thing to do.
7:09 - 7:13

For all of these guys combined
7:13 - 7:18

the sum of squares due to the differences between the samples.
7:18 - 7:21

So that's from the first sample, the contribution from the first sample.
7:21 - 7:23

And then from the second sample,
7:23 - 7:29

you have this guy here, five-- sorry, you don't want to calculate him.
7:29 - 7:33

For this data point, the amount of variation due to the difference between the means
7:33 - 7:38

is going to be 4-4 squared
7:38 - 7:41

Same thing for this guys, would be 4-4 squared.
7:41 - 7:46

We're not taking it into consideration. We're only taking its sample mean into consideration.
7:46 - 7:49

And then finally + 4-4 square.
7:49 - 7:50

We're taking this
7:50 - 7:54

minus this squared for each of these data points.
7:54 - 7:57

And then finally we'll do that with the last group.
7:58 - 8:10

Sample mean is 6, so it's going to be 6-4 squared plus 6-4 squared plus 6-4 squared.
8:10 - 8:12

Now, let's think about
8:12 - 8:19

how many degrees of freedom we had in this calculation right over here.
8:20 - 8:25

Well, in general, I guess the easiest way to think about it is,
8:25 - 8:28

how much information do we have, assuming that we knew the mean of means?
8:28 - 8:31

If we know the mean of means, how much here is new information?
8:32 - 8:37

If you know 2 of these if you know the mean of the means and you know 2 of the sample means,
8:37 - 8:38

you can always figure out the third.
8:38 - 8:41

If you know this one and this one, you can figure out that one.
8:41 - 8:43

If you know that one and that one, you can figure out that one.
8:43 - 8:46

That's because this is the mean of these means over here.
8:46 - 8:52

So in general, if you m groups or if you have m means,
8:52 - 9:06

there are m-1 degrees of freedom here.
9:06 - 9:09

With that said, in this case m is 3.
9:09 - 9:15

So we could say, there's 2 degrees of freedom for this exact example.
9:15 - 9:19

Let's actually calculate the sum of squares between. So what is this going to be?
9:19 - 9:29

This is going to be equal to, this right here is, 2-4 is -2, squared is 4.
9:29 - 9:33

And then we have three fours over here, so three times four.
9:34 - 9:51

Plus 3 times 0, plus 3 times (6-4)2, which is 3 times 4. So plus 3 times 4.
9:51 - 10:00

And we get 3 times 4 is 12 + 0 + 12, is equal to 24.
10:00 - 10:04

So the sum of squares, or the variation due to
10:04 - 10:09

what's the difference between the groups, between the means is 24.
10:09 - 10:12

Not let's put this altogether. We said that
10:12 - 10:18

the total variation when you look at all 9 data points, is 30.
10:18 - 10:19

Let me write that over here.
10:20 - 10:26

So the total sum of squares is equal to 30.
10:26 - 10:33

We figured out the sum of squares between each data point and its central tendency, its sample
10:33 - 10:40

mean, we figure out and we totaled it all up, we got 6 for the sum of squares within.
10:40 - 10:49

The sum of squares within was equal to 6. In this case, it was 6 degrees of freedom.
10:49 - 10:54

If we wanted to write generally, there were m times n-1 degrees of freedom.
10:55 - 11:03

Actually for the total, we figured out we had m times n -1 degrees of freedom.
11:03 - 11:06

Let me write the degrees of freedom in this column over here.
11:06 - 11:09

In this case, the number turned out to be 8.
11:09 - 11:14

And then just now, we calculated the sum of squares between the samples.
11:14 - 11:18

The sum of squares between the samples is equal to 24
11:18 - 11:24

and we figured out that it had m-1 degrees of freedom which ended up being 2.
11:25 - 11:31

Now the interesting thing here-- this is why this analysis of variance all fits nicely together.
11:31 - 11:35

In future videos we will think about how we can actually test hypotheses
11:35 - 11:38

using some of the tools that we're thinking about right now--
11:38 - 11:43

is that the sum of squares within plus the sum of squares between
11:43 - 11:45

is equal to the total sum of squares.
11:45 - 11:51

So the way to think about is that the total variation in this data right here
11:51 - 11:56

can be described as the sum of the variation within each of these groups
11:56 - 11:58

when you take that total
11:58 - 12:04

plus the sum of the variation between the groups.
12:04 - 12:06

And even the degrees of freedom work out.
12:06 - 12:09

The sum of squares between has 2 degrees of freedom.
12:09 - 12:13

The sum of squares within each of the groups had 6 degrees of freedom.
12:13 - 12:14

2+6 is 8.
12:14 - 12:19

That's the total degrees of freedom we have for all of the data combined.
12:19 - 12:23

It even works if you look at the more general.
12:23 - 12:27

Our sum of squares between had m-1 degrees of freedom.
12:27 - 12:33

Our sum of squares within had m(n-1) degrees of freedom.
12:33 - 12:38

This is equal to m-1+mn-m.
12:38 - 12:44

These guys cancel out. This is equal to mn-1 degrees of freedom,
12:44 - 12:49

which is exactly the total degrees of freedom we have for the total sum of squares.
12:49 - 12:54

So the whole point of the calculations that we did in the last and this video
12:54 - 12:59

is just to appreciate that this total variation over here
12:59 - 13:04

can be viewed as the sum of these two component variations,
13:04 - 13:12

how much variation within each of the samples
13:12 - 13:17

plus how much variation is there between the means of the samples.
13:17 - 13:19

Hopefully that's not too confusing.

Title:: ANOVA 2 - Calculating SSW and SSB (Total Sum of Squares Within and Between).avi
Description:: Analysis of Variance 2 - Calculating SSW and SSB (Total Sum of Squares Within and Between).avi

more » « less
Video Language:: English
Duration:: 13:20

	chezisu1988 edited English subtitles for ANOVA 2 - Calculating SSW and SSB (Total Sum of Squares Within and Between).avi
	Mary Walsh added a translation

English subtitles

Revisions

Revision 2

chezisu1988

ANOVA 2 - Calculating SSW and SSB (Total Sum of Squares Within and Between).avi

Revisions

Our website uses cookies

Operating cookies (Required)