-
In this video and
the next few videos,
-
we're just really going to be
doing a bunch of calculations
-
about this data set
right over here.
-
And hopefully, just going
through those calculations
-
will give you an
intuitive sense of what
-
the analysis of
variance is all about.
-
Now, the first thing I
want to do in this video
-
is calculate the
total sum of squares.
-
So I'll call that SST.
-
SS-- sum of squares total.
-
And you could view it
as really the numerator
-
when you calculate variance.
-
So you're just going to take
the distance between each
-
of these data points and the
mean of all of these data
-
points, square them,
and just take that sum.
-
We're not going to divide by
the degree of freedom, which
-
you would normally do
if you were calculating
-
sample variance.
-
Now, what is this going to be?
-
Well, the first
thing we need to do,
-
we have to figure out the mean
of all of this stuff over here.
-
And I'm actually going to
call that the grand mean.
-
And I'm going to
show you in a second
-
that it's the same thing as
the mean of the means of each
-
of these data sets.
-
So let's calculate
the grand mean.
-
So it's going to be 3 plus 2
plus 1 plus 5 plus 3 plus 4
-
plus 5 plus 6 plus 7.
-
And then we have
nine data points here
-
so we'll divide by 9.
-
And what is this
going to be equal to?
-
3 plus 2 plus 1 is 6.
-
6 plus-- let me just add.
-
So these are 6.
-
5 plus 3 plus 4 is 12.
-
And then 5 plus 6 plus 7 is 18.
-
And then 6 plus 12 is 18 plus
another 18 is 36, divided by 9
-
is equal to 4.
-
And let me show you that
that's the exact same thing
-
as the mean of the means.
-
So the mean of this
group 1 over here--
-
let me do it in
that same green--
-
the mean of group 1 over
here is 3 plus 2 plus 1.
-
That's that 6 right over
here, divided by 3 data
-
points so that
will be equal to 2.
-
The mean of group 2,
the sum here is 12.
-
We saw that right over here.
-
5 plus 3 plus 4 is
12, divided by 3
-
is 4 because we have
three data points.
-
And then the mean
of group 3, 5 plus 6
-
plus 7 is 18 divided by 3 is 6.
-
So if you were to take the
mean of the means, which
-
is another way of viewing this
grand mean, you have 2 plus 4
-
plus 6, which is 12,
divided by 3 means here.
-
And once again, you would get 4.
-
So you could view
this as the mean
-
of all of the data
in all of the groups
-
or the mean of the means
of each of these groups.
-
But either way, now that
we've calculated it,
-
we can actually figure out
the total sum of squares.
-
So let's do that.
-
So it's going to be
equal to 3 minus 4--
-
the 4 is this 4 right over
here-- squared plus 2 minus 4
-
squared plus 1 minus 4 squared.
-
Now, I'll do these guys
over here in purple.
-
Plus 5 minus 4 squared plus 3
minus 4 squared plus 4 minus 4
-
squared.
-
Let me scroll over a little bit.
-
Now, we only have three
left, plus 5 minus 4 squared
-
plus 6 minus 4 squared
plus 7 minus 4 squared.
-
And what does this give us?
-
So up here, this is going
to be equal to 3 minus 4.
-
Difference is 1.
-
You square it.
-
It's actually negative 1,
but you square it, you get 1,
-
plus you get negative 2 squared
is 4, plus negative 3 squared.
-
Negative 3 squared is 9.
-
And then we have here
in the magenta 5 minus 4
-
is 1 squared is still 1.
-
3 minus 4 squared is 1.
-
You square it again,
you still get 1.
-
And then 4 minus 4 is just 0.
-
So we could-- well, I'll
just write the 0 there just
-
to show you that we
actually calculated that.
-
And then we have these
last three data points.
-
5 minus 4 squared.
-
That's 1.
-
6 minus 4 squared.
-
That is 4, right?
-
That's 2 squared.
-
And then plus 7 minus
4 is 3 squared is 9.
-
So what's this going
to be equal to?
-
So I have 1 plus 4
plus 9 right over here.
-
That's 5 plus 9.
-
This right over
here is 14, right?
-
5 plus-- yup, 14.
-
And then we also have
another 14 right over here
-
because we have a
1 plus 4 plus 9.
-
So that right over
there is also 14.
-
And then we have 2 over here.
-
So it's going to be
28-- 14 times 2, 14
-
plus 14 is 28-- plus 2 is 30.
-
Is equal to 30.
-
So our total sum of
squares-- and actually,
-
if we wanted the
variance here, we
-
would divide this by
the degrees of freedom.
-
And we've learned multiple
times the degrees of freedom
-
here so let's say
that we have-- so we
-
know that we have
m groups over here.
-
So let me just write
it as m and I'm not
-
going to prove things
rigorously here,
-
but I want to show
you where some
-
of these strange formulas that
show up in statistics books
-
actually come from without
proving it rigorously.
-
More to give you the intuition.
-
So we have m groups here.
-
And each group
here has n members.
-
So how many total
members do we have here?
-
Well, we had m
times n or 9, right?
-
3 times 3 total members.
-
So our degrees of
freedom-- and remember,
-
you have however
many data points
-
you had minus 1
degrees of freedom
-
because if you know
the mean of means,
-
if you assume you knew
that, then only 9 minus 1,
-
only eight of these are going
to give you new information
-
because if you know that, you
could calculate the last one.
-
Or it really doesn't
have to be the last one.
-
If you have the other eight,
you could calculate this one.
-
If you have eight of
them, you could always
-
calculate the ninth one
using the mean of means.
-
So one way to think
about it is that there's
-
only eight independent
measurements here.
-
Or if we want to
talk generally, there
-
are m times n-- so that tells
us the total number of samples--
-
minus 1 degrees of freedom.
-
And if we were actually
calculating the variance here,
-
we would just divide
30 by m times n minus 1
-
or this is another way of
saying eight degrees of freedom
-
for this exact example.
-
We would take 30 divided
by 8 and we would actually
-
have the variance for
this entire group,
-
for the group of nine
when you combine them.
-
I'll leave you
here in this video.
-
In the next video, we're
going to try to figure out
-
how much of this total
variance, how much of this total
-
squared sum, total
variation comes
-
from the variation within
each of these groups
-
versus the variation
between the groups.
-
And I think you get
a sense of where
-
this whole analysis of
variance is coming from.
-
It's the sense
that, look, there's
-
a variance of this
entire sample of nine,
-
but some of that variance--
if these groups are
-
different in some way--
might come from the variation
-
from being in different groups
versus the variation from being
-
within a group.
-
And we're going to
calculate those two things
-
and we're going to
see that they're
-
going to add up to the
total squared sum variation.