< Return to Video

ANOVA 1 - Calculating SST (Total Sum of Squares)

  • 0:01 - 0:02
    In this video and
    the next few videos,
  • 0:02 - 0:05
    we're just really going to be
    doing a bunch of calculations
  • 0:05 - 0:08
    about this data set
    right over here.
  • 0:08 - 0:10
    And hopefully, just going
    through those calculations
  • 0:10 - 0:12
    will give you an
    intuitive sense of what
  • 0:12 - 0:15
    the analysis of
    variance is all about.
  • 0:15 - 0:17
    Now, the first thing I
    want to do in this video
  • 0:17 - 0:20
    is calculate the
    total sum of squares.
  • 0:20 - 0:23
    So I'll call that SST.
  • 0:23 - 0:25
    SS-- sum of squares total.
  • 0:25 - 0:27
    And you could view it
    as really the numerator
  • 0:27 - 0:28
    when you calculate variance.
  • 0:28 - 0:31
    So you're just going to take
    the distance between each
  • 0:31 - 0:33
    of these data points and the
    mean of all of these data
  • 0:33 - 0:35
    points, square them,
    and just take that sum.
  • 0:35 - 0:38
    We're not going to divide by
    the degree of freedom, which
  • 0:38 - 0:40
    you would normally do
    if you were calculating
  • 0:40 - 0:41
    sample variance.
  • 0:41 - 0:42
    Now, what is this going to be?
  • 0:42 - 0:44
    Well, the first
    thing we need to do,
  • 0:44 - 0:47
    we have to figure out the mean
    of all of this stuff over here.
  • 0:47 - 0:50
    And I'm actually going to
    call that the grand mean.
  • 0:50 - 0:52
    And I'm going to
    show you in a second
  • 0:52 - 0:55
    that it's the same thing as
    the mean of the means of each
  • 0:55 - 0:57
    of these data sets.
  • 0:57 - 0:59
    So let's calculate
    the grand mean.
  • 0:59 - 1:10
    So it's going to be 3 plus 2
    plus 1 plus 5 plus 3 plus 4
  • 1:10 - 1:12
    plus 5 plus 6 plus 7.
  • 1:16 - 1:20
    And then we have
    nine data points here
  • 1:20 - 1:22
    so we'll divide by 9.
  • 1:22 - 1:23
    And what is this
    going to be equal to?
  • 1:23 - 1:26
    3 plus 2 plus 1 is 6.
  • 1:26 - 1:28
    6 plus-- let me just add.
  • 1:28 - 1:30
    So these are 6.
  • 1:30 - 1:36
    5 plus 3 plus 4 is 12.
  • 1:36 - 1:41
    And then 5 plus 6 plus 7 is 18.
  • 1:41 - 1:45
    And then 6 plus 12 is 18 plus
    another 18 is 36, divided by 9
  • 1:45 - 1:46
    is equal to 4.
  • 1:46 - 1:48
    And let me show you that
    that's the exact same thing
  • 1:48 - 1:50
    as the mean of the means.
  • 1:50 - 1:53
    So the mean of this
    group 1 over here--
  • 1:53 - 1:55
    let me do it in
    that same green--
  • 1:55 - 1:58
    the mean of group 1 over
    here is 3 plus 2 plus 1.
  • 1:58 - 2:01
    That's that 6 right over
    here, divided by 3 data
  • 2:01 - 2:04
    points so that
    will be equal to 2.
  • 2:04 - 2:09
    The mean of group 2,
    the sum here is 12.
  • 2:09 - 2:10
    We saw that right over here.
  • 2:10 - 2:13
    5 plus 3 plus 4 is
    12, divided by 3
  • 2:13 - 2:16
    is 4 because we have
    three data points.
  • 2:16 - 2:21
    And then the mean
    of group 3, 5 plus 6
  • 2:21 - 2:25
    plus 7 is 18 divided by 3 is 6.
  • 2:25 - 2:27
    So if you were to take the
    mean of the means, which
  • 2:27 - 2:30
    is another way of viewing this
    grand mean, you have 2 plus 4
  • 2:30 - 2:34
    plus 6, which is 12,
    divided by 3 means here.
  • 2:34 - 2:36
    And once again, you would get 4.
  • 2:36 - 2:37
    So you could view
    this as the mean
  • 2:37 - 2:39
    of all of the data
    in all of the groups
  • 2:39 - 2:42
    or the mean of the means
    of each of these groups.
  • 2:42 - 2:43
    But either way, now that
    we've calculated it,
  • 2:43 - 2:47
    we can actually figure out
    the total sum of squares.
  • 2:47 - 2:49
    So let's do that.
  • 2:49 - 2:54
    So it's going to be
    equal to 3 minus 4--
  • 2:54 - 3:00
    the 4 is this 4 right over
    here-- squared plus 2 minus 4
  • 3:00 - 3:03
    squared plus 1 minus 4 squared.
  • 3:03 - 3:05
    Now, I'll do these guys
    over here in purple.
  • 3:05 - 3:15
    Plus 5 minus 4 squared plus 3
    minus 4 squared plus 4 minus 4
  • 3:15 - 3:16
    squared.
  • 3:16 - 3:19
    Let me scroll over a little bit.
  • 3:19 - 3:25
    Now, we only have three
    left, plus 5 minus 4 squared
  • 3:25 - 3:31
    plus 6 minus 4 squared
    plus 7 minus 4 squared.
  • 3:31 - 3:33
    And what does this give us?
  • 3:33 - 3:37
    So up here, this is going
    to be equal to 3 minus 4.
  • 3:37 - 3:37
    Difference is 1.
  • 3:37 - 3:39
    You square it.
  • 3:39 - 3:42
    It's actually negative 1,
    but you square it, you get 1,
  • 3:42 - 3:48
    plus you get negative 2 squared
    is 4, plus negative 3 squared.
  • 3:48 - 3:51
    Negative 3 squared is 9.
  • 3:51 - 3:54
    And then we have here
    in the magenta 5 minus 4
  • 3:54 - 3:56
    is 1 squared is still 1.
  • 3:56 - 3:57
    3 minus 4 squared is 1.
  • 3:57 - 3:59
    You square it again,
    you still get 1.
  • 3:59 - 4:01
    And then 4 minus 4 is just 0.
  • 4:01 - 4:03
    So we could-- well, I'll
    just write the 0 there just
  • 4:03 - 4:05
    to show you that we
    actually calculated that.
  • 4:05 - 4:07
    And then we have these
    last three data points.
  • 4:07 - 4:09
    5 minus 4 squared.
  • 4:09 - 4:10
    That's 1.
  • 4:10 - 4:12
    6 minus 4 squared.
  • 4:12 - 4:13
    That is 4, right?
  • 4:13 - 4:15
    That's 2 squared.
  • 4:15 - 4:19
    And then plus 7 minus
    4 is 3 squared is 9.
  • 4:19 - 4:22
    So what's this going
    to be equal to?
  • 4:22 - 4:28
    So I have 1 plus 4
    plus 9 right over here.
  • 4:28 - 4:29
    That's 5 plus 9.
  • 4:29 - 4:33
    This right over
    here is 14, right?
  • 4:33 - 4:35
    5 plus-- yup, 14.
  • 4:35 - 4:37
    And then we also have
    another 14 right over here
  • 4:37 - 4:39
    because we have a
    1 plus 4 plus 9.
  • 4:39 - 4:42
    So that right over
    there is also 14.
  • 4:42 - 4:43
    And then we have 2 over here.
  • 4:43 - 4:47
    So it's going to be
    28-- 14 times 2, 14
  • 4:47 - 4:51
    plus 14 is 28-- plus 2 is 30.
  • 4:51 - 4:53
    Is equal to 30.
  • 4:53 - 4:56
    So our total sum of
    squares-- and actually,
  • 4:56 - 4:57
    if we wanted the
    variance here, we
  • 4:57 - 5:00
    would divide this by
    the degrees of freedom.
  • 5:00 - 5:02
    And we've learned multiple
    times the degrees of freedom
  • 5:02 - 5:07
    here so let's say
    that we have-- so we
  • 5:07 - 5:09
    know that we have
    m groups over here.
  • 5:09 - 5:11
    So let me just write
    it as m and I'm not
  • 5:11 - 5:13
    going to prove things
    rigorously here,
  • 5:13 - 5:15
    but I want to show
    you where some
  • 5:15 - 5:18
    of these strange formulas that
    show up in statistics books
  • 5:18 - 5:21
    actually come from without
    proving it rigorously.
  • 5:21 - 5:23
    More to give you the intuition.
  • 5:23 - 5:25
    So we have m groups here.
  • 5:25 - 5:32
    And each group
    here has n members.
  • 5:32 - 5:34
    So how many total
    members do we have here?
  • 5:34 - 5:37
    Well, we had m
    times n or 9, right?
  • 5:37 - 5:38
    3 times 3 total members.
  • 5:38 - 5:42
    So our degrees of
    freedom-- and remember,
  • 5:42 - 5:44
    you have however
    many data points
  • 5:44 - 5:46
    you had minus 1
    degrees of freedom
  • 5:46 - 5:51
    because if you know
    the mean of means,
  • 5:51 - 5:58
    if you assume you knew
    that, then only 9 minus 1,
  • 5:58 - 6:00
    only eight of these are going
    to give you new information
  • 6:00 - 6:03
    because if you know that, you
    could calculate the last one.
  • 6:03 - 6:05
    Or it really doesn't
    have to be the last one.
  • 6:05 - 6:08
    If you have the other eight,
    you could calculate this one.
  • 6:08 - 6:09
    If you have eight of
    them, you could always
  • 6:09 - 6:14
    calculate the ninth one
    using the mean of means.
  • 6:14 - 6:16
    So one way to think
    about it is that there's
  • 6:16 - 6:18
    only eight independent
    measurements here.
  • 6:18 - 6:22
    Or if we want to
    talk generally, there
  • 6:22 - 6:28
    are m times n-- so that tells
    us the total number of samples--
  • 6:28 - 6:30
    minus 1 degrees of freedom.
  • 6:34 - 6:38
    And if we were actually
    calculating the variance here,
  • 6:38 - 6:42
    we would just divide
    30 by m times n minus 1
  • 6:42 - 6:45
    or this is another way of
    saying eight degrees of freedom
  • 6:45 - 6:46
    for this exact example.
  • 6:46 - 6:48
    We would take 30 divided
    by 8 and we would actually
  • 6:48 - 6:50
    have the variance for
    this entire group,
  • 6:50 - 6:53
    for the group of nine
    when you combine them.
  • 6:53 - 6:54
    I'll leave you
    here in this video.
  • 6:54 - 6:57
    In the next video, we're
    going to try to figure out
  • 6:57 - 7:03
    how much of this total
    variance, how much of this total
  • 7:03 - 7:06
    squared sum, total
    variation comes
  • 7:06 - 7:10
    from the variation within
    each of these groups
  • 7:10 - 7:14
    versus the variation
    between the groups.
  • 7:14 - 7:15
    And I think you get
    a sense of where
  • 7:15 - 7:17
    this whole analysis of
    variance is coming from.
  • 7:17 - 7:19
    It's the sense
    that, look, there's
  • 7:19 - 7:21
    a variance of this
    entire sample of nine,
  • 7:21 - 7:23
    but some of that variance--
    if these groups are
  • 7:23 - 7:27
    different in some way--
    might come from the variation
  • 7:27 - 7:31
    from being in different groups
    versus the variation from being
  • 7:31 - 7:31
    within a group.
  • 7:31 - 7:33
    And we're going to
    calculate those two things
  • 7:33 - 7:35
    and we're going to
    see that they're
  • 7:35 - 7:38
    going to add up to the
    total squared sum variation.
Title:
ANOVA 1 - Calculating SST (Total Sum of Squares)
Description:

more » « less
Video Language:
English
Team:
Khan Academy
Duration:
07:39

English subtitles

Revisions Compare revisions