< Return to Video

Variance of a population

  • 0:01 - 0:05
    Let's say I'm trying to judge
    how many years of experience
  • 0:05 - 0:06
    we have at the Khan Academy.
  • 0:06 - 0:09
    Or on average, how many
    years of experience we have.
  • 0:09 - 0:11
    And in particular, the
    particular type of average
  • 0:11 - 0:13
    we'll focus on, is
    the arithmetic mean.
  • 0:13 - 0:16
    So I go and I survey
    the folks there.
  • 0:16 - 0:18
    And let's say this was when
    Khan Academy was a smaller
  • 0:18 - 0:19
    organization, when
    there were only
  • 0:19 - 0:21
    five people in the organization.
  • 0:21 - 0:24
    And I find-- and I'm surveying
    the entire population--
  • 0:24 - 0:27
    so years of experience, the
    entire population of Khan
  • 0:27 - 0:29
    Academy, because that's
    what I care about,
  • 0:29 - 0:33
    years of experience at our
    organization, at Khan Academy.
  • 0:33 - 0:35
    And this was when
    we had five people.
  • 0:35 - 0:37
    And I were to go--
    we're now 36 people,
  • 0:37 - 0:40
    I don't want to date this video
    too much-- but let's say I go,
  • 0:40 - 0:42
    and I say, OK, there's one
    person straight out of college,
  • 0:42 - 0:43
    they have one year
    of experience,
  • 0:43 - 0:45
    or recently out of
    college, somebody
  • 0:45 - 0:47
    with three years of
    experience, someone
  • 0:47 - 0:49
    with five years of
    experience, someone
  • 0:49 - 0:53
    with seven years of experience,
    and someone very experienced,
  • 0:53 - 0:56
    or reasonably experienced,
    with 14 years of experience.
  • 0:56 - 0:59
    So based on this data point,
    and this is our population,
  • 0:59 - 1:00
    for years of experience.
  • 1:00 - 1:02
    I'm assuming that we
    only have five people
  • 1:02 - 1:04
    in the organization,
    at this point.
  • 1:04 - 1:08
    What would be the
    population mean
  • 1:08 - 1:10
    for the years of experience?
  • 1:10 - 1:14
    What is the mean years of
    experience for my population?
  • 1:14 - 1:16
    Well, we can just
    calculate that.
  • 1:16 - 1:18
    Our mean experience,
    and I'm going
  • 1:18 - 1:20
    to denote it with
    mu, because we're
  • 1:20 - 1:22
    talking about the
    population now.
  • 1:22 - 1:25
    This is a parameter
    for the population.
  • 1:25 - 1:29
    It's going to be equal to the
    sum, from our first data point,
  • 1:29 - 1:34
    so data point one all the way
    to data point, in this case,
  • 1:34 - 1:40
    data point five-- we have
    five data points-- of each
  • 1:40 - 1:42
    of-- so we're going to take
    all, from the first data
  • 1:42 - 1:44
    point, the second data
    point, the third data
  • 1:44 - 1:45
    point, all the way to the fifth.
  • 1:45 - 1:47
    So this is going to be
    equal to x1, plus x--
  • 1:47 - 1:50
    and I'm going to divide it all
    by the number of data points
  • 1:50 - 1:58
    I have-- plus x2, plus x3, plus
    x4, plus x sub 5, subscript 5.
  • 1:58 - 2:00
    All of that over 5.
  • 2:00 - 2:02
    And as we said, this is a
    very fancy way of saying,
  • 2:02 - 2:05
    I'm going to sum up
    all of these things
  • 2:05 - 2:08
    and then divide by the
    number of things we have.
  • 2:08 - 2:09
    So let's do that.
  • 2:09 - 2:12
    Get the calculator out.
  • 2:12 - 2:19
    So I'm going to add them
    all up, 1 plus 3 plus 5--
  • 2:19 - 2:24
    I really don't need a calculator
    for this-- plus 7 plus 14.
  • 2:24 - 2:25
    So that's five data points.
  • 2:25 - 2:27
    And I'm going to divide by 5.
  • 2:27 - 2:29
    And I get 6.
  • 2:29 - 2:31
    So the population
    mean, for years
  • 2:31 - 2:35
    of experience at my
    organization, is 6.
  • 2:35 - 2:36
    6 years of experience.
  • 2:36 - 2:39
    Well, that's, I
    guess, interesting.
  • 2:39 - 2:41
    But now I want to
    ask another question.
  • 2:41 - 2:44
    I want to get some
    measure of how much spread
  • 2:44 - 2:46
    there is around that mean.
  • 2:46 - 2:49
    Or how much do the data
    points vary around that mean.
  • 2:49 - 2:51
    And obviously, I can give
    someone all the data points.
  • 2:51 - 2:52
    But instead, I actually
    want to come up
  • 2:52 - 2:54
    with a parameter that
    somehow represents
  • 2:54 - 2:57
    how much all of these things,
    on average, are varying
  • 2:57 - 3:01
    from this number right here.
  • 3:01 - 3:05
    Or maybe I will call
    that thing the variance.
  • 3:05 - 3:10
    And so, what I do-- so the
    variance-- and I will do--
  • 3:10 - 3:12
    and this is a
    population variance
  • 3:12 - 3:16
    that I'm talking about, just
    to be clear, it's a parameter.
  • 3:16 - 3:18
    The population
    variance I'm going
  • 3:18 - 3:22
    to denote with the Greek letter
    sigma, lowercase sigma-- this
  • 3:22 - 3:27
    is capital sigma--
    lowercase sigma squared.
  • 3:27 - 3:29
    And I'm going to
    say, well, I'm going
  • 3:29 - 3:32
    to take the distance from each
    of these points to the mean.
  • 3:32 - 3:35
    And just so I get a positive
    value, I'm going to square it.
  • 3:35 - 3:37
    And then, I'm going to divide
    by the number of data points
  • 3:37 - 3:38
    that I have.
  • 3:38 - 3:39
    So essentially,
    I'm going to find
  • 3:39 - 3:41
    the average squared distance.
  • 3:41 - 3:43
    Now that might sound
    very complicated,
  • 3:43 - 3:45
    but let's actually work it out.
  • 3:45 - 3:49
    So I'll take my first
    data point and I
  • 3:49 - 3:52
    will subtract our mean from it.
  • 3:52 - 3:54
    So this is going to give
    me a negative number.
  • 3:54 - 3:56
    But if I square it, it's
    going to be positive.
  • 3:56 - 3:57
    So it's, essentially,
    going to be
  • 3:57 - 4:00
    the squared distance
    between 1 and my mean.
  • 4:00 - 4:01
    And then, to that,
    I'm going to add
  • 4:01 - 4:04
    the squared distance
    between 3 and my mean.
  • 4:08 - 4:11
    And to that, I'm going to add
    the squared distance between 5
  • 4:11 - 4:14
    and my mean.
  • 4:14 - 4:15
    And since I'm
    squaring, it doesn't
  • 4:15 - 4:18
    matter if I do 5
    minus 6, or 6 minus 5.
  • 4:18 - 4:20
    When I square it, I'm going
    to get a positive result
  • 4:20 - 4:21
    regardless.
  • 4:21 - 4:23
    And then, to that
    I'm going to add
  • 4:23 - 4:26
    the squared distance
    between 7 and my mean.
  • 4:26 - 4:28
    So 7 minus 6 squared.
  • 4:28 - 4:30
    All of this, this
    is my population
  • 4:30 - 4:32
    mean that I'm finding
    the difference between.
  • 4:32 - 4:37
    And then, finally, the squared
    difference between 14 and my
  • 4:37 - 4:37
    mean.
  • 4:42 - 4:44
    And then, I'm going
    to find, essentially,
  • 4:44 - 4:46
    the mean of these
    squared distances.
  • 4:46 - 4:49
    So I have five squared
    distances right over here.
  • 4:49 - 4:56
    So let me divide by 5.
  • 4:56 - 5:00
    So what will I get when
    I make this calculation,
  • 5:00 - 5:02
    right over here?
  • 5:02 - 5:04
    Well, let's figure this out.
  • 5:04 - 5:10
    This is going to be equal
    to 1 minus 6 is negative 5,
  • 5:10 - 5:13
    negative 5 squared is 25.
  • 5:13 - 5:18
    3 minus 6 is negative 3, now
    if I square that, I get 9.
  • 5:18 - 5:22
    5 minus 6 is negative 1, if I
    square it, I get positive 1.
  • 5:22 - 5:26
    7 minus 6 is 1, if I square
    it, I get positive 1.
  • 5:26 - 5:31
    And 14 minus 6 is 8, if
    I square it, I get 64.
  • 5:31 - 5:36
    And then, I'm going to
    divide all of that by 5.
  • 5:36 - 5:38
    And I don't need to
    use a calculator,
  • 5:38 - 5:40
    but I tend to make a
    lot of careless mistakes
  • 5:40 - 5:43
    when I do things
    while making a video.
  • 5:43 - 5:54
    So I get 25 plus 9 plus 1
    plus 1 plus 64 divided by 5.
  • 5:54 - 5:56
    So I get 20.
  • 5:56 - 6:00
    So the average squared distance,
    or the mean squared distance,
  • 6:00 - 6:05
    from our population
    mean is equal to 20.
  • 6:05 - 6:07
    You may say, wait, these
    things aren't 20 away.
  • 6:07 - 6:10
    Remember, it's the
    squared distance
  • 6:10 - 6:12
    away from my population mean.
  • 6:12 - 6:14
    So I squared each
    of these things.
  • 6:14 - 6:16
    I liked it, because
    it made it positive.
  • 6:16 - 6:19
    And we'll see later it has
    other nice properties about it.
  • 6:19 - 6:21
    Now the last thing
    is, how can we
  • 6:21 - 6:23
    represent this mathematically?
  • 6:23 - 6:27
    We already saw that we know how
    to represent a population mean,
  • 6:27 - 6:29
    and a sample mean,
    mathematically like this,
  • 6:29 - 6:31
    and hopefully, we don't find
    it that daunting anymore.
  • 6:31 - 6:34
    But how would we do
    the exact same thing?
  • 6:34 - 6:38
    How would we denote what
    we did, right over here?
  • 6:38 - 6:40
    Well, let's just
    think it through.
  • 6:40 - 6:44
    We're just saying that
    the population variance,
  • 6:44 - 6:51
    we're taking the sum
    of each-- so we're
  • 6:51 - 6:53
    going to take each item, we'll
    start with the first item.
  • 6:53 - 6:57
    And we're going to go to the
    n-th item in our population.
  • 6:57 - 7:00
    We're talking about
    a population here.
  • 7:00 - 7:01
    And we're going to
    take-- we're not
  • 7:01 - 7:04
    going to just take the item,
    this would just be the item--
  • 7:04 - 7:05
    but we're going take the item.
  • 7:05 - 7:09
    And from that, we're going to
    subtract the population mean.
  • 7:09 - 7:11
    We're going to
    subtract this thing.
  • 7:11 - 7:12
    We're going to
    subtract this thing.
  • 7:12 - 7:14
    We're going to square it.
  • 7:14 - 7:15
    We're going to square it.
  • 7:15 - 7:16
    So the way I've
    written it right now,
  • 7:16 - 7:18
    this would just
    be the numerator.
  • 7:18 - 7:21
    I've just taken the sum
    of each of these things,
  • 7:21 - 7:23
    the sum of the difference
    between each data
  • 7:23 - 7:25
    point and the population
    mean and squared it.
  • 7:25 - 7:27
    If I really want to get
    the way I figure out
  • 7:27 - 7:29
    this variance right
    over here, I have
  • 7:29 - 7:34
    to divide the whole thing by the
    number of data points we have.
  • 7:34 - 7:35
    So this might seem
    very daunting,
  • 7:35 - 7:36
    and very intimidating.
  • 7:36 - 7:39
    But all it says is, take each
    of your data points-- well, one,
  • 7:39 - 7:42
    it says, figure out
    your population mean.
  • 7:45 - 7:46
    Figure that out first.
  • 7:46 - 7:51
    And then, from each data
    point, in your population,
  • 7:51 - 7:55
    subtract out that
    population mean, square it,
  • 7:55 - 7:57
    take the sum of all
    of those things,
  • 7:57 - 8:00
    and then just divide by the
    number of data points you have.
  • 8:00 - 8:04
    And you will get your
    population variance.
Title:
Variance of a population
Description:

more » « less
Video Language:
English
Team:
Khan Academy
Duration:
08:05

English subtitles

Revisions Compare revisions