< Return to Video

Mean and standard deviation versus median and IQR

  • 0:01 - 0:03
    - [Narrator] So we have
    nine students who recently
  • 0:03 - 0:08
    graduated from a small school
    that has a class size of nine,
  • 0:08 - 0:11
    and they wanna figure out
    what is the central tendency
  • 0:11 - 0:14
    for salaries one year after graduation?
  • 0:14 - 0:17
    And they also wanna have a
    sense of the spread around
  • 0:17 - 0:20
    that central tendency one
    year after graduation.
  • 0:20 - 0:24
    So they all agree to put in
    their salaries into a computer,
  • 0:24 - 0:26
    and so these are their salaries.
  • 0:26 - 0:27
    They're measured in thousands.
  • 0:27 - 0:31
    So one makes 35,000, 50,000,
    50,000, 50,000, 56,000,
  • 0:31 - 0:35
    two make 60,000, one makes
    75,000, and one makes 250,000.
  • 0:35 - 0:37
    So she's doing very well for herself,
  • 0:37 - 0:41
    and the computer it spits
    out a bunch of parameters
  • 0:41 - 0:43
    based on this data here.
  • 0:43 - 0:47
    So it spits out two typical
    measures of central tendency.
  • 0:47 - 0:50
    The mean is roughly 76.2.
  • 0:50 - 0:53
    The computer would calculate
    it by adding up all of these
  • 0:53 - 0:56
    numbers, these nine numbers,
    and then dividing by nine,
  • 0:56 - 1:00
    and the median is 56, and median
    is quite easy to calculate.
  • 1:00 - 1:02
    You just order the numbers and you take
  • 1:02 - 1:05
    the middle number here which is 56.
  • 1:05 - 1:08
    Now what I want you to
    do is pause this video
  • 1:08 - 1:10
    and think about for this data set,
  • 1:10 - 1:14
    for this population of
    salaries, which measure,
  • 1:14 - 1:19
    which measure of central
    tendency is a better measure?
  • 1:19 - 1:21
    All right, so let's think
    about this a little bit.
  • 1:21 - 1:24
    I'm gonna plot it on a line here.
  • 1:24 - 1:26
    I'm gonna plot my data
    so we get a better sense
  • 1:26 - 1:28
    and we just don't see them,
    so we just don't see things
  • 1:28 - 1:31
    as numbers, but we see
    where those numbers sit
  • 1:31 - 1:33
    relative to each other.
  • 1:33 - 1:35
    So let's say this is zero.
  • 1:35 - 1:39
    Let's say this is, let's see,
    one, two, three, four, five.
  • 1:42 - 1:46
    So this would be 250, this
    is 50, 100, 150, 200, 200,
  • 1:52 - 1:53
    and let's see.
  • 1:53 - 1:56
    Let's say if this is 50
    than this would be roughly
  • 1:56 - 1:59
    40 right here, and I just wanna get rough.
  • 1:59 - 2:04
    So this would be about 60,
    70, 80, 90, close enough.
  • 2:04 - 2:06
    I'm, I could draw this
    a little bit neater,
  • 2:06 - 2:07
    but, 60, 70, 80, 90.
  • 2:09 - 2:12
    Actually, let me just clean
    this up a little bit more too.
  • 2:12 - 2:14
    This one right over here would be
  • 2:14 - 2:17
    a little bit closer to this one.
  • 2:18 - 2:22
    Let me just put it right around here.
  • 2:22 - 2:26
    So that's 40, and then
    this would be 30, 20, 10.
  • 2:27 - 2:29
    Okay, that's pretty good.
  • 2:29 - 2:30
    So let's plot this data.
  • 2:30 - 2:34
    So, one student makes 35,000,
    so that is right over there.
  • 2:36 - 2:38
    Two make 50,000, or three make 50,000,
  • 2:38 - 2:40
    so one, two, and three.
  • 2:42 - 2:44
    I'll put it like that.
  • 2:44 - 2:48
    One makes 56,000 which would
    put them right over here.
  • 2:50 - 2:53
    One makes 60,000, or
    actually, two make 60,000,
  • 2:53 - 2:55
    so it's like that.
  • 2:55 - 2:58
    One makes 75,000, so
    that's 60, 70, 75,000.
  • 3:00 - 3:02
    So it's gonna be right around there,
  • 3:02 - 3:04
    and then one makes 250,000.
  • 3:04 - 3:08
    So one's salary is all
    the way around there,
  • 3:08 - 3:11
    and then when we
    calculate the mean as 76.2
  • 3:11 - 3:13
    as our measure of central tendency,
  • 3:13 - 3:15
    76.2 is right over there.
  • 3:17 - 3:21
    So is this a good measure
    of central tendency?
  • 3:21 - 3:23
    Well to me it doesn't feel that good,
  • 3:23 - 3:26
    because our measure of central
    tendency is higher than all
  • 3:26 - 3:30
    of the data points except for
    one, and the reason is is that
  • 3:30 - 3:34
    you have this one that the,
    that our, our data is skewed
  • 3:34 - 3:37
    significantly by this
    data point at $250,000.
  • 3:39 - 3:41
    It is so far from the
    rest of the distribution
  • 3:41 - 3:45
    from the rest of the data
    that it has skewed the mean,
  • 3:45 - 3:47
    and this is something
    that you see in general.
  • 3:47 - 3:50
    If you have data that is skewed,
    and especially things like
  • 3:50 - 3:53
    salary data where someone might
    make, most people are making
  • 3:53 - 3:56
    50, 60, $70,000, but someone
    might make two million dollars,
  • 3:56 - 4:00
    and so that will skew the
    average or skew the mean I should
  • 4:00 - 4:02
    say, when you add them all
    up and divide by the number
  • 4:02 - 4:03
    of data points you have.
  • 4:03 - 4:06
    In this case, especially when
    you have data points that
  • 4:06 - 4:10
    would skew the mean,
    median is much more robust.
  • 4:10 - 4:14
    The median at 56 sits right
    over here, which seems to be
  • 4:14 - 4:17
    much more indicative for central tendency.
  • 4:17 - 4:19
    And think about it.
  • 4:19 - 4:22
    Even if you made this instead of 250,000
  • 4:22 - 4:26
    if you made this 250,000
    thousand, which would be 250
  • 4:26 - 4:29
    million dollars, which is
    a ginormous amount of money
  • 4:29 - 4:33
    to make, it wouldn't, it would
    skew the mean incredibly,
  • 4:33 - 4:36
    but it actually would not
    even change the median,
  • 4:36 - 4:37
    because the median, it doesn't matter
  • 4:37 - 4:39
    how high this number gets.
  • 4:39 - 4:40
    This could be a trillion dollars.
  • 4:40 - 4:42
    This could be a quadrillion dollars.
  • 4:42 - 4:44
    The median is going to stay the same.
  • 4:44 - 4:46
    So the median is much more robust
  • 4:46 - 4:48
    if you have a skewed data set.
  • 4:48 - 4:52
    Mean makes a little bit more
    sense if you have a symmetric
  • 4:52 - 4:55
    data set or if you have things
    that are, you know, where,
  • 4:55 - 4:57
    where things are roughly
    above and below the mean,
  • 4:57 - 5:00
    or things aren't skewed
    incredibly in one direction,
  • 5:00 - 5:01
    especially by a handful of data
  • 5:01 - 5:04
    points like we have right over here.
  • 5:04 - 5:07
    So in this example, the median is a much
  • 5:07 - 5:10
    better measure of central tendency.
  • 5:10 - 5:11
    And so what about spread?
  • 5:11 - 5:14
    Well you might say, well,
    Sal you already told us
  • 5:14 - 5:16
    that the mean is not so good
  • 5:16 - 5:18
    and the standard deviation
    is based on the mean.
  • 5:18 - 5:22
    You take each of these data
    points, find their distance
  • 5:22 - 5:25
    from the mean, square that
    number, add up those squared
  • 5:25 - 5:28
    distances, divide by the
    number of data points if we're
  • 5:28 - 5:31
    taking the population standard
    deviation, and then you,
  • 5:31 - 5:35
    and then you, you take the
    square root of the whole thing.
  • 5:35 - 5:38
    And so since this is based on
    the mean, which isn't a good
  • 5:38 - 5:41
    measure of central tendency
    in this situation, and this,
  • 5:41 - 5:45
    this is also going to skew
    that standard deviation.
  • 5:45 - 5:48
    This is going to be, this is a lot larger
  • 5:48 - 5:50
    than if you look at the, the actual,
  • 5:50 - 5:53
    if you wanted an indication of the spread.
  • 5:53 - 5:57
    Yes, you have this one data
    point that's way far away
  • 5:57 - 6:00
    from either the mean or
    the median depending on how
  • 6:00 - 6:02
    you wanna think about it, but
    most of the data points seem
  • 6:02 - 6:05
    much closer, and so for that situation,
  • 6:05 - 6:07
    not only are we using the median,
  • 6:07 - 6:11
    but the interquartile range
    is once again more robust.
  • 6:11 - 6:13
    How do we calculate the
    interquartile range?
  • 6:13 - 6:15
    Well, you take the median
    and then you take the bottom
  • 6:15 - 6:19
    group of numbers and
    calculate the median of those.
  • 6:19 - 6:22
    So that's 50 right over here
    and then you take the top
  • 6:22 - 6:25
    group of numbers, the
    upper group of numbers,
  • 6:25 - 6:29
    and the median there is
    60 and 75, it's 67.5.
  • 6:29 - 6:31
    If this looks unfamiliar
    we have many videos
  • 6:31 - 6:33
    on interquartile range and calculating
  • 6:33 - 6:35
    standard deviation and median and mean.
  • 6:35 - 6:36
    This is just a little bit of a review,
  • 6:36 - 6:39
    and then the difference
    between these two is 17.5,
  • 6:39 - 6:43
    and notice, this distance
    between these two, this 17.5,
  • 6:43 - 6:45
    this isn't going to change,
  • 6:45 - 6:48
    even if this is 250 billion dollars.
  • 6:48 - 6:52
    So once again, it is both of
    these measures are more robust
  • 6:52 - 6:55
    when you have a skewed data set.
  • 6:56 - 6:59
    So the big take away here is
    mean and standard deviation,
  • 6:59 - 7:02
    they're not bad if you have
    a roughly symmetric data set,
  • 7:02 - 7:05
    if you don't have any
    significant outliers,
  • 7:05 - 7:07
    things that really skew the data set,
  • 7:07 - 7:10
    mean and standard deviation
    can be quite solid.
  • 7:10 - 7:13
    But if you're looking at
    something that could get really
  • 7:13 - 7:16
    skewed by a handful of data
    points median might be,
  • 7:16 - 7:19
    median and interquartile range,
    median for central tendency,
  • 7:19 - 7:23
    interquartile range for spread
    around that central tendency,
  • 7:23 - 7:26
    and that's why you'll see when
    people talk about salaries
  • 7:26 - 7:28
    they'll often talk about
    median, because you can have
  • 7:28 - 7:30
    some skewed salaries,
    especially on the up side.
  • 7:30 - 7:32
    When we talk about things
    like home prices you'll see
  • 7:32 - 7:35
    median often measured
    more typically than mean,
  • 7:35 - 7:39
    because home prices in a
    neighborhood, a lot of,
  • 7:39 - 7:42
    or in a city, a lot of the
    houses might be in the 200,000,
  • 7:42 - 7:46
    $300,000 range, but maybe
    there's one ginormous mansion
  • 7:46 - 7:49
    that is 100 million dollars,
    and if you calculated mean
  • 7:49 - 7:52
    that would skew and give a
    false impression of the average
  • 7:52 - 7:56
    or the central tendency
    of prices in that city.
Title:
Mean and standard deviation versus median and IQR
Description:

more » « less
Video Language:
English
Team:
Khan Academy
Duration:
07:59

English subtitles

Revisions