Return to Video

Outliers

  • 0:00 - 0:04
    ♪ [music] ♪
  • 0:18 - 0:19
    - [Thomas] What is an outlier?
  • 0:19 - 0:22
    You probably already have
    a good intuition of it.
  • 0:23 - 0:27
    It's that 6'10" kid in high school
    who towers over his classmates.
  • 0:27 - 0:31
    It's that little guy
    outeating people twice his size.
  • 0:31 - 0:34
    Outliers are those
    that don't fit the pattern
  • 0:34 - 0:35
    we're used to seeing.
  • 0:36 - 0:38
    Let's go back to our data set.
  • 0:39 - 0:41
    If you remember,
    we're examining the relationship
  • 0:41 - 0:45
    between professors' beauty scores
    and their student evaluations.
  • 0:45 - 0:48
    We generally see
    that better-looking teachers
  • 0:48 - 0:49
    get better evaluations.
  • 0:50 - 0:52
    Is there an outlier here?
  • 0:52 - 0:53
    This point.
  • 0:53 - 0:56
    It shows a professor
    with an extremely high beauty score
  • 0:56 - 0:58
    but a very low evaluation score.
  • 0:58 - 1:01
    So, I guess in this case,
    looks aren't everything.
  • 1:02 - 1:04
    A single outlier
    can have a major effect.
  • 1:04 - 1:08
    We can see this in our data
    by removing or adding points
  • 1:08 - 1:10
    and seeing the effect
    on the regression line.
  • 1:11 - 1:12
    Let me show you.
  • 1:12 - 1:13
    If you right click
    or control click,
  • 1:13 - 1:17
    you can remove a point
    from our regression analysis.
  • 1:17 - 1:20
    Let's remove this outlier,
    and see what happens.
  • 1:21 - 1:24
    The dotted line
    is our old regression line,
  • 1:24 - 1:27
    and the solid one is our new one.
  • 1:27 - 1:28
    You can see it's gotten steeper.
  • 1:28 - 1:31
    That makes sense,
    as we're removing an outlier
  • 1:31 - 1:33
    that is pulling the line down.
  • 1:33 - 1:35
    You can also see down below
  • 1:35 - 1:39
    that our slope value has changed
    from 0.2 to 0.3.
  • 1:39 - 1:42
    So removing a single outlier
    resulted in us finding
  • 1:42 - 1:44
    a much stronger relationship
  • 1:44 - 1:47
    between beauty
    and evaluation scores.
  • 1:47 - 1:51
    If you remember, in our last video
    we did an exercise
  • 1:51 - 1:54
    where we predicted the expected
    change in evaluation score
  • 1:54 - 1:58
    when going from a 2 to a 7
    in beauty score.
  • 1:58 - 2:00
    Our old slope was 0.2,
  • 2:00 - 2:04
    which meant a 1-point
    predicted increase in evaluation.
  • 2:04 - 2:06
    With this new line,
  • 2:06 - 2:09
    we'd predict an increase
    of 1.5 points in evaluation.
  • 2:09 - 2:11
    That's a 50% stronger effect,
  • 2:11 - 2:13
    and that's the power of an outlier.
  • 2:13 - 2:17
    That one really good-looking
    but really bad teacher
  • 2:17 - 2:19
    is changing our model a lot.
  • 2:19 - 2:22
    The point is, regressions
    can be sensitive to outliers.
  • 2:23 - 2:24
    If you want
    to get your outlier back,
  • 2:24 - 2:27
    you can just click the undo button.
  • 2:27 - 2:30
    Additionally, we can add
    data points to our regression
  • 2:30 - 2:32
    by just clicking
    anywhere in the canvas.
  • 2:32 - 2:35
    You can see, again,
    the line moves each time,
  • 2:35 - 2:39
    showing my new regression line
    with a solid line
  • 2:39 - 2:41
    and the previous line
    with a dotted line.
  • 2:42 - 2:43
    Now it's your turn.
  • 2:43 - 2:45
    I'd like you to try two things.
  • 2:46 - 2:49
    First, remove a point
    so that the relationship weakens.
  • 2:50 - 2:53
    The line should get flatter,
    so the slope should go down.
  • 2:54 - 2:59
    And, second, add some data points,
    to make the relationship stronger.
  • 2:59 - 3:02
    So pause the video now,
    and try it out.
  • 3:02 - 3:03
    - [Narrator] Find the answer
  • 3:03 - 3:05
    by clicking to see
    the data in this video.
  • 3:05 - 3:08
    The link is also
    in the video description below.
  • 3:08 - 3:10
    If you just want
    to see the answer already,
  • 3:10 - 3:11
    hang tight.
  • 3:12 - 3:16
    Hopefully you've had some fun
    adding and removing data points.
  • 3:16 - 3:18
    It's good to play with data
    to build your intuition.
  • 3:18 - 3:20
    Let's cover the first question --
  • 3:21 - 3:24
    how to remove a point
    to make the relationship weaker.
  • 3:24 - 3:27
    For example, remove this point.
  • 3:30 - 3:32
    You would have seen a flatter line
  • 3:32 - 3:36
    and our slope go down
    from 0.2 to 0.14.
  • 3:37 - 3:40
    So how about to make
    the relationship stronger?
  • 3:42 - 3:44
    Let's go back
    to our original data set,
  • 3:44 - 3:46
    with a slope of 0.2.
  • 3:46 - 3:49
    If I add a point,
    way up in the upper-right,
  • 3:50 - 3:52
    I can see my slope go up from 0.2.
  • 3:53 - 3:55
    I can also add one
    to the lower-left,
  • 3:55 - 3:57
    and see another increase.
  • 3:57 - 3:59
    The more points I add
    to these two regions,
  • 3:59 - 4:01
    the stronger the relationship gets.
  • 4:02 - 4:04
    So now that we've seen
    the power of outliers,
  • 4:04 - 4:08
    your next question might be --
    can I just remove the outlier?
  • 4:08 - 4:11
    it's important to remember
    we're typically using a regression
  • 4:11 - 4:14
    to make a prediction
    about the real world.
  • 4:14 - 4:15
    In the real world,
  • 4:15 - 4:19
    we see guys who are 130 pounds
    eat 60 hot dogs.
  • 4:20 - 4:23
    So if you want to predict
    hot dog eating based on weight,
  • 4:23 - 4:27
    removing an outlier without thought
    can make your predictions worse.
  • 4:28 - 4:29
    However, in certain cases,
  • 4:29 - 4:32
    it might make sense
    to remove an outlier.
  • 4:32 - 4:35
    What if you are trying to predict
    the number of hot dogs
  • 4:35 - 4:38
    an average person would eat
    at your upcoming barbeque?
  • 4:39 - 4:43
    Then, including Kobayashi would make
    your prediction much worse.
  • 4:43 - 4:45
    In this case,
    he's clearly an outlier,
  • 4:45 - 4:47
    and should be omitted
    from the model.
  • 4:51 - 4:53
    You might have also noticed
    something else changing
  • 4:53 - 4:57
    when we removed the outlier --
    these funny numbers.
  • 4:57 - 4:58
    What do those mean?
  • 4:59 - 5:01
    That's what we turn to next.
  • 5:04 - 5:06
    - [Narrator] Congratulations!
  • 5:06 - 5:08
    You're one step closer
    to being a data ninja!
  • 5:09 - 5:11
    If you want to master
    linear regression,
  • 5:11 - 5:12
    Click to train up
  • 5:12 - 5:14
    on everything
    from p-values to residuals,
  • 5:14 - 5:16
    and then prepare to unleash fury
  • 5:16 - 5:19
    on whatever poor data set
    crosses your path.
  • 5:19 - 5:20
    Or, you can explore other skills
  • 5:20 - 5:22
    in our Understanding Data playlist.
  • 5:23 - 5:25
    ♪ [music] ♪
Title:
Outliers
Description:

more » « less
Video Language:
English
Team:
Marginal Revolution University
Project:
Understanding Data
Duration:
05:28
Marilia_PM approved English subtitles for Outliers
Kirstin Cosper accepted English subtitles for Outliers
Kirstin Cosper edited English subtitles for Outliers
Retired user edited English subtitles for Outliers
Retired user edited English subtitles for Outliers
Retired user edited English subtitles for Outliers

English subtitles

Revisions Compare revisions