Return to Video

Easy Imputation - Intro to Data Science

  • 0:01 - 0:03
    Let's first discuss what would seem to be the
  • 0:03 - 0:05
    easiest way to impute a missing value in our data
  • 0:05 - 0:08
    set. Just take the mean of our other data
  • 0:08 - 0:13
    points and fill in the missing values. So, for example,
  • 0:13 - 0:16
    let's say that Ichiro Suzuki and Babe Ruth are
  • 0:16 - 0:20
    missing values for weight in our baseball data set. Well,
  • 0:20 - 0:23
    okay, no problem. We can just take the mean
  • 0:23 - 0:26
    of all other players weights and assign that value to
  • 0:26 - 0:28
    Ichiro and Babe Ruth. In this case, we
  • 0:28 - 0:31
    would assign Ichiro and Babe Ruth both a weight
  • 0:31 - 0:35
    of 191.67. Wow, that seems really easy, right?
  • 0:35 - 0:39
    There's gotta be a catch. Well, let's first discuss
  • 0:39 - 0:41
    what's good about this method. We don't change
  • 0:41 - 0:44
    the mean of the height across our sample, That's
  • 0:44 - 0:46
    good. But let's say we were hoping to
  • 0:46 - 0:51
    study the relationship between weight and birth year. Or
  • 0:51 - 0:53
    height and weight. Just plugging the mean height into a bunch of our
  • 0:53 - 0:56
    data points lessens the correlation between
  • 0:56 - 0:59
    our imputed variable and any other variable.
Title:
Easy Imputation - Intro to Data Science
Description:

02-41 Easy Imputation

more » « less
Video Language:
English
Team:
Udacity
Project:
ud359: Intro to Data Science
Duration:
01:00

English subtitles

Revisions Compare revisions