English subtitles

← Easy Imputation - Intro to Data Science

Get Embed Code
4 Languages

Showing Revision 5 created 05/24/2016 by Udacity Robot.

  1. Let's first discuss what would seem to be the
  2. easiest way to impute a missing value in our data
  3. set. Just take the mean of our other data
  4. points and fill in the missing values. So, for example,
  5. let's say that Ichiro Suzuki and Babe Ruth are
  6. missing values for weight in our baseball data set. Well,
  7. okay, no problem. We can just take the mean
  8. of all other players weights and assign that value to
  9. Ichiro and Babe Ruth. In this case, we
  10. would assign Ichiro and Babe Ruth both a weight
  11. of 191.67. Wow, that seems really easy, right?
  12. There's gotta be a catch. Well, let's first discuss
  13. what's good about this method. We don't change
  14. the mean of the height across our sample, That's
  15. good. But let's say we were hoping to
  16. study the relationship between weight and birth year. Or
  17. height and weight. Just plugging the mean height into a bunch of our
  18. data points lessens the correlation between
  19. our imputed variable and any other variable.