YouTube

Got a YouTube account?

New: enable viewer-created translations and captions on your YouTube channel!

English subtitles

← Impute Using Linear Regression - Intro to Data Science

Get Embed Code
4 Languages

Showing Revision 5 created 05/25/2016 by Udacity Robot.

  1. Another method that we could use to impute missing
  2. values in a data set is to perform linear regression
  3. to estimate the missing values. We'll cover linear regression
  4. in more depth in the next lesson. But the general
  5. idea is that we would create an equation that
  6. predicts missing values in the data using information we do
  7. have, and then use that equation to fill in our
  8. missing values. Okay so, what are the drawbacks of using
  9. this linear regression type technique? Well, one negative
  10. side effect of imputing missing values in this
  11. way is that we would over emphasize existing
  12. trends in the data. For example, if, if
  13. there is a relationship between date of birth
  14. and height in MLB players, all of our
  15. imputed values will amplify this trend. Additionally, this
  16. model will produce exact values for the missing entries,
  17. which would suggest a greater certainty in the missing values than
  18. we actually have. In any case, let's say we did want
  19. to fill in the missing values for weight in our baseball
  20. player data. We could train a linear model using the existing
  21. data that we have, and then use that model to fill
  22. in these missing values. Let's say we did want to fill
  23. in the missing values for weight in our baseball data. We
  24. could train a linear model using our existing data. That is,
  25. entries that have position, left or right
  26. handed batter, average, birthdate, deathdate, height and
  27. weight. And then use that model that
  28. we've created to fill in these missing values.