## ← Impute Using Linear Regression - Intro to Data Science

• 2 Followers
• 28 Lines

### Get Embed Code x Embed video Use the following code to embed this video. See our usage guide for more details on embedding. Paste this in your document somewhere (closest to the closing body tag is preferable): ```<script type="text/javascript" src='https://amara.org/embedder-iframe'></script> ``` Paste this inside your HTML body, where you want to include the widget: ```<div class="amara-embed" data-url="http://www.youtube.com/watch?v=SQa-FzaaKus" data-team="udacity"></div> ``` 4 Languages

Showing Revision 5 created 05/25/2016 by Udacity Robot.

1. Another method that we could use to impute missing
2. values in a data set is to perform linear regression
3. to estimate the missing values. We'll cover linear regression
4. in more depth in the next lesson. But the general
5. idea is that we would create an equation that
6. predicts missing values in the data using information we do
7. have, and then use that equation to fill in our
8. missing values. Okay so, what are the drawbacks of using
9. this linear regression type technique? Well, one negative
10. side effect of imputing missing values in this
11. way is that we would over emphasize existing
12. trends in the data. For example, if, if
13. there is a relationship between date of birth
14. and height in MLB players, all of our
15. imputed values will amplify this trend. Additionally, this
16. model will produce exact values for the missing entries,
17. which would suggest a greater certainty in the missing values than
18. we actually have. In any case, let's say we did want
19. to fill in the missing values for weight in our baseball
20. player data. We could train a linear model using the existing
21. data that we have, and then use that model to fill
22. in these missing values. Let's say we did want to fill
23. in the missing values for weight in our baseball data. We
24. could train a linear model using our existing data. That is,
25. entries that have position, left or right
26. handed batter, average, birthdate, deathdate, height and
27. weight. And then use that model that
28. we've created to fill in these missing values.