
Let's first discuss what would seem to be the

easiest way to impute a missing value in our data

set. Just take the mean of our other data

points and fill in the missing values. So, for example,

let's say that Ichiro Suzuki and Babe Ruth are

missing values for weight in our baseball data set. Well,

okay, no problem. We can just take the mean

of all other players weights and assign that value to

Ichiro and Babe Ruth. In this case, we

would assign Ichiro and Babe Ruth both a weight

of 191.67. Wow, that seems really easy, right?

There's gotta be a catch. Well, let's first discuss

what's good about this method. We don't change

the mean of the height across our sample, That's

good. But let's say we were hoping to

study the relationship between weight and birth year. Or

height and weight. Just plugging the mean height into a bunch of our

data points lessens the correlation between

our imputed variable and any other variable.