Let's first discuss what would seem to be the
easiest way to impute a missing value in our data
set. Just take the mean of our other data
points and fill in the missing values. So, for example,
let's say that Ichiro Suzuki and Babe Ruth are
missing values for weight in our baseball data set. Well,
okay, no problem. We can just take the mean
of all other players weights and assign that value to
Ichiro and Babe Ruth. In this case, we
would assign Ichiro and Babe Ruth both a weight
of 191.67. Wow, that seems really easy, right?
There's gotta be a catch. Well, let's first discuss
what's good about this method. We don't change
the mean of the height across our sample, That's
good. But let's say we were hoping to
study the relationship between weight and birth year. Or
height and weight. Just plugging the mean height into a bunch of our
data points lessens the correlation between
our imputed variable and any other variable.