1 00:00:00,550 --> 00:00:02,840 Let's first discuss what would seem to be the 2 00:00:02,840 --> 00:00:05,310 easiest way to impute a missing value in our data 3 00:00:05,310 --> 00:00:08,180 set. Just take the mean of our other data 4 00:00:08,180 --> 00:00:12,960 points and fill in the missing values. So, for example, 5 00:00:12,960 --> 00:00:15,720 let's say that Ichiro Suzuki and Babe Ruth are 6 00:00:15,720 --> 00:00:19,760 missing values for weight in our baseball data set. Well, 7 00:00:19,760 --> 00:00:22,670 okay, no problem. We can just take the mean 8 00:00:22,670 --> 00:00:25,686 of all other players weights and assign that value to 9 00:00:25,686 --> 00:00:28,170 Ichiro and Babe Ruth. In this case, we 10 00:00:28,170 --> 00:00:30,610 would assign Ichiro and Babe Ruth both a weight 11 00:00:30,610 --> 00:00:35,420 of 191.67. Wow, that seems really easy, right? 12 00:00:35,420 --> 00:00:38,600 There's gotta be a catch. Well, let's first discuss 13 00:00:38,600 --> 00:00:41,380 what's good about this method. We don't change 14 00:00:41,380 --> 00:00:43,940 the mean of the height across our sample, That's 15 00:00:43,940 --> 00:00:46,360 good. But let's say we were hoping to 16 00:00:46,360 --> 00:00:50,590 study the relationship between weight and birth year. Or 17 00:00:50,590 --> 00:00:53,440 height and weight. Just plugging the mean height into a bunch of our 18 00:00:53,440 --> 00:00:56,040 data points lessens the correlation between 19 00:00:56,040 --> 00:00:58,780 our imputed variable and any other variable.