0:00:00.380,0:00:02.970 all right so here's a little help file 0:00:02.970,0:00:05.509 on dealing with missing values in our 0:00:05.509,0:00:09.719 mini survey data basically the reason 0:00:09.719,0:00:12.240 that we have any concern that there are 0:00:12.240,0:00:15.000 missing values is that for each case for 0:00:15.000,0:00:16.680 which there's a missing value in any 0:00:16.680,0:00:19.230 analysis that includes that case it'll 0:00:19.230,0:00:22.320 be omitted so things per se like 0:00:22.320,0:00:24.090 multiple regression or something like 0:00:24.090,0:00:27.689 that you end up losing a lot of cases if 0:00:27.689,0:00:31.439 you have a few missing values scatter 0:00:31.439,0:00:32.659 around a number of different variables 0:00:32.659,0:00:34.530 so what I've done here is I've 0:00:34.530,0:00:39.090 highlighted the cases that are the cases 0:00:39.090,0:00:41.370 they had something missing with yellow 0:00:41.370,0:00:43.079 highlighting and I just did that by hand 0:00:43.079,0:00:44.579 there was no magic formula although I 0:00:44.579,0:00:47.840 could have made some sort of conditional 0:00:47.840,0:00:50.070 if statement sort of thing that would 0:00:50.070,0:00:51.390 have done that but I was feeling lazy 0:00:51.390,0:00:54.120 and there weren't that many cases what 0:00:54.120,0:00:55.289 I've gone ahead and done is I've 0:00:55.289,0:00:58.050 calculated the mean which for this 0:00:58.050,0:00:59.670 variable or something missing is one 0:00:59.670,0:01:01.559 point six five to one seven for the 0:01:01.559,0:01:04.500 median in the mode and just to kind of 0:01:04.500,0:01:06.479 think about these are all central 0:01:06.479,0:01:10.830 tendencies of this variable and a lot of 0:01:10.830,0:01:12.750 a cheap and dirty way to deal with 0:01:12.750,0:01:14.100 missing values is substitute in the 0:01:14.100,0:01:16.049 central tendency and then for linear 0:01:16.049,0:01:18.960 source of variables often times the mean 0:01:18.960,0:01:20.490 would be a good choice in this case we 0:01:20.490,0:01:24.830 have kind of these ordered levels in 0:01:24.830,0:01:28.979 terms of our Likert scales and in that 0:01:28.979,0:01:30.840 case sometimes maybe the median might be 0:01:30.840,0:01:33.270 superior one advantage of the mean is 0:01:33.270,0:01:34.829 that if you substitute the mean it'll be 0:01:34.829,0:01:36.390 one point six five two which is 0:01:36.390,0:01:37.829 obviously not one of the choices and you 0:01:37.829,0:01:39.659 can clearly see which ones were 0:01:39.659,0:01:42.270 substituted in this case just for our 0:01:42.270,0:01:44.520 purposes here I'm highlighting them by 0:01:44.520,0:01:46.470 hand the other thing I'm going to do is 0:01:46.470,0:01:49.310 I'm going to put an equation here that 0:01:49.310,0:01:52.920 for these cases where they are being 0:01:52.920,0:01:54.210 substituted instead of putting the value 0:01:54.210,0:01:57.930 there I'm going to put equals to the 0:01:57.930,0:01:59.810 median here and I'm gonna make the 0:01:59.810,0:02:03.329 twenty nine stay in place there by 0:02:03.329,0:02:04.890 putting a dollar sign in front of it 0:02:04.890,0:02:06.659 make it an absolute reference and that 0:02:06.659,0:02:09.479 means that I can copy this and paste it 0:02:09.479,0:02:11.330 in each of these subsequent spots 0:02:11.330,0:02:13.890 regardless of which row it's in 0:02:13.890,0:02:15.750 it's always going to be grabbing the 0:02:15.750,0:02:18.540 value from road 29 for that particular 0:02:18.540,0:02:21.960 column all right so I can come here and 0:02:21.960,0:02:24.030 I can substitute that in and here when I 0:02:24.030,0:02:26.160 substitute it we'll see it changes in 0:02:26.160,0:02:35.010 here and so here this one this one and 0:02:35.010,0:02:36.390 you can see these ones here 0:02:36.390,0:02:38.190 these questions were how much he uses 0:02:38.190,0:02:39.720 different statistics software probably 0:02:39.720,0:02:42.930 the best guess is actually the median 0:02:42.930,0:02:43.980 rather than the mean in that case 0:02:43.980,0:02:47.330 because only one person here used that 0:02:47.330,0:02:49.230 so anyway this is probably the dominant 0:02:49.230,0:02:50.700 category here if someone left it blank 0:02:50.700,0:02:52.520 they probably haven't used it 0:02:52.520,0:02:56.400 let me see let's drag this over a little 0:02:56.400,0:02:58.320 bit and I can also I can fill these 0:02:58.320,0:03:00.420 across it'll still work so I don't have 0:03:00.420,0:03:02.220 to just paste paste paste I can do a 0:03:02.220,0:03:04.830 whole row of them like that can go in 0:03:04.830,0:03:07.500 oops I need to recopy so I can copy any 0:03:07.500,0:03:09.690 of these paste it in here and it'll work 0:03:09.690,0:03:13.170 and it'll keep grabbing the observation 0:03:13.170,0:03:17.310 from the twenty-ninth cell so I need to 0:03:17.310,0:03:21.480 copy those in there okay and so at the 0:03:21.480,0:03:22.920 end of this I'm gonna have a data set 0:03:22.920,0:03:26.370 that is almost all values Oh something I 0:03:26.370,0:03:29.459 should say is that I calculated the mean 0:03:29.459,0:03:31.260 median and mode before I started making 0:03:31.260,0:03:33.239 these changes if you hadn't you'd get a 0:03:33.239,0:03:35.610 circular reference warning so instead 0:03:35.610,0:03:37.440 what you want to do is copy and paste 0:03:37.440,0:03:40.140 the values here instead of the formulas 0:03:40.140,0:03:42.780 for the mean median mode and that's what 0:03:42.780,0:03:45.209 I already did and so that's why we see 0:03:45.209,0:03:49.739 that there let's see here's a couple 0:03:49.739,0:03:52.650 more that I hadn't highlighted so I'll 0:03:52.650,0:03:55.769 copy this and I'll paste it there you 0:03:55.769,0:03:57.570 can see I'm pasting the formatting at 0:03:57.570,0:03:58.680 the same time which is kind of 0:03:58.680,0:04:03.750 convenient boom okay hold on okay and 0:04:03.750,0:04:06.810 then we have a case here computers at 0:04:06.810,0:04:09.120 home awesome one listed three or more 0:04:09.120,0:04:10.680 I'm just gonna go ahead and change that 0:04:10.680,0:04:13.709 to a three I know it could be more than 0:04:13.709,0:04:16.079 that but for our purposes it's uh we 0:04:16.079,0:04:17.149 have low medium and high 0:04:17.149,0:04:20.130 more or less okay and I'll look along 0:04:20.130,0:04:22.079 here see if there any more missing cases 0:04:22.079,0:04:24.720 there are a couple so I'm going to go 0:04:24.720,0:04:27.030 ahead and copy formatting 0:04:27.030,0:04:28.890 the rule to look for the value from the 0:04:28.890,0:04:31.320 29 throw from here I'm gonna paste it 0:04:31.320,0:04:33.240 into the remaining items now this was 0:04:33.240,0:04:37.170 not very efficient yeah I wouldn't do 0:04:37.170,0:04:39.150 this if I had a lot of a lot of cases 0:04:39.150,0:04:42.270 but for our purposes I think this will 0:04:42.270,0:04:44.970 work and we could talk about how to 0:04:44.970,0:04:47.820 automate these sort of a steps later so 0:04:47.820,0:04:50.640 now what I'm going to do is I'm going to 0:04:50.640,0:04:57.740 copy everything in this sheet like this 0:04:57.740,0:05:02.310 copy and then I'm gonna just shoot empty 0:05:02.310,0:05:03.840 no it's not I'm gonna go and add yet 0:05:03.840,0:05:08.280 another sheet I'm gonna paste special so 0:05:08.280,0:05:09.810 right click and then click paste special 0:05:09.810,0:05:14.090 words and then I'm gonna click on 0:05:14.090,0:05:18.680 formats ok and then I'm gonna click on 0:05:18.680,0:05:23.490 paste special values ok and so then I 0:05:23.490,0:05:26.070 got my highlighting and I got my numbers 0:05:26.070,0:05:32.400 but I didn't get my formulas which I 0:05:32.400,0:05:34.890 appreciate so now this data set here I'm 0:05:34.890,0:05:36.270 gonna go ahead and clear it clear this 0:05:36.270,0:05:39.360 stuff off the bottom clear contents cuz 0:05:39.360,0:05:43.500 I don't need it anymore and now I'm 0:05:43.500,0:05:45.240 ready to do some analysis on this data 0:05:45.240,0:05:46.830 yeah so this would be a good starting 0:05:46.830,0:05:48.900 point for that so pretty much we're done 0:05:48.900,0:05:50.880 processing the data in terms of missing 0:05:50.880,0:05:53.040 cases and stuff gonna get ready to start 0:05:53.040,0:05:54.120 doing the next step which would be 0:05:54.120,0:05:56.130 constructing an index which I'll make a 0:05:56.130,0:05:59.960 brief video about here in a second