[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:00.38,0:00:02.97,Default,,0000,0000,0000,,All right? So here's a little help file Dialogue: 0,0:00:02.97,0:00:05.51,Default,,0000,0000,0000,,on dealing with missing values in our Dialogue: 0,0:00:05.51,0:00:09.72,Default,,0000,0000,0000,,mini survey data. Basically, the reason Dialogue: 0,0:00:09.72,0:00:12.24,Default,,0000,0000,0000,,that we have any concern that there are Dialogue: 0,0:00:12.24,0:00:15.00,Default,,0000,0000,0000,,missing values is that for each case for Dialogue: 0,0:00:15.00,0:00:16.68,Default,,0000,0000,0000,,which there's a missing value in any Dialogue: 0,0:00:16.68,0:00:19.23,Default,,0000,0000,0000,,analysis. That includes that case. It'll Dialogue: 0,0:00:19.23,0:00:22.32,Default,,0000,0000,0000,,be omitted, so things per se, like Dialogue: 0,0:00:22.32,0:00:24.09,Default,,0000,0000,0000,,multiple regression or something like Dialogue: 0,0:00:24.09,0:00:27.69,Default,,0000,0000,0000,,that you end up losing a lot of cases if Dialogue: 0,0:00:27.69,0:00:31.44,Default,,0000,0000,0000,,you have a few missing values, scatter Dialogue: 0,0:00:31.44,0:00:32.66,Default,,0000,0000,0000,,around a number of different variables, Dialogue: 0,0:00:32.66,0:00:34.53,Default,,0000,0000,0000,,so what I've done here is I've Dialogue: 0,0:00:34.53,0:00:39.09,Default,,0000,0000,0000,,highlighted the cases that are the cases Dialogue: 0,0:00:39.09,0:00:41.37,Default,,0000,0000,0000,,they had something missing with yellow Dialogue: 0,0:00:41.37,0:00:43.08,Default,,0000,0000,0000,,highlighting and I just did that by hand. Dialogue: 0,0:00:43.08,0:00:44.58,Default,,0000,0000,0000,,There was no magic formula. Although, I Dialogue: 0,0:00:44.58,0:00:47.84,Default,,0000,0000,0000,,could have made some sort of conditional Dialogue: 0,0:00:47.84,0:00:50.07,Default,,0000,0000,0000,,if statement sort of thing. That would Dialogue: 0,0:00:50.07,0:00:51.39,Default,,0000,0000,0000,,have done that, but I was feeling lazy. Dialogue: 0,0:00:51.39,0:00:54.12,Default,,0000,0000,0000,,And there weren't that many cases. What Dialogue: 0,0:00:54.12,0:00:55.29,Default,,0000,0000,0000,,I've gone ahead and done is I've Dialogue: 0,0:00:55.29,0:00:58.05,Default,,0000,0000,0000,,calculated the mean which for this Dialogue: 0,0:00:58.05,0:00:59.67,Default,,0000,0000,0000,,variable or something missing is Dialogue: 0,0:00:59.67,0:01:01.56,Default,,0000,0000,0000,,1.65217 for the Dialogue: 0,0:01:01.56,0:01:04.50,Default,,0000,0000,0000,,median and the mode. And just to kind of Dialogue: 0,0:01:04.50,0:01:06.48,Default,,0000,0000,0000,,think about these are all central Dialogue: 0,0:01:06.48,0:01:10.83,Default,,0000,0000,0000,,tendencies of this variable and a lot of Dialogue: 0,0:01:10.83,0:01:12.75,Default,,0000,0000,0000,,a cheap and dirty way to deal with Dialogue: 0,0:01:12.75,0:01:14.10,Default,,0000,0000,0000,,missing values is substitute in the Dialogue: 0,0:01:14.10,0:01:16.05,Default,,0000,0000,0000,,central tendency, and then for linear Dialogue: 0,0:01:16.05,0:01:18.96,Default,,0000,0000,0000,,source of variables often times the mean Dialogue: 0,0:01:18.96,0:01:20.49,Default,,0000,0000,0000,,would be a good choice. In this case, we Dialogue: 0,0:01:20.49,0:01:24.83,Default,,0000,0000,0000,,have kind of these ordered levels in Dialogue: 0,0:01:24.83,0:01:28.98,Default,,0000,0000,0000,,terms of our Likert scales, and in that Dialogue: 0,0:01:28.98,0:01:30.84,Default,,0000,0000,0000,,case. Sometimes, maybe the median might be Dialogue: 0,0:01:30.84,0:01:33.27,Default,,0000,0000,0000,,superior. One advantage of the mean is Dialogue: 0,0:01:33.27,0:01:34.83,Default,,0000,0000,0000,,that if you substitute the mean, it'll be Dialogue: 0,0:01:34.83,0:01:36.39,Default,,0000,0000,0000,,1.652, which is Dialogue: 0,0:01:36.39,0:01:37.83,Default,,0000,0000,0000,,obviously not one of the choices, and you Dialogue: 0,0:01:37.83,0:01:39.66,Default,,0000,0000,0000,,can clearly see which ones were Dialogue: 0,0:01:39.66,0:01:42.27,Default,,0000,0000,0000,,substituted. In this case, just for our Dialogue: 0,0:01:42.27,0:01:44.52,Default,,0000,0000,0000,,purposes here, I'm highlighting them by Dialogue: 0,0:01:44.52,0:01:46.47,Default,,0000,0000,0000,,hand the other thing I'm going to do is Dialogue: 0,0:01:46.47,0:01:49.31,Default,,0000,0000,0000,,I'm going to put an equation here that Dialogue: 0,0:01:49.31,0:01:52.92,Default,,0000,0000,0000,,for these cases where they are being Dialogue: 0,0:01:52.92,0:01:54.21,Default,,0000,0000,0000,,substituted instead of putting the value Dialogue: 0,0:01:54.21,0:01:57.93,Default,,0000,0000,0000,,there. I'm going to put equals to the Dialogue: 0,0:01:57.93,0:01:59.81,Default,,0000,0000,0000,,median here, and I'm gonna make the Dialogue: 0,0:01:59.81,0:02:03.33,Default,,0000,0000,0000,,29 stay in place there by Dialogue: 0,0:02:03.33,0:02:04.89,Default,,0000,0000,0000,,putting a dollar sign in front of it; Dialogue: 0,0:02:04.89,0:02:06.66,Default,,0000,0000,0000,,make it an absolute reference. And that Dialogue: 0,0:02:06.66,0:02:09.48,Default,,0000,0000,0000,,means that I can copy this and paste it Dialogue: 0,0:02:09.48,0:02:11.33,Default,,0000,0000,0000,,in each of these subsequent spots, Dialogue: 0,0:02:11.33,0:02:13.89,Default,,0000,0000,0000,,regardless of which row it's in, Dialogue: 0,0:02:13.89,0:02:15.75,Default,,0000,0000,0000,,it's always going to be grabbing the Dialogue: 0,0:02:15.75,0:02:18.54,Default,,0000,0000,0000,,value. from row 29 for that particular Dialogue: 0,0:02:18.54,0:02:21.96,Default,,0000,0000,0000,,column all right? So I can come here and Dialogue: 0,0:02:21.96,0:02:24.03,Default,,0000,0000,0000,,I can substitute that in and here when I Dialogue: 0,0:02:24.03,0:02:26.16,Default,,0000,0000,0000,,substitute it, we'll see it changes in Dialogue: 0,0:02:26.16,0:02:35.01,Default,,0000,0000,0000,,here. And so here, this one, this one, and Dialogue: 0,0:02:35.01,0:02:36.39,Default,,0000,0000,0000,,you can see these ones here. Dialogue: 0,0:02:36.39,0:02:38.19,Default,,0000,0000,0000,,These questions were how much he uses Dialogue: 0,0:02:38.19,0:02:39.72,Default,,0000,0000,0000,,different statistics software, probably Dialogue: 0,0:02:39.72,0:02:42.93,Default,,0000,0000,0000,,the best guess is actually the median Dialogue: 0,0:02:42.93,0:02:43.98,Default,,0000,0000,0000,,rather than the mean in that case Dialogue: 0,0:02:43.98,0:02:47.33,Default,,0000,0000,0000,,because only one person here used that. Dialogue: 0,0:02:47.33,0:02:49.23,Default,,0000,0000,0000,,So anyway, this is probably the dominant Dialogue: 0,0:02:49.23,0:02:50.70,Default,,0000,0000,0000,,category here if someone left it blank, Dialogue: 0,0:02:50.70,0:02:52.52,Default,,0000,0000,0000,,they probably haven't used it. Dialogue: 0,0:02:52.52,0:02:56.40,Default,,0000,0000,0000,,Let me see. Let's drag this over a little Dialogue: 0,0:02:56.40,0:02:58.32,Default,,0000,0000,0000,,bit, and I can also I can fill these Dialogue: 0,0:02:58.32,0:03:00.42,Default,,0000,0000,0000,,across. It'll still work, so I don't have Dialogue: 0,0:03:00.42,0:03:02.22,Default,,0000,0000,0000,,to just paste paste paste. I can do a Dialogue: 0,0:03:02.22,0:03:04.83,Default,,0000,0000,0000,,whole row of them like that can go in, Dialogue: 0,0:03:04.83,0:03:07.50,Default,,0000,0000,0000,,oops. I need to recopy, so I can copy any Dialogue: 0,0:03:07.50,0:03:09.69,Default,,0000,0000,0000,,of these, paste it in here, and it'll work Dialogue: 0,0:03:09.69,0:03:13.17,Default,,0000,0000,0000,,and it'll keep grabbing the observation Dialogue: 0,0:03:13.17,0:03:17.31,Default,,0000,0000,0000,,from the 29th cell. So I need to Dialogue: 0,0:03:17.31,0:03:21.48,Default,,0000,0000,0000,,copy those in there, okay. And so at the Dialogue: 0,0:03:21.48,0:03:22.92,Default,,0000,0000,0000,,end of this, I'm gonna have a data set Dialogue: 0,0:03:22.92,0:03:26.37,Default,,0000,0000,0000,,that is almost all values. Oh, something I Dialogue: 0,0:03:26.37,0:03:29.46,Default,,0000,0000,0000,,should say is that I calculated the mean Dialogue: 0,0:03:29.46,0:03:31.26,Default,,0000,0000,0000,,median and mode before I started making Dialogue: 0,0:03:31.26,0:03:33.24,Default,,0000,0000,0000,,these changes if you hadn't. You'd get a Dialogue: 0,0:03:33.24,0:03:35.61,Default,,0000,0000,0000,,circular reference warning. So instead, Dialogue: 0,0:03:35.61,0:03:37.44,Default,,0000,0000,0000,,what you want to do is copy and paste Dialogue: 0,0:03:37.44,0:03:40.14,Default,,0000,0000,0000,,the values here instead of the formulas Dialogue: 0,0:03:40.14,0:03:42.78,Default,,0000,0000,0000,,for the mean, median mode and that's what Dialogue: 0,0:03:42.78,0:03:45.21,Default,,0000,0000,0000,,I already did. And so that's why we see Dialogue: 0,0:03:45.21,0:03:49.74,Default,,0000,0000,0000,,that there. Let's see. Here's a couple Dialogue: 0,0:03:49.74,0:03:52.65,Default,,0000,0000,0000,,more that I hadn't highlighted, so I'll Dialogue: 0,0:03:52.65,0:03:55.77,Default,,0000,0000,0000,,copy this and I'll paste it there. You Dialogue: 0,0:03:55.77,0:03:57.57,Default,,0000,0000,0000,,can see I'm pasting the formatting at Dialogue: 0,0:03:57.57,0:03:58.68,Default,,0000,0000,0000,,the same time which is kind of Dialogue: 0,0:03:58.68,0:04:03.75,Default,,0000,0000,0000,,convenient boom. Okay, hold on, okay. And Dialogue: 0,0:04:03.75,0:04:06.81,Default,,0000,0000,0000,,then we have a case here computers at Dialogue: 0,0:04:06.81,0:04:09.12,Default,,0000,0000,0000,,home, awesome, one listed three or more. Dialogue: 0,0:04:09.12,0:04:10.68,Default,,0000,0000,0000,,I'm just gonna go ahead and change that Dialogue: 0,0:04:10.68,0:04:13.71,Default,,0000,0000,0000,,to a three. I know it could be more than Dialogue: 0,0:04:13.71,0:04:16.08,Default,,0000,0000,0000,,that but for our purposes it's, we Dialogue: 0,0:04:16.08,0:04:17.15,Default,,0000,0000,0000,,have low, medium and high, Dialogue: 0,0:04:17.15,0:04:20.13,Default,,0000,0000,0000,,more or less, okay. And I'll look along Dialogue: 0,0:04:20.13,0:04:22.08,Default,,0000,0000,0000,,here. See if there any more missing cases. Dialogue: 0,0:04:22.08,0:04:24.72,Default,,0000,0000,0000,,There are a couple so I'm going to go Dialogue: 0,0:04:24.72,0:04:27.03,Default,,0000,0000,0000,,ahead and copy formatting. Dialogue: 0,0:04:27.03,0:04:28.89,Default,,0000,0000,0000,,The rule to look for the value from the Dialogue: 0,0:04:28.89,0:04:31.32,Default,,0000,0000,0000,,29th row, from here. I'm gonna paste it Dialogue: 0,0:04:31.32,0:04:33.24,Default,,0000,0000,0000,,into the remaining items. Now this was Dialogue: 0,0:04:33.24,0:04:37.17,Default,,0000,0000,0000,,not very efficient, yeah? I wouldn't do Dialogue: 0,0:04:37.17,0:04:39.15,Default,,0000,0000,0000,,this if I had a lot of cases, Dialogue: 0,0:04:39.15,0:04:42.27,Default,,0000,0000,0000,,but for our purposes, I think this will Dialogue: 0,0:04:42.27,0:04:44.97,Default,,0000,0000,0000,,work and we could talk about how to Dialogue: 0,0:04:44.97,0:04:47.82,Default,,0000,0000,0000,,automate these sort of a steps later. So Dialogue: 0,0:04:47.82,0:04:50.64,Default,,0000,0000,0000,,now what I'm going to do is I'm going to Dialogue: 0,0:04:50.64,0:04:57.74,Default,,0000,0000,0000,,copy everything in this sheet like this, Dialogue: 0,0:04:57.74,0:05:02.31,Default,,0000,0000,0000,,copy. And then, I'm gonna just shoot empty. Dialogue: 0,0:05:02.31,0:05:03.84,Default,,0000,0000,0000,,No, it's not. I'm gonna go and add yet Dialogue: 0,0:05:03.84,0:05:08.28,Default,,0000,0000,0000,,another sheet. I'm gonna paste special, so Dialogue: 0,0:05:08.28,0:05:09.81,Default,,0000,0000,0000,,right click and then click paste special Dialogue: 0,0:05:09.81,0:05:14.09,Default,,0000,0000,0000,,words. And then, I'm gonna click on Dialogue: 0,0:05:14.09,0:05:18.68,Default,,0000,0000,0000,,formats, okay. And then, I'm gonna click on Dialogue: 0,0:05:18.68,0:05:23.49,Default,,0000,0000,0000,,paste special values, okay. And so, then I Dialogue: 0,0:05:23.49,0:05:26.07,Default,,0000,0000,0000,,got my highlighting, and I got my numbers Dialogue: 0,0:05:26.07,0:05:32.40,Default,,0000,0000,0000,,but I didn't get my formulas which I Dialogue: 0,0:05:32.40,0:05:34.89,Default,,0000,0000,0000,,appreciate. So now this data set here. I'm Dialogue: 0,0:05:34.89,0:05:36.27,Default,,0000,0000,0000,,gonna go ahead and clear it. Clear this Dialogue: 0,0:05:36.27,0:05:39.36,Default,,0000,0000,0000,,stuff off the bottom clear contents because Dialogue: 0,0:05:39.36,0:05:43.50,Default,,0000,0000,0000,,I don't need it anymore. And now I'm Dialogue: 0,0:05:43.50,0:05:45.24,Default,,0000,0000,0000,,ready to do some analysis on this data, Dialogue: 0,0:05:45.24,0:05:46.83,Default,,0000,0000,0000,,yeah. So this would be a good starting Dialogue: 0,0:05:46.83,0:05:48.90,Default,,0000,0000,0000,,point for that. So pretty much we're done Dialogue: 0,0:05:48.90,0:05:50.88,Default,,0000,0000,0000,,processing the data in terms of missing Dialogue: 0,0:05:50.88,0:05:53.04,Default,,0000,0000,0000,,cases and stuff, gonna get ready to start Dialogue: 0,0:05:53.04,0:05:54.12,Default,,0000,0000,0000,,doing the next step, which would be Dialogue: 0,0:05:54.12,0:05:56.13,Default,,0000,0000,0000,,constructing an index, which I'll make a Dialogue: 0,0:05:56.13,0:05:59.96,Default,,0000,0000,0000,,brief video about here in a second.