-
All right? So here's a little help file
-
on dealing with missing values in our
-
mini survey data. Basically, the reason
-
that we have any concern that there are
-
missing values is that for each case for
-
which there's a missing value in any
-
analysis. That includes that case. It'll
-
be omitted, so things per se, like
-
multiple regression or something like
-
that you end up losing a lot of cases if
-
you have a few missing values, scatter
-
around a number of different variables,
-
so what I've done here is I've
-
highlighted the cases that are the cases
-
they had something missing with yellow
-
highlighting and I just did that by hand.
-
There was no magic formula. Although, I
-
could have made some sort of conditional
-
if statement sort of thing. That would
-
have done that, but I was feeling lazy.
-
And there weren't that many cases. What
-
I've gone ahead and done is I've
-
calculated the mean which for this
-
variable or something missing is
-
1.65217 for the
-
median and the mode. And just to kind of
-
think about these are all central
-
tendencies of this variable and a lot of
-
a cheap and dirty way to deal with
-
missing values is substitute in the
-
central tendency, and then for linear
-
source of variables often times the mean
-
would be a good choice. In this case, we
-
have kind of these ordered levels in
-
terms of our Likert scales, and in that
-
case. Sometimes, maybe the median might be
-
superior. One advantage of the mean is
-
that if you substitute the mean, it'll be
-
1.652, which is
-
obviously not one of the choices, and you
-
can clearly see which ones were
-
substituted. In this case, just for our
-
purposes here, I'm highlighting them by
-
hand the other thing I'm going to do is
-
I'm going to put an equation here that
-
for these cases where they are being
-
substituted instead of putting the value
-
there. I'm going to put equals to the
-
median here, and I'm gonna make the
-
29 stay in place there by
-
putting a dollar sign in front of it;
-
make it an absolute reference. And that
-
means that I can copy this and paste it
-
in each of these subsequent spots,
-
regardless of which row it's in,
-
it's always going to be grabbing the
-
value. from row 29 for that particular
-
column all right? So I can come here and
-
I can substitute that in and here when I
-
substitute it, we'll see it changes in
-
here. And so here, this one, this one, and
-
you can see these ones here.
-
These questions were how much he uses
-
different statistics software, probably
-
the best guess is actually the median
-
rather than the mean in that case
-
because only one person here used that.
-
So anyway, this is probably the dominant
-
category here if someone left it blank,
-
they probably haven't used it.
-
Let me see. Let's drag this over a little
-
bit, and I can also I can fill these
-
across. It'll still work, so I don't have
-
to just paste paste paste. I can do a
-
whole row of them like that can go in,
-
oops. I need to recopy, so I can copy any
-
of these, paste it in here, and it'll work
-
and it'll keep grabbing the observation
-
from the 29th cell. So I need to
-
copy those in there, okay. And so at the
-
end of this, I'm gonna have a data set
-
that is almost all values. Oh, something I
-
should say is that I calculated the mean
-
median and mode before I started making
-
these changes if you hadn't. You'd get a
-
circular reference warning. So instead,
-
what you want to do is copy and paste
-
the values here instead of the formulas
-
for the mean, median mode and that's what
-
I already did. And so that's why we see
-
that there. Let's see. Here's a couple
-
more that I hadn't highlighted, so I'll
-
copy this and I'll paste it there. You
-
can see I'm pasting the formatting at
-
the same time which is kind of
-
convenient boom. Okay, hold on, okay. And
-
then we have a case here computers at
-
home, awesome, one listed three or more.
-
I'm just gonna go ahead and change that
-
to a three. I know it could be more than
-
that but for our purposes it's, we
-
have low, medium and high,
-
more or less, okay. And I'll look along
-
here. See if there any more missing cases.
-
There are a couple so I'm going to go
-
ahead and copy formatting.
-
The rule to look for the value from the
-
29th row, from here. I'm gonna paste it
-
into the remaining items. Now this was
-
not very efficient, yeah? I wouldn't do
-
this if I had a lot of cases,
-
but for our purposes, I think this will
-
work and we could talk about how to
-
automate these sort of a steps later. So
-
now what I'm going to do is I'm going to
-
copy everything in this sheet like this,
-
copy. And then, I'm gonna just shoot empty.
-
No, it's not. I'm gonna go and add yet
-
another sheet. I'm gonna paste special, so
-
right click and then click paste special
-
words. And then, I'm gonna click on
-
formats, okay. And then, I'm gonna click on
-
paste special values, okay. And so, then I
-
got my highlighting, and I got my numbers
-
but I didn't get my formulas which I
-
appreciate. So now this data set here. I'm
-
gonna go ahead and clear it. Clear this
-
stuff off the bottom clear contents because
-
I don't need it anymore. And now I'm
-
ready to do some analysis on this data,
-
yeah. So this would be a good starting
-
point for that. So pretty much we're done
-
processing the data in terms of missing
-
cases and stuff, gonna get ready to start
-
doing the next step, which would be
-
constructing an index, which I'll make a
-
brief video about here in a second.