-
all right so here's a little help file
-
on dealing with missing values in our
-
mini survey data basically the reason
-
that we have any concern that there are
-
missing values is that for each case for
-
which there's a missing value in any
-
analysis that includes that case it'll
-
be omitted so things per se like
-
multiple regression or something like
-
that you end up losing a lot of cases if
-
you have a few missing values scatter
-
around a number of different variables
-
so what I've done here is I've
-
highlighted the cases that are the cases
-
they had something missing with yellow
-
highlighting and I just did that by hand
-
there was no magic formula although I
-
could have made some sort of conditional
-
if statement sort of thing that would
-
have done that but I was feeling lazy
-
and there weren't that many cases what
-
I've gone ahead and done is I've
-
calculated the mean which for this
-
variable or something missing is one
-
point six five to one seven for the
-
median in the mode and just to kind of
-
think about these are all central
-
tendencies of this variable and a lot of
-
a cheap and dirty way to deal with
-
missing values is substitute in the
-
central tendency and then for linear
-
source of variables often times the mean
-
would be a good choice in this case we
-
have kind of these ordered levels in
-
terms of our Likert scales and in that
-
case sometimes maybe the median might be
-
superior one advantage of the mean is
-
that if you substitute the mean it'll be
-
one point six five two which is
-
obviously not one of the choices and you
-
can clearly see which ones were
-
substituted in this case just for our
-
purposes here I'm highlighting them by
-
hand the other thing I'm going to do is
-
I'm going to put an equation here that
-
for these cases where they are being
-
substituted instead of putting the value
-
there I'm going to put equals to the
-
median here and I'm gonna make the
-
twenty nine stay in place there by
-
putting a dollar sign in front of it
-
make it an absolute reference and that
-
means that I can copy this and paste it
-
in each of these subsequent spots
-
regardless of which row it's in
-
it's always going to be grabbing the
-
value from road 29 for that particular
-
column all right so I can come here and
-
I can substitute that in and here when I
-
substitute it we'll see it changes in
-
here and so here this one this one and
-
you can see these ones here
-
these questions were how much he uses
-
different statistics software probably
-
the best guess is actually the median
-
rather than the mean in that case
-
because only one person here used that
-
so anyway this is probably the dominant
-
category here if someone left it blank
-
they probably haven't used it
-
let me see let's drag this over a little
-
bit and I can also I can fill these
-
across it'll still work so I don't have
-
to just paste paste paste I can do a
-
whole row of them like that can go in
-
oops I need to recopy so I can copy any
-
of these paste it in here and it'll work
-
and it'll keep grabbing the observation
-
from the twenty-ninth cell so I need to
-
copy those in there okay and so at the
-
end of this I'm gonna have a data set
-
that is almost all values Oh something I
-
should say is that I calculated the mean
-
median and mode before I started making
-
these changes if you hadn't you'd get a
-
circular reference warning so instead
-
what you want to do is copy and paste
-
the values here instead of the formulas
-
for the mean median mode and that's what
-
I already did and so that's why we see
-
that there let's see here's a couple
-
more that I hadn't highlighted so I'll
-
copy this and I'll paste it there you
-
can see I'm pasting the formatting at
-
the same time which is kind of
-
convenient boom okay hold on okay and
-
then we have a case here computers at
-
home awesome one listed three or more
-
I'm just gonna go ahead and change that
-
to a three I know it could be more than
-
that but for our purposes it's uh we
-
have low medium and high
-
more or less okay and I'll look along
-
here see if there any more missing cases
-
there are a couple so I'm going to go
-
ahead and copy formatting
-
the rule to look for the value from the
-
29 throw from here I'm gonna paste it
-
into the remaining items now this was
-
not very efficient yeah I wouldn't do
-
this if I had a lot of a lot of cases
-
but for our purposes I think this will
-
work and we could talk about how to
-
automate these sort of a steps later so
-
now what I'm going to do is I'm going to
-
copy everything in this sheet like this
-
copy and then I'm gonna just shoot empty
-
no it's not I'm gonna go and add yet
-
another sheet I'm gonna paste special so
-
right click and then click paste special
-
words and then I'm gonna click on
-
formats ok and then I'm gonna click on
-
paste special values ok and so then I
-
got my highlighting and I got my numbers
-
but I didn't get my formulas which I
-
appreciate so now this data set here I'm
-
gonna go ahead and clear it clear this
-
stuff off the bottom clear contents cuz
-
I don't need it anymore and now I'm
-
ready to do some analysis on this data
-
yeah so this would be a good starting
-
point for that so pretty much we're done
-
processing the data in terms of missing
-
cases and stuff gonna get ready to start
-
doing the next step which would be
-
constructing an index which I'll make a
-
brief video about here in a second