all right so here's a little help file
on dealing with missing values in our
mini survey data basically the reason
that we have any concern that there are
missing values is that for each case for
which there's a missing value in any
analysis that includes that case it'll
be omitted so things per se like
multiple regression or something like
that you end up losing a lot of cases if
you have a few missing values scatter
around a number of different variables
so what I've done here is I've
highlighted the cases that are the cases
they had something missing with yellow
highlighting and I just did that by hand
there was no magic formula although I
could have made some sort of conditional
if statement sort of thing that would
have done that but I was feeling lazy
and there weren't that many cases what
I've gone ahead and done is I've
calculated the mean which for this
variable or something missing is one
point six five to one seven for the
median in the mode and just to kind of
think about these are all central
tendencies of this variable and a lot of
a cheap and dirty way to deal with
missing values is substitute in the
central tendency and then for linear
source of variables often times the mean
would be a good choice in this case we
have kind of these ordered levels in
terms of our Likert scales and in that
case sometimes maybe the median might be
superior one advantage of the mean is
that if you substitute the mean it'll be
one point six five two which is
obviously not one of the choices and you
can clearly see which ones were
substituted in this case just for our
purposes here I'm highlighting them by
hand the other thing I'm going to do is
I'm going to put an equation here that
for these cases where they are being
substituted instead of putting the value
there I'm going to put equals to the
median here and I'm gonna make the
twenty nine stay in place there by
putting a dollar sign in front of it
make it an absolute reference and that
means that I can copy this and paste it
in each of these subsequent spots
regardless of which row it's in
it's always going to be grabbing the
value from road 29 for that particular
column all right so I can come here and
I can substitute that in and here when I
substitute it we'll see it changes in
here and so here this one this one and
you can see these ones here
these questions were how much he uses
different statistics software probably
the best guess is actually the median
rather than the mean in that case
because only one person here used that
so anyway this is probably the dominant
category here if someone left it blank
they probably haven't used it
let me see let's drag this over a little
bit and I can also I can fill these
across it'll still work so I don't have
to just paste paste paste I can do a
whole row of them like that can go in
oops I need to recopy so I can copy any
of these paste it in here and it'll work
and it'll keep grabbing the observation
from the twenty-ninth cell so I need to
copy those in there okay and so at the
end of this I'm gonna have a data set
that is almost all values Oh something I
should say is that I calculated the mean
median and mode before I started making
these changes if you hadn't you'd get a
circular reference warning so instead
what you want to do is copy and paste
the values here instead of the formulas
for the mean median mode and that's what
I already did and so that's why we see
that there let's see here's a couple
more that I hadn't highlighted so I'll
copy this and I'll paste it there you
can see I'm pasting the formatting at
the same time which is kind of
convenient boom okay hold on okay and
then we have a case here computers at
home awesome one listed three or more
I'm just gonna go ahead and change that
to a three I know it could be more than
that but for our purposes it's uh we
have low medium and high
more or less okay and I'll look along
here see if there any more missing cases
there are a couple so I'm going to go
ahead and copy formatting
the rule to look for the value from the
29 throw from here I'm gonna paste it
into the remaining items now this was
not very efficient yeah I wouldn't do
this if I had a lot of a lot of cases
but for our purposes I think this will
work and we could talk about how to
automate these sort of a steps later so
now what I'm going to do is I'm going to
copy everything in this sheet like this
copy and then I'm gonna just shoot empty
no it's not I'm gonna go and add yet
another sheet I'm gonna paste special so
right click and then click paste special
words and then I'm gonna click on
formats ok and then I'm gonna click on
paste special values ok and so then I
got my highlighting and I got my numbers
but I didn't get my formulas which I
appreciate so now this data set here I'm
gonna go ahead and clear it clear this
stuff off the bottom clear contents cuz
I don't need it anymore and now I'm
ready to do some analysis on this data
yeah so this would be a good starting
point for that so pretty much we're done
processing the data in terms of missing
cases and stuff gonna get ready to start
doing the next step which would be
constructing an index which I'll make a
brief video about here in a second