< Return to Video

Missing values

  • 0:00 - 0:03
    All right? So here's a little help file
  • 0:03 - 0:06
    on dealing with missing values in our
  • 0:06 - 0:10
    mini survey data. Basically, the reason
  • 0:10 - 0:12
    that we have any concern that there are
  • 0:12 - 0:15
    missing values is that for each case for
  • 0:15 - 0:17
    which there's a missing value in any
  • 0:17 - 0:19
    analysis. That includes that case. It'll
  • 0:19 - 0:22
    be omitted, so things per se, like
  • 0:22 - 0:24
    multiple regression or something like
  • 0:24 - 0:28
    that you end up losing a lot of cases if
  • 0:28 - 0:31
    you have a few missing values, scatter
  • 0:31 - 0:33
    around a number of different variables,
  • 0:33 - 0:35
    so what I've done here is I've
  • 0:35 - 0:39
    highlighted the cases that are the cases
  • 0:39 - 0:41
    they had something missing with yellow
  • 0:41 - 0:43
    highlighting and I just did that by hand.
  • 0:43 - 0:45
    There was no magic formula. Although, I
  • 0:45 - 0:48
    could have made some sort of conditional
  • 0:48 - 0:50
    if statement sort of thing. That would
  • 0:50 - 0:51
    have done that, but I was feeling lazy.
  • 0:51 - 0:54
    And there weren't that many cases. What
  • 0:54 - 0:55
    I've gone ahead and done is I've
  • 0:55 - 0:58
    calculated the mean which for this
  • 0:58 - 1:00
    variable or something missing is
  • 1:00 - 1:02
    1.65217 for the
  • 1:02 - 1:04
    median and the mode. And just to kind of
  • 1:04 - 1:06
    think about these are all central
  • 1:06 - 1:11
    tendencies of this variable and a lot of
  • 1:11 - 1:13
    a cheap and dirty way to deal with
  • 1:13 - 1:14
    missing values is substitute in the
  • 1:14 - 1:16
    central tendency, and then for linear
  • 1:16 - 1:19
    source of variables often times the mean
  • 1:19 - 1:20
    would be a good choice. In this case, we
  • 1:20 - 1:25
    have kind of these ordered levels in
  • 1:25 - 1:29
    terms of our Likert scales, and in that
  • 1:29 - 1:31
    case. Sometimes, maybe the median might be
  • 1:31 - 1:33
    superior. One advantage of the mean is
  • 1:33 - 1:35
    that if you substitute the mean, it'll be
  • 1:35 - 1:36
    1.652, which is
  • 1:36 - 1:38
    obviously not one of the choices, and you
  • 1:38 - 1:40
    can clearly see which ones were
  • 1:40 - 1:42
    substituted. In this case, just for our
  • 1:42 - 1:45
    purposes here, I'm highlighting them by
  • 1:45 - 1:46
    hand the other thing I'm going to do is
  • 1:46 - 1:49
    I'm going to put an equation here that
  • 1:49 - 1:53
    for these cases where they are being
  • 1:53 - 1:54
    substituted instead of putting the value
  • 1:54 - 1:58
    there. I'm going to put equals to the
  • 1:58 - 2:00
    median here, and I'm gonna make the
  • 2:00 - 2:03
    29 stay in place there by
  • 2:03 - 2:05
    putting a dollar sign in front of it;
  • 2:05 - 2:07
    make it an absolute reference. And that
  • 2:07 - 2:09
    means that I can copy this and paste it
  • 2:09 - 2:11
    in each of these subsequent spots,
  • 2:11 - 2:14
    regardless of which row it's in,
  • 2:14 - 2:16
    it's always going to be grabbing the
  • 2:16 - 2:19
    value. from row 29 for that particular
  • 2:19 - 2:22
    column all right? So I can come here and
  • 2:22 - 2:24
    I can substitute that in and here when I
  • 2:24 - 2:26
    substitute it, we'll see it changes in
  • 2:26 - 2:35
    here. And so here, this one, this one, and
  • 2:35 - 2:36
    you can see these ones here.
  • 2:36 - 2:38
    These questions were how much he uses
  • 2:38 - 2:40
    different statistics software, probably
  • 2:40 - 2:43
    the best guess is actually the median
  • 2:43 - 2:44
    rather than the mean in that case
  • 2:44 - 2:47
    because only one person here used that.
  • 2:47 - 2:49
    So anyway, this is probably the dominant
  • 2:49 - 2:51
    category here if someone left it blank,
  • 2:51 - 2:53
    they probably haven't used it.
  • 2:53 - 2:56
    Let me see. Let's drag this over a little
  • 2:56 - 2:58
    bit, and I can also I can fill these
  • 2:58 - 3:00
    across. It'll still work, so I don't have
  • 3:00 - 3:02
    to just paste paste paste. I can do a
  • 3:02 - 3:05
    whole row of them like that can go in,
  • 3:05 - 3:08
    oops. I need to recopy, so I can copy any
  • 3:08 - 3:10
    of these, paste it in here, and it'll work
  • 3:10 - 3:13
    and it'll keep grabbing the observation
  • 3:13 - 3:17
    from the 29th cell. So I need to
  • 3:17 - 3:21
    copy those in there, okay. And so at the
  • 3:21 - 3:23
    end of this, I'm gonna have a data set
  • 3:23 - 3:26
    that is almost all values. Oh, something I
  • 3:26 - 3:29
    should say is that I calculated the mean
  • 3:29 - 3:31
    median and mode before I started making
  • 3:31 - 3:33
    these changes if you hadn't. You'd get a
  • 3:33 - 3:36
    circular reference warning. So instead,
  • 3:36 - 3:37
    what you want to do is copy and paste
  • 3:37 - 3:40
    the values here instead of the formulas
  • 3:40 - 3:43
    for the mean, median mode and that's what
  • 3:43 - 3:45
    I already did. And so that's why we see
  • 3:45 - 3:50
    that there. Let's see. Here's a couple
  • 3:50 - 3:53
    more that I hadn't highlighted, so I'll
  • 3:53 - 3:56
    copy this and I'll paste it there. You
  • 3:56 - 3:58
    can see I'm pasting the formatting at
  • 3:58 - 3:59
    the same time which is kind of
  • 3:59 - 4:04
    convenient boom. Okay, hold on, okay. And
  • 4:04 - 4:07
    then we have a case here computers at
  • 4:07 - 4:09
    home, awesome, one listed three or more.
  • 4:09 - 4:11
    I'm just gonna go ahead and change that
  • 4:11 - 4:14
    to a three. I know it could be more than
  • 4:14 - 4:16
    that but for our purposes it's, we
  • 4:16 - 4:17
    have low, medium and high,
  • 4:17 - 4:20
    more or less, okay. And I'll look along
  • 4:20 - 4:22
    here. See if there any more missing cases.
  • 4:22 - 4:25
    There are a couple so I'm going to go
  • 4:25 - 4:27
    ahead and copy formatting.
  • 4:27 - 4:29
    The rule to look for the value from the
  • 4:29 - 4:31
    29th row, from here. I'm gonna paste it
  • 4:31 - 4:33
    into the remaining items. Now this was
  • 4:33 - 4:37
    not very efficient, yeah? I wouldn't do
  • 4:37 - 4:39
    this if I had a lot of cases,
  • 4:39 - 4:42
    but for our purposes, I think this will
  • 4:42 - 4:45
    work and we could talk about how to
  • 4:45 - 4:48
    automate these sort of a steps later. So
  • 4:48 - 4:51
    now what I'm going to do is I'm going to
  • 4:51 - 4:58
    copy everything in this sheet like this,
  • 4:58 - 5:02
    copy. And then, I'm gonna just shoot empty.
  • 5:02 - 5:04
    No, it's not. I'm gonna go and add yet
  • 5:04 - 5:08
    another sheet. I'm gonna paste special, so
  • 5:08 - 5:10
    right click and then click paste special
  • 5:10 - 5:14
    words. And then, I'm gonna click on
  • 5:14 - 5:19
    formats, okay. And then, I'm gonna click on
  • 5:19 - 5:23
    paste special values, okay. And so, then I
  • 5:23 - 5:26
    got my highlighting, and I got my numbers
  • 5:26 - 5:32
    but I didn't get my formulas which I
  • 5:32 - 5:35
    appreciate. So now this data set here. I'm
  • 5:35 - 5:36
    gonna go ahead and clear it. Clear this
  • 5:36 - 5:39
    stuff off the bottom clear contents because
  • 5:39 - 5:44
    I don't need it anymore. And now I'm
  • 5:44 - 5:45
    ready to do some analysis on this data,
  • 5:45 - 5:47
    yeah. So this would be a good starting
  • 5:47 - 5:49
    point for that. So pretty much we're done
  • 5:49 - 5:51
    processing the data in terms of missing
  • 5:51 - 5:53
    cases and stuff, gonna get ready to start
  • 5:53 - 5:54
    doing the next step, which would be
  • 5:54 - 5:56
    constructing an index, which I'll make a
  • 5:56 - 6:00
    brief video about here in a second.
Title:
Missing values
Description:

more » « less
Video Language:
English
Duration:
05:59
OEVIDEOS edited English subtitles for Missing values
OEVIDEOS edited English subtitles for Missing values

English subtitles

Revisions Compare revisions