< Return to Video

Missing values

  • 0:00 - 0:03
    all right so here's a little help file
  • 0:03 - 0:06
    on dealing with missing values in our
  • 0:06 - 0:10
    mini survey data basically the reason
  • 0:10 - 0:12
    that we have any concern that there are
  • 0:12 - 0:15
    missing values is that for each case for
  • 0:15 - 0:17
    which there's a missing value in any
  • 0:17 - 0:19
    analysis that includes that case it'll
  • 0:19 - 0:22
    be omitted so things per se like
  • 0:22 - 0:24
    multiple regression or something like
  • 0:24 - 0:28
    that you end up losing a lot of cases if
  • 0:28 - 0:31
    you have a few missing values scatter
  • 0:31 - 0:33
    around a number of different variables
  • 0:33 - 0:35
    so what I've done here is I've
  • 0:35 - 0:39
    highlighted the cases that are the cases
  • 0:39 - 0:41
    they had something missing with yellow
  • 0:41 - 0:43
    highlighting and I just did that by hand
  • 0:43 - 0:45
    there was no magic formula although I
  • 0:45 - 0:48
    could have made some sort of conditional
  • 0:48 - 0:50
    if statement sort of thing that would
  • 0:50 - 0:51
    have done that but I was feeling lazy
  • 0:51 - 0:54
    and there weren't that many cases what
  • 0:54 - 0:55
    I've gone ahead and done is I've
  • 0:55 - 0:58
    calculated the mean which for this
  • 0:58 - 1:00
    variable or something missing is one
  • 1:00 - 1:02
    point six five to one seven for the
  • 1:02 - 1:04
    median in the mode and just to kind of
  • 1:04 - 1:06
    think about these are all central
  • 1:06 - 1:11
    tendencies of this variable and a lot of
  • 1:11 - 1:13
    a cheap and dirty way to deal with
  • 1:13 - 1:14
    missing values is substitute in the
  • 1:14 - 1:16
    central tendency and then for linear
  • 1:16 - 1:19
    source of variables often times the mean
  • 1:19 - 1:20
    would be a good choice in this case we
  • 1:20 - 1:25
    have kind of these ordered levels in
  • 1:25 - 1:29
    terms of our Likert scales and in that
  • 1:29 - 1:31
    case sometimes maybe the median might be
  • 1:31 - 1:33
    superior one advantage of the mean is
  • 1:33 - 1:35
    that if you substitute the mean it'll be
  • 1:35 - 1:36
    one point six five two which is
  • 1:36 - 1:38
    obviously not one of the choices and you
  • 1:38 - 1:40
    can clearly see which ones were
  • 1:40 - 1:42
    substituted in this case just for our
  • 1:42 - 1:45
    purposes here I'm highlighting them by
  • 1:45 - 1:46
    hand the other thing I'm going to do is
  • 1:46 - 1:49
    I'm going to put an equation here that
  • 1:49 - 1:53
    for these cases where they are being
  • 1:53 - 1:54
    substituted instead of putting the value
  • 1:54 - 1:58
    there I'm going to put equals to the
  • 1:58 - 2:00
    median here and I'm gonna make the
  • 2:00 - 2:03
    twenty nine stay in place there by
  • 2:03 - 2:05
    putting a dollar sign in front of it
  • 2:05 - 2:07
    make it an absolute reference and that
  • 2:07 - 2:09
    means that I can copy this and paste it
  • 2:09 - 2:11
    in each of these subsequent spots
  • 2:11 - 2:14
    regardless of which row it's in
  • 2:14 - 2:16
    it's always going to be grabbing the
  • 2:16 - 2:19
    value from road 29 for that particular
  • 2:19 - 2:22
    column all right so I can come here and
  • 2:22 - 2:24
    I can substitute that in and here when I
  • 2:24 - 2:26
    substitute it we'll see it changes in
  • 2:26 - 2:35
    here and so here this one this one and
  • 2:35 - 2:36
    you can see these ones here
  • 2:36 - 2:38
    these questions were how much he uses
  • 2:38 - 2:40
    different statistics software probably
  • 2:40 - 2:43
    the best guess is actually the median
  • 2:43 - 2:44
    rather than the mean in that case
  • 2:44 - 2:47
    because only one person here used that
  • 2:47 - 2:49
    so anyway this is probably the dominant
  • 2:49 - 2:51
    category here if someone left it blank
  • 2:51 - 2:53
    they probably haven't used it
  • 2:53 - 2:56
    let me see let's drag this over a little
  • 2:56 - 2:58
    bit and I can also I can fill these
  • 2:58 - 3:00
    across it'll still work so I don't have
  • 3:00 - 3:02
    to just paste paste paste I can do a
  • 3:02 - 3:05
    whole row of them like that can go in
  • 3:05 - 3:08
    oops I need to recopy so I can copy any
  • 3:08 - 3:10
    of these paste it in here and it'll work
  • 3:10 - 3:13
    and it'll keep grabbing the observation
  • 3:13 - 3:17
    from the twenty-ninth cell so I need to
  • 3:17 - 3:21
    copy those in there okay and so at the
  • 3:21 - 3:23
    end of this I'm gonna have a data set
  • 3:23 - 3:26
    that is almost all values Oh something I
  • 3:26 - 3:29
    should say is that I calculated the mean
  • 3:29 - 3:31
    median and mode before I started making
  • 3:31 - 3:33
    these changes if you hadn't you'd get a
  • 3:33 - 3:36
    circular reference warning so instead
  • 3:36 - 3:37
    what you want to do is copy and paste
  • 3:37 - 3:40
    the values here instead of the formulas
  • 3:40 - 3:43
    for the mean median mode and that's what
  • 3:43 - 3:45
    I already did and so that's why we see
  • 3:45 - 3:50
    that there let's see here's a couple
  • 3:50 - 3:53
    more that I hadn't highlighted so I'll
  • 3:53 - 3:56
    copy this and I'll paste it there you
  • 3:56 - 3:58
    can see I'm pasting the formatting at
  • 3:58 - 3:59
    the same time which is kind of
  • 3:59 - 4:04
    convenient boom okay hold on okay and
  • 4:04 - 4:07
    then we have a case here computers at
  • 4:07 - 4:09
    home awesome one listed three or more
  • 4:09 - 4:11
    I'm just gonna go ahead and change that
  • 4:11 - 4:14
    to a three I know it could be more than
  • 4:14 - 4:16
    that but for our purposes it's uh we
  • 4:16 - 4:17
    have low medium and high
  • 4:17 - 4:20
    more or less okay and I'll look along
  • 4:20 - 4:22
    here see if there any more missing cases
  • 4:22 - 4:25
    there are a couple so I'm going to go
  • 4:25 - 4:27
    ahead and copy formatting
  • 4:27 - 4:29
    the rule to look for the value from the
  • 4:29 - 4:31
    29 throw from here I'm gonna paste it
  • 4:31 - 4:33
    into the remaining items now this was
  • 4:33 - 4:37
    not very efficient yeah I wouldn't do
  • 4:37 - 4:39
    this if I had a lot of a lot of cases
  • 4:39 - 4:42
    but for our purposes I think this will
  • 4:42 - 4:45
    work and we could talk about how to
  • 4:45 - 4:48
    automate these sort of a steps later so
  • 4:48 - 4:51
    now what I'm going to do is I'm going to
  • 4:51 - 4:58
    copy everything in this sheet like this
  • 4:58 - 5:02
    copy and then I'm gonna just shoot empty
  • 5:02 - 5:04
    no it's not I'm gonna go and add yet
  • 5:04 - 5:08
    another sheet I'm gonna paste special so
  • 5:08 - 5:10
    right click and then click paste special
  • 5:10 - 5:14
    words and then I'm gonna click on
  • 5:14 - 5:19
    formats ok and then I'm gonna click on
  • 5:19 - 5:23
    paste special values ok and so then I
  • 5:23 - 5:26
    got my highlighting and I got my numbers
  • 5:26 - 5:32
    but I didn't get my formulas which I
  • 5:32 - 5:35
    appreciate so now this data set here I'm
  • 5:35 - 5:36
    gonna go ahead and clear it clear this
  • 5:36 - 5:39
    stuff off the bottom clear contents cuz
  • 5:39 - 5:44
    I don't need it anymore and now I'm
  • 5:44 - 5:45
    ready to do some analysis on this data
  • 5:45 - 5:47
    yeah so this would be a good starting
  • 5:47 - 5:49
    point for that so pretty much we're done
  • 5:49 - 5:51
    processing the data in terms of missing
  • 5:51 - 5:53
    cases and stuff gonna get ready to start
  • 5:53 - 5:54
    doing the next step which would be
  • 5:54 - 5:56
    constructing an index which I'll make a
  • 5:56 - 6:00
    brief video about here in a second
Title:
Missing values
Description:

more » « less
Video Language:
English
Duration:
05:59
OEVIDEOS edited English subtitles for Missing values
OEVIDEOS edited English subtitles for Missing values

English subtitles

Revisions Compare revisions