Return to Video

https:/.../2020-02-21_psy317l_standard_error.mp4

  • 0:01 - 0:05
    Hello, in this video, I want to talk about
    the standard error and this is really
  • 0:05 - 0:10
    extending our understanding of sampling
    distributions and essential limit theorem.
  • 0:10 - 0:13
    So, let's talk about what
    a standard error is.
  • 0:15 - 0:18
    First of all, we'll go back to
    this penguin example and
  • 0:18 - 0:22
    you've seen this distribution before
    as a uniform distribution of data.
  • 0:23 - 0:27
    It has, like any distribution, it has--
    there's descriptive statistics.
  • 0:27 - 0:29
    So, it has a population mean.
  • 0:29 - 0:30
    The average is 5.04.
  • 0:30 - 0:34
    The average penguin is 5.04 meters
    from the edge of the ice sheet.
  • 0:34 - 0:37
    You can calculate a standard
    deviation for this.
  • 0:37 - 0:40
    So, the deviation is 2.88.
  • 0:40 - 0:42
    So, that's the, you know,
    a measure of the spread.
  • 0:42 - 0:48
    And there was 5,000 penguins floating
    on this ice sheet, that's the n,
  • 0:48 - 0:49
    the population size.
  • 0:50 - 0:55
    We then discussed about how if you were
    just to sample either just randomly select
  • 0:55 - 1:00
    five penguins at a time or 50 penguins
    at a time, that each of those samples
  • 1:00 - 1:04
    of, let's pick the n equals five for now,
    each of those five penguins,
  • 1:04 - 1:08
    you could calculate how, what the average
    distance from the front of the edge sheet
  • 1:08 - 1:13
    was for each of those individual penguins,
    sample of five penguins and if you were
  • 1:13 - 1:17
    to do that over and over and over again
    and in this histogram, we did it
  • 1:17 - 1:23
    1,000 times, we would be able to generate
    what's called the sampling distribution.
  • 1:25 - 1:30
    And it's the sampling distribution of
    the sample means, that's what it is
  • 1:30 - 1:35
    and I told you that we could calculate
    from that what the average of
  • 1:35 - 1:38
    those sample means across
    the 1,000 samples was and
  • 1:38 - 1:44
    that's this value and the notation that
    we use for that is this mu and then
  • 1:44 - 1:53
    subscript x bar and that's the mean
    of the sample means and I've forgotten
  • 1:53 - 1:57
    what it was, the exact value, but it's
    pretty much going to approximate
  • 1:57 - 1:58
    very, very close.
  • 1:58 - 2:04
    So, I just put approximately equal to
    5.05, just go back, it's 5.04.
  • 2:04 - 2:11
    So, it was-- it's going to approximate
    the population average and you can
  • 2:11 - 2:13
    do that for any sample size.
  • 2:13 - 2:17
    So, that was sample size five,
    let's look at the sample size 50.
  • 2:17 - 2:24
    Again, we have the mean of the sampling
    distribution-- sorry, the mean of
  • 2:24 - 2:29
    the sample means and that is also going
    to be very close to 5.04, it might be
  • 2:29 - 2:32
    a little bit closer because
    our sample size is larger.
  • 2:33 - 2:36
    Two other things to notice about
    these distributions, number one
  • 2:36 - 2:39
    they're normally distributed or approx--
    sorry, the approximate to normal
  • 2:39 - 2:43
    distributions despite the fact for
    the original distribution of penguins.
  • 2:43 - 2:46
    The population distribution was
    a uniform distribution.
  • 2:46 - 2:51
    Second thing to notice, the sample size
    doesn't really effect where the value
  • 2:51 - 2:56
    of the mean, of the sample means, it does
    effect the standard deviation of
  • 2:56 - 2:57
    the sample means.
  • 2:57 - 3:00
    So, if this is a normal distribution,
    or we believe it to approximate,
  • 3:00 - 3:08
    and then also this approximates
    a normal distribution, then, it's clear
  • 3:08 - 3:15
    that the distance here, let's just assume
    that's a standard deviation and I put it
  • 3:15 - 3:17
    in the right place.
  • 3:17 - 3:21
    This standard deviation, it's greater than
    whatever the corresponding value is
  • 3:21 - 3:25
    over here, if that's also
    the standard deviation.
  • 3:25 - 3:29
    So, as the sample size gets larger,
    the spead of the sample means
  • 3:29 - 3:33
    gets smaller, so, we can say
    the standard deviation gets smaller.
  • 3:33 - 3:38
    Now, does this standard deviation have any
    relationship at all to the original
  • 3:38 - 3:41
    standard deviation of
    the original population.
  • 3:41 - 3:45
    The original standard deviation was 2.88,
    so, I'll just say population of
  • 3:45 - 3:49
    the original-- standard deviation was 2.88
  • 3:49 - 3:52
    Is there any relationship at all between
    these two standard deviations?
  • 3:52 - 3:56
    Because it's not like the mean of
    the sample means, which is pretty
  • 3:56 - 4:01
    much the same, regardless of the sample
    size, I mean it does get better with
  • 4:01 - 4:04
    larger samples but it approximates,
    it's close, especially if you have
  • 4:04 - 4:06
    enough of these samples.
  • 4:06 - 4:09
    What's the relationship of these standard
    deviations because it's clear that when
  • 4:09 - 4:15
    you change n, this value is going to
    change, so is there a relationship?
  • 4:15 - 4:17
    And it turns out that there is
    a relationship and we're going to
  • 4:17 - 4:18
    look into that.
  • 4:18 - 4:24
    This graph here just shows you that
    the normal distribution for becomes
  • 4:24 - 4:27
    better and better the larger
    the sample size, so, it's a little
  • 4:27 - 4:30
    tricky to see but let me, I just want to
    really point out one or two things here.
  • 4:30 - 4:34
    I'm going to pick a color
    that represents that.
  • 4:34 - 4:39
    So, this value here, actually in red, so,
    if I was just to pick one penguin at
  • 4:39 - 4:44
    a time, a sample size of one, this is
    my estimate of the sample-- I'm going
  • 4:44 - 4:45
    for the red line here.
  • 4:45 - 4:50
    That's my estimate of the sample--
    sorry, let's say that again.
  • 4:50 - 4:52
    That's the distribution of
    the sample means.
  • 4:52 - 4:54
    It looks like the original population.
  • 4:54 - 4:57
    So, for a sample size of one, you don't
    get a normal distribution of the sample
  • 4:57 - 5:01
    means, you get whatever
    the original population was.
  • 5:01 - 5:07
    Let's look at two and I've got to find it
    on here, so, it's the orange one and
  • 5:07 - 5:12
    I believe it's this one here.
  • 5:12 - 5:13
    It is this one here.
  • 5:13 - 5:15
    This is what it looks like.
  • 5:15 - 5:17
    This is the n is two.
  • 5:17 - 5:19
    So, again, not a really
    normal distribution.
  • 5:19 - 5:22
    Now, let's skip to 50.
  • 5:22 - 5:27
    This is 50 here and you can see it
    really, you don't need me to help
  • 5:27 - 5:28
    you too much.
  • 5:28 - 5:31
    This is the 50 value, it's very normal.
  • 5:31 - 5:36
    And then, we got blue at ten--
    sorry, 25 here.
  • 5:36 - 5:38
    This is the 25 one and so on.
  • 5:38 - 5:40
    This is the ten.
  • 5:40 - 5:42
    This is the five.
  • 5:42 - 5:44
    I wanted to just show you this graph
    because I wanted to show you that
  • 5:44 - 5:49
    even with very, very, very small
    sample sizes of like five, we already
  • 5:49 - 5:51
    get very close to a normal distribution.
  • 5:51 - 5:55
    It's only with sample sizes of ridiculous
    sample sizes of like one or two that
  • 5:55 - 5:57
    we don't do a very good job,
  • 5:57 - 6:00
    So, even with small sample sizes,
    we get to the normal distribution
  • 6:00 - 6:03
    of the normal distribution of
    the sample means.
  • 6:03 - 6:07
    So, back to the problem
    I just posted a moment ago.
  • 6:07 - 6:15
    This is our original standard deviation
    of a population, this is our population
  • 6:15 - 6:17
    and whenever we get a sample,
    and again, this is just the sample
  • 6:17 - 6:18
    size of five.
  • 6:18 - 6:19
    This is the distribution of sample means.
  • 6:19 - 6:27
    The mean is going to approximate the mean
    here but what is the relationship of
  • 6:27 - 6:31
    the standard deviation to
    this original population.
  • 6:31 - 6:33
    What is the relationship?
  • 6:33 - 6:39
    It must be also related to the sample size
    because it changes with its sample size.
  • 6:39 - 6:43
    And it's just a formula and we're not
    going to talk too much about--
  • 6:43 - 6:47
    we're not going to talk much really at all
    about how it's derived but this formula
  • 6:47 - 6:52
    here, very neatly, just tells us
    about their relationship and
  • 6:52 - 6:56
    so, what we have here is this is
    our standard deviation of
  • 6:56 - 7:00
    the sampling distribution of
    the sample means.
  • 7:00 - 7:03
    So, we call that sigma subscript x bar,
  • 7:03 - 7:04
    sigma x bar.
  • 7:04 - 7:08
    The standard deviation, so just to really
    reiterate what we're looking at, this is
  • 7:08 - 7:13
    the distribution of sample means,
    this is-- we're looking for this value
  • 7:13 - 7:16
    what's this standard deviation?
  • 7:16 - 7:22
    And actually, technically, that's
    the notation, what is that standard
  • 7:22 - 7:23
    deviation?
  • 7:23 - 7:25
    So, what we do is, we just take
    the original population.
  • 7:25 - 7:30
    This is the population standard deviation
    from the original population and we're
  • 7:30 - 7:35
    going to divide it by the square root of n
    and that gives us that this value,
  • 7:35 - 7:37
    this standard deviation.
  • 7:37 - 7:41
    Its technical name is the standard
    deviation of the sampling distribution
  • 7:41 - 7:44
    of the sample means, which is an awful
    mouthful but we just call
  • 7:44 - 7:45
    it the standard error of
  • 7:45 - 7:49
    the mean, which is what we call it
    the standard error of the mean.
  • 7:49 - 7:55
    So, this graph illustrates how
    the standard error of the mean
  • 7:55 - 7:57
    changes by sample size.
  • 7:57 - 8:07
    So, if I just go back to-- maybe,
    I'll just go back to this slide here
  • 8:07 - 8:11
    and we were asking the question of,
    you know, what's this value over
  • 8:11 - 8:15
    sample size 50 compared to this
    value of a sample size of five?
  • 8:15 - 8:19
    So, that was the question and I'm going to
    plot-- maybe here I'll plot it or write it
  • 8:19 - 8:20
    sorry.
  • 8:20 - 8:25
    So, this is the formula, the standard
    error of the mean or the standard
  • 8:25 - 8:28
    deviation of the sampling distribution
    of the sample means is equal to
  • 8:28 - 8:32
    the original population standard deviation
    divided by the square root of n.
  • 8:32 - 8:37
    So, when we had that sample size of five,
    which is this one up here, what we're
  • 8:37 - 8:42
    really looking at is this, the original
    standard deviation was 2.88 and
  • 8:42 - 8:46
    we're going to divide by the square root
    of the sample size which is five, so that
  • 8:46 - 8:48
    equals 1.3.
  • 8:48 - 8:53
    So, the standard deviation here is 1.3 and
    that standard error we call that is 1.3.
  • 8:53 - 8:59
    So, what this is saying is this value here
    is 1.3 higher that was it, I forget.
  • 8:59 - 9:04
    I think it was 5.04 was the mean of
    the sample means and so this value here
  • 9:04 - 9:10
    is going to be a 6.5-- nope, nope, not five.
  • 9:10 - 9:15
    It's going to be at 6.34.
  • 9:15 - 9:22
    This is one standard deviation above
    the sample mean but if we have
  • 9:22 - 9:27
    a sample size of fifty, then
    the calculation becomes this.
  • 9:27 - 9:30
    Becomes the original standard deviation
    of the population divided by the square
  • 9:30 - 9:33
    root of 50, which is equal to and I've
  • 9:33 - 9:36
    written this down so I can check, 0.4.
  • 9:36 - 9:40
    So, back to this graph,
    this value is 0.4,
  • 9:40 - 9:43
    and this value is 1.3.
  • 9:43 - 9:47
    And so, it gets smaller the bigger the
    sample size.
  • 9:47 - 9:52
    This graph here that I got to previously
    is actually showing us
  • 9:52 - 9:56
    how the standard error changes by
    the sample size.
  • 9:56 - 10:00
    So we just had a sample size of 50,
    which is approximately here.
  • 10:00 - 10:07
    If we go across to this value on this
    axis, it tells us that's about 0.4,
  • 10:07 - 10:12
    sample size of 50, and if we had
    a sample size of 5,
  • 10:12 - 10:16
    which is approximately here --
    I'm doing a line, not very well,
  • 10:16 - 10:20
    but it goes to about there.
    This was about 1.3.
  • 10:20 - 10:24
    And I just want you to -- there's nothing
    really too much for you to take home
  • 10:24 - 10:27
    from this graph other than showing you
    that as the sample size increases,
  • 10:27 - 10:32
    that the -- any population
    standard deviation that we have,
  • 10:32 - 10:37
    the standard error is going to get
    much smaller very rapidly.
  • 10:37 - 10:41
    A sample size of 5 is still quite high up
    on this curve,
  • 10:41 - 10:44
    but once you come down to sample sizes
    of 20 or 30 or more,
  • 10:44 - 10:49
    then we get a very, very small
    standard error.
  • 10:51 - 10:57
    This is just to reiterate that point so
    you can see what these are on this graph.
  • 10:57 - 11:00
    So let's put together what we've
    just learned about the standard error
  • 11:00 - 11:05
    with what we have learned previously about
    the Central Limit Theorem.
  • 11:05 - 11:09
    So what we have just been discussing is
    that we just know that we have
  • 11:09 - 11:11
    an original population,
    it could be any distribution,
  • 11:11 - 11:13
    here's our uniform distribution.
  • 11:13 - 11:17
    If we take many samples from it,
    we get our sampling distribution.
  • 11:17 - 11:25
    In this case, of the sample means,
    is normally distributed
  • 11:25 - 11:27
    or approximately normally distributed.
  • 11:27 - 11:35
    And we know that the sampling distribution
    has a mean that is approximately equal to
  • 11:35 - 11:40
    the population mean and we've just learned
    that we just know now that
  • 11:40 - 11:44
    the standard deviation of this
    approximately normal distribution,
  • 11:44 - 11:47
    this is the standard error.
  • 11:47 - 11:50
    I'll write here, "standard error."
  • 11:50 - 11:54
    So we can actually write this in
    notation form,
  • 11:54 - 11:57
    and we say that this sampling distribution
    is approximately normal,
  • 11:57 - 12:00
    this is what this tilde squiggle means,
    is approximately normal,
  • 12:00 - 12:08
    approximately normal and it has a mean
    of the population mean,
  • 12:08 - 12:11
    so I'll just write here,
    the mean is the population mean.
  • 12:11 - 12:13
    And the standard deviation of that
    distribution,
  • 12:13 - 12:16
    and we're talking about this distribution
    down here,
  • 12:16 - 12:19
    the standard deviation of that
    distribution is the standard error,
  • 12:19 - 12:21
    that's what we call it.
  • 12:21 - 12:23
    And it's approximately equal to the
    standard deviation of the
  • 12:23 - 12:27
    original population divided by the
    square root of the sample size n.
  • 12:27 - 12:35
    So, this is a key thing that we know.
    If we have at a population of any --
  • 12:35 - 12:38
    I'll just write "uniform" in here,
    of any type, it could bimodal,
  • 12:38 - 12:40
    it could be uniform, it could be skewed,
    we know that if we were to take
  • 12:40 - 12:44
    thousands and thousands of samples
    or just one thousand -- or just a few,
  • 12:44 - 12:47
    hundred samples, the sample means that
    we get from all those samples
  • 12:47 - 12:50
    are going to approximate
    a normal distribution
  • 12:50 - 12:53
    if our sample size is larger,
    it's going to approximate
  • 12:53 - 12:57
    a normal distribution even more.
    And we can already determine what the
  • 12:57 - 13:01
    shape of that distribution is going to be
    because we know that the population mean
  • 13:01 - 13:04
    is approximately equal to the mean
    of the sample means,
  • 13:04 - 13:09
    and we know that the standard deviation,
    this is the standard error,
  • 13:09 - 13:15
    we know that that, the standard error,
    is the standard deviation of the
  • 13:15 - 13:17
    sampling distribution.
  • 13:17 - 13:20
    Okay, so we can work that out.
  • 13:20 - 13:23
    But the thing is, what you're probably
    already thinking is,
  • 13:23 - 13:26
    "why do you care?" And you may not care,
    and that's fine.
  • 13:26 - 13:29
    There's no reason to particularly.
  • 13:29 - 13:34
    But, it can be very, very helpful.
    I'm just going to just float this idea
  • 13:34 - 13:38
    and we'll return to it in future videos.
  • 13:38 - 13:43
    Hopefully it's gone through your head
    that why is this strange person
  • 13:43 - 13:46
    taking thousands of samples all the time?
  • 13:46 - 13:47
    You know, you're not going to go to this
    penguin ice sheet and just keep
  • 13:47 - 13:51
    randomly picking 5 penguins at random
    1,000 times.
  • 13:51 - 13:54
    Science and other types of time --
    when we collect data,
  • 13:54 - 13:57
    it doesn't work like that.
    We pretty much usually only just collect
  • 13:57 - 13:59
    one sample of data.
  • 13:59 - 14:03
    And so, when we collect one sample of data
    and this here -- I've got
  • 14:03 - 14:08
    sampling distribution of n = 5 penguins.
  • 14:08 - 14:10
    This is when we did do it 1,000 times.
  • 14:10 - 14:14
    But let's just say that we did it one time
    and we got a value around about here,
  • 14:14 - 14:18
    around about 7 meters,
    that was our sample.
  • 14:18 - 14:20
    We just got one sample.
  • 14:20 - 14:25
    If we just got one sample,
    we don't know anything really about that
  • 14:25 - 14:30
    in terms of how certain or how uncertain
    are we that this truly is the sample mean.
  • 14:30 - 14:34
    We knew if we did this many, many times
    the average of al the sample means
  • 14:34 - 14:37
    would converge on the true
    population mean.
  • 14:37 - 14:38
    And that's our ultimate goal,
    we're trying to est--
  • 14:38 - 14:42
    normally we don't know the population mean
    we're trying to estimate it.
  • 14:42 - 14:46
    So in our one sample, we just got this
    value of 7, say.
  • 14:46 - 14:51
    How confident are we that that is
    the population mean?
  • 14:51 - 14:56
    And so, what we're able to do by having
    this belief that we're able to know
  • 14:56 - 15:01
    that this value of 7 does come from
    in theory,
  • 15:01 - 15:04
    a sampling distribution that exists.
  • 15:04 - 15:08
    And in theory, this sampling distribution
    exists with a standard deviation
  • 15:08 - 15:11
    that we call the standard error.
    We're able to understand how far
  • 15:11 - 15:16
    this value of 7, or any value that
    we collected, it could be some other value
  • 15:16 - 15:20
    but our one sample was 7 meters,
    we get a sense of how far away
  • 15:20 - 15:24
    from the mean that is in the units
    of standard deviations
  • 15:24 - 15:26
    or technically,
    with a sampling distribution,
  • 15:26 - 15:27
    standard errors.
  • 15:27 - 15:30
    So we're going to come back to this topic,
    but really the value of the standard error
  • 15:30 - 15:34
    is that enables us to determine
    when we collect one sample,
  • 15:34 - 15:40
    we're able to work out how far away
    or how confident we are in our value,
  • 15:40 - 15:42
    is how far away is it from the
    population mean,
  • 15:42 - 15:46
    how confident we are that this is a true
    representation of the population mean.
  • 15:46 - 15:48
    We're going to come back to this
    in future videos.
Title:
https:/.../2020-02-21_psy317l_standard_error.mp4
Video Language:
English
Duration:
16:05

English subtitles

Revisions