< Return to Video

Small Sample Size Confidence Intervals

  • 0:01 - 0:03
    7 patients blood pressures
    have been measured after
  • 0:03 - 0:06
    having been given a new
    drug for 3 months.
  • 0:06 - 0:08
    They had blood pressure
    increases of, and they give us
  • 0:08 - 0:11
    seven data points right here--
    who knows, that's in some
  • 0:11 - 0:12
    blood pressure units.
  • 0:12 - 0:17
    Construct a 95% confidence
    interval for the true expected
  • 0:17 - 0:22
    blood pressure increase for all
    patients in a population.
  • 0:22 - 0:25
    So there's some population
    distribution here.
  • 0:25 - 0:27
    It's a reasonable assumption
    to think that it is normal.
  • 0:27 - 0:29
    It's a biological process.
  • 0:29 - 0:33
    So if you gave this drug to
    every person who has ever
  • 0:33 - 0:39
    lived, that will result in some
    mean increase in blood
  • 0:39 - 0:41
    pressure, or who knows, maybe
    it actually will decrease.
  • 0:41 - 0:43
    And there's also going to be
    some standard deviation here.
  • 0:46 - 0:47
    It is a normal distribution.
  • 0:47 - 0:50
    And the reason why it's
    reasonable to assume that it's
  • 0:50 - 0:52
    a normal distribution
    is because it's
  • 0:52 - 0:53
    a biological process.
  • 0:53 - 0:55
    It's going to be the sum of many
    thousands and millions of
  • 0:55 - 0:56
    random events.
  • 0:56 - 0:59
    And things that are sums of
    millions and thousands of
  • 0:59 - 1:02
    random events tend to be
    normal distribution.
  • 1:02 - 1:03
    So this is a population
    distribution.
  • 1:08 - 1:11
    And we don't know anything
    really about it outside of the
  • 1:11 - 1:13
    sample that we have here.
  • 1:13 - 1:17
    Now, what we can do is, and this
    tends to be a good thing
  • 1:17 - 1:19
    to do, when you do have a
    sample just figure out
  • 1:19 - 1:21
    everything that you can
    figure out about that
  • 1:21 - 1:22
    sample from the get-go.
  • 1:22 - 1:24
    So we have our seven
    data points.
  • 1:24 - 1:27
    And you could add them up and
    divide by 7 and get your
  • 1:27 - 1:28
    sample mean.
  • 1:28 - 1:34
    So our sample mean
    here is 2.34.
  • 1:34 - 1:35
    And then you can also
    calculate your
  • 1:35 - 1:37
    sample standard deviation.
  • 1:37 - 1:39
    Find the square distance from
    each of these points to your
  • 1:39 - 1:43
    sample mean, add them up, divide
    by n minus 1, because
  • 1:43 - 1:46
    it's a sample, then take the
    square root, and you get your
  • 1:46 - 1:47
    sample standard deviation.
  • 1:47 - 1:50
    I did this ahead of time
    just to save time.
  • 1:50 - 1:53
    Sample standard deviation
    is 1.04.
  • 1:53 - 1:55
    And when you don't know anything
    about the population
  • 1:55 - 1:57
    distribution, the thing that
    we've been doing from the
  • 1:57 - 2:03
    get-go is estimating that
    character with our sample
  • 2:03 - 2:05
    standard deviation.
  • 2:05 - 2:08
    So we've been estimating the
    true standard deviation of the
  • 2:08 - 2:12
    population with our sample
    standard deviation.
  • 2:16 - 2:19
    Now in this problem, this exact
    problem, we're going to
  • 2:19 - 2:20
    run into a problem.
  • 2:20 - 2:25
    We're estimating our standard
    deviation with an n of only 7.
  • 2:25 - 2:31
    So this is probably going to
    be a not so good estimate
  • 2:31 - 2:41
    because-- let me just write--
    because n is small.
  • 2:41 - 2:44
    In general, this is considered
    a bad estimate if n
  • 2:44 - 2:46
    is less than 30.
  • 2:46 - 2:48
    Above 30 you're dealing
    in the realm
  • 2:48 - 2:50
    of pretty good estimates.
  • 2:50 - 2:53
    So the whole focus of this video
    is when we think about
  • 2:53 - 2:55
    the sampling distribution, which
    is what we're going to
  • 2:55 - 2:59
    use to generate our interval,
    instead of assuming that the
  • 2:59 - 3:02
    sampling distribution is normal
    like we did in many
  • 3:02 - 3:05
    other videos using the central
    limit theorem and all of that,
  • 3:05 - 3:08
    we're going to tweak the
    sampling distribution.
  • 3:08 - 3:11
    We're not going to assume it's
    a normal distribution because
  • 3:11 - 3:12
    this is a bad estimate.
  • 3:12 - 3:14
    We're going to assume that
    it's something called a
  • 3:14 - 3:16
    t-distribution.
  • 3:16 - 3:18
    And a t-distribution is
    essentially, the best way to
  • 3:18 - 3:23
    think about is it's almost
    engineered so it gives a
  • 3:23 - 3:25
    better estimate of your
    confidence intervals and all
  • 3:25 - 3:29
    of that when you do have
    a small sample size.
  • 3:29 - 3:31
    It looks very similar to
    a normal distribution.
  • 3:35 - 3:39
    It has some mean, so this is
    your mean of your sampling
  • 3:39 - 3:40
    distribution still.
  • 3:40 - 3:41
    But it also has fatter tails.
  • 3:46 - 3:50
    And the way I think about why
    it has fatter tails is when
  • 3:50 - 3:53
    you make an assumption that this
    is a standard deviation
  • 3:53 - 3:56
    for-- let me take
    one more step.
  • 3:56 - 3:59
    So normally what we do is we
    find the estimate of the true
  • 3:59 - 4:02
    standard deviation, and then
    we say that the standard
  • 4:02 - 4:08
    deviation of the sampling
    distribution is equal to the
  • 4:08 - 4:11
    true standard deviation of our
    population divided by the
  • 4:11 - 4:13
    square root of n.
  • 4:13 - 4:16
    In this case, n is equal to 7.
  • 4:16 - 4:18
    And then we say OK, we never
    know the true standard, or we
  • 4:18 - 4:22
    seldom know-- sometimes you do
    know-- we seldom know the true
  • 4:22 - 4:22
    standard deviation.
  • 4:22 - 4:25
    So if we don't know that the
    best thing we can put in there
  • 4:25 - 4:27
    is our sample standard
    deviation.
  • 4:32 - 4:36
    And this right here, this is the
    whole reason why we don't
  • 4:36 - 4:39
    say that this is just a 95
    probability interval.
  • 4:39 - 4:41
    This is the whole reason why
    we call it a confidence
  • 4:41 - 4:43
    interval because we're making
    some assumptions.
  • 4:43 - 4:47
    This thing is going to change
    from sample to sample.
  • 4:47 - 4:50
    And in particular, this is going
    to be a particularly bad
  • 4:50 - 4:53
    estimate when we have a
    small sample size, a
  • 4:53 - 4:55
    size less than 30.
  • 4:55 - 4:59
    So when you are estimating the
    standard deviation where you
  • 4:59 - 5:01
    don't know it, you're estimating
    it with your sample
  • 5:01 - 5:04
    standard deviation, and your
    sample size is small, and
  • 5:04 - 5:07
    you're going to use this to
    estimate the standard
  • 5:07 - 5:11
    deviation of your sampling
    distribution, you don't assume
  • 5:11 - 5:14
    your sampling distribution
    is a normal distribution.
  • 5:14 - 5:17
    You assume it has
    fatter tails.
  • 5:17 - 5:20
    And it has fatter tails because
    you're essentially
  • 5:20 - 5:22
    underestimating-- you're
    underestimating the standard
  • 5:22 - 5:24
    deviation over here.
  • 5:24 - 5:26
    Anyway, with all of that said,
    let's just actually go through
  • 5:26 - 5:27
    this problem.
  • 5:27 - 5:31
    So we need to think about a 95%
    confidence interval around
  • 5:31 - 5:33
    this mean right over here.
  • 5:33 - 5:37
    So a 95% confidence interval,
    if this was a normal
  • 5:37 - 5:39
    distribution you would just
    look it up in a Z-table.
  • 5:39 - 5:40
    But it's not, this is
    a t-distribution.
  • 5:45 - 5:48
    We're looking for a 95%
    confidence interval.
  • 5:48 - 5:51
    So some interval around
    the mean that
  • 5:51 - 5:54
    encapsulates 95% of the area.
  • 5:54 - 5:58
    For a t-distribution you use
    t-table, and I have a t-table
  • 5:58 - 5:59
    ahead of time right over here.
  • 5:59 - 6:03
    And what you want to do is use
    the two-sided row for what
  • 6:03 - 6:04
    we're doing right over here.
  • 6:04 - 6:06
    And the best way to think
    about it is that we're
  • 6:06 - 6:10
    symmetric around the mean.
  • 6:10 - 6:11
    And that's why they
    call it two-sided.
  • 6:11 - 6:13
    It would be one-sided if it
    was kind of a cumulative
  • 6:13 - 6:16
    percentage up to some
    critical threshold.
  • 6:16 - 6:19
    But in this case, it's
    two-sided, we're symmetric.
  • 6:19 - 6:20
    Or another way to think
    about it is we're
  • 6:20 - 6:22
    excluding the two sides.
  • 6:22 - 6:25
    So we want the 95%
    in the middle.
  • 6:25 - 6:33
    And this is a sampling
    distribution of the sample
  • 6:33 - 6:37
    mean for n is equal to 7.
  • 6:37 - 6:39
    And I won't go into the details
    here, but when n is
  • 6:39 - 6:45
    equal to 7 you have 6 degrees
    of freedom, or n minus 1.
  • 6:45 - 6:49
    And the way that t-tables are
    set up, you go and find the
  • 6:49 - 6:50
    degrees of freedom.
  • 6:50 - 6:53
    So you don't go to the n,
    you go to the n minus 1.
  • 6:53 - 6:55
    So you go to the 6 right here.
  • 6:55 - 6:59
    So if you want to encapsulate
    95% of this right over here,
  • 6:59 - 7:04
    and you have an n of 6, you
    have to go 2.447 standard
  • 7:04 - 7:06
    deviations in each direction.
  • 7:06 - 7:11
    And this t-table assumes that
    you are approximating that
  • 7:11 - 7:14
    standard deviation using your
    sample standard deviation.
  • 7:14 - 7:18
    So another way to think of it
    you have to go 2.447 of these
  • 7:18 - 7:21
    approximated standard
    deviations.
  • 7:21 - 7:22
    Let me it right here.
  • 7:22 - 7:28
    So you have to go 2.447-- this
    distance right here is 2.447
  • 7:28 - 7:34
    times this approximated
    standard deviation.
  • 7:38 - 7:40
    And sometimes you'll see this
    in some statistics book.
  • 7:40 - 7:42
    This thing right here,
    this exact number,
  • 7:42 - 7:44
    is shown like this.
  • 7:44 - 7:47
    They put a little hat on top of
    the standard deviation to
  • 7:47 - 7:50
    show that it has been
    approximated using the sample
  • 7:50 - 7:51
    standard deviation.
  • 7:51 - 7:53
    So we'll put a little hat over
    here, because frankly, this is
  • 7:53 - 7:56
    the only thing that
    we can calculate.
  • 7:56 - 7:59
    So this is how far you have
    to go in each direction.
  • 7:59 - 8:00
    And we know what
    this value is.
  • 8:00 - 8:02
    We know what the sample
    distribution is.
  • 8:02 - 8:03
    So let's get our
    calculator out.
  • 8:11 - 8:17
    So we know our sample standard
    deviation is 1.04.
  • 8:17 - 8:19
    And we want to divide that
    by the square root of 7.
  • 8:24 - 8:29
    So we get 0.39.
  • 8:29 - 8:36
    So this right here is 0.39.
  • 8:36 - 8:40
    And so if we want to find
    the distance around this
  • 8:40 - 8:43
    population mean that
    encapsulates 95% of the
  • 8:43 - 8:46
    population or of the sampling
    distribution, we have to
  • 8:46 - 8:51
    multiply 0.39 times 2.447,
    so let's do that.
  • 8:51 - 9:01
    So times 2.447 is
    equal to 0.96.
  • 9:01 - 9:10
    So this is equal to-- so this
    distance right here is 0.96,
  • 9:10 - 9:14
    and then this distance
    right here is 0.96.
  • 9:14 - 9:16
    So if you take a random sample,
    and that's exactly
  • 9:16 - 9:20
    what we did when we found
    these 7 samples.
  • 9:20 - 9:23
    When we took these 7 samples and
    took their mean, that mean
  • 9:23 - 9:26
    can be viewed as a random
    sample from the sampling
  • 9:26 - 9:27
    distribution.
  • 9:27 - 9:31
    And so the probability, and so
    we can view it, we could say
  • 9:31 - 9:36
    that there's a 95% chance-- and
    we have to actually caveat
  • 9:36 - 9:39
    everything with a confident,
    because we're doing all of
  • 9:39 - 9:41
    these estimations here.
  • 9:41 - 9:44
    So it's not a true precise
    95% chance.
  • 9:44 - 9:48
    We're just confident that
    there's a 95% chance that our
  • 9:48 - 9:52
    random population, our random
    sampling mean right here, so
  • 9:52 - 9:56
    that 2.34, which we can kind of
    use-- we just picked that
  • 9:56 - 10:00
    2.34 from this distribution
    right here.
  • 10:00 - 10:12
    So there's a 95% chance that
    2.34 is within 0.96 of the
  • 10:12 - 10:16
    true sampling distribution mean,
    which we know is also
  • 10:16 - 10:18
    the same thing as the
    population mean.
  • 10:22 - 10:25
    Or we can just rearrange the
    sentence and say that there is
  • 10:25 - 10:33
    a 95% chance that the mean, the
    true mean, which is the
  • 10:33 - 10:40
    same thing as a sampling
    distribution mean, is within
  • 10:40 - 10:45
    0.96 of our sample
    mean, of 2.34.
  • 10:45 - 10:52
    So at the low end, so if you go
    2.36 minus-- if you go 2.34
  • 10:52 - 10:56
    minus 0.96-- that's the low
    end of our confidence
  • 10:56 - 10:58
    interval, 1.38.
  • 10:58 - 11:02
    And the high end of our
    confidence interval, 2.34 plus
  • 11:02 - 11:05
    0.96 is equal to 3.3.
  • 11:05 - 11:11
    So our 95% confidence interval
    is from 1.38 to 3.3.
Title:
Small Sample Size Confidence Intervals
Description:

more » « less
Video Language:
English
Team:
Khan Academy
Duration:
11:11

English subtitles

Revisions Compare revisions