Small Sample Size Confidence Intervals

Edit subtitles

0:01 - 0:03

7 patients blood pressures
have been measured after
0:03 - 0:06

having been given a new
drug for 3 months.
0:06 - 0:08

They had blood pressure
increases of, and they give us
0:08 - 0:11

seven data points right here--
who knows, that's in some
0:11 - 0:12

blood pressure units.
0:12 - 0:17

Construct a 95% confidence
interval for the true expected
0:17 - 0:22

blood pressure increase for all
patients in a population.
0:22 - 0:25

So there's some population
distribution here.
0:25 - 0:27

It's a reasonable assumption
to think that it is normal.
0:27 - 0:29

It's a biological process.
0:29 - 0:33

So if you gave this drug to
every person who has ever
0:33 - 0:39

lived, that will result in some
mean increase in blood
0:39 - 0:41

pressure, or who knows, maybe
it actually will decrease.
0:41 - 0:43

And there's also going to be
some standard deviation here.
0:46 - 0:47

It is a normal distribution.
0:47 - 0:50

And the reason why it's
reasonable to assume that it's
0:50 - 0:52

a normal distribution
is because it's
0:52 - 0:53

a biological process.
0:53 - 0:55

It's going to be the sum of many
thousands and millions of
0:55 - 0:56

random events.
0:56 - 0:59

And things that are sums of
millions and thousands of
0:59 - 1:02

random events tend to be
normal distribution.
1:02 - 1:03

So this is a population
distribution.
1:08 - 1:11

And we don't know anything
really about it outside of the
1:11 - 1:13

sample that we have here.
1:13 - 1:17

Now, what we can do is, and this
tends to be a good thing
1:17 - 1:19

to do, when you do have a
sample just figure out
1:19 - 1:21

everything that you can
figure out about that
1:21 - 1:22

sample from the get-go.
1:22 - 1:24

So we have our seven
data points.
1:24 - 1:27

And you could add them up and
divide by 7 and get your
1:27 - 1:28

sample mean.
1:28 - 1:34

So our sample mean
here is 2.34.
1:34 - 1:35

And then you can also
calculate your
1:35 - 1:37

sample standard deviation.
1:37 - 1:39

Find the square distance from
each of these points to your
1:39 - 1:43

sample mean, add them up, divide
by n minus 1, because
1:43 - 1:46

it's a sample, then take the
square root, and you get your
1:46 - 1:47

sample standard deviation.
1:47 - 1:50

I did this ahead of time
just to save time.
1:50 - 1:53

Sample standard deviation
is 1.04.
1:53 - 1:55

And when you don't know anything
about the population
1:55 - 1:57

distribution, the thing that
we've been doing from the
1:57 - 2:03

get-go is estimating that
character with our sample
2:03 - 2:05

standard deviation.
2:05 - 2:08

So we've been estimating the
true standard deviation of the
2:08 - 2:12

population with our sample
standard deviation.
2:16 - 2:19

Now in this problem, this exact
problem, we're going to
2:19 - 2:20

run into a problem.
2:20 - 2:25

We're estimating our standard
deviation with an n of only 7.
2:25 - 2:31

So this is probably going to
be a not so good estimate
2:31 - 2:41

because-- let me just write--
because n is small.
2:41 - 2:44

In general, this is considered
a bad estimate if n
2:44 - 2:46

is less than 30.
2:46 - 2:48

Above 30 you're dealing
in the realm
2:48 - 2:50

of pretty good estimates.
2:50 - 2:53

So the whole focus of this video
is when we think about
2:53 - 2:55

the sampling distribution, which
is what we're going to
2:55 - 2:59

use to generate our interval,
instead of assuming that the
2:59 - 3:02

sampling distribution is normal
like we did in many
3:02 - 3:05

other videos using the central
limit theorem and all of that,
3:05 - 3:08

we're going to tweak the
sampling distribution.
3:08 - 3:11

We're not going to assume it's
a normal distribution because
3:11 - 3:12

this is a bad estimate.
3:12 - 3:14

We're going to assume that
it's something called a
3:14 - 3:16

t-distribution.
3:16 - 3:18

And a t-distribution is
essentially, the best way to
3:18 - 3:23

think about is it's almost
engineered so it gives a
3:23 - 3:25

better estimate of your
confidence intervals and all
3:25 - 3:29

of that when you do have
a small sample size.
3:29 - 3:31

It looks very similar to
a normal distribution.
3:35 - 3:39

It has some mean, so this is
your mean of your sampling
3:39 - 3:40

distribution still.
3:40 - 3:41

But it also has fatter tails.
3:46 - 3:50

And the way I think about why
it has fatter tails is when
3:50 - 3:53

you make an assumption that this
is a standard deviation
3:53 - 3:56

for-- let me take
one more step.
3:56 - 3:59

So normally what we do is we
find the estimate of the true
3:59 - 4:02

standard deviation, and then
we say that the standard
4:02 - 4:08

deviation of the sampling
distribution is equal to the
4:08 - 4:11

true standard deviation of our
population divided by the
4:11 - 4:13

square root of n.
4:13 - 4:16

In this case, n is equal to 7.
4:16 - 4:18

And then we say OK, we never
know the true standard, or we
4:18 - 4:22

seldom know-- sometimes you do
know-- we seldom know the true
4:22 - 4:22

standard deviation.
4:22 - 4:25

So if we don't know that the
best thing we can put in there
4:25 - 4:27

is our sample standard
deviation.
4:32 - 4:36

And this right here, this is the
whole reason why we don't
4:36 - 4:39

say that this is just a 95
probability interval.
4:39 - 4:41

This is the whole reason why
we call it a confidence
4:41 - 4:43

interval because we're making
some assumptions.
4:43 - 4:47

This thing is going to change
from sample to sample.
4:47 - 4:50

And in particular, this is going
to be a particularly bad
4:50 - 4:53

estimate when we have a
small sample size, a
4:53 - 4:55

size less than 30.
4:55 - 4:59

So when you are estimating the
standard deviation where you
4:59 - 5:01

don't know it, you're estimating
it with your sample
5:01 - 5:04

standard deviation, and your
sample size is small, and
5:04 - 5:07

you're going to use this to
estimate the standard
5:07 - 5:11

deviation of your sampling
distribution, you don't assume
5:11 - 5:14

your sampling distribution
is a normal distribution.
5:14 - 5:17

You assume it has
fatter tails.
5:17 - 5:20

And it has fatter tails because
you're essentially
5:20 - 5:22

underestimating-- you're
underestimating the standard
5:22 - 5:24

deviation over here.
5:24 - 5:26

Anyway, with all of that said,
let's just actually go through
5:26 - 5:27

this problem.
5:27 - 5:31

So we need to think about a 95%
confidence interval around
5:31 - 5:33

this mean right over here.
5:33 - 5:37

So a 95% confidence interval,
if this was a normal
5:37 - 5:39

distribution you would just
look it up in a Z-table.
5:39 - 5:40

But it's not, this is
a t-distribution.
5:45 - 5:48

We're looking for a 95%
confidence interval.
5:48 - 5:51

So some interval around
the mean that
5:51 - 5:54

encapsulates 95% of the area.
5:54 - 5:58

For a t-distribution you use
t-table, and I have a t-table
5:58 - 5:59

ahead of time right over here.
5:59 - 6:03

And what you want to do is use
the two-sided row for what
6:03 - 6:04

we're doing right over here.
6:04 - 6:06

And the best way to think
about it is that we're
6:06 - 6:10

symmetric around the mean.
6:10 - 6:11

And that's why they
call it two-sided.
6:11 - 6:13

It would be one-sided if it
was kind of a cumulative
6:13 - 6:16

percentage up to some
critical threshold.
6:16 - 6:19

But in this case, it's
two-sided, we're symmetric.
6:19 - 6:20

Or another way to think
about it is we're
6:20 - 6:22

excluding the two sides.
6:22 - 6:25

So we want the 95%
in the middle.
6:25 - 6:33

And this is a sampling
distribution of the sample
6:33 - 6:37

mean for n is equal to 7.
6:37 - 6:39

And I won't go into the details
here, but when n is
6:39 - 6:45

equal to 7 you have 6 degrees
of freedom, or n minus 1.
6:45 - 6:49

And the way that t-tables are
set up, you go and find the
6:49 - 6:50

degrees of freedom.
6:50 - 6:53

So you don't go to the n,
you go to the n minus 1.
6:53 - 6:55

So you go to the 6 right here.
6:55 - 6:59

So if you want to encapsulate
95% of this right over here,
6:59 - 7:04

and you have an n of 6, you
have to go 2.447 standard
7:04 - 7:06

deviations in each direction.
7:06 - 7:11

And this t-table assumes that
you are approximating that
7:11 - 7:14

standard deviation using your
sample standard deviation.
7:14 - 7:18

So another way to think of it
you have to go 2.447 of these
7:18 - 7:21

approximated standard
deviations.
7:21 - 7:22

Let me it right here.
7:22 - 7:28

So you have to go 2.447-- this
distance right here is 2.447
7:28 - 7:34

times this approximated
standard deviation.
7:38 - 7:40

And sometimes you'll see this
in some statistics book.
7:40 - 7:42

This thing right here,
this exact number,
7:42 - 7:44

is shown like this.
7:44 - 7:47

They put a little hat on top of
the standard deviation to
7:47 - 7:50

show that it has been
approximated using the sample
7:50 - 7:51

standard deviation.
7:51 - 7:53

So we'll put a little hat over
here, because frankly, this is
7:53 - 7:56

the only thing that
we can calculate.
7:56 - 7:59

So this is how far you have
to go in each direction.
7:59 - 8:00

And we know what
this value is.
8:00 - 8:02

We know what the sample
distribution is.
8:02 - 8:03

So let's get our
calculator out.
8:11 - 8:17

So we know our sample standard
deviation is 1.04.
8:17 - 8:19

And we want to divide that
by the square root of 7.
8:24 - 8:29

So we get 0.39.
8:29 - 8:36

So this right here is 0.39.
8:36 - 8:40

And so if we want to find
the distance around this
8:40 - 8:43

population mean that
encapsulates 95% of the
8:43 - 8:46

population or of the sampling
distribution, we have to
8:46 - 8:51

multiply 0.39 times 2.447,
so let's do that.
8:51 - 9:01

So times 2.447 is
equal to 0.96.
9:01 - 9:10

So this is equal to-- so this
distance right here is 0.96,
9:10 - 9:14

and then this distance
right here is 0.96.
9:14 - 9:16

So if you take a random sample,
and that's exactly
9:16 - 9:20

what we did when we found
these 7 samples.
9:20 - 9:23

When we took these 7 samples and
took their mean, that mean
9:23 - 9:26

can be viewed as a random
sample from the sampling
9:26 - 9:27

distribution.
9:27 - 9:31

And so the probability, and so
we can view it, we could say
9:31 - 9:36

that there's a 95% chance-- and
we have to actually caveat
9:36 - 9:39

everything with a confident,
because we're doing all of
9:39 - 9:41

these estimations here.
9:41 - 9:44

So it's not a true precise
95% chance.
9:44 - 9:48

We're just confident that
there's a 95% chance that our
9:48 - 9:52

random population, our random
sampling mean right here, so
9:52 - 9:56

that 2.34, which we can kind of
use-- we just picked that
9:56 - 10:00

2.34 from this distribution
right here.
10:00 - 10:12

So there's a 95% chance that
2.34 is within 0.96 of the
10:12 - 10:16

true sampling distribution mean,
which we know is also
10:16 - 10:18

the same thing as the
population mean.
10:22 - 10:25

Or we can just rearrange the
sentence and say that there is
10:25 - 10:33

a 95% chance that the mean, the
true mean, which is the
10:33 - 10:40

same thing as a sampling
distribution mean, is within
10:40 - 10:45

0.96 of our sample
mean, of 2.34.
10:45 - 10:52

So at the low end, so if you go
2.36 minus-- if you go 2.34
10:52 - 10:56

minus 0.96-- that's the low
end of our confidence
10:56 - 10:58

interval, 1.38.
10:58 - 11:02

And the high end of our
confidence interval, 2.34 plus
11:02 - 11:05

0.96 is equal to 3.3.
11:05 - 11:11

So our 95% confidence interval
is from 1.38 to 3.3.

Title:: Small Sample Size Confidence Intervals
Description:: more » « less
Video Language:: English
Team:: Khan Academy
Duration:: 11:11

	Fran Ontanaya edited English subtitles for Small Sample Size Confidence Intervals
	Amara Bot edited English subtitles for Small Sample Size Confidence Intervals

English subtitles

Revisions Compare revisions

Revision 2 API

Fran Ontanaya
Revision 1 Imported

Amara Bot

	Revision Number	Author	Created
	2	Fran Ontanaya
	1	Amara Bot

Small Sample Size Confidence Intervals

Revisions Compare revisions

Our website uses cookies

Operating cookies (Required)