-
Hello, in this video, I want to talk about
the standard error and this is really
-
extending our understanding of sampling
distributions and essential limit theorem.
-
So, let's talk about what
a standard error is.
-
First of all, we'll go back to
this penguin example and
-
you've seen this distribution before
as a uniform distribution of data.
-
It has, like any distribution, it has--
there's descriptive statistics.
-
So, it has a population mean.
-
The average is 5.04.
-
The average penguin is 5.04 meters
from the edge of the ice sheet.
-
You can calculate a standard
deviation for this.
-
So, the deviation is 2.88.
-
So, that's the, you know,
a measure of the spread.
-
And there was 5,000 penguins floating
on this ice sheet, that's the n,
-
the population size.
-
We then discussed about how if you were
just to sample either just randomly select
-
five penguins at a time or 50 penguins
at a time, that each of those samples
-
of, let's pick the n equals five for now,
each of those five penguins,
-
you could calculate how, what the average
distance from the front of the edge sheet
-
was for each of those individual penguins,
sample of five penguins and if you were
-
to do that over and over and over again
and in this histogram, we did it
-
1,000 times, we would be able to generate
what's called the sampling distribution.
-
And it's the sampling distribution of
the sample means, that's what it is
-
and I told you that we could calculate
from that what the average of
-
those sample means across
the 1,000 samples was and
-
that's this value and the notation that
we use for that is this mu and then
-
subscript x bar and that's the mean
of the sample means and I've forgotten
-
what it was, the exact value, but it's
pretty much going to approximate
-
very, very close.
-
So, I just put approximately equal to
5.05, just go back, it's 5.04.
-
So, it was-- it's going to approximate
the population average and you can
-
do that for any sample size.
-
So, that was sample size five,
let's look at the sample size 50.
-
Again, we have the mean of the sampling
distribution-- sorry, the mean of
-
the sample means and that is also going
to be very close to 5.04, it might be
-
a little bit closer because
our sample size is larger.
-
Two other things to notice about
these distributions, number one
-
they're normally distributed or approx--
sorry, the approximate to normal
-
distributions despite the fact for
the original distribution of penguins.
-
The population distribution was
a uniform distribution.
-
Second thing to notice, the sample size
doesn't really effect where the value
-
of the mean, of the sample means, it does
effect the standard deviation of
-
the sample means.
-
So, if this is a normal distribution,
or we believe it to approximate,
-
and then also this approximates
a normal distribution, then, it's clear
-
that the distance here, let's just assume
that's a standard deviation and I put it
-
in the right place.
-
This standard deviation, it's greater than
whatever the corresponding value is
-
over here, if that's also
the standard deviation.
-
So, as the sample size gets larger,
the spead of the sample means
-
gets smaller, so, we can say
the standard deviation gets smaller.
-
Now, does this standard deviation have any
relationship at all to the original
-
standard deviation of
the original population.
-
The original standard deviation was 2.88,
so, I'll just say population of
-
the original-- standard deviation was 2.88
-
Is there any relationship at all between
these two standard deviations?
-
Because it's not like the mean of
the sample means, which is pretty
-
much the same, regardless of the sample
size, I mean it does get better with
-
larger samples but it approximates,
it's close, especially if you have
-
enough of these samples.
-
What's the relationship of these standard
deviations because it's clear that when
-
you change n, this value is going to
change, so is there a relationship?
-
And it turns out that there is
a relationship and we're going to
-
look into that.
-
This graph here just shows you that
the normal distribution for becomes
-
better and better the larger
the sample size, so, it's a little
-
tricky to see but let me, I just want to
really point out one or two things here.
-
I'm going to pick a color
that represents that.
-
So, this value here, actually in red, so,
if I was just to pick one penguin at
-
a time, a sample size of one, this is
my estimate of the sample-- I'm going
-
for the red line here.
-
That's my estimate of the sample--
sorry, let's say that again.
-
That's the distribution of
the sample means.
-
It looks like the original population.
-
So, for a sample size of one, you don't
get a normal distribution of the sample
-
means, you get whatever
the original population was.
-
Let's look at two and I've got to find it
on here, so, it's the orange one and
-
I believe it's this one here.
-
It is this one here.
-
This is what it looks like.
-
This is the n is two.
-
So, again, not a really
normal distribution.
-
Now, let's skip to 50.
-
This is 50 here and you can see it
really, you don't need me to help
-
you too much.
-
This is the 50 value, it's very normal.
-
And then, we got blue at ten--
sorry, 25 here.
-
This is the 25 one and so on.
-
This is the ten.
-
This is the five.
-
I wanted to just show you this graph
because I wanted to show you that
-
even with very, very, very small
sample sizes of like five, we already
-
get very close to a normal distribution.
-
It's only with sample sizes of ridiculous
sample sizes of like one or two that
-
we don't do a very good job,
-
So, even with small sample sizes,
we get to the normal distribution
-
of the normal distribution of
the sample means.
-
So, back to the problem
I just posted a moment ago.
-
This is our original standard deviation
of a population, this is our population
-
and whenever we get a sample,
and again, this is just the sample
-
size of five.
-
This is the distribution of sample means.
-
The mean is going to approximate the mean
here but what is the relationship of
-
the standard deviation to
this original population.
-
What is the relationship?
-
It must be also related to the sample size
because it changes with its sample size.
-
And it's just a formula and we're not
going to talk too much about--
-
we're not going to talk much really at all
about how it's derived but this formula
-
here, very neatly, just tells us
about their relationship and
-
so, what we have here is this is
our standard deviation of
-
the sampling distribution of
the sample means.
-
So, we call that sigma subscript x bar,
-
sigma x bar.
-
The standard deviation, so just to really
reiterate what we're looking at, this is
-
the distribution of sample means,
this is-- we're looking for this value
-
what's this standard deviation?
-
And actually, technically, that's
the notation, what is that standard
-
deviation?
-
So, what we do is, we just take
the original population.
-
This is the population standard deviation
from the original population and we're
-
going to divide it by the square root of n
and that gives us that this value,
-
this standard deviation.
-
Its technical name is the standard
deviation of the sampling distribution
-
of the sample means, which is an awful
mouthful but we just call
-
it the standard error of
-
the mean, which is what we call it
the standard error of the mean.
-
So, this graph illustrates how
the standard error of the mean
-
changes by sample size.
-
So, if I just go back to-- maybe,
I'll just go back to this slide here
-
and we were asking the question of,
you know, what's this value over
-
sample size 50 compared to this
value of a sample size of five?
-
So, that was the question and I'm going to
plot-- maybe here I'll plot it or write it
-
sorry.
-
So, this is the formula, the standard
error of the mean or the standard
-
deviation of the sampling distribution
of the sample means is equal to
-
the original population standard deviation
divided by the square root of n.
-
So, when we had that sample size of five,
which is this one up here, what we're
-
really looking at is this, the original
standard deviation was 2.88 and
-
we're going to divide by the square root
of the sample size which is five, so that
-
equals 1.3.
-
So, the standard deviation here is 1.3 and
that standard error we call that is 1.3.
-
So, what this is saying is this value here
is 1.3 higher that was it, I forget.
-
I think it was 5.04 was the mean of
the sample means and so this value here
-
is going to be a 6.5-- nope, nope, not five.
-
It's going to be at 6.34.
-
This is one standard deviation above
the sample mean but if we have
-
a sample size of fifty, then
the calculation becomes this.
-
Becomes the original standard deviation
of the population divided by the square
-
root of 50, which is equal to and I've
-
written this down so I can check, 0.4.
-
So, back to this graph,
this value is 0.4,
-
and this value is 1.3.
-
And so, it gets smaller the bigger the
sample size.
-
This graph here that I got to previously
is actually showing us
-
how the standard error changes by
the sample size.
-
So we just had a sample size of 50,
which is approximately here.
-
If we go across to this value on this
axis, it tells us that's about 0.4,
-
sample size of 50, and if we had
a sample size of 5,
-
which is approximately here --
I'm doing a line, not very well,
-
but it goes to about there.
This was about 1.3.
-
And I just want you to -- there's nothing
really too much for you to take home
-
from this graph other than showing you
that as the sample size increases,
-
that the -- any population
standard deviation that we have,
-
the standard error is going to get
much smaller very rapidly.
-
A sample size of 5 is still quite high up
on this curve,
-
but once you come down to sample sizes
of 20 or 30 or more,
-
then we get a very, very small
standard error.
-
This is just to reiterate that point so
you can see what these are on this graph.
-
So let's put together what we've
just learned about the standard error
-
with what we have learned previously about
the Central Limit Theorem.
-
So what we have just been discussing is
that we just know that we have
-
an original population,
it could be any distribution,
-
here's our uniform distribution.
-
If we take many samples from it,
we get our sampling distribution.
-
In this case, of the sample means,
is normally distributed
-
or approximately normally distributed.
-
And we know that the sampling distribution
has a mean that is approximately equal to
-
the population mean and we've just learned
that we just know now that
-
the standard deviation of this
approximately normal distribution,
-
this is the standard error.
-
I'll write here, "standard error."
-
So we can actually write this in
notation form,
-
and we say that this sampling distribution
is approximately normal,
-
this is what this tilde squiggle means,
is approximately normal,
-
approximately normal and it has a mean
of the population mean,
-
so I'll just write here,
the mean is the population mean.
-
And the standard deviation of that
distribution,
-
and we're talking about this distribution
down here,
-
the standard deviation of that
distribution is the standard error,
-
that's what we call it.
-
And it's approximately equal to the
standard deviation of the
-
original population divided by the
square root of the sample size n.
-
So, this is a key thing that we know.
If we have at a population of any --
-
I'll just write "uniform" in here,
of any type, it could bimodal,
-
it could be uniform, it could be skewed,
we know that if we were to take
-
thousands and thousands of samples
or just one thousand -- or just a few,
-
hundred samples, the sample means that
we get from all those samples
-
are going to approximate
a normal distribution
-
if our sample size is larger,
it's going to approximate
-
a normal distribution even more.
And we can already determine what the
-
shape of that distribution is going to be
because we know that the population mean
-
is approximately equal to the mean
of the sample means,
-
and we know that the standard deviation,
this is the standard error,
-
we know that that, the standard error,
is the standard deviation of the
-
sampling distribution.
-
Okay, so we can work that out.
-
But the thing is, what you're probably
already thinking is,
-
"why do you care?" And you may not care,
and that's fine.
-
There's no reason to particularly.
-
But, it can be very, very helpful.
I'm just going to just float this idea
-
and we'll return to it in future videos.
-
Hopefully it's gone through your head
that why is this strange person
-
taking thousands of samples all the time?
-
You know, you're not going to go to this
penguin ice sheet and just keep
-
randomly picking 5 penguins at random
1,000 times.
-
Science and other types of time --
when we collect data,
-
it doesn't work like that.
We pretty much usually only just collect
-
one sample of data.
-
And so, when we collect one sample of data
and this here -- I've got
-
sampling distribution of n = 5 penguins.
-
This is when we did do it 1,000 times.
-
But let's just say that we did it one time
and we got a value around about here,
-
around about 7 meters,
that was our sample.
-
We just got one sample.
-
If we just got one sample,
we don't know anything really about that
-
in terms of how certain or how uncertain
are we that this truly is the sample mean.
-
We knew if we did this many, many times
the average of al the sample means
-
would converge on the true
population mean.
-
And that's our ultimate goal,
we're trying to est--
-
normally we don't know the population mean
we're trying to estimate it.
-
So in our one sample, we just got this
value of 7, say.
-
How confident are we that that is
the population mean?
-
And so, what we're able to do by having
this belief that we're able to know
-
that this value of 7 does come from
in theory,
-
a sampling distribution that exists.
-
And in theory, this sampling distribution
exists with a standard deviation
-
that we call the standard error.
We're able to understand how far
-
this value of 7, or any value that
we collected, it could be some other value
-
but our one sample was 7 meters,
we get a sense of how far away
-
from the mean that is in the units
of standard deviations
-
or technically,
with a sampling distribution,
-
standard errors.
-
So we're going to come back to this topic,
but really the value of the standard error
-
is that enables us to determine
when we collect one sample,
-
we're able to work out how far away
or how confident we are in our value,
-
is how far away is it from the
population mean,
-
how confident we are that this is a true
representation of the population mean.
-
We're going to come back to this
in future videos.