-
-
We've seen in the last several
videos you start off with
-
any crazy distribution.
-
It doesn't have to be
crazy, it could be a nice
-
normal distribution.
-
But to really make the point
that you don't have to have
-
a normal distribution I
like to use crazy ones.
-
So let's say you have some kind
of crazy distribution that
-
looks something like that.
-
It could look like anything.
-
So we've seen multiple times
you take samples from
-
this crazy distribution.
-
So let's say you were to take
samples of n is equal to 10.
-
So we take 10 instances of this
random variable, average them
-
out, and then plot our average.
-
We plot our average.
-
We get 1 instance there.
-
We keep doing that.
-
We do that again.
-
We take 10 samples from this
random variable, average
-
them, plot them again.
-
You plot again and eventually
you do this a gazillion times--
-
in theory an infinite number of
times-- and you're going to
-
approach the sampling
distribution of the sample
-
mean. n equal 10 is not going
to be a perfect normal
-
distribution but it's
going to be close.
-
It'd be perfect only
if n was infinity.
-
But let's say we eventually--
all of our samples we get a lot
-
of averages that are there that
stacks up, that stacks up
-
there, and eventually will
approach something that
-
looks something like that.
-
And we've seen from the last
video that one-- if let's say
-
we were to do it again and this
time let's say that n is equal
-
to 20-- one, the distribution
that we get is going
-
to be more normal.
-
And maybe in future videos
we'll delve even deeper into
-
things like kurtosis and skew.
-
But it's going to
be more normal.
-
But even more important here or
I guess even more obviously
-
to us, we saw that in the
experiment it's going to have
-
a lower standard deviation.
-
So they're all going to
have the same mean.
-
Let's say the mean here is,
I don't know, let's say
-
the mean here is 5.
-
Then the mean here is
also going to be 5.
-
The mean of our sampling
distribution of the sample
-
mean is going to be 5.
-
It doesn't matter
what our n is.
-
If our n is 20 it's
still going to be 5.
-
But our standard deviation
is going to be less than
-
either of these scenarios.
-
And we saw that just
by experimenting.
-
It might look like this.
-
It's going to be more normal
but it's going to have a
-
tighter standard deviation.
-
So maybe it'll look like that.
-
And if we did it with an even
larger sample size-- let me do
-
that in a different color-- if
we did that with an even larger
-
sample size, n is equal to 100,
what we're going to get is
-
something that fits the normal
distribution even better.
-
We take a hundred instances
of this random variable,
-
average them, plot it.
-
A hundred instances of
this random variable,
-
average them, plot it.
-
And we just keep doing that.
-
If we keep doing that, what
we're going to have is
-
something that's even more
normal than either of these.
-
So it's going to be a much
closer fit to a true
-
normal distribution.
-
But even more obvious to
the human, it's going
-
to be even tighter.
-
So it's going to be a very
low standard deviation.
-
It's going to look
something like that.
-
And I'll show you on the
simulation app in the next or
-
probably later in this video.
-
So two things happen.
-
As you increase your sample
size for every time you
-
do the average, two
things are happening.
-
You're becoming more normal
and your standard deviation
-
is getting smaller.
-
So the question might
arise is there a formula?
-
So if I know the standard
deviation-- so this is my
-
standard deviation of just my
original probability density
-
function, this is the mean of
my original probability
-
density function.
-
So if I know the standard
deviation and I know n-- n is
-
going to change depending on
how many samples I'm taking
-
every time I do a sample mean--
if I know that my standard
-
deviation, or maybe if I
know my variance, right?
-
The variance to just the
standard deviation squared.
-
If you don't remember
that you might want to
-
review those videos.
-
But if I know the variance of
my original distribution and if
-
I know what my n is-- how many
samples I'm going to take every
-
time before I average them in
order to plot one thing in my
-
sampling distribution of my
sample mean-- is there a way to
-
predict what the mean of
these distributions are?
-
And so-- I'm sorry, the
standard deviation of
-
these distributions.
-
And so you don't get confused
between that and that,
-
let me say the variance.
-
If you know the variance
you can figure out the
-
standard deviation.
-
One is just the square
root of the other.
-
So this is the variance of
our original distribution.
-
Now to show that this is the
variance of our sampling
-
distribution of our sample mean
we'll write it right here.
-
This is the variance of our
mean of our sample mean.
-
Remember the sample--
our true mean is this.
-
The Greek letter Mu
is our true mean.
-
This is equal to the mean,
while an x a line over
-
it means sample mean.
-
-
So here what we're saying is
this is the variance of our
-
sample mean, that this is going
to be true distribution.
-
This isn't an estimate.
-
There's some-- you know, if we
magically knew distribution--
-
there's some true
variance here.
-
And of course the mean-- so
this has a mean-- this right
-
here, we can just get our
notation right, this is the
-
mean of the sampling
distribution of the
-
sampling mean.
-
So this is the mean
of our means.
-
It just happens to
be the same thing.
-
This is the mean of
our sample means.
-
It's going to be the same thing
as that, especially if we do
-
the trial over and over again.
-
But anyway, the point of this
video, is there any way to
-
figure out this variance given
the variance of the original
-
distribution and your n?
-
And it turns out there is.
-
And I'm not going to
do a proof here.
-
I really want to give you
the intuition of it.
-
I think you already do have the
sense that every trial you
-
take-- if you take a hundred,
you're much more likely when
-
you average those out, to get
close to the true mean than if
-
you took an n of
2 or an n of 5.
-
You're just very unlikely to be
far away, right, if you took
-
100 trials as opposed
to taking 5.
-
So I think you know that
in some way it should be
-
inversely proportional to n.
-
The larger your n the smaller
a standard deviation.
-
And actually it turns out it's
about as simple as possible.
-
It's one of those magical
things about mathematics.
-
And I'll prove it
to you one day.
-
I want to give you
working knowledge first.
-
In statistics, I'm always
struggling whether I should be
-
formal in giving you rigorous
proofs but I've kind of come to
-
the conclusion that it's more
important to get the working
-
knowledge first in statistics
and then later, once you've
-
gotten all of that down, we can
get into the real deep math
-
of it and prove it to you.
-
But I think experimental proofs
are kind of all you need for
-
right now, using those
simulations to show that
-
they're really true.
-
So it turns out that the
variance of your sampling
-
distribution of your sample
mean is equal to the
-
variance of your original
distribution-- that guy
-
right there-- divided by n.
-
That's all it is.
-
So if this up here has a
variance of-- let's say this up
-
here has a variance of 20-- I'm
just making that number up--
-
then let's say your n is 20.
-
Then the variance of your
sampling distribution of your
-
sample mean for an n of 20,
well you're just going to take
-
that, the variance up here--
your variance is 20--
-
divided by your n, 20.
-
So here your variance is
going to be 20 divided by
-
20 which is equal to 1.
-
This is the variance of
your original probability
-
distribution and
this is your n.
-
What's your standard
deviation going to be?
-
What's going to be the
square root of that, right?
-
Standard deviation is going
to be square root of 1.
-
Well that's also going to be 1.
-
So we could also write this.
-
We could take the square root
of both sides of this and say
-
the standard deviation of the
sampling distribution
-
standard-- the standard
deviation of the sampling
-
distribution of the sample mean
is often called the standard
-
deviation of the mean.
-
And it's also called-- I'm
going to write this down-- the
-
standard error of the mean.
-
-
All of these things that I just
mentioned, they all just mean
-
the standard deviation of the
sampling distribution
-
of the sample mean.
-
That's why this is confusing
because you use the word mean
-
and sample over and over again.
-
And if it confuses
you let me know.
-
I'll do another video or pause
and repeat or whatever.
-
But if we just take the square
root of both sides, the
-
standard error of the mean or
the standard deviation of the
-
sampling distribution of the
sample mean is equal to the
-
standard deviation of your
original function-- of your
-
original probability density
function-- which could be very
-
non-normal, divided by
the square root of n.
-
I just took the square root of
both sides of this equation.
-
I personally like to remember
this: that the variance is just
-
inversely proportional to n.
-
And then I like to
go back to this.
-
Because this is very
simple in my head.
-
You just take the
variance, divide it by n.
-
Oh and if I want the standard
deviation, I just take the
-
square roots of both sides
and I get this formula.
-
So here the standard
deviation-- when n is 20-- the
-
standard deviation of the
sampling distribution of the
-
sample mean is going to be 1.
-
Here when n is 100, our
variance here when
-
n is equal to 100.
-
So our variance of the sampling
mean of the sample distribution
-
or our variance of the mean--
of the sample mean, we
-
could say-- is going to be
equal to 20-- this guy's
-
variance-- divided by n.
-
So it equals-- n is
100-- so it equals 1/5.
-
Now this guy's standard
deviation or the standard
-
deviation of the sampling
distribution of the sample mean
-
or the standard error of the
mean is going to be the
-
square root of that.
-
So 1 over the square root of 5.
-
And so this guy's will be a
little bit under 1/2 the
-
standard deviation while
this guy had a standard
-
deviation of 1.
-
So you see, it's
definitely thinner.
-
Now I know what you're saying.
-
Well, Sal, you just gave
a formula, I don't
-
necessarily believe you.
-
Well let's see if we can
prove it to ourselves
-
using the simulation.
-
So just for fun let me make
a-- I'll just mess with this
-
distribution a little bit.
-
So that's my new distribution.
-
And let me take an n of-- let
me take two things that's easy
-
to take the square root of
because we're looking at
-
standard deviations.
-
So we take an n of
16 and an n of 25.
-
Let's do 10,000 trials.
-
So in this case every one of
the trials we're going to take
-
16 samples from here, average
them, plot it here, and
-
then do a frequency plot.
-
Here we're going to do 25 at a
time and then average them.
-
I'll do it once animated
just to remember.
-
So I'm taking 16
samples, plot it there.
-
I take 16 samples as described
by this probability density
-
function-- or 25 now,
plot it down here.
-
Now if I do that 10,000
times, what do I get?
-
All right, so here, just
visually you can tell just when
-
n was larger, the standard
deviation here is smaller.
-
This is more squeezed together.
-
But actually let's
write this stuff down.
-
Let's see if I can
remember it here.
-
So in this random distribution
I made my standard
-
deviation was 9.3.
-
I'm going to remember these.
-
Our standard deviation for
the original thing was 9.3.
-
And so standard deviation here
was 2.3 and the standard
-
deviation here is 1.87.
-
Let's see if it conforms
to our formula.
-
So I'm going to take this off
screen for a second and I'm
-
going to go back and
do some mathematics.
-
So I have this on my
other screen so I can
-
remember those numbers.
-
So in the trial we just did,
my wacky distribution had a
-
standard deviation of 9.3.
-
When n is equal to-- let me do
this in another color-- when n
-
was equal to 16, just doing the
experiment, doing a bunch of
-
trials and averaging and doing
all the things, we got the
-
standard deviation of the
sampling distribution of the
-
sample mean or the standard
error of the mean, we
-
experimentally determined
it to be 2.33.
-
And then when n is equal to 25
we got the standard error of
-
the mean being equal to 1.87.
-
Let's see if it conforms
to our formulas.
-
So we know that the variance or
we could almost say the
-
variance of the mean or the
standard error-- the variance
-
of the sampling distribution of
the sample mean is equal to the
-
variance of our original
distribution divided by n, take
-
the square roots of both sides,
and then you get the standard
-
error of the mean is equal to
the standard deviation of your
-
original distribution divided
by the square root of n.
-
So let's see if this works
out for these two things.
-
So if I were to take 9.3--
so let me do this case.
-
So 9.3 divided by the
square root of 16, right?
-
N is 16.
-
So divided by the square
root of 16, which is
-
4, what do I get?
-
So 9.3 divided by 4.
-
Let me get a little
calculator out here.
-
Let's see.
-
We have-- let me clear it
out-- we want to divide
-
9.3 divided by 4.
-
9.3 three divided by our
square root of n. n was 16.
-
So divided by 4 is
equal to 2.32.
-
So this is equal to 2.32 which
is pretty darn close to 2.33.
-
This was after 10,000 trials.
-
Maybe right after this I'll see
what happens if we did 20,000
-
or 30,000 trials where we take
samples of 16 and average them.
-
Now let's look at this.
-
Here we would take 9.3-- so let
me draw a little line here.
-
Let me scroll over,
that might be better.
-
So we take our standard
deviation of our
-
original distribution.
-
So just that formula that we've
derived right here would tell
-
us that our standard error
should be equal to the standard
-
deviation of our original
distribution, 9.3, divided by
-
the square root of n, divided
by the square root
-
of 25, right?
-
4 was just the
square root of 16.
-
So this is equal to
9.3 divided by 5.
-
And let's see if it's 1.87.
-
So let me get my
calculator back.
-
So if I take 9.3 divided
by 5, what do I get?
-
1.86 which is very
close to 1.87.
-
So we got in this case 1.86.
-
So as you can see what we got
experimentally was almost
-
exactly-- and this was after
10,000 trials-- of what
-
you would expect.
-
Let's do another 10,000.
-
So you've got another
10,000 trials.
-
Well we're still
in the ballpark.
-
We're not going to-- maybe I
can't hope to get the exact
-
number rounded or whatever.
-
But as you can see, hopefully
that'll be pretty satisfying to
-
you, that the variance of the
sampling distribution of the
-
sample mean is just going to be
equal to the variance of your
-
original distribution, no
matter how wacky that
-
distribution might be, divided
by your sample size-- by the
-
number of samples you take for
every basket that you average I
-
guess is the best way
to think about it.
-
You know, sometimes this can
get confusing because you are
-
taking samples of averages
based on samples.
-
So when someone says sample
size, you're like, is sample
-
size the number of times I
took averages or the number
-
of things I'm taking
averages of each time?
-
And you know, it doesn't
hurt to clarify that.
-
Normally when they talk
about sample size
-
they're talking about n.
-
And, at least in my head, when
I think of the trials as you
-
take a sample size of 16, you
average it, that's the one
-
trial, and then you plot it.
-
Then you do it again and
you do another trial.
-
And you do it over
and over again.
-
But anyway, hopefully this
makes everything clear and then
-
you now also understand how to
get to the standard
-
error of the mean.
-