-
Hello. In this video, I would
like to introduce you
-
to the concept
of sampling distributions.
-
Now, sampling distributions are going
to be fundamental to understanding
-
how we are able to derive at
certain knowledge about distributions.
-
So what -- Let's, first of all,
let's just recap
-
what a sample is
and what a population is.
-
So we've seen this diagram before.
-
A population is essentially all
the individuals we could possibly
-
ever get information on,
get data. And populations --
-
let's just pick a very,
very boring example of height.
-
Maybe the population is
all UT students, all UT students.
-
Now, every single student
enrolled at UT,
-
if we were able to get all of their
heights, there would be some true
-
measure of the average height.
The average height would be
-
a population mean, and there would
be some value that that is.
-
We'd also be able to get some
other truth, we call these truths.
-
A standard deviation, if we were
able to measure every
-
however many thousand
UT students (inaudible).
-
If we were able to measure those things,
we'd be able to get those values,
-
and they're the population values,
and you may remember
-
we call them parameters.
-
Now, that's pretty much impossible.
We would never actually, in theory,
-
ever be able to measure
the height of all UT students.
-
So what we do, if we were interested,
if this was something we were
-
interested in for some strange reason,
we'd be able to just collect a sample.
-
I don't know. Maybe we'd get
a sample of five people,
-
or maybe we would get
a sample of 50 people,
-
or maybe we would
get a sample of 500.
-
In future videos,
we'll talk about
-
what's an appropriate size
of sample to collect
-
when we're interested
in finding out something.
-
But for now, who knows?
Maybe we picked ten people,
-
but we pick a sample
from those individuals.
-
Maybe we measure, we measure
all their heights and we calculate
-
their average height and we call
that the sample mean, "x-bar."
-
We're also able to calculate their,
maybe their sample standard deviation.
-
We'd call that "s,"
something like that.
-
These things we call statistics.
This is just recap stuff.
-
And the purpose of
calculating the statistics
-
is because we want
to estimate these parameters.
-
We want to make an estimation
of what the true value is.
-
Now the issue is, let's say we're
just collecting samples of five,
-
and then maybe another day,
we got a different sample of five,
-
and then on another day,
we're addicted to measuring
-
the heights of people,
we get another sample of five.
-
Every time we get
another sample of five,
-
let's, for now, just consider
the sample mean.
-
We're not that interested in
the standard deviation of these.
-
We're just gonna get a different
average every time we collect a
-
sample of five. It'd be
very, very unusual
-
if we randomly sampled five individuals
from all UT students,
-
and it was the same five people.
It's almost unlikely
-
ever to be the same five people,
and so the average height
-
of the sample mean is going
to be different every single time.
-
And I don't know, let's say
another day we did another five,
-
we got another sample mean.
-
So we're trying
to estimate these things,
-
but our values are going
to generally be different,
-
and then they're never going
to be exactly -- well they could
-
actually be exactly the population
mean, but it's unlikely.
-
So what we've just done, by the way,
is a sampling distribution.
-
I kind of, by accident, have
introduced you to it.
-
A sampling distribution is these things:
-
It's when you collect lots,
and lots, and lots of samples.
-
You collect lots and lots
of samples, and then you
-
just look at all
the values that you got.
-
And then you tend to plot
them as a histogram.
-
So in the next couple of slides,
I'll just more formally go through this,
-
because it gets very slightly
more exciting than this.
-
Well this distribution, actually,
-
looks quite nice
when you collect the data.
-
Okay, so in this case, I don't have --
my population is not all UT students.
-
My population is four people.
There's just four people.
-
Maybe my population is four people
that are roommates together,
-
and I ask them, "How many people
do you phone in one week?"
-
And one person said they phoned
four people, one person said seven,
-
one person said five,
one person said eight.
-
So, if you look at
our population mean, it's six.
-
That's just 4+5+7+8 divided by
the number of individuals,
-
which is four. Our mean is six.
-
So, I was really interested
in knowing how many,
-
this population of this one house,
-
I wanted to know what's
the average number of people
-
that they phone in a week.
-
But I thought collecting data on
four people was too much work,
-
so I decided, I'm just going
to collect samples of three people.
-
And so, I got a five,
a seven, and an eight.
-
They were the first three people
that I sampled.
-
And so my sample mean
here is 6.67,
-
which is 5+7+8 divided by
my sample size of three.
-
So I got one sample mean,
but this is not a sampling distribution
-
because it's just one sample.
-
For a sampling distribution,
I need to do this over and over
-
and over again.
-
So I can do something like this,
where I do, let's say,
-
one sample, two samples, three samples,
four samples, five samples, six samples,
-
and here are my values:
5,7,8; 4,5,8; 4,7,8; 4, 8, 8;
-
and for each of these,
I calculate the sample mean.
-
This is a sampling distribution.
I've now got many samples,
-
and I've collected
the sample mean for each of them.
-
One thing -- there's two things
you may have noticed about this:
-
one thing I want you to notice is that
I've actually sampled with replacement.
-
So in the UT example, I said
it would be really unlikely
-
to ever get
the same five people.
-
More than that, if you randomly
selected five people,
-
it's unlikely to get the same
individual twice in one sample.
-
But for sampling distributions,
we tend to actually say
-
that we will sample
with replacement.
-
That's just a -- it's not something
we actually need to worry
-
too much about right now.
I just want you to be aware of --
-
in this particular example,
I sampled with replacement,
-
which means you could sample
the same individual twice,
-
or even three times, because
you'll notice the 64th sample
-
had the same individual
three times and I got 8,8,8,
-
and the mean of
that sample mean is 8.
-
The second thing you may
have noticed is that
-
the sample number
only goes up to 64 here.
-
I've done 64 samples,
and that's because
-
when you collect three numbers
from four potential numbers,
-
there's only 64 combinations of the data,
so I just stopped at 64.
-
Okay, but what I want you to realize
is that I have collected 64 samples.
-
These dots just refer to --
I didn't put all the data
-
between 6 and 64.
I have 64 sample means.
-
So I have 64 of these things,
so what should I do with them?
-
Well, I could plot them on a histogram.
-
So, here's my histogram
of the 64 sample means,
-
and I wonder what you think about it.
-
Well, one thing is I've used a nice color,
I think, for the bars, but another thing
-
is maybe you see that there's
potentially a shape in this data.
-
So if I, if you allow me to be
kind of, draw a curve over it,
-
I haven't done a great job,
but there's a curve,
-
and it looks normal-ish.
-
I'm using the word, "ish," because
it's obviously not normal.
-
It's kind of getting there to be
normal distributed.
-
So this is just a histogram of all
the possible sample means.
-
And here is the one where we had 8,8,8,
and actually there's one down here
-
where we got 4,4,4, but there's all
the ones in between as well.
-
Now what we --
And I'll just put the curve back on it.
-
Maybe I'll actually this time
do a better job.
-
I'm not sur --
No, I didn't.
-
I'm gonna undo that because that
was a terrible job.
-
Let's see if I can --
-
I think going slow...
no, it will do.
-
Mmm... no it won't.
-
Let's do it again.
-
This time, third time's lucky.
-
Okay I'm happy with that.
-
This is the sampling distribution.
-
I just told you that,
but I want you to formally know
-
what it means.
-
It's the sampling distribution,
dot dot dot, for the sample mean,
-
that's what we collected.
-
We collected the sample mean,
so it's a sampling distribution of the
-
sample mean, and then we say
for n=3.
-
Because our sample
could have been 2, n=2m
-
in which case we would have
had a different distribution.
-
It could've been 1. We could've
just been really lazy and said,
-
"I only want to just check
one person and ask them,"
-
and calculated
these sample mean for 1.
-
But is the sample --
we did three people,
-
and so this is technically
the sampling distribution
-
for the sample mean for n=3.
-
Okay, so that's probably enough for now
on introducing sampling distribution.
-
I hope you understand
about what it is.
-
It is the --
you collect many samples,
-
and you collect some information
about each of those samples --
-
in this case, it was the sample mean --
and then you plot them as a histogram,
-
and you are able to look at
the shape of the distribution.
-
And that's what we call
a sampling distribution.
-
And we're going to extend
this idea in future videos.