Hello. In this video, I would
like to introduce you

to the concept
of sampling distributions.

Now, sampling distributions are going
to be fundamental to understanding

how we are able to derive at 
certain knowledge about distributions.

So what -- Let's, first of all, 
let's just recap

what a sample is 
and what a population is.

So we've seen this diagram before.

A population is essentially all 
the individuals we could possibly

ever get information on, 
get data. And populations --

let's just pick a very, 
very boring example of height.

Maybe the population is 
all UT students, all UT students.

Now, every single student 
enrolled at UT,

if we were able to get all of their 
heights, there would be some true

measure of the average height.
The average height would be

a population mean, and there would 
be some value that that is.

We'd also be able to get some
other truth, we call these truths.

A standard deviation, if we were
able to measure every

however many thousand
UT students (inaudible).

If we were able to measure those things,
we'd be able to get those values,

and they're the population values,
and you may remember

we call them parameters.

Now, that's pretty much impossible.
We would never actually, in theory,

ever be able to measure 
the height of all UT students.

So what we do, if we were interested,
if this was something we were

interested in for some strange reason,
we'd be able to just collect a sample.

I don't know. Maybe we'd get 
a sample of five people,

or maybe we would get 
a sample of 50 people,

or maybe we would
get a sample of 500.

In future videos, 
we'll talk about

what's an appropriate size 
of sample to collect

when we're interested 
in finding out something.

But for now, who knows?
Maybe we picked ten people,

but we pick a sample 
from those individuals.

Maybe we measure, we measure
all their heights and we calculate

their average height and we call
that the sample mean, "x-bar."

We're also able to calculate their,
maybe their sample standard deviation.

We'd call that "s," 
something like that.

These things we call statistics.
This is just recap stuff.

And the purpose of 
calculating the statistics

is because we want
to estimate these parameters.

We want to make an estimation
of what the true value is.

Now the issue is, let's say we're
just collecting samples of five,

and then maybe another day,
we got a different sample of five,

and then on another day,
we're addicted to measuring

the heights of people, 
we get another sample of five.

Every time we get 
another sample of five,

let's, for now, just consider 
the sample mean.

We're not that interested in
the standard deviation of these.

We're just gonna get a different 
average every time we collect a

sample of five. It'd be 
very, very unusual

if we randomly sampled five individuals
from all UT students,

and it was the same five people.
It's almost unlikely

ever to be the same five people,
and so the average height

of the sample mean is going 
to be different every single time.

And I don't know, let's say 
another day we did another five,

we got another sample mean.

So we're trying 
to estimate these things,

but our values are going 
to generally be different,

and then they're never going 
to be exactly -- well they could

actually be exactly the population 
mean, but it's unlikely.

So what we've just done, by the way,
is a sampling distribution.

I kind of, by accident, have 
introduced you to it.

A sampling distribution is these things:

It's when you collect lots, 
and lots, and lots of samples.

You collect lots and lots
of samples, and then you

just look at all 
the values that you got.

And then you tend to plot 
them as a histogram.

So in the next couple of slides,
I'll just more formally go through this,

because it gets very slightly 
more exciting than this.

Well this distribution, actually,

looks quite nice 
when you collect the data.

Okay, so in this case, I don't have --
my population is not all UT students.

My population is four people.
There's just four people.

Maybe my population is four people
that are roommates together,

and I ask them, "How many people
do you phone in one week?"

And one person said they phoned
four people, one person said seven,

one person said five, 
one person said eight.

So, if you look at 
our population mean, it's six.

That's just 4+5+7+8 divided by 
the number of individuals,

which is four. Our mean is six.

So, I was really interested 
in knowing how many,

this population of this one house,

I wanted to know what's
the average number of people

that they phone in a week.

But I thought collecting data on 
four people was too much work,

so I decided, I'm just going 
to collect samples of three people.

And so, I got a five, 
a seven, and an eight.

They were the first three people 
that I sampled.

And so my sample mean 
here is 6.67,

which is 5+7+8 divided by 
my sample size of three.

So I got one sample mean,
but this is not a sampling distribution

because it's just one sample.

For a sampling distribution,
I need to do this over and over

and over again.

So I can do something like this,
where I do, let's say,

one sample, two samples, three samples,
four samples, five samples, six samples,

and here are my values:
5,7,8; 4,5,8; 4,7,8; 4, 8, 8;

and for each of these, 
I calculate the sample mean.

This is a sampling distribution.
I've now got many samples,

and I've collected 
the sample mean for each of them.

One thing -- there's two things
you may have noticed about this:

one thing I want you to notice is that 
I've actually sampled with replacement.

So in the UT example, I said
it would be really unlikely

to ever get 
the same five people.

More than that, if you randomly
selected five people,

it's unlikely to get the same 
individual twice in one sample.

But for sampling distributions,
we tend to actually say

that we will sample 
with replacement.

That's just a -- it's not something
we actually need to worry

too much about right now.
I just want you to be aware of --

in this particular example,
I sampled with replacement,

which means you could sample 
the same individual twice,

or even three times, because 
you'll notice the 64th sample

had the same individual
three times and I got 8,8,8,

and the mean of 
that sample mean is 8.

The second thing you may 
have noticed is that

the sample number 
only goes up to 64 here.

I've done 64 samples, 
and that's because

when you collect three numbers
from four potential numbers,

there's only 64 combinations of the data,
so I just stopped at 64.

Okay, but what I want you to realize
is that I have collected 64 samples.

These dots just refer to --
I didn't put all the data

between 6 and 64.
I have 64 sample means.

So I have 64 of these things,
so what should I do with them?

Well, I could plot them on a histogram.

So, here's my histogram 
of the 64 sample means,

and I wonder what you think about it.

Well, one thing is I've used a nice color,
I think, for the bars, but another thing

is maybe you see that there's
potentially a shape in this data.

So if I, if you allow me to be
kind of, draw a curve over it,

I haven't done a great job,
but there's a curve,

and it looks normal-ish.

I'm using the word, "ish," because
it's obviously not normal.

It's kind of getting there to be 
normal distributed.

So this is just a histogram of all
the possible sample means.

And here is the one where we had 8,8,8,
and actually there's one down here

where we got 4,4,4, but there's all
the ones in between as well.

Now what we --
And I'll just put the curve back on it.

Maybe I'll actually this time 
do a better job.

I'm not sur --
No, I didn't.

I'm gonna undo that because that
was a terrible job.

Let's see if I can --

I think going slow...
no, it will do.

Mmm... no it won't.

Let's do it again.

This time, third time's lucky.

Okay I'm happy with that.

This is the sampling distribution.

I just told you that,
but I want you to formally know

what it means.

It's the sampling distribution,
dot dot dot, for the sample mean,

that's what we collected.

We collected the sample mean,
so it's a sampling distribution of the

sample mean, and then we say
for n=3.

Because our sample 
could have been 2, n=2m

in which case we would have 
had a different distribution.

It could've been 1. We could've 
just been really lazy and said,

"I only want to just check 
one person and ask them,"

and calculated 
these sample mean for 1.

But is the sample -- 
we did three people,

and so this is technically 
the sampling distribution

for the sample mean for n=3.

Okay, so that's probably enough for now 
on introducing sampling distribution.

I hope you understand
about what it is.

It is the -- 
you collect many samples,

and you collect some information 
about each of those samples --

in this case, it was the sample mean --
and then you plot them as a histogram,

and you are able to look at 
the shape of the distribution.

And that's what we call 
a sampling distribution.

And we're going to extend 
this idea in future videos.