https:/.../2020-02-21_psy317l_samp_dist_intro.mp4

0:01 - 0:05

Hello. In this video, I would
like to introduce you
0:05 - 0:08

to the concept
of sampling distributions.
0:08 - 0:14

Now, sampling distributions are going
to be fundamental to understanding
0:14 - 0:19

how we are able to derive at
certain knowledge about distributions.
0:19 - 0:21

So what -- Let's, first of all,
let's just recap
0:21 - 0:24

what a sample is
and what a population is.
0:24 - 0:26

So we've seen this diagram before.
0:26 - 0:30

A population is essentially all
the individuals we could possibly
0:30 - 0:35

ever get information on,
get data. And populations --
0:35 - 0:38

let's just pick a very,
very boring example of height.
0:38 - 0:43

Maybe the population is
all UT students, all UT students.
0:43 - 0:46

Now, every single student
enrolled at UT,
0:46 - 0:50

if we were able to get all of their
heights, there would be some true
0:50 - 0:54

measure of the average height.
The average height would be
0:54 - 0:57

a population mean, and there would
be some value that that is.
0:57 - 1:01

We'd also be able to get some
other truth, we call these truths.
1:01 - 1:03

A standard deviation, if we were
able to measure every
1:03 - 1:05

however many thousand
UT students (inaudible).
1:05 - 1:09

If we were able to measure those things,
we'd be able to get those values,
1:09 - 1:12

and they're the population values,
and you may remember
1:12 - 1:15

we call them parameters.
1:15 - 1:18

Now, that's pretty much impossible.
We would never actually, in theory,
1:18 - 1:21

ever be able to measure
the height of all UT students.
1:21 - 1:24

So what we do, if we were interested,
if this was something we were
1:24 - 1:27

interested in for some strange reason,
we'd be able to just collect a sample.
1:27 - 1:30

I don't know. Maybe we'd get
a sample of five people,
1:30 - 1:33

or maybe we would get
a sample of 50 people,
1:33 - 1:35

or maybe we would
get a sample of 500.
1:35 - 1:38

In future videos,
we'll talk about
1:38 - 1:40

what's an appropriate size
of sample to collect
1:40 - 1:42

when we're interested
in finding out something.
1:42 - 1:46

But for now, who knows?
Maybe we picked ten people,
1:46 - 1:48

but we pick a sample
from those individuals.
1:48 - 1:51

Maybe we measure, we measure
all their heights and we calculate
1:51 - 1:56

their average height and we call
that the sample mean, "x-bar."
1:56 - 2:00

We're also able to calculate their,
maybe their sample standard deviation.
2:00 - 2:03

We'd call that "s,"
something like that.
2:03 - 2:09

These things we call statistics.
This is just recap stuff.
2:09 - 2:13

And the purpose of
calculating the statistics
2:13 - 2:17

is because we want
to estimate these parameters.
2:17 - 2:20

We want to make an estimation
of what the true value is.
2:21 - 2:26

Now the issue is, let's say we're
just collecting samples of five,
2:26 - 2:30

and then maybe another day,
we got a different sample of five,
2:30 - 2:33

and then on another day,
we're addicted to measuring
2:33 - 2:36

the heights of people,
we get another sample of five.
2:36 - 2:38

Every time we get
another sample of five,
2:38 - 2:41

let's, for now, just consider
the sample mean.
2:41 - 2:44

We're not that interested in
the standard deviation of these.
2:44 - 2:48

We're just gonna get a different
average every time we collect a
2:48 - 2:51

sample of five. It'd be
very, very unusual
2:51 - 2:55

if we randomly sampled five individuals
from all UT students,
2:55 - 2:59

and it was the same five people.
It's almost unlikely
2:59 - 3:02

ever to be the same five people,
and so the average height
3:02 - 3:05

of the sample mean is going
to be different every single time.
3:05 - 3:08

And I don't know, let's say
another day we did another five,
3:08 - 3:09

we got another sample mean.
3:09 - 3:12

So we're trying
to estimate these things,
3:12 - 3:15

but our values are going
to generally be different,
3:15 - 3:17

and then they're never going
to be exactly -- well they could
3:17 - 3:20

actually be exactly the population
mean, but it's unlikely.
3:20 - 3:24

So what we've just done, by the way,
is a sampling distribution.
3:24 - 3:27

I kind of, by accident, have
introduced you to it.
3:27 - 3:30

A sampling distribution is these things:
3:30 - 3:34

It's when you collect lots,
and lots, and lots of samples.
3:34 - 3:36

You collect lots and lots
of samples, and then you
3:36 - 3:38

just look at all
the values that you got.
3:38 - 3:41

And then you tend to plot
them as a histogram.
3:41 - 3:44

So in the next couple of slides,
I'll just more formally go through this,
3:44 - 3:48

because it gets very slightly
more exciting than this.
3:48 - 3:50

Well this distribution, actually,
3:50 - 3:53

looks quite nice
when you collect the data.
3:53 - 3:57

Okay, so in this case, I don't have --
my population is not all UT students.
3:57 - 4:01

My population is four people.
There's just four people.
4:01 - 4:05

Maybe my population is four people
that are roommates together,
4:05 - 4:10

and I ask them, "How many people
do you phone in one week?"
4:10 - 4:14

And one person said they phoned
four people, one person said seven,
4:14 - 4:16

one person said five,
one person said eight.
4:16 - 4:19

So, if you look at
our population mean, it's six.
4:19 - 4:24

That's just 4+5+7+8 divided by
the number of individuals,
4:24 - 4:27

which is four. Our mean is six.
4:27 - 4:30

So, I was really interested
in knowing how many,
4:30 - 4:33

this population of this one house,
4:33 - 4:37

I wanted to know what's
the average number of people
4:37 - 4:38

that they phone in a week.
4:38 - 4:41

But I thought collecting data on
four people was too much work,
4:41 - 4:44

so I decided, I'm just going
to collect samples of three people.
4:44 - 4:47

And so, I got a five,
a seven, and an eight.
4:47 - 4:49

They were the first three people
that I sampled.
4:49 - 4:52

And so my sample mean
here is 6.67,
4:52 - 4:56

which is 5+7+8 divided by
my sample size of three.
4:56 - 5:00

So I got one sample mean,
but this is not a sampling distribution
5:00 - 5:02

because it's just one sample.
5:02 - 5:05

For a sampling distribution,
I need to do this over and over
5:05 - 5:06

and over again.
5:06 - 5:09

So I can do something like this,
where I do, let's say,
5:10 - 5:14

one sample, two samples, three samples,
four samples, five samples, six samples,
5:14 - 5:18

and here are my values:
5,7,8; 4,5,8; 4,7,8; 4, 8, 8;
5:18 - 5:21

and for each of these,
I calculate the sample mean.
5:21 - 5:24

This is a sampling distribution.
I've now got many samples,
5:24 - 5:27

and I've collected
the sample mean for each of them.
5:28 - 5:31

One thing -- there's two things
you may have noticed about this:
5:31 - 5:34

one thing I want you to notice is that
I've actually sampled with replacement.
5:34 - 5:37

So in the UT example, I said
it would be really unlikely
5:37 - 5:39

to ever get
the same five people.
5:39 - 5:42

More than that, if you randomly
selected five people,
5:42 - 5:45

it's unlikely to get the same
individual twice in one sample.
5:45 - 5:48

But for sampling distributions,
we tend to actually say
5:48 - 5:50

that we will sample
with replacement.
5:50 - 5:52

That's just a -- it's not something
we actually need to worry
5:52 - 5:54

too much about right now.
I just want you to be aware of --
5:54 - 5:57

in this particular example,
I sampled with replacement,
5:57 - 6:00

which means you could sample
the same individual twice,
6:00 - 6:03

or even three times, because
you'll notice the 64th sample
6:03 - 6:06

had the same individual
three times and I got 8,8,8,
6:06 - 6:09

and the mean of
that sample mean is 8.
6:09 - 6:11

The second thing you may
have noticed is that
6:11 - 6:15

the sample number
only goes up to 64 here.
6:15 - 6:17

I've done 64 samples,
and that's because
6:17 - 6:21

when you collect three numbers
from four potential numbers,
6:21 - 6:25

there's only 64 combinations of the data,
so I just stopped at 64.
6:25 - 6:29

Okay, but what I want you to realize
is that I have collected 64 samples.
6:29 - 6:32

These dots just refer to --
I didn't put all the data
6:32 - 6:36

between 6 and 64.
I have 64 sample means.
6:36 - 6:40

So I have 64 of these things,
so what should I do with them?
6:40 - 6:43

Well, I could plot them on a histogram.
6:43 - 6:47

So, here's my histogram
of the 64 sample means,
6:48 - 6:51

and I wonder what you think about it.
6:51 - 6:55

Well, one thing is I've used a nice color,
I think, for the bars, but another thing
6:55 - 6:59

is maybe you see that there's
potentially a shape in this data.
6:59 - 7:05

So if I, if you allow me to be
kind of, draw a curve over it,
7:05 - 7:07

I haven't done a great job,
but there's a curve,
7:07 - 7:10

and it looks normal-ish.
7:10 - 7:12

I'm using the word, "ish," because
it's obviously not normal.
7:12 - 7:15

It's kind of getting there to be
normal distributed.
7:15 - 7:18

So this is just a histogram of all
the possible sample means.
7:18 - 7:23

And here is the one where we had 8,8,8,
and actually there's one down here
7:23 - 7:28

where we got 4,4,4, but there's all
the ones in between as well.
7:30 - 7:33

Now what we --
And I'll just put the curve back on it.
7:33 - 7:36

Maybe I'll actually this time
do a better job.
7:36 - 7:38

I'm not sur --
No, I didn't.
7:38 - 7:39

I'm gonna undo that because that
was a terrible job.
7:39 - 7:41

Let's see if I can --
7:41 - 7:45

I think going slow...
no, it will do.
7:45 - 7:46

Mmm... no it won't.
7:46 - 7:47

Let's do it again.
7:47 - 7:49

This time, third time's lucky.
7:50 - 7:52

Okay I'm happy with that.
7:52 - 7:55

This is the sampling distribution.
7:55 - 7:57

I just told you that,
but I want you to formally know
7:57 - 7:58

what it means.
7:58 - 8:04

It's the sampling distribution,
dot dot dot, for the sample mean,
8:04 - 8:05

that's what we collected.
8:05 - 8:08

We collected the sample mean,
so it's a sampling distribution of the
8:08 - 8:12

sample mean, and then we say
for n=3.
8:12 - 8:16

Because our sample
could have been 2, n=2m
8:16 - 8:17

in which case we would have
had a different distribution.
8:17 - 8:20

It could've been 1. We could've
just been really lazy and said,
8:20 - 8:22

"I only want to just check
one person and ask them,"
8:22 - 8:25

and calculated
these sample mean for 1.
8:25 - 8:28

But is the sample --
we did three people,
8:28 - 8:30

and so this is technically
the sampling distribution
8:30 - 8:33

for the sample mean for n=3.
8:35 - 8:39

Okay, so that's probably enough for now
on introducing sampling distribution.
8:39 - 8:43

I hope you understand
about what it is.
8:43 - 8:44

It is the --
you collect many samples,
8:44 - 8:46

and you collect some information
about each of those samples --
8:46 - 8:51

in this case, it was the sample mean --
and then you plot them as a histogram,
8:51 - 8:53

and you are able to look at
the shape of the distribution.
8:53 - 8:55

And that's what we call
a sampling distribution.
8:55 - 8:58

And we're going to extend
this idea in future videos.

Title:: https:/.../2020-02-21_psy317l_samp_dist_intro.mp4
Video Language:: English
Duration:: 09:15

	Richard M Gaunt edited English subtitles for https:/.../2020-02-21_psy317l_samp_dist_intro.mp4
	libbysears edited English subtitles for https:/.../2020-02-21_psy317l_samp_dist_intro.mp4
	libbysears edited English subtitles for https:/.../2020-02-21_psy317l_samp_dist_intro.mp4
	Javonna S Hamilton edited English subtitles for https:/.../2020-02-21_psy317l_samp_dist_intro.mp4

English subtitles

Revisions

Revision 4 Edited

Richard M Gaunt

https:/.../2020-02-21_psy317l_samp_dist_intro.mp4

Revisions

Our website uses cookies

Operating cookies (Required)