Math 1080 Lecture 15 Normal Distribution 3 Sampling Distributions x bar and s squared

Edit subtitles

0:04 - 0:08

Okay, let's go back to basics.
0:10 - 0:13

Let's see sampling distributions.
0:13 - 0:17

We had something
called "population."
0:23 - 0:27

Like, let's take some
huge population,
0:28 - 0:34

say, all high school students
in the United States.
0:35 - 0:37

That's a huge population.
0:38 - 0:40

I don't know
how many there are,
0:41 - 0:46

and we want to study
something like weight,
0:46 - 0:51

obesity of students, and
we need the weight of them.
0:53 - 0:56

That's huge data.
0:57 - 1:03

What we usually do
when we have a huge data--
1:03 - 1:10

or it might be the case that we
do not have access to data--
1:10 - 1:15

[unclear] for example, you
want to study something about...
1:15 - 1:20

...some rabbits, for example,
in state of Minnesota.
1:22 - 1:29

Well, you cannot just gather all the rabbits
and maybe weigh them, for example.
1:33 - 1:38

So, what you do is you sample.
This one makes more sense
1:38 - 1:46

to just sample some rabbits at random from maybe
different places in the state and weigh them.
1:47 - 1:53

So, that's when you really don't
have access to the whole population.
1:54 - 1:57

The other case you might have
access to the whole population
1:57 - 2:02

of all high school students in the
country, but it's not feasible.
2:03 - 2:11

So, in that case, too, we choose
some samples from every state--
2:11 - 2:15

there are different
ways to sample them--
2:15 - 2:23

you can say based on population of each
state we sample according to that population.
2:23 - 2:29

So, a more populous state like
California you sample more students
2:29 - 2:33

from California than, say,
I don't know...Minnesota.
2:36 - 2:41

So, we have a population and we're
studying something about this population.
2:42 - 2:47

Usually we are talking about
the normal distribution,
2:47 - 2:51

although this applies
to any distribution.
2:51 - 2:54

So, this population
has some distribution.
3:00 - 3:02

It has some parameters.
3:07 - 3:11

Now, parameters for the
distribution of the population are--
3:11 - 3:21

usually what we study: mu is mean,
and sigma squared which is variance,
3:23 - 3:27

and sigma which is
standard deviation.
3:36 - 3:41

Well, there might be other things that
we study, depending on the situation,
3:41 - 3:48

if you're working for an insurance
company you might even consider other--
3:48 - 3:52

these are some kind of "moments,"
we call them "moments"--
3:52 - 4:00

so you can choose other moments.
You might need more than just these two.
4:01 - 4:02

So...
4:04 - 4:08

For the moment we
just need these two,
4:08 - 4:11

in fact this one [sigma/standard deviation]
and this one [mu/mean].
4:11 - 4:14

Now, that's for the population.
4:14 - 4:21

But, I said it's not feasible or it's not
possible to look at the whole population.
4:22 - 4:25

So, what we do is we sample.
4:26 - 4:30

And that's where statistics really
comes into the picture.
4:31 - 4:34

Statistics basically starts here.
4:34 - 4:40

Otherwise, if we know the population,
we know everybody and the weight
4:40 - 4:48

of every student in this country,
then that's it, we have data. So what?
4:49 - 4:57

Statistics comes in when it is not possible (for
whatever reason) to study the whole population.
4:58 - 5:02

In fact, when we do data science,
that's another story.
5:02 - 5:06

In data science, say, we
usually have the population.
5:06 - 5:11

Well, sometimes we would sample,
but we usually have the population.
5:11 - 5:17

And most often what
we want to do is predict.
5:17 - 5:19

So, we have to--
5:19 - 5:24

you want to predict something about
the population in the future.
5:24 - 5:30

But, anyways, this is not data science,
this is statistics, although they are related.
5:30 - 5:37

We have a sample here and this sample also has
some kind of mean and some kind of variance.
5:38 - 5:44

This mean and variance we denote
them differently. Remember? Sample--
5:44 - 5:51

This is population, and for sample
we have x̅ [x-bar] for mean,
5:53 - 5:57

and remember we have s
for standard deviation.
6:05 - 6:10

Okay, s for standard deviation
and this is for the sample.
6:10 - 6:15

Oh, it's getting so dark,
let me turn a light on.
6:23 - 6:25

Now...
6:26 - 6:32

First of all, you might say--
you might intuitively [unclear]--
6:32 - 6:39

You might feel like having more
sample, so you might say,
6:39 - 6:45

"A larger sample size will give me
a better idea of the population."
6:45 - 6:49

Where does that feeling
or intuition come from?
6:53 - 7:00

In fact, it's true. It's true that a larger
sample will give a much better idea.
7:01 - 7:06

But, there's something here, this
mean and this standard deviation,
7:06 - 7:12

so the larger sample will give me a mean and
a standard deviation. But, what do we want?
7:12 - 7:15

We want this [sample] mean and
this [sample] standard deviation
7:15 - 7:19

be close to this [population] mean and
this [population] standard deviation.
7:19 - 7:27

In other words, you want these two [from the
sample] to estimate [the population] for me.
7:27 - 7:30

Right? So, you want these
two [from the sample]
7:30 - 7:33

to estimate this [population] mean
and this standard deviation.
7:33 - 7:37

And that's where the
word "estimators" comes in.
7:37 - 7:41

So, this x-bar, which is
the mean of the sample,
7:41 - 7:44

and s, which is the standard
deviation of the sample,
7:44 - 7:50

these two are estimators for
mu and sigma. In fact, in--
7:51 - 8:00

Well, the best-- I mean, x-bar is
an unbiased estimator for mu,
8:01 - 8:07

and s squared is an unbiased
estimator for sigma squared.
8:09 - 8:12

So, this estimates this,
and this estimates this.
8:12 - 8:18

Not sigma, not sigma and s,
that's why we always write variance.
8:19 - 8:26

This guy x̅ estimates this μ, and
this guy s^2 estimates this σ^2.
8:27 - 8:36

And they're unbiased. Now, what "unbiased" is,
I'm not going to discuss in more detail, but it's...
8:39 - 8:45

Well...at some point I will
touch that, but anyways.
8:46 - 8:55

But, the word "unbiased" itself, should
give you some idea of what they really are.
8:55 - 8:56

So...
8:58 - 9:00

Okay, now...
9:01 - 9:02

This...
9:05 - 9:09

So, really there's something
that's coming into the picture.
9:09 - 9:15

We have the sample, and this
sample has some distribution.
9:15 - 9:18

So, the sample has
some distribution.
9:18 - 9:22

Now, what do we assume
for that distribution?
9:22 - 9:28

Well, most often what we do
is we graph the histogram
9:28 - 9:33

and by looking at the histogram
we guess, and we say,
9:33 - 9:40

"Okay, the histogram is telling me that
this distribution looks like normal."
9:40 - 9:46

Or, "This distribution looks like a
exponential distribution," so on and so forth.
9:47 - 9:54

The first thing we usually do is we graph
the histogram and look at the picture
9:54 - 10:01

and try to just guess
what the distribution is.
10:01 - 10:05

Well, anyway, the sample
comes with some numbers,
10:05 - 10:12

so if it is weight of students or age of students,
something [like that], you have some numbers.
10:12 - 10:18

Those numbers give you x-bar and s,
so you have at least 2 numbers here.
10:18 - 10:23

And these two numbers
definitely help you to--
10:23 - 10:28

not the distribution, but most often
you guess the distribution, but anyway--
10:28 - 10:36

to write down the distribution more clearly
by including these two in the distribution.
10:37 - 10:38

Okay, so...
10:39 - 10:41

First thing we do is histogram.
10:41 - 10:46

So, we have a sample
and we draw a histogram.
10:49 - 10:51

So, given a sample
10:57 - 11:01

graph the histogram first.
11:06 - 11:13

So, the histogram should give you an idea
of what the distribution should look like.
11:14 - 11:21

When you graph the histogram this means that
you are using some computer program,
11:21 - 11:27

whatever program you
are using, you'll find:
11:29 - 11:35

x-bar and s.
What is x-bar and what is s?
11:37 - 11:44

X-bar is [uppercase] sigma
[Σ meaning sum] of all x's
11:44 - 11:52

divided by n. And s squared
is [uppercase] sigma Σ,
11:52 - 11:58

x minus x-bar squared
over n minus 1.
12:02 - 12:05

That is s squared.
Now, what is n?
12:06 - 12:12

N is the sample size. Let me
write here: "n = sample size."
12:15 - 12:21

And these two formulas, in fact--
n is sample size-- these two formulas--
12:21 - 12:24

First of all, it means that your
sample must be more than 1.
12:24 - 12:28

[He points to denominator where 1 minus 1
would equal 0 and be undefined.]
12:28 - 12:33

Definitely, can you say anything just
by having 1 [item in a] sample?
12:35 - 12:45

So, this one and this one give me two numbers,
and these two numbers theoretically--
12:45 - 12:51

by theory we know-- that these are unbiased
estimators for mu and [lowercase] sigma squared.
12:51 - 12:57

Now, as I said, and I'll say it again, this chapter,
in fact, is about normal distribution.
12:57 - 13:07

So, what we usually do is we just say, "Okay, well
let's assume that the population is normal."
13:08 - 13:13

Although there are some theorems which have
nothing to do with normal distributions,
13:13 - 13:20

but they work for every distribution.
But, in order to work with a sample
13:20 - 13:26

and make some predictions
and other things,
13:26 - 13:33

we usually assume that the distribution of
the population is normal, and also the sample.
13:34 - 13:40

So, although there are some problems
with this assumption, but...
13:41 - 13:43

Well, we have to assume sometimes.
13:46 - 13:50

Unless we have a large sample,
if we have a large enough sample,
13:50 - 13:54

then by looking at the histogram we might say,
"I think this is not a normal distribution,
13:54 - 13:59

this is like an exponential distribution,
or like 'blah-blah' distribution."
14:00 - 14:07

That idea needs more experience and
knowledge of statistics and probability.
14:08 - 14:13

So, these two guys
are good estimators.
14:21 - 14:22

Now...
14:23 - 14:26

Let me erase...
what should I erase?
14:57 - 15:09

[unclear]
15:14 - 15:20

So, it's saying something like this.
When we study population--
15:20 - 15:23

So, this is the population...
15:26 - 15:30

You have to listen carefully
to what I'm saying.
15:30 - 15:35

So, we want to
study a population.
15:36 - 15:45

Say I'm a stats professor and I choose
some students from my class,
15:45 - 15:49

n students, say 10 students,
15:49 - 15:58

and send them to different places in the country
to measure weight of some students.
15:59 - 16:04

So, to weigh some students,
high school students.
16:04 - 16:12

Today I sent them and they go around
the country in the morning and weigh,
16:12 - 16:18

and bring me 10 numbers.
So, today, day 1...
16:21 - 16:26

they bring me ten numbers.
[counting]
16:27 - 16:29

So, they give me ten numbers.
16:29 - 16:34

These ten numbers, you
can find using that formula,
16:34 - 16:44

you can find for these ten numbers
both x-bar and s squared for day 1.
16:45 - 16:50

So for day 1 we have
x-bar and s squared.
16:51 - 16:54

And, again, day 2,
16:55 - 17:02

I send students again to give me
another x-bar and another s squared.
17:03 - 17:07

Day 3, another one.
17:11 - 17:16

Day-- well, how many days do
you think? Let's say 100 days.
17:17 - 17:24

Day 100, so I have x-bar [subscript] 100,
and s squared [subscript] 100.
17:26 - 17:32

Say I send them 100 days, and I have
no money to do that. [chuckles]
17:32 - 17:35

That's why I'm saying
"if it's feasible."
17:35 - 17:41

Anyway, so I have some x-bars
here and some s squares.
17:42 - 17:49

And samples themselves are kind
of random, the whole set of ten--
17:49 - 17:52

How many students did I send? Ten.
17:52 - 17:56

So, this sample has ten students from
high schools around the country.
17:56 - 18:02

This sample [day 2] has ten students,
but they do not have to be the same,
18:02 - 18:10

they can be at random, so it's unlikely
to have two samples exactly the same.
18:11 - 18:18

Right? So, just imagine. You choose some
students from the high schools in the country.
18:18 - 18:23

The next day you choose other students,
but since you are doing it at random,
18:23 - 18:28

some of these students might be the same.
But, just imagine out of-- I don't know,
18:28 - 18:34

like out of a hundred million students--
I don't know, let's say ten million--
18:34 - 18:40

out of ten million students you are choosing
ten in 2 days and they are the same?
18:40 - 18:43

It's really unlikely to have the same.
18:43 - 18:47

In fact, we can find the probability of being
the same, but that's not the point here.
18:47 - 18:52

And then day 3, another ten students,
up to day 100, ten students.
18:52 - 18:57

The point is, this sample
itself is like random.
18:58 - 19:04

Although each one of them is random,
the sample in whole is also like random.
19:05 - 19:11

Which makes these x-bars
and s squared's random.
19:12 - 19:16

Right? This makes them random.
19:16 - 19:21

So, being random, it means that we can
talk about some kind of random variable.
19:21 - 19:25

X-bar, for example, or s squared.
19:28 - 19:34

And this x-bar has values: x-bar [subscript] 1,
x-bar [subscript] 2, up to x-bar [subscript] 100.
19:34 - 19:36

This one [s squared] takes values from
19:36 - 19:41

s^2 [subscript] 1, s^2 [subscript] 2,
up to s^2 [subscript] 100.
19:43 - 19:45

Since they are random variables,
19:45 - 19:53

with these as values taken by these two
random variables, they have a distribution.
19:53 - 20:00

So, the distribution of these two, that's what
this thing is about: sampling distribution.
20:01 - 20:07

And that's what we usually graph.
That's what we graph. And...
20:08 - 20:11

So, we have this x-bar
and s squared for ten--
20:11 - 20:17

Well, ten's not enough, but say
I send 200 students, okay?
20:17 - 20:23

And then we graph it-- a histogram for
x-bar, and a histogram for s squared--
20:23 - 20:28

those two histograms will give us
an idea of what the distributions are.
20:28 - 20:34

Now, next time I will discuss
the distributions of these...
20:36 - 20:44

...random variables and also some amazing
theorems about those distributions.
20:45 - 20:47

So, see you next time.

Title:: Math 1080 Lecture 15 Normal Distribution 3 Sampling Distributions x bar and s squared
Video Language:: English
Duration:: 20:49

	geriwilson edited English subtitles for Math 1080 Lecture 15 Normal Distribution 3 Sampling Distributions x bar and s squared
	geriwilson edited English subtitles for Math 1080 Lecture 15 Normal Distribution 3 Sampling Distributions x bar and s squared

English subtitles

Revisions

Revision 2 Edited

geriwilson

Math 1080 Lecture 15 Normal Distribution 3 Sampling Distributions x bar and s squared

Revisions

Our website uses cookies

Operating cookies (Required)