-
Before, at the end of the last
video, I actually said that
-
we'd talk about measures of
dispersion or how things are
-
distributed, but before I go
into that, I realize that I
-
have more to talk about,
especially the mean.
-
And before I do that, I want
to differentiate between a
-
sample and a population.
-
I touched on this a little
bit in the last video.
-
Let's say I wanted to
know-- I don't know.
-
Let's say I wanted to know
the average height of all
-
men in America, right?
-
So let me make the set
of all men in America.
-
So that's all men in America.
-
I know there's 300 million
people in the U.S., and half of
-
them maybe roughly are men,
so this would be 150
-
million men, right?
-
And it would be nearly
impossible, even if I was
-
intent on doing it, to actually
measure the average height
-
of every man in America.
-
Frankly, you know, every few
seconds, one of these men is
-
being born and one of
these men is dying.
-
So you know by the time I'm
done measuring everything,
-
someone would have died, and
some new men would have
-
been born, so it would
almost be impossible.
-
And if not impossible, it would
be very tiresome to measure the
-
average, or the mean, or the
median, or the mode of this
-
entire population, right?
-
So the best way I can get a
sense of this, because I'm
-
interested in what the average
of the population is, maybe I
-
can take the average
of a sample.
-
So what I could do is I can go
up to, you know-- and I'd try
-
to be pretty random about it.
-
I don't want to like-- you
know, hopefully, my sample
-
wouldn't be my college's
basketball team because that
-
would be a skewed sample, but
I'd try to find random people
-
and random situations where it
wouldn't be skewed
-
based on height.
-
And I'd maybe collect 10
heights, and I'd get, well,
-
maybe-- you know, the more
people I get the more
-
indicative it is, but if I got
10 heights, and those 10
-
heights were-- I don't know.
-
I'll do it in, you know, 5
feet, 6 feet, 5 and a half
-
feet, 5.75 feet, and, well,
let's say I only do 6,
-
or let's say in 6 and
a half feet, right?
-
Those are the five people that
I'd sample, and we could talk
-
more about what's a good way to
generate a random sample from a
-
population so it's not skewed
one way or the other.
-
But anyway, if I wanted to get
a sense of it and if I was kind
-
of lazy, so I only took
five measurements, this
-
is the way I would do it.
-
This would be a sample.
-
This would be a sample
of the population.
-
So instead of taking the mean--
let's say how I wanted to
-
calculate the average by
taking the arithmetic mean.
-
Instead of taking the
arithmetic mean of this entire
-
group of 150 million people, I
might just be happy taking the
-
mean of this sample, and
that'll be called
-
the sample mean.
-
And I want to introduce you to
some notation, even though it's
-
kind of-- so in statistics
speak, the mean, this mu, it's
-
a Greek letter, essentially the
Greek letter that later turns
-
into m, but mu is the
population mean, and this is
-
just a convention
population mean.
-
And x with a line over it, that
is equal to a sample mean.
-
And these are just notations
that people might see, and you
-
might have been confused
because sometimes you see
-
something-- people talk about
means, and you see this mu, and
-
sometimes you see this x with a
line over it, and it's nice
-
to know the distinction.
-
Here they're talking about
the mean of a sample of the
-
population, and here they're
talking about the mean of
-
the population as a whole.
-
Now, the way you calculate
them is essentially the same.
-
If you want to figure out the
population mean, you'd go to
-
all 150 million people at one
moment and add up all their
-
heights, and divide by 150
million to get the
-
population mean.
-
The sample mean, you just add
up the numbers in your sample
-
and divide by the number
of data points you have.
-
And the formulas I
want to show you.
-
I think you know how to
calculate averages.
-
It's a fairly straightforward
operation, and I want to show
-
you how it's often written in
statistics books, so that
-
you're not intimidated
when you see it.
-
The population mean, they'll
write it as-- so just to do,
-
you know, the convention.
-
Each member of a-- well, let
me do the sample first.
-
Each member of a sample, say
this is the first sample.
-
They'll call that x sub 1.
-
They'll call this x sub 2.
-
They'll call this one x
sub 3, x sub 4, and this
-
one x sub 5, right?
-
And this is just a way
of referring to each
-
of the samples.
-
So in a sample mean, they'll
say, do you know what you do?
-
You take the sum
of these numbers.
-
And you know how to do that,
but the fancy way of writing
-
it is to say, let's
write a capital Sigma.
-
That means the sum.
-
Sum of every x sub n, right?
-
Take the sum of each of
these numbers, right?
-
This is x sub 1, x sub 2, where
n goes from 1 to-- I mean, you
-
could say to the size
of the population.
-
You know, sometimes-- you know,
in this case it would be 5, or
-
sometimes they'd write a big--
they'd write an n like that.
-
And you'd divide it by the
number of members there are
-
in that population,
so divided by n.
-
You know, when you see this in
a book, you're like, wow, this
-
is advanced mathematics.
-
But essentially, they're saying
take the sum of all the data
-
points, just sum up these
numbers, and divide by the
-
number of numbers there are.
-
So this would just be 5
plus 5 plus 5.5 plus 5.75
-
plus 6.5 divided by 5.
-
That's all this is telling you.
-
For the population mean,
it's the same thing.
-
They just use a slightly
different notation.
-
They'll say that's equal to the
sum from n is equal to 1 to a
-
big N-- and I'll explain why
they write a big N-- of each
-
data point in the population,
not just the sample, all
-
that divided by big N.
-
And this is just a way of,
when they're at big N,
-
they mean 150 million.
-
They mean, you know, we want
you to get every data point
-
in the entire population.
-
So that's what they mean by--
and then divide by the number
-
of the entire population.
-
While the small n, they're kind
of-- it's just the convention,
-
the notation, that they say,
hey, we just want you to get
-
some smaller number, not
the entire population.
-
But the way you calculate
them is, you know, they're
-
essentially equivalent.
-
Anyway, I wanted to leave you
with that just because this is
-
something that if you don't get
it clarified early on-- it's a
-
fairly simple concept-- later
on, it becomes very confusing
-
when people want to
differentiate between the
-
population and the sample mean.
-
And you see these formulas
written slightly different.
-
Sometimes you'll see a mu, and
sometimes you'll see an x with
-
a line over it for
the sample mean.
-
Anyway, I'll see in
the next video.