Before, at the end of the last
video, I actually said that
we'd talk about measures of
dispersion or how things are
distributed, but before I go
into that, I realize that I
have more to talk about,
especially the mean.
And before I do that, I want
to differentiate between a
sample and a population.
I touched on this a little
bit in the last video.
Let's say I wanted to
know-- I don't know.
Let's say I wanted to know
the average height of all
men in America, right?
So let me make the set
of all men in America.
So that's all men in America.
I know there's 300 million
people in the U.S., and half of
them maybe roughly are men,
so this would be 150
million men, right?
And it would be nearly
impossible, even if I was
intent on doing it, to actually
measure the average height
of every man in America.
Frankly, you know, every few
seconds, one of these men is
being born and one of
these men is dying.
So you know by the time I'm
done measuring everything,
someone would have died, and
some new men would have
been born, so it would
almost be impossible.
And if not impossible, it would
be very tiresome to measure the
average, or the mean, or the
median, or the mode of this
entire population, right?
So the best way I can get a
sense of this, because I'm
interested in what the average
of the population is, maybe I
can take the average
of a sample.
So what I could do is I can go
up to, you know-- and I'd try
to be pretty random about it.
I don't want to like-- you
know, hopefully, my sample
wouldn't be my college's
basketball team because that
would be a skewed sample, but
I'd try to find random people
and random situations where it
wouldn't be skewed
based on height.
And I'd maybe collect 10
heights, and I'd get, well,
maybe-- you know, the more
people I get the more
indicative it is, but if I got
10 heights, and those 10
heights were-- I don't know.
I'll do it in, you know, 5
feet, 6 feet, 5 and a half
feet, 5.75 feet, and, well,
let's say I only do 6,
or let's say in 6 and
a half feet, right?
Those are the five people that
I'd sample, and we could talk
more about what's a good way to
generate a random sample from a
population so it's not skewed
one way or the other.
But anyway, if I wanted to get
a sense of it and if I was kind
of lazy, so I only took
five measurements, this
is the way I would do it.
This would be a sample.
This would be a sample
of the population.
So instead of taking the mean--
let's say how I wanted to
calculate the average by
taking the arithmetic mean.
Instead of taking the
arithmetic mean of this entire
group of 150 million people, I
might just be happy taking the
mean of this sample, and
that'll be called
the sample mean.
And I want to introduce you to
some notation, even though it's
kind of-- so in statistics
speak, the mean, this mu, it's
a Greek letter, essentially the
Greek letter that later turns
into m, but mu is the
population mean, and this is
just a convention
population mean.
And x with a line over it, that
is equal to a sample mean.
And these are just notations
that people might see, and you
might have been confused
because sometimes you see
something-- people talk about
means, and you see this mu, and
sometimes you see this x with a
line over it, and it's nice
to know the distinction.
Here they're talking about
the mean of a sample of the
population, and here they're
talking about the mean of
the population as a whole.
Now, the way you calculate
them is essentially the same.
If you want to figure out the
population mean, you'd go to
all 150 million people at one
moment and add up all their
heights, and divide by 150
million to get the
population mean.
The sample mean, you just add
up the numbers in your sample
and divide by the number
of data points you have.
And the formulas I
want to show you.
I think you know how to
calculate averages.
It's a fairly straightforward
operation, and I want to show
you how it's often written in
statistics books, so that
you're not intimidated
when you see it.
The population mean, they'll
write it as-- so just to do,
you know, the convention.
Each member of a-- well, let
me do the sample first.
Each member of a sample, say
this is the first sample.
They'll call that x sub 1.
They'll call this x sub 2.
They'll call this one x
sub 3, x sub 4, and this
one x sub 5, right?
And this is just a way
of referring to each
of the samples.
So in a sample mean, they'll
say, do you know what you do?
You take the sum
of these numbers.
And you know how to do that,
but the fancy way of writing
it is to say, let's
write a capital Sigma.
That means the sum.
Sum of every x sub n, right?
Take the sum of each of
these numbers, right?
This is x sub 1, x sub 2, where
n goes from 1 to-- I mean, you
could say to the size
of the population.
You know, sometimes-- you know,
in this case it would be 5, or
sometimes they'd write a big--
they'd write an n like that.
And you'd divide it by the
number of members there are
in that population,
so divided by n.
You know, when you see this in
a book, you're like, wow, this
is advanced mathematics.
But essentially, they're saying
take the sum of all the data
points, just sum up these
numbers, and divide by the
number of numbers there are.
So this would just be 5
plus 5 plus 5.5 plus 5.75
plus 6.5 divided by 5.
That's all this is telling you.
For the population mean,
it's the same thing.
They just use a slightly
different notation.
They'll say that's equal to the
sum from n is equal to 1 to a
big N-- and I'll explain why
they write a big N-- of each
data point in the population,
not just the sample, all
that divided by big N.
And this is just a way of,
when they're at big N,
they mean 150 million.
They mean, you know, we want
you to get every data point
in the entire population.
So that's what they mean by--
and then divide by the number
of the entire population.
While the small n, they're kind
of-- it's just the convention,
the notation, that they say,
hey, we just want you to get
some smaller number, not
the entire population.
But the way you calculate
them is, you know, they're
essentially equivalent.
Anyway, I wanted to leave you
with that just because this is
something that if you don't get
it clarified early on-- it's a
fairly simple concept-- later
on, it becomes very confusing
when people want to
differentiate between the
population and the sample mean.
And you see these formulas
written slightly different.
Sometimes you'll see a mu, and
sometimes you'll see an x with
a line over it for
the sample mean.
Anyway, I'll see in
the next video.