-
I'm thinking about
buying a restaurant,
-
so I go and ask
the current owner,
-
what is the distribution
of the number of customers
-
you get each day?
-
And he says, oh, I've
already figure that out.
-
And he gives me
this distribution
-
over here, which essentially
says 10% of his customers come
-
in on Monday, 10% on
Tuesday, 15% on Wednesday,
-
so forth, and so on.
-
They're closed on Sunday.
-
So this is 100% of the
customers for a week.
-
If you add that
up, you get 100%.
-
I obviously am a
little bit suspicious,
-
so I decide to see how good
this distribution that he's
-
describing actually
fits observed data.
-
So I actually observe the number
of customers, when they come in
-
during the week,
and this is what
-
I get from my observed data.
-
So to figure out whether
I want to accept or reject
-
his hypothesis right
here, I'm going
-
to do a little bit
of a hypothesis test.
-
So I'll make the null hypothesis
that the owner's distribution--
-
so that's this thing
right here-- is correct.
-
And then the
alternative hypothesis
-
is going to be that
it is not correct,
-
that it is not a
correct distribution,
-
that I should not feel
reasonably OK relying on this.
-
It's not the correct--
I should reject
-
the owner's distribution.
-
And I want to do this with
a significance level of 5%.
-
Or another way of
thinking about it,
-
I'm going to calculate a
statistic based on this data
-
right here.
-
And it's going to be
chi-square statistic.
-
Or another way to view
it is it that statistic
-
that I'm going to
calculate has approximately
-
a chi-square distribution.
-
And given that it does have
a chi-square distribution
-
with a certain number
of degrees of freedom
-
and we're going to calculate
that, what I want to see
-
is the probability of
getting this result,
-
or getting a result like
this or a result more extreme
-
less than 5%.
-
If the probability of getting
a result like this or something
-
less likely than
this is less than 5%,
-
then I'm going to reject
the null hypothesis, which
-
is essentially just rejecting
the owner's distribution.
-
If I don't get
that, if I say, hey,
-
the probability of getting
a chi-square statistic that
-
is this extreme or more
is greater than my alpha,
-
than my significance level,
then I'm not going to reject it.
-
I'm going to say,
well, I have no reason
-
to really assume
that he's lying.
-
So let's do that.
-
So to calculate the chi-square
statistic, what I'm going to do
-
is-- so here we're assuming
the owner's distribution is
-
correct.
-
So assuming the
owner's distribution
-
was correct, what would have
been the expected observed?
-
So we have expected
percentage here,
-
but what would have been
the expected observed?
-
So let me write this right here.
-
Expected.
-
I'll add another row, Expected.
-
So we would have expected
10% of the total customers
-
in that week to
come in on Monday,
-
10% of the total
customers of that week
-
to come in on Tuesday, 15%
to come in on Wednesday.
-
Now to figure out what
the actual number is,
-
we need to figure out the
total number of customers.
-
So let's add up these
numbers right here.
-
So we have-- I'll get
the calculator out.
-
So we have 30 plus 14 plus
34 plus 45 plus 57 plus 20.
-
So there's a total
of 200 customers who
-
came into the
restaurant that week.
-
So let me write this down.
-
So this is equal to-- so I
wrote the total over here.
-
Ignore this right here.
-
I had 200 customers
come in for the week.
-
So what was the expected
number on Monday?
-
Well, on Monday, we would
have expected 10% of the 200
-
to come in.
-
So this would have been 20
customers, 10% times 200.
-
On Tuesday, another 10%.
-
So we would have
expected 20 customers.
-
Wednesday, 15% of 200,
that's 30 customers.
-
On Thursday, we would have
expected 20% of 200 customers,
-
so that would have
been 40 customers.
-
Then on Friday, 30%, that
would have been 60 customers.
-
And then on Friday 15% again.
-
15% of 200 would have
been 30 customers.
-
So if this distribution
is correct,
-
this is the actual number
that I would have expected.
-
Now to calculate
chi-square statistic,
-
we essentially just take--
let me just show it to you,
-
and instead of
writing chi, I'm going
-
to write capital X squared.
-
Sometimes someone will write the
actual Greek letter chi here.
-
But I'll write the
x squared here.
-
And let me write it this way.
-
This is our
chi-square statistic,
-
but I'm going to write it with
a capital X instead of a chi
-
because this is going
to have approximately
-
a chi-squared distribution.
-
I can't assume
that it's exactly,
-
so this is where we're dealing
with approximations right here.
-
But it's fairly
straightforward to calculate.
-
For each of the days,
we take the difference
-
between the observed
and expected.
-
So it's going to
be 30 minus 20--
-
I'll do the first one
color coded-- squared
-
divided by the expected.
-
So we're essentially
taking the square
-
of almost you could kind of
do the error between what
-
we observed and expected or
the difference between what
-
we observed and expect, and
we're kind of normalizing it
-
by the expected right over here.
-
But we want to take the
sum of all of these.
-
So I'll just do all
of those in yellow.
-
So plus 14 minus 20 squared
over 20 plus 34 minus 30 squared
-
over 30 plus-- I'll continue
over here-- 45 minus 40 squared
-
over 40 plus 57 minus
60 squared over 60,
-
and then finally, plus 20
minus 30 squared over 30.
-
I just took the observed
minus the expected
-
squared over the expected.
-
I took the sum of
it, and this is
-
what gives us our
chi-square statistic.
-
Now let's just calculate what
this number is going to be.
-
So this is going to be equal
to-- I'll do it over here
-
so you don't run out of space.
-
So we'll do this a new color.
-
We'll do it in orange.
-
This is going to be
equal to 30 minus 20
-
is 10 squared, which is 100
divided by 20, which is 5.
-
I might not be able to do all
of them in my head like this.
-
Plus, actually, let me
just write it this way
-
just so you can
see what I'm doing.
-
This right here is 100
over 20 plus 14 minus 20
-
is negative 6 squared
is positive 36.
-
So plus 36 over 20.
-
Plus 34 minus 30 is
4, squared is 16.
-
So plus 16 over 30.
-
Plus 45 minus 40
is 5 squared is 25.
-
So plus 25 over 40.
-
Plus the difference
here is 3 squared is 9,
-
so it's 9 over 60.
-
Plus we have a difference of
10 squared is plus 100 over 30.
-
And this is equal to-- and I'll
just get the calculator out
-
for this-- this is
equal to, we have
-
100 divided by 20
plus 36 divided by 20
-
plus 16 divided by 30
plus 25 divided by 40
-
plus 9 divided by 60 plus 100
divided by 30 gives us 11.44.
-
So let me write that down.
-
So this right here
is going to be 11.44.
-
This is my chi-square
statistic, or we
-
could call it a big
capital X squared.
-
Sometimes you'll have it
written as a chi-square,
-
but this statistic is
going to have approximately
-
a chi-square distribution.
-
Anyway, with that
said, let's figure out,
-
if we assume that it has roughly
a chi-square distribution, what
-
is the probability of getting a
result this extreme or at least
-
this extreme, I guess is another
way of thinking about it.
-
Or another way of saying, is
this a more extreme result
-
than the critical
chi-square value
-
that there's a 5% chance of
getting a result that extreme?
-
So let's do it that way.
-
Let's figure out the
critical chi-square value.
-
And if this is more
extreme than that,
-
then we will reject
our null hypothesis.
-
So let's figure out our
critical chi-square values.
-
So we have an alpha of 5%.
-
And actually the other
thing we have to figure out
-
is the degrees of freedom.
-
The degrees of freedom, we're
taking one, two, three, four,
-
five, six sums, so
you might be tempted
-
to say the degrees
of freedom are six.
-
But one thing to
realize is that if you
-
had all of this
information over here,
-
you could actually figure out
this last piece of information,
-
so you actually have
five degrees of freedom.
-
When you have just kind of
n data points like this,
-
and you're measuring kind of
the observed versus expected,
-
your degrees of freedom
are going to be n minus 1,
-
because you could figure
out that nth data point just
-
based on everything
else that you have,
-
all of the other information.
-
So our degrees of freedom
here are going to be 5.
-
It's n minus 1.
-
So our significance level is 5%.
-
And our degrees of freedom is
also going to be equal to 5.
-
So let's look at our
chi-square distribution.
-
We have a degree
of freedom of 5.
-
We have a significance
level of 5%.
-
And so the critical
chi-square value is 11.07.
-
So let's go with this chart.
-
So we have a
chi-squared distribution
-
with a degree of freedom of 5.
-
So that's this distribution
over here in magenta.
-
And we care about a
critical value of 11.07.
-
So this is right here.
-
Oh, you actually even
can't see it on this.
-
So if I were to keep drawing
this magenta thing all
-
the way over here, if the
magenta line just kept going,
-
over here, you'd have 8.
-
Over here you'd have 10.
-
Over here, you'd have 12.
-
11.07 is maybe some
place right over there.
-
So what it's saying
is the probability
-
of getting a result at least
as extreme as 11.07 is 5%.
-
So we could write it even here.
-
Our critical chi-square value is
equal to-- we just saw-- 11.07.
-
Let me look at the chart again.
-
11.07.
-
The result we got
for our statistic
-
is even less likely than that.
-
The probability is less
than our significance level.
-
So then we are going to reject.
-
So the probability
of getting that is--
-
let me put it this
way-- 11.44 is
-
more extreme than our
critical chi-square level.
-
So it's very unlikely that
this distribution is true.
-
So we will reject
what he's telling us.
-
We will reject
this distribution.
-
It's not a good fit based
on this significance level.