-
Today I'll discuss confidence interval
for sigma and sigma squared--
-
that's standard deviation and variance--
using the chi-squared distribution.
-
Now, these are to be estimated (sigma
and sigma squared) for the population.
-
Remember, population
is what you're studying.
-
And sample is what
you have in hand.
-
You don't have access to
the whole population data,
-
but you can take a sample.
-
And what you are-- what you have
in hand, again, what you have...
-
is a sample, and the sample has
standard deviation and variance.
-
And also we know how many data points
we have...that's number of sample points.
-
Then...
-
Since we chose the
chi-squared distribution for this,
-
we have two critical values here:
chi-squared L and chi-squared R.
-
And let me show you what they are.
Chi-squared is distribution related to...
-
...normal distribution, in fact,
the square of normal distribution.
-
So, it is positive and it is on the right side.
-
Now, chi-squared, uh...
left side is this.
-
And this one [on the right] is this.
Now, what are they?
-
So, you choose confidence level.
-
Remember, the confidence level was...
is a number... a small number.
-
Usually you choose 0.05 or 0.01,
-
and if it's not important,
-
you can choose 0.1, for example.
-
Most often we choose 0.05 because it's,
uh... it's better, it is the best way, I think.
-
Now, what is this chi-squared L?
Chi-squared L is a point on the...
-
...on the x-axis, such that this area here
under the curve of chi-squared is alpha over 2.
-
And this area on this side
[right side] is alpha over 2.
-
So, the area in the
middle is 1 minus alpha.
-
So... because you see the area
under the curve is [equal to] 1.
-
The whole area is 1, so if this is α/2 and
this is α/2, add them, you have alpha,
-
the whole thing is 1, so this area between
these two is 1 minus alpha [1 - α].
-
Okay, so...
-
Let me erase this, and...
-
No, let me write down the confidence
interval, say, end points, the interval itself.
-
In the end...
-
So, you calculated what?
-
You calculated s and s-squared.
-
And you have chi-squared L and
chi-squared R, and you also have n.
-
So, we use chi-squared with
n minus 1 degrees of freedom.
-
That is n minus 1 [n - 1]
degrees of freedom.
-
So...
-
For sigma squared, we have this interval,
so on the left side we have n minus 2--
-
No, sorry, that's n minus 1, s-squared,
divided by chi-squared (on the left that's R).
-
And on this side we have n minus 1,
s-squared, [over] chi-squared L.
-
They are different, so the
left side is R, the right side is L.
-
And for sigma... we have...
-
It's like almost the same thing, so sigma
is just the square root of this thing.
-
So, we have square root
of this n minus 1, s-squared
-
over chi-squared R.
-
And on this side, we have n minus 1,
s-squared, [over] chi-squared L.
-
Okay, so these are the things that
we will calculate, these two, that's it.
-
We have the confidence interval
for sigma and sigma squared.
-
So, let's look at this again.
-
We know this [n - 1], we know this [s-squared],
we calculate this [chi-squared R]--
-
[correction] we calculate
them [n - 1 and s-squared]--
-
and also this [on the right] we calculate, and
this gives me an interval for sigma squared.
-
And this [below] gives me
the interval for sigma.
-
So, after I give you an example,
I will also discuss...
-
I give you what this really means. So...
-
Let me erase this. And so...
-
The one that I chose, in fact,
comes from the book.
-
And that is...
-
Confidence interval estimate
of sigma for pulse rates.
-
Confidence interval estimate
of sigma for pulse rates.
-
Now, that makes sense because...
-
But, that probably has significance for...
like, maybe... I don't know. Maybe...
-
Well, health insurance companies...
so they gather a team to give them
-
a confidence interval for sigma
and sigma squared. So...
-
What they have to do is...
-
preferably, find pulse rates of
everybody in certain society,
-
if that was United Stated, then
everybody who lives in this country.
-
But, that's not feasible, right?
-
And that must be done at the same
time, say, so at this very moment.
-
So, just imagine how can you do that.
-
What the do is, they just...
-
take a sample of people in this country
at random and find the pulse rates.
-
So, they have a sample. Now,
this sample is given here and...
-
So, the sample consists
of some numbers here.
-
I will just write some of them,
but I have all of them in R.
-
Some of them look like this:
the first person has 76,
-
then the second 76,
then 86, dot, dot, dot.
-
Last one is like 66.
-
So, this is the sample, sample pulse
rate for certain number of people.
-
And we will find out everything in R.
-
So, let me share the R with you.
-
Where is R? Okay, I see.
-
Okay.
-
So, you can see that I named this thing
"data," we can call it "my data," whatever.
-
And these are the numbers:
76, 76, 86, and blah, blah, blah.
-
And the last one is 74.
Oh, okay.
-
So, this is my data, and I have
already entered, so the data...
-
Well, let's see, what was wrong?
-
Oh, I don't know, something
is terribly wrong here.
-
Okay, so yeah, I guess [we deleted some].
So, let's check data again.
-
This is my data.
-
First thing is n, I need n number of
sample points, so I enter length of my data.
-
So, let's check how many: 22.
-
So, degree of freedom (df)
is n minus 1, so df is 21.
-
What else do I need?
I need the s and s-squared.
-
So, s is my standard
deviation of data,
-
and "s_sq" (s-squared)
is variance of data.
-
Okay, let's check s and s_sq.
-
Well, I said s and s-squared,
so s-squared is that 's' squared, in fact.
-
So, let's see if s-squared
is that 's' squared.
-
S power 2...
See, they are the same.
-
In fact, variance is s-squared.
Or 's' is square root of variance.
-
So, what else do we need?
-
We need the chi-squared L
and chi-squared R.
-
You can find them like this: "qchisq."
-
Oh! [corrects himself]
Alpha, let's enter alpha.
-
Alpha is... How much was alpha?
0.05...or no [corrects typo of 4].
-
That's the usual alpha [0.05], so
I need "qchisq"-- Did I enter that?
-
No, there isn't "qchisq" [yet],
[corrects typo] "qchisq" alpha,
-
over 2, and the degree of freedom (df).
-
Let's call it something...
Let's call it... "chi_L," right?
-
So, it's easier when I have to do it again.
-
So, that is chi-squared left, I just
called it something [chi_L], and...
-
chi-squared right [chi-r] is qchisq.
-
Now, this is 1 minus alpha over 2
with the same degree of freedom.
-
Okay, so...
-
I guess I have almost
everything here.
-
What is the formula for s?
-
For s, it was square root of (sqrt) of
n minus 1, which was degree of freedom...
-
well, let's write it "n-1" times
s-squared "s_sq," all of that divided by chi...
-
...well, that was... which one am I...
"chi_L" [corrects himself], "chi_r."
-
R goes to the left [so] "chi-r."
-
Oops, what did I enter?
-
Oh, yeah, I have to enter "times" [*];
otherwise, it doesn't understand.
-
So, 7.63, let me write
this here somewhere.
-
So, that's for sigma,
in fact, so 7.63 [unclear].
-
Let me... Oh, let me see.
-
And the other one is the same thing
except chi squared left.
-
14.17. So this one is 14.17, okay?
-
And I also need...
This is for... This is...
-
7.63 and 14.17 are
the two ends for sigma.
-
I need for sigma squared.
That is the variance.
-
So I don't have to find
the square root of them,
-
so this was square root of this thing,
so let's just remove square root.
-
And that is for variance.
-
So that's 58.24. Let me
write it somewhere here.
-
58.24.
-
The other one would be...
-
...the left one. Let's
remove the square root.
-
And that is for variance. 200.95.
-
200.95 [unclear] come back.
-
I guess I have everything.
Let's go back to the board.
-
So I calculated this one, this one,
and the other one.
-
And that is all I needed.
-
So...
-
Let's write for sigma first.
-
So I found that sigma is
between 7.63 and 14.17,
-
with 95% confidence.
-
And the other one is this one:
-
58 and 200,
I'll just write it here.
-
I'm sorry. I didn't keep
the numbers up there.
-
So... And sigma squared is
between these two numbers:
-
58.24 and 200.95,
with 95% confidence.
-
So I found that the standard deviation
of the population with 95% confidence
-
is between these two numbers.
-
Pulse rate doesn't have any units, right?
So that-- I guess the unit is "per minute."
-
I think it's per minute.
-
So they just count the number
of pulses per minute.
-
So that sigma is per minute, in fact.
-
So that's 7.63 per minute
and 14.17 per minute.
-
Sigma has the same unit as
the quantity that you are studying,
-
but sigma squared does not.
-
But sigma squared has
more theoretical significance.
-
But sigma makes
more sense in practice.
-
So that is sigma for the population
and sigma squared for the population
-
and with 95% confidence.
-
We can find, say,
0.1 percent confidence.
-
So let me rewrite this here
and we'll find 0.1 percent.
-
So this is for 95
(sigma squared for 95 is this).
-
Let me write this here
and then find the other one
-
with more confidence, say.
-
So this is for 95%, and let's see
if we can do that for...
-
What difference...
What kind of calculation?
-
Let's see. Well, let me
share this thing again.
-
Alright, so with 0.01, the only
thing that changes is alpha.
-
The rest is the same.
-
Alpha this time is 0.01, okay?
-
So the thing that changes
is chi-L and chi-r,
-
so let's calculate chi-L and chi-r again.
-
Chi-r is q chi squared 1 minus alpha over 2.
-
Okay, because I have changed alpha,
so chi-r is this and chi-L, let's find it.
-
Okay, and chi-L is this.
-
The rest of the calculation is the same,
so let's find the square root of...
-
Everything else is the same,
so let's see: sqrt, this is chi-L,
-
so that's the right side.
-
16.03.
-
16.03, let me write it on the board.
-
And the other one is chi-r --
-
Let's just change this to r,
let's just go back.
-
And 7.06.
-
Let's find confidence
level interval for 257.21.
-
And the other one, chi-r: 49.91.
-
And this is 99% confidence.
-
Okay, so let's go back to the...
-
Where is it?
-
So these two, let's look at these two
and see what we have here.
-
And why... So this one, let's take sigma.
-
That interval is something like,
this is say 7.63 and this is 14.17.
-
That is for 95%.
-
What is this? 7.06 is right by here,
and this is like 16.03.
-
So that's 16.03 and 7.06.
-
As you see, the interval
for 95% is shorter.
-
So you say that
"with 99% confidence,
-
I think sigma lies in this interval."
-
The other one is 95% confidence,
and that interval is smaller.
-
If you want 50% confidence,
-
then that will become even more,
even smaller than this one.
-
Why is it happening?
What's the meaning?
-
And why more confidence
gives you larger interval?
-
You see, when you
need more confidence...
-
How shall I say that?
-
Well, you see, this larger interval--
-
so you are more confident
that they are...
-
This lies in a larger interval...
-
Well, because it has
to do with probability.
-
I can discuss it using probability,
-
so you can say that this
is, in fact, probability.
-
The probability that it lies
in this interval is 0.99.
-
So that makes the interval larger
because when you need the probability--
-
And remember, probability is like
area under certain curve,
-
theoretically or mathematically.
-
So that changes the interval
-
or let's say the area under the curve.
-
But intuitively, that also makes sense
-
because when you extend the range
with larger probability,
-
you can... you are more certain
with larger probability
-
that it really is here.
So if you say, okay,
-
you have hidden something in the house,
and the house has 10 rooms.
-
You can say that with--
just choose 9 of them
-
and you can say with 0.9 probability,
that object is in one of those 9 rooms.
-
But if you reduce that to 5 rooms,
-
then you can say, okay, with 50%
probability, it is in one of those rooms.
-
Now, which one is much better?
-
The 9 rooms is much better, right?
-
Because even intuitively,
-
you can say, yeah, I think
if we choose 9 rooms out of 10,
-
then there's a larger probability
that the object is in one of them.
-
But if you choose just
5 rooms out of 10,
-
then the probability goes down
-
(the probability that object which is
hidden is in one of those 5 rooms).
-
So you can see that
this is like the rooms.
-
This [the top interval] is much less
number of rooms. This is like 5 rooms,
-
this [bottom range] is like 9 rooms.
-
Maybe this example I give you,
that example makes sense.
-
This nice example, I just
came up with that example.
-
So if you want more confidence,
you get larger interval,
-
or let's say larger--
-
I should say it backwards.
-
Larger interval gives
you more confidence.
-
Okay. So that was the confidence interval
for this standard deviation and variance,
-
and I hope to see you next week.