
Titre:
Lecture 97  Bayes Theorem Honors

Description:
From the Think Again: How to Reason and Argue course on Coursera

Coins and dice provide a nice simple model
of how to calculate probabilities, but

everyday life is a lot more complicated
and it's not taken up with gambling.

At least, I hope your life is not taken up
with gambling.

So in order to make probabilities more
applicable to everyday life,

we need to look at, slightly more
complicated methods.

Now, because these methods
are more complicated,

this lecture is going to be
an honors lecture: it's optional.

It will not be on the quiz,

so don't get worried about that.

But it is still useful, and it's fascinating,

and it'll help you avoid some mistakes

that a lot people make
and that create a lot of problems.

And so I hope you'll stick with it and listen to this lecture.

And there will be exercises
to help you figure out

whether you understand
the material or not.

But don't get too worried, because
it's not going to be on the quizz.

The real problem
that we'll be facing in this lecture

is the problem of test

We use tests all the time:
we use tests to figure out

whether you have
a certain medical condition.

We use tests to predict the weather
or to predict people's future behavior.

We have certain indicators
of how they're going to act,

either commit a crime
or not commit a crime,

but also whether they're going to pass,

do well in school or fail.

We always use these tests
when we don't know for certain,

but we want some kind of evidence,
or some kind of indicator.

The problem is none of these tests
are perfect.

They always contain errors
of various sorts.

And what we're going to have to do is to
see how to take

those errors of different sorts
and build them together into a method

and then a formula for calculating
how reliable the method is

for detecting the thing that we want to detect.

This problem is a lot like the problem
we faced earlier

when we were talking about applying
generalizations to particular cases

because here we're going to be applying
probabilities to particular cases.

So it'll seem familiar to you in certain parts,

but you'll see that this case
is a little trickier.

The best examples occur in medicine.

So just imagine that you go to your doctor
for a regular checkup.

You don't have any special symptoms,

but he decides to do
a few screening tests.

And unfortunately, and very worryingly,
it turns out that you test positive

on one test for a particular form of cancer,
a certain kind of medical condition.

Well, what that means is that you might
have cancer.

Might, great.

You want to know whether you do have
cancer.

But of course, finding out for sure
whether or not you have cancer

is going to take further tests.

And those tests might be expensive,
they might be dangerous,

they're going to be invasive
in various ways.

So you really want to know what's the
probability,

given that you've tested positive
on this one test,

that you really have cancer.

Now clearly that probability is going
to depend on a number of facts

about this type of cancer,
about the type of test and so on.

And I am not a doctor.

I am not giving you medical advice.

If you test positive on a test,
go talk to your doctor,

don't trust me, because I'm just
making up numbers here.

But let's do make up a few numbers
and figure out

what the likelihood is of having cancer,
given that you tested positive.

So let's imagine that the base rate
of this particular type of cancer

in the population is 0.3%, that is,
3 out of 1,000, or 0.003.

And they say that's the base rate,

or it's sometimes called the prevalence
of the condition in the population.

That's simply to say that out of 1,000
people chosen randomly

in the population, you'd get about 3
that have this condition.

It's just a percentage
of the general population.

So that's the condition, what about the
test?

Well the first thing we want to know
is the sensitivity of the test.

The sensitivity of the test we're going to
assume is 0.99.

And what that means is that out of
100 people who have this condition,

99 of them will test positive.

So this test is pretty good at figuring
out,

from among the people
who have the condition, which ones do.

99 of those 100 people who have the
condition will test positive.

The other feature is specificity, and what
that means is

the percentage of the people who don't
have the condition who will test negative.

The point here is you're not going
to get a positive result

for people who don't have the condition,
right?

Because you want it to be specific
to this particular condition

and not get a bunch of positives for
people who have other types of conditions

or no medical condition at all.

So the specificity we're going to assume,

in this particular case we're talking about, is also 99%.

Now, what we want to know is the probability
that you have a cancer, a condition,

given that you tested positive on the test;

but notice that the sensitivity
tells you the probability

that you will test positive
given that you have the condition.

We want to know the opposite of that,

the probability
that you have the condition

given that you tested positive.

And that's what we have to do
a little calculation to figure out.

But before we do that calculation,
I want you to think about these figures

that I've given you:
the prevalence in the population,

the sensitivity of the test,
the specificity of the test,

and just make a guess.

Just start out by writing down
on a piece of paper

what you think the probability is
that you would have the cancer

given that you tested positive
on the test.

Take a minute and think about it
and write it down.

But we don't want to just guess
about medical conditions,

about probabilities that really matter
as much as this will do.

Instead, we want to calculate what the
probability really is.

So, let's go through it carefully and
show you how to use

what I'll call the box method in order
to calculate the real likelihood

that you have the condition, given that
you got a positive test result.

What we need to do is to divide the
population into four different groups:

the group that has the condition
and tested positive,

the group that has the condition
and tested negative,

the group that doesn't have the condition
and tested positive,

and the group that doesn't have
the condition and tested negative.

And this chart will show you a nice,
simple way of organizing

all of that information.

Because this row, the top row, tells
you all the people who tested positive.

The bottom row tells you the people
who tested negative.

Then, the left column gives you the
people who do have the medical condition,

in this case, some kind of cancer.

And the right column tells you the people
who do not have that condition.

Now what we need to do is to start
filling it out with numbers.

Now the first thing we need to specify is
the population.

In this case we want to start with a big
enough population

that we're not going to have a lot
of fractions in the other boxes.

So, let's just imagine that the population
is 100,000.

Make it a million or 10 million,
it doesn't matter

because we're going to be interested
in the ratios with the different groups.

We can use that 100,000 to fill out the
other boxes,

if we know the prevalence, or the
base rate,

because the base rate tells you what
percentage of that 100,000

actually do have the condition and
don't have the condition.

We imagined  remember we're just
making up numbers here 

but we imagined that the prevalence
of this condition is 0.3%.

And that means out of 100,000 people,
there will be 300

who do have the medical condition.

Well, if there are 300 who have it and
there are 100,000 total,

we can figure out how many don't have the
medical condition by just subtracting.

Which means 99,700
do not have the medical condition.

Okay?

Now, we've divided the population into our
two columns:

the ones that do and the ones that don't
have the medical condition.

The next step is to figure out how many
are going to test positive

and how many are going to test negative
out of each of these groups.

For that, we first need the sensitivity.

The sensitivity tells us the percentage
of the cases that have the condition

who will test positive.

So the people who have the condition are
the 300.

The ones who test positive are going
to go up in this area

and we know from the sensitivity being 0.99 or 99%

that the number in that area should be 99%
of 300, or 297.

And of course, if that's the number
that test positive,

then the remainder
are going to test negative

and that means that we'll have three.

Which shouldn't surprise you because if
99% of the cases that have it

test positive, then 1% will test negative,
and 1% of 300 is 3.

Good: so we got the first column done.

Now, the next question is going to be the
specificity.

We can use the specificity to figure out
what goes in that next column.

If the specificity is 99 and we know

that 99,700 people do not have the
condition out of our sample of 100,000,

well, that means that 99% of 99,700 are
going to test negative

because the specificity is the
percentage of cases without the condition

that test negative.

And that means that we'll have
98,703 among the people

who do not have the condition
who test negative.

How many are going to test positive?
The rest of them.

So 99,700 minus 98,703
is going to be 997.

And of course, that shouldn't be surprising
again, because 1% of 99,700 is 997.

We only got two boxes left to fill out.

How do you fill out those?

Well, this box in the upper right,
is the total number of people

in this population of 100,000
who test positive.

And so, we can get that by adding the ones
that do have the condition and test positive

and the ones that don't have
the condition and test positive.

Just add them together, and you get 1,294.

And you do the same on the next row,
because that blank is the area

that has all the people
who test negative,

and 3 people who have the condition
test negative,

98,703 people who do not have the
condition test negative,

so the total is going to be 98,706.

And we can check to make sure that
we got it right,

by just adding them together:
1,294 plus 98,706 is equal to 100,000.

Phew, we got it right.

Okay, so now we've divided the population
into those people who have the condition,

those people who don't have the
condition,

and we know how many of each
of those groups test positive,

and how many of each of those groups
test negative.

The real question is
what's the probability

that I have cancer or the medical
condition, given that I tested positive?

How do we figure that out?

Well, the total number
of positive tests was 1,294

and the people who tested positive
who really had the condition was 297.

So it looks like the probability of
actually having the condition,

given that you tested positive,
is 297 out of 1294 or 0.23.

That's 23%, less than one in four.

Is that what you guessed?

Most people, including most doctors, when
they hear that the test is

99% sensitive and 99% specific, will
guess a lot higher than one in four.

>> Oh my gosh!

I'm a doctor, and I never would have
thought that!

>> Now, don't worry:

she's not a physician.
she's a metaphysician.

>> But in this case, the probability
really is just one in four

that you had that medical condition.

Now how did that happen?

The reason was that the prevalence or the
base rate was so low

that even a small rate
of false positives,

given the massive numbers of people who
don't have the condition,

will mean that there are more false positives,
3 times as many,

as there are true positives.

And that's why the probability
is just one in four,

actually a little less than one in four,

that you have the medical condition even
when you tested positive.

I want to add a quick caveat here, in
order to avoid misinterpretation.

because the point here is that, if you
have a screening test for a condition

with a very low base rate or prevalence,
and you don't have any symptoms

that put you in a special category,
then, you need to get another test

before you jump to any conclusions
about having the medical condition.

Because, if you have that other test,
then the fact that you tested positive

on the first test puts you in a smaller class,

with a much higher base rate, or prevalence.

And now, the probability's going to go up.

Most doctors know that, and that's why,
after the first test,

they don't jump to conclusions, and they
order another test,

but many patients don't realize that and
they get extremely worried

after a single test even when they don't
have any symptoms.

So that's the mistake
that we're trying to avoid here

and that's surprising, but it actually
applies to many different areas of life.

It applies, for example, to medical tests
with all kinds of other diseases.

Not just cancer or colon cancer, but
pretty much every disease

where the prevalence is extremely low.

It applies also to drug tests.

If somebody gets a positive drug test,

does that mean they really
were using drugs?

Well, if it's a population where the
base rate or prevelance of drug use

is quite low, then it might not.

Of course, if you assume that the
prevalence or base rate is quite high,

then you're going to believe
that drug test.

But you need to know the facts about what
the prevalence or base rate really is

in order to calculate
accurately the probability

that this person really was using drugs.

Same applies to evidence in legal trials:
take eyewitnesses for example,

it's very tricky, someone's trying to use
their eyes as a test for what they see.

They might identify a friend,
or they might just say

that car that did the hitandrun accident
was a Porsche.

Well, how good are they at identifying
Porsches?

If they get it right most of the time,
but not always,

and sometimes they don't get it right
when it is a Porsche,

then we've got the sensitivity and
specificity of what they identify.

And we can use that to calculate
how likely it is

that their evidence in the trial
really is reliable or not.

Another example is the prediction of
future behavior.

We might have some kind of marker

that a certain group of people
with that marker

have a certain likelihood of
committing crimes.

But if crimes are very rare
in that community and every other,

then a test which has a pretty good
sensitivity and specificity

still might not be good enough when
we're talking about something like crime

that's actually very rare and has
a very low prevalence or base rate

in most communities.

And the same applies
to failing out of school.

Our SAT scores or GRE scores
are going to be

good predictors of
who's going to fail out of school.

Well, if very few people fail out of
school,

so that the prevalence and base rate
is very low,

then, even if they're
pretty sensitive and specific,

they might not be good predictors.

So this same type of problem arises
in a lot of different areas.

And I'm not going to go through
more examples right now,

but we'll have plenty of examples in the
exercises at the end of this chapter.

I want to end, though,
by saying a few things

that are a bit more technical
about this method.

First, there's a lot of terminology to
learn,

because when you read about using
this method in other areas,

for other types of topics,
then you'll run into these terms,

and it's a good idea to know them.

So first, the cases where the person does
have the condition and also tests positive

are called hits, or true positives.

Different people use different terms.

The cases where the person tests positive,
but they don't have the condition,

are called, false positives
or false alarms.

The cases where a person really does have
the condition, but tests negative

are called misses or false negatives.

And the cases where the person
does not have the condition

and the test comes out negative
are called true negatives,

because they're negative and it's true
that they don't have the condition.

If we put together the false negatives,
and the true negatives,

we get the total set of negatives.

And if we put together the true positives
and the false positives

we get, the total set of positives.

And of course, we have the general
population.

Within that population,
a percentage that have the condition

and a percentage
that don't have the condition.

Now, what's the base rate?

The base rate in this population is simply
the set that have the condition,

divided by the total population,
which is Box 7 divided by Box 9.

If we use e for the evidence

and h for the hypothesis being true that
the condition really does exist,

then that's the probability of h,

and the sensitivity is going to be
the total number of true positives

divided by the total number of people
with the condition,

because it's the percentage of people who
have the condition and test positive.

OK? So that's the probability of e given h,

and it's box one divided by box 7.

The specificity in contrast is the ratio
of it being a true negative

to the total number of people
who do not have the condition, that is,

the probability of not e, that is,

not having the evidence
of a positive test result,

given not h,
given that you're in the second column,

where the hypothesis is false,
because you don't have the condition.

So that's Box 5 divided by Box 8.

That's the specificity.

So we can define all of these
in terms of each other.

The hits divided by the total with that
condition is going to be the sensitivity.

And you can use this terminology to guide
your way through this box.

And the big question is again going to be
what's the solution?

What's the probability of the hypothesis
having the condition, given the evidence,

that is, a positive test result:
that's going to be Box 1 divided by Box 3.

And as we saw in the case that we just
went through,

that gives you the probability of having
the medical condition, or colon cancer,

given a positive test result.

That's called the posterior probability,
or in symbols,

the probability of the hypothesis,
given the evidence.

So I hope this terminology helps you
understand some of the discussions of this,

if you go on and read about it
in the literature.

This procedure that we've been discussing
is actually just an application

of a famous theorem called Bayes' Theorem
after Thomas Bayes,

a 18th century English clergyman,
who was also a mathematician

and proved this extremely important
theorem in probability theory.

Now some of you out there will use the
boxes, and it'll make sense to you.

But some Courserians, I assume,
are mathematicians,

and they want to see
the mathematics behind it.

So now, I want to show you how to derive
Bayes' theorem

from the rules of probability
that we learned in earlier lectures.

So for all you math nerds out there,
here goes.

You start with rule 2G,

apply it to the probability that the
evidence and the hypothesis are both true.

And by the rule, that probability is
equal to the probability of the evidence,

times the probability of the hypothesis,
given the evidence.

You have to have
that conditional probability

because they're not independent.

Then you simply divide both sides of that
by the probability of the evidence:

a little simple algebra.

And you end up with the probability
of the hypothesis, given the evidence,

is equal to the probability
of the evidence and the hypothesis,

divided by the probability
of the evidence.

Now we can do a little trick.
This was ingenious.

Substitute for e, something
that's logically equivalent to e,

namely, the evidence AND the hypothesis
or the evidence AND NOT the hypothesis.

Now if you think about it, you'll see
that those are equivalent,

because either the hypothesis
has to be true

or NOT the hypothesis is true.

One or the other has to be true.

And that means that the evidence
AND the hypothesis

or the evidence AND NOT the hypothesis
is going to be equivalent to e.

So this is equivalent to this.

And because they're equivalent,
we can substitute them

within the formula for probability
without affecting the truth values.

So we just substitute this formula in
here for the e up there.

And we end up with the probability of the
hypothesis, given the evidence,

is equal to the probability of the
evidence AND the hypothesis, divided by

the probability of the evidence
AND the hypothesis

or the evidence AND NOT the hypothesis.

Now, that's not supposed to make much
sense, but it helps with the derivation.

The next step is to apply rule 3, because
we have a disjunction.

And notice the disjuncts are mutually
exclusive.

It cannot be true, both, that the evidence
AND the hypothesis is true,

and also that the evidence
AND NOT the hypothesis is true,

because it can't be both h and not h.

So we can apply the simple version
of rule 3.

And that means that the probability of
(e&h) or (e&~h)

is equal to the probability of (e&h
+ the probability of (e&~h).

We're just applying
that rule 3 for disjunction

that we learned a few lectures ago.

Now we apply rule 2G again,

because we have the probability
of a conjunction up in the top.

And, since these are not independent of
each other

 we hope not, if it's a hypothesis
and the evidence for it 

then we have to use
the conditional probability.

And using rule 2G, we find that
the probability of the hypothesis,

given the evidence, is equal to

the probability of the hypothesis, times
the probability of the evidence,

given the hypothesis, divided by
the probability of the hypothesis,

times the probability of the evidence,
given the hypothesis,

plus the probability
of the hypothesis being false,

that is the probability of NOT h,
times the probability of the evidence,

given NOT h, or the hypothesis being false.

And that's a mouthful

and it's a long formula,
but that's the mathematical formula

that Bayes proved in the 18th century
and it provides the mathematical basis

for that whole system of boxes
that we talked about before.

But if you don't like the mathematical
proof and that's too confusing for you,

then use the boxes.

And if you don't like the boxes,
use the mathematical proof.

They're both going to work:
just pick the one that works for you.

In fact, you don't have to pick
either of them,

because remember, this is an honors
lecture, it's optional,

and it won't be on the quiz.

But if you do want to try this method,
and make sure that you understand it,

we'll have a bunch of exercises for you,
where you can test your skills.