Return to Video

Lecture 9-7 - Bayes Theorem Honors

  • 0:03 - 0:09
    Coins and dice provide a nice simple model
    of how to calculate probabilities, but
  • 0:10 - 0:15
    everyday life is a lot more complicated
    and it's not taken up with gambling.
  • 0:15 - 0:17
    At least, I hope your life is not taken up
    with gambling.
  • 0:18 - 0:22
    So in order to make probabilities more
    applicable to everyday life,
  • 0:22 - 0:26
    we need to look at, slightly more
    complicated methods.
  • 0:27 - 0:30
    Now, because these methods
    are more complicated,
  • 0:30 - 0:34
    this lecture is going to be
    an honors lecture: it's optional.
  • 0:34 - 0:36
    It will not be on the quiz,
  • 0:36 - 0:38
    so don't get worried about that.
  • 0:38 - 0:42
    But it is still useful, and it's fascinating,
  • 0:42 - 0:44
    and it'll help you avoid some mistakes
  • 0:44 - 0:48
    that a lot people make
    and that create a lot of problems.
  • 0:48 - 0:53
    And so I hope you'll stick with it and listen to this lecture.
  • 0:53 - 0:57
    And there will be exercises
    to help you figure out
  • 0:57 - 0:59
    whether you understand
    the material or not.
  • 0:59 - 1:03
    But don't get too worried, because
    it's not going to be on the quizz.
  • 1:05 - 1:08
    The real problem
    that we'll be facing in this lecture
  • 1:09 - 1:11
    is the problem of test
  • 1:11 - 1:14
    We use tests all the time:
    we use tests to figure out
  • 1:14 - 1:17
    whether you have
    a certain medical condition.
  • 1:17 - 1:22
    We use tests to predict the weather
    or to predict people's future behavior.
  • 1:22 - 1:25
    We have certain indicators
    of how they're going to act,
  • 1:26 - 1:28
    either commit a crime
    or not commit a crime,
  • 1:28 - 1:30
    but also whether they're going to pass,
  • 1:30 - 1:32
    do well in school or fail.
  • 1:33 - 1:38
    We always use these tests
    when we don't know for certain,
  • 1:38 - 1:41
    but we want some kind of evidence,
    or some kind of indicator.
  • 1:42 - 1:45
    The problem is none of these tests
    are perfect.
  • 1:45 - 1:48
    They always contain errors
    of various sorts.
  • 1:49 - 1:52
    And what we're going to have to do is to
    see how to take
  • 1:52 - 1:58
    those errors of different sorts
    and build them together into a method
  • 1:58 - 2:03
    and then a formula for calculating
    how reliable the method is
  • 2:03 - 2:06
    for detecting the thing that we want to detect.
  • 2:07 - 2:10
    This problem is a lot like the problem
    we faced earlier
  • 2:10 - 2:15
    when we were talking about applying
    generalizations to particular cases
  • 2:15 - 2:18
    because here we're going to be applying
    probabilities to particular cases.
  • 2:19 - 2:22
    So it'll seem familiar to you in certain parts,
  • 2:22 - 2:25
    but you'll see that this case
    is a little trickier.
  • 2:26 - 2:28
    The best examples occur in medicine.
  • 2:29 - 2:32
    So just imagine that you go to your doctor
    for a regular checkup.
  • 2:33 - 2:34
    You don't have any special symptoms,
  • 2:35 - 2:38
    but he decides to do
    a few screening tests.
  • 2:39 - 2:44
    And unfortunately, and very worryingly,
    it turns out that you test positive
  • 2:45 - 2:51
    on one test for a particular form of cancer,
    a certain kind of medical condition.
  • 2:52 - 2:56
    Well, what that means is that you might
    have cancer.
  • 2:57 - 2:58
    Might, great.
  • 2:58 - 3:00
    You want to know whether you do have
    cancer.
  • 3:01 - 3:04
    But of course, finding out for sure
    whether or not you have cancer
  • 3:04 - 3:06
    is going to take further tests.
  • 3:06 - 3:11
    And those tests might be expensive,
    they might be dangerous,
  • 3:11 - 3:13
    they're going to be invasive
    in various ways.
  • 3:14 - 3:17
    So you really want to know what's the
    probability,
  • 3:17 - 3:21
    given that you've tested positive
    on this one test,
  • 3:21 - 3:23
    that you really have cancer.
  • 3:24 - 3:28
    Now clearly that probability is going
    to depend on a number of facts
  • 3:28 - 3:31
    about this type of cancer,
    about the type of test and so on.
  • 3:32 - 3:34
    And I am not a doctor.
  • 3:34 - 3:36
    I am not giving you medical advice.
  • 3:37 - 3:41
    If you test positive on a test,
    go talk to your doctor,
  • 3:41 - 3:44
    don't trust me, because I'm just
    making up numbers here.
  • 3:44 - 3:48
    But let's do make up a few numbers
    and figure out
  • 3:48 - 3:53
    what the likelihood is of having cancer,
    given that you tested positive.
  • 3:53 - 3:59
    So let's imagine that the base rate
    of this particular type of cancer
  • 3:59 - 4:06
    in the population is 0.3%, that is,
    3 out of 1,000, or 0.003.
  • 4:06 - 4:08
    And they say that's the base rate,
  • 4:08 - 4:13
    or it's sometimes called the prevalence
    of the condition in the population.
  • 4:13 - 4:17
    That's simply to say that out of 1,000
    people chosen randomly
  • 4:17 - 4:20
    in the population, you'd get about 3
    that have this condition.
  • 4:22 - 4:25
    It's just a percentage
    of the general population.
  • 4:26 - 4:28
    So that's the condition, what about the
    test?
  • 4:29 - 4:32
    Well the first thing we want to know
    is the sensitivity of the test.
  • 4:33 - 4:37
    The sensitivity of the test we're going to
    assume is 0.99.
  • 4:39 - 4:46
    And what that means is that out of
    100 people who have this condition,
  • 4:46 - 4:49
    99 of them will test positive.
  • 4:49 - 4:54
    So this test is pretty good at figuring
    out,
  • 4:54 - 4:57
    from among the people
    who have the condition, which ones do.
  • 4:57 - 5:03
    99 of those 100 people who have the
    condition will test positive.
  • 5:03 - 5:08
    The other feature is specificity, and what
    that means is
  • 5:08 - 5:13
    the percentage of the people who don't
    have the condition who will test negative.
  • 5:14 - 5:18
    The point here is you're not going
    to get a positive result
  • 5:18 - 5:20
    for people who don't have the condition,
    right?
  • 5:20 - 5:24
    Because you want it to be specific
    to this particular condition
  • 5:24 - 5:28
    and not get a bunch of positives for
    people who have other types of conditions
  • 5:28 - 5:29
    or no medical condition at all.
  • 5:30 - 5:32
    So the specificity we're going to assume,
  • 5:32 - 5:37
    in this particular case we're talking about, is also 99%.
  • 5:39 - 5:46
    Now, what we want to know is the probability
    that you have a cancer, a condition,
  • 5:47 - 5:51
    given that you tested positive on the test;
  • 5:51 - 5:55
    but notice that the sensitivity
    tells you the probability
  • 5:55 - 5:59
    that you will test positive
    given that you have the condition.
  • 5:59 - 6:02
    We want to know the opposite of that,
  • 6:02 - 6:05
    the probability
    that you have the condition
  • 6:05 - 6:07
    given that you tested positive.
  • 6:08 - 6:11
    And that's what we have to do
    a little calculation to figure out.
  • 6:11 - 6:15
    But before we do that calculation,
    I want you to think about these figures
  • 6:15 - 6:18
    that I've given you:
    the prevalence in the population,
  • 6:18 - 6:22
    the sensitivity of the test,
    the specificity of the test,
  • 6:22 - 6:23
    and just make a guess.
  • 6:24 - 6:27
    Just start out by writing down
    on a piece of paper
  • 6:27 - 6:32
    what you think the probability is
    that you would have the cancer
  • 6:32 - 6:36
    given that you tested positive
    on the test.
  • 6:37 - 6:40
    Take a minute and think about it
    and write it down.
  • 6:41 - 6:45
    But we don't want to just guess
    about medical conditions,
  • 6:45 - 6:48
    about probabilities that really matter
    as much as this will do.
  • 6:49 - 6:53
    Instead, we want to calculate what the
    probability really is.
  • 6:54 - 6:59
    So, let's go through it carefully and
    show you how to use
  • 6:59 - 7:04
    what I'll call the box method in order
    to calculate the real likelihood
  • 7:04 - 7:09
    that you have the condition, given that
    you got a positive test result.
  • 7:09 - 7:16
    What we need to do is to divide the
    population into four different groups:
  • 7:16 - 7:20
    the group that has the condition
    and tested positive,
  • 7:20 - 7:23
    the group that has the condition
    and tested negative,
  • 7:23 - 7:26
    the group that doesn't have the condition
    and tested positive,
  • 7:26 - 7:29
    and the group that doesn't have
    the condition and tested negative.
  • 7:29 - 7:34
    And this chart will show you a nice,
    simple way of organizing
  • 7:34 - 7:36
    all of that information.
  • 7:36 - 7:44
    Because this row, the top row, tells
    you all the people who tested positive.
  • 7:44 - 7:49
    The bottom row tells you the people
    who tested negative.
  • 7:50 - 7:56
    Then, the left column gives you the
    people who do have the medical condition,
  • 7:56 - 7:58
    in this case, some kind of cancer.
  • 7:59 - 8:03
    And the right column tells you the people
    who do not have that condition.
  • 8:04 - 8:08
    Now what we need to do is to start
    filling it out with numbers.
  • 8:09 - 8:13
    Now the first thing we need to specify is
    the population.
  • 8:13 - 8:16
    In this case we want to start with a big
    enough population
  • 8:16 - 8:19
    that we're not going to have a lot
    of fractions in the other boxes.
  • 8:19 - 8:23
    So, let's just imagine that the population
    is 100,000.
  • 8:23 - 8:25
    Make it a million or 10 million,
    it doesn't matter
  • 8:25 - 8:29
    because we're going to be interested
    in the ratios with the different groups.
  • 8:31 - 8:34
    We can use that 100,000 to fill out the
    other boxes,
  • 8:34 - 8:36
    if we know the prevalence, or the
    base rate,
  • 8:37 - 8:40
    because the base rate tells you what
    percentage of that 100,000
  • 8:40 - 8:44
    actually do have the condition and
    don't have the condition.
  • 8:45 - 8:48
    We imagined -- remember we're just
    making up numbers here --
  • 8:48 - 8:52
    but we imagined that the prevalence
    of this condition is 0.3%.
  • 8:52 - 8:56
    And that means out of 100,000 people,
    there will be 300
  • 8:56 - 9:00
    who do have the medical condition.
  • 9:01 - 9:04
    Well, if there are 300 who have it and
    there are 100,000 total,
  • 9:04 - 9:08
    we can figure out how many don't have the
    medical condition by just subtracting.
  • 9:08 - 9:12
    Which means 99,700
    do not have the medical condition.
  • 9:13 - 9:14
    Okay?
  • 9:14 - 9:18
    Now, we've divided the population into our
    two columns:
  • 9:18 - 9:21
    the ones that do and the ones that don't
    have the medical condition.
  • 9:21 - 9:26
    The next step is to figure out how many
    are going to test positive
  • 9:26 - 9:30
    and how many are going to test negative
    out of each of these groups.
  • 9:31 - 9:34
    For that, we first need the sensitivity.
  • 9:34 - 9:38
    The sensitivity tells us the percentage
    of the cases that have the condition
  • 9:38 - 9:40
    who will test positive.
  • 9:41 - 9:45
    So the people who have the condition are
    the 300.
  • 9:46 - 9:50
    The ones who test positive are going
    to go up in this area
  • 9:51 - 9:57
    and we know from the sensitivity being 0.99 or 99%
  • 9:57 - 10:03
    that the number in that area should be 99%
    of 300, or 297.
  • 10:05 - 10:08
    And of course, if that's the number
    that test positive,
  • 10:09 - 10:12
    then the remainder
    are going to test negative
  • 10:12 - 10:14
    and that means that we'll have three.
  • 10:14 - 10:19
    Which shouldn't surprise you because if
    99% of the cases that have it
  • 10:19 - 10:24
    test positive, then 1% will test negative,
    and 1% of 300 is 3.
  • 10:24 - 10:26
    Good: so we got the first column done.
  • 10:27 - 10:31
    Now, the next question is going to be the
    specificity.
  • 10:31 - 10:37
    We can use the specificity to figure out
    what goes in that next column.
  • 10:38 - 10:44
    If the specificity is 99 and we know
  • 10:44 - 10:51
    that 99,700 people do not have the
    condition out of our sample of 100,000,
  • 10:52 - 10:59
    well, that means that 99% of 99,700 are
    going to test negative
  • 10:59 - 11:03
    because the specificity is the
    percentage of cases without the condition
  • 11:03 - 11:05
    that test negative.
  • 11:05 - 11:11
    And that means that we'll have
    98,703 among the people
  • 11:11 - 11:14
    who do not have the condition
    who test negative.
  • 11:15 - 11:18
    How many are going to test positive?
    The rest of them.
  • 11:18 - 11:27
    So 99,700 minus 98,703
    is going to be 997.
  • 11:28 - 11:35
    And of course, that shouldn't be surprising
    again, because 1% of 99,700 is 997.
  • 11:36 - 11:39
    We only got two boxes left to fill out.
  • 11:39 - 11:41
    How do you fill out those?
  • 11:41 - 11:47
    Well, this box in the upper right,
    is the total number of people
  • 11:47 - 11:51
    in this population of 100,000
    who test positive.
  • 11:51 - 11:56
    And so, we can get that by adding the ones
    that do have the condition and test positive
  • 11:56 - 12:00
    and the ones that don't have
    the condition and test positive.
  • 12:00 - 12:06
    Just add them together, and you get 1,294.
  • 12:06 - 12:13
    And you do the same on the next row,
    because that blank is the area
  • 12:13 - 12:16
    that has all the people
    who test negative,
  • 12:16 - 12:20
    and 3 people who have the condition
    test negative,
  • 12:21 - 12:25
    98,703 people who do not have the
    condition test negative,
  • 12:26 - 12:30
    so the total is going to be 98,706.
  • 12:30 - 12:35
    And we can check to make sure that
    we got it right,
  • 12:35 - 12:44
    by just adding them together:
    1,294 plus 98,706 is equal to 100,000.
  • 12:45 - 12:47
    Phew, we got it right.
  • 12:47 - 12:52
    Okay, so now we've divided the population
    into those people who have the condition,
  • 12:53 - 12:55
    those people who don't have the
    condition,
  • 12:55 - 12:59
    and we know how many of each
    of those groups test positive,
  • 12:59 - 13:03
    and how many of each of those groups
    test negative.
  • 13:04 - 13:08
    The real question is
    what's the probability
  • 13:08 - 13:12
    that I have cancer or the medical
    condition, given that I tested positive?
  • 13:12 - 13:14
    How do we figure that out?
  • 13:14 - 13:20
    Well, the total number
    of positive tests was 1,294
  • 13:21 - 13:27
    and the people who tested positive
    who really had the condition was 297.
  • 13:28 - 13:34
    So it looks like the probability of
    actually having the condition,
  • 13:35 - 13:44
    given that you tested positive,
    is 297 out of 1294 or 0.23.
  • 13:44 - 13:47
    That's 23%, less than one in four.
  • 13:48 - 13:49
    Is that what you guessed?
  • 13:50 - 13:55
    Most people, including most doctors, when
    they hear that the test is
  • 13:55 - 14:01
    99% sensitive and 99% specific, will
    guess a lot higher than one in four.
  • 14:02 - 14:03
    >> Oh my gosh!
  • 14:03 - 14:06
    I'm a doctor, and I never would have
    thought that!
  • 14:07 - 14:08
    >> Now, don't worry:
  • 14:08 - 14:11
    she's not a physician.
    she's a metaphysician.
  • 14:12 - 14:16
    >> But in this case, the probability
    really is just one in four
  • 14:16 - 14:18
    that you had that medical condition.
  • 14:18 - 14:19
    Now how did that happen?
  • 14:20 - 14:23
    The reason was that the prevalence or the
    base rate was so low
  • 14:24 - 14:28
    that even a small rate
    of false positives,
  • 14:28 - 14:33
    given the massive numbers of people who
    don't have the condition,
  • 14:34 - 14:37
    will mean that there are more false positives,
    3 times as many,
  • 14:38 - 14:39
    as there are true positives.
  • 14:40 - 14:43
    And that's why the probability
    is just one in four,
  • 14:43 - 14:45
    actually a little less than one in four,
  • 14:45 - 14:49
    that you have the medical condition even
    when you tested positive.
  • 14:49 - 14:53
    I want to add a quick caveat here, in
    order to avoid misinterpretation.
  • 14:54 - 14:59
    because the point here is that, if you
    have a screening test for a condition
  • 14:59 - 15:04
    with a very low base rate or prevalence,
    and you don't have any symptoms
  • 15:04 - 15:10
    that put you in a special category,
    then, you need to get another test
  • 15:10 - 15:15
    before you jump to any conclusions
    about having the medical condition.
  • 15:16 - 15:20
    Because, if you have that other test,
    then the fact that you tested positive
  • 15:20 - 15:23
    on the first test puts you in a smaller class,
  • 15:23 - 15:25
    with a much higher base rate, or prevalence.
  • 15:25 - 15:28
    And now, the probability's going to go up.
  • 15:29 - 15:32
    Most doctors know that, and that's why,
    after the first test,
  • 15:32 - 15:35
    they don't jump to conclusions, and they
    order another test,
  • 15:35 - 15:39
    but many patients don't realize that and
    they get extremely worried
  • 15:39 - 15:42
    after a single test even when they don't
    have any symptoms.
  • 15:44 - 15:46
    So that's the mistake
    that we're trying to avoid here
  • 15:46 - 15:53
    and that's surprising, but it actually
    applies to many different areas of life.
  • 15:55 - 15:59
    It applies, for example, to medical tests
    with all kinds of other diseases.
  • 16:00 - 16:04
    Not just cancer or colon cancer, but
    pretty much every disease
  • 16:04 - 16:06
    where the prevalence is extremely low.
  • 16:07 - 16:10
    It applies also to drug tests.
  • 16:11 - 16:13
    If somebody gets a positive drug test,
  • 16:13 - 16:15
    does that mean they really
    were using drugs?
  • 16:15 - 16:20
    Well, if it's a population where the
    base rate or prevelance of drug use
  • 16:20 - 16:23
    is quite low, then it might not.
  • 16:24 - 16:28
    Of course, if you assume that the
    prevalence or base rate is quite high,
  • 16:28 - 16:30
    then you're going to believe
    that drug test.
  • 16:31 - 16:35
    But you need to know the facts about what
    the prevalence or base rate really is
  • 16:35 - 16:39
    in order to calculate
    accurately the probability
  • 16:39 - 16:42
    that this person really was using drugs.
  • 16:43 - 16:48
    Same applies to evidence in legal trials:
    take eyewitnesses for example,
  • 16:48 - 16:55
    it's very tricky, someone's trying to use
    their eyes as a test for what they see.
  • 16:55 - 16:58
    They might identify a friend,
    or they might just say
  • 16:58 - 17:02
    that car that did the hit-and-run accident
    was a Porsche.
  • 17:03 - 17:08
    Well, how good are they at identifying
    Porsches?
  • 17:10 - 17:13
    If they get it right most of the time,
    but not always,
  • 17:13 - 17:18
    and sometimes they don't get it right
    when it is a Porsche,
  • 17:18 - 17:22
    then we've got the sensitivity and
    specificity of what they identify.
  • 17:23 - 17:26
    And we can use that to calculate
    how likely it is
  • 17:26 - 17:30
    that their evidence in the trial
    really is reliable or not.
  • 17:31 - 17:34
    Another example is the prediction of
    future behavior.
  • 17:35 - 17:37
    We might have some kind of marker
  • 17:38 - 17:41
    that a certain group of people
    with that marker
  • 17:41 - 17:44
    have a certain likelihood of
    committing crimes.
  • 17:44 - 17:49
    But if crimes are very rare
    in that community and every other,
  • 17:49 - 17:55
    then a test which has a pretty good
    sensitivity and specificity
  • 17:55 - 18:00
    still might not be good enough when
    we're talking about something like crime
  • 18:00 - 18:05
    that's actually very rare and has
    a very low prevalence or base rate
  • 18:05 - 18:06
    in most communities.
  • 18:07 - 18:09
    And the same applies
    to failing out of school.
  • 18:11 - 18:14
    Our SAT scores or GRE scores
    are going to be
  • 18:14 - 18:17
    good predictors of
    who's going to fail out of school.
  • 18:18 - 18:21
    Well, if very few people fail out of
    school,
  • 18:21 - 18:25
    so that the prevalence and base rate
    is very low,
  • 18:25 - 18:28
    then, even if they're
    pretty sensitive and specific,
  • 18:28 - 18:29
    they might not be good predictors.
  • 18:30 - 18:35
    So this same type of problem arises
    in a lot of different areas.
  • 18:36 - 18:38
    And I'm not going to go through
    more examples right now,
  • 18:38 - 18:42
    but we'll have plenty of examples in the
    exercises at the end of this chapter.
  • 18:44 - 18:46
    I want to end, though,
    by saying a few things
  • 18:46 - 18:49
    that are a bit more technical
    about this method.
  • 18:50 - 18:52
    First, there's a lot of terminology to
    learn,
  • 18:53 - 18:58
    because when you read about using
    this method in other areas,
  • 18:58 - 19:01
    for other types of topics,
    then you'll run into these terms,
  • 19:01 - 19:03
    and it's a good idea to know them.
  • 19:04 - 19:14
    So first, the cases where the person does
    have the condition and also tests positive
  • 19:14 - 19:17
    are called hits, or true positives.
  • 19:17 - 19:19
    Different people use different terms.
  • 19:22 - 19:28
    The cases where the person tests positive,
    but they don't have the condition,
  • 19:28 - 19:31
    are called, false positives
    or false alarms.
  • 19:34 - 19:41
    The cases where a person really does have
    the condition, but tests negative
  • 19:41 - 19:44
    are called misses or false negatives.
  • 19:47 - 19:51
    And the cases where the person
    does not have the condition
  • 19:52 - 19:55
    and the test comes out negative
    are called true negatives,
  • 19:55 - 19:58
    because they're negative and it's true
    that they don't have the condition.
  • 20:00 - 20:03
    If we put together the false negatives,
    and the true negatives,
  • 20:04 - 20:06
    we get the total set of negatives.
  • 20:07 - 20:12
    And if we put together the true positives
    and the false positives
  • 20:12 - 20:15
    we get, the total set of positives.
  • 20:16 - 20:19
    And of course, we have the general
    population.
  • 20:19 - 20:23
    Within that population,
    a percentage that have the condition
  • 20:23 - 20:26
    and a percentage
    that don't have the condition.
  • 20:27 - 20:29
    Now, what's the base rate?
  • 20:30 - 20:35
    The base rate in this population is simply
    the set that have the condition,
  • 20:36 - 20:42
    divided by the total population,
    which is Box 7 divided by Box 9.
  • 20:42 - 20:45
    If we use e for the evidence
  • 20:45 - 20:50
    and h for the hypothesis being true that
    the condition really does exist,
  • 20:50 - 20:53
    then that's the probability of h,
  • 20:54 - 21:02
    and the sensitivity is going to be
    the total number of true positives
  • 21:02 - 21:06
    divided by the total number of people
    with the condition,
  • 21:06 - 21:12
    because it's the percentage of people who
    have the condition and test positive.
  • 21:13 - 21:16
    OK? So that's the probability of e given h,
  • 21:16 - 21:20
    and it's box one divided by box 7.
  • 21:21 - 21:27
    The specificity in contrast is the ratio
    of it being a true negative
  • 21:27 - 21:32
    to the total number of people
    who do not have the condition, that is,
  • 21:32 - 21:35
    the probability of not e, that is,
  • 21:35 - 21:39
    not having the evidence
    of a positive test result,
  • 21:39 - 21:43
    given not h,
    given that you're in the second column,
  • 21:43 - 21:47
    where the hypothesis is false,
    because you don't have the condition.
  • 21:47 - 21:52
    So that's Box 5 divided by Box 8.
  • 21:54 - 21:56
    That's the specificity.
  • 21:56 - 22:00
    So we can define all of these
    in terms of each other.
  • 22:01 - 22:07
    The hits divided by the total with that
    condition is going to be the sensitivity.
  • 22:07 - 22:11
    And you can use this terminology to guide
    your way through this box.
  • 22:11 - 22:15
    And the big question is again going to be
    what's the solution?
  • 22:15 - 22:22
    What's the probability of the hypothesis
    having the condition, given the evidence,
  • 22:22 - 22:28
    that is, a positive test result:
    that's going to be Box 1 divided by Box 3.
  • 22:29 - 22:32
    And as we saw in the case that we just
    went through,
  • 22:32 - 22:37
    that gives you the probability of having
    the medical condition, or colon cancer,
  • 22:37 - 22:39
    given a positive test result.
  • 22:39 - 22:44
    That's called the posterior probability,
    or in symbols,
  • 22:44 - 22:48
    the probability of the hypothesis,
    given the evidence.
  • 22:48 - 22:53
    So I hope this terminology helps you
    understand some of the discussions of this,
  • 22:53 - 22:56
    if you go on and read about it
    in the literature.
  • 22:56 - 23:01
    This procedure that we've been discussing
    is actually just an application
  • 23:02 - 23:07
    of a famous theorem called Bayes' Theorem
    after Thomas Bayes,
  • 23:07 - 23:11
    a 18th century English clergyman,
    who was also a mathematician
  • 23:11 - 23:16
    and proved this extremely important
    theorem in probability theory.
  • 23:17 - 23:23
    Now some of you out there will use the
    boxes, and it'll make sense to you.
  • 23:23 - 23:26
    But some Courserians, I assume,
    are mathematicians,
  • 23:26 - 23:28
    and they want to see
    the mathematics behind it.
  • 23:29 - 23:33
    So now, I want to show you how to derive
    Bayes' theorem
  • 23:33 - 23:37
    from the rules of probability
    that we learned in earlier lectures.
  • 23:37 - 23:40
    So for all you math nerds out there,
    here goes.
  • 23:41 - 23:44
    You start with rule 2G,
  • 23:45 - 23:51
    apply it to the probability that the
    evidence and the hypothesis are both true.
  • 23:51 - 23:56
    And by the rule, that probability is
    equal to the probability of the evidence,
  • 23:56 - 24:01
    times the probability of the hypothesis,
    given the evidence.
  • 24:02 - 24:05
    You have to have
    that conditional probability
  • 24:05 - 24:07
    because they're not independent.
  • 24:09 - 24:14
    Then you simply divide both sides of that
    by the probability of the evidence:
  • 24:14 - 24:16
    a little simple algebra.
  • 24:16 - 24:20
    And you end up with the probability
    of the hypothesis, given the evidence,
  • 24:20 - 24:25
    is equal to the probability
    of the evidence and the hypothesis,
  • 24:25 - 24:28
    divided by the probability
    of the evidence.
  • 24:31 - 24:35
    Now we can do a little trick.
    This was ingenious.
  • 24:35 - 24:39
    Substitute for e, something
    that's logically equivalent to e,
  • 24:39 - 24:45
    namely, the evidence AND the hypothesis
    or the evidence AND NOT the hypothesis.
  • 24:46 - 24:48
    Now if you think about it, you'll see
    that those are equivalent,
  • 24:48 - 24:51
    because either the hypothesis
    has to be true
  • 24:51 - 24:54
    or NOT the hypothesis is true.
  • 24:54 - 24:56
    One or the other has to be true.
  • 24:57 - 25:00
    And that means that the evidence
    AND the hypothesis
  • 25:00 - 25:05
    or the evidence AND NOT the hypothesis
    is going to be equivalent to e.
  • 25:05 - 25:08
    So this is equivalent to this.
  • 25:08 - 25:11
    And because they're equivalent,
    we can substitute them
  • 25:11 - 25:15
    within the formula for probability
    without affecting the truth values.
  • 25:16 - 25:23
    So we just substitute this formula in
    here for the e up there.
  • 25:24 - 25:28
    And we end up with the probability of the
    hypothesis, given the evidence,
  • 25:28 - 25:32
    is equal to the probability of the
    evidence AND the hypothesis, divided by
  • 25:32 - 25:35
    the probability of the evidence
    AND the hypothesis
  • 25:35 - 25:38
    or the evidence AND NOT the hypothesis.
  • 25:38 - 25:41
    Now, that's not supposed to make much
    sense, but it helps with the derivation.
  • 25:43 - 25:48
    The next step is to apply rule 3, because
    we have a disjunction.
  • 25:48 - 25:51
    And notice the disjuncts are mutually
    exclusive.
  • 25:52 - 25:56
    It cannot be true, both, that the evidence
    AND the hypothesis is true,
  • 25:56 - 26:00
    and also that the evidence
    AND NOT the hypothesis is true,
  • 26:00 - 26:04
    because it can't be both h and not h.
  • 26:05 - 26:08
    So we can apply the simple version
    of rule 3.
  • 26:09 - 26:14
    And that means that the probability of
    (e&h) or (e&~h)
  • 26:14 - 26:21
    is equal to the probability of (e&h
    + the probability of (e&~h).
  • 26:22 - 26:24
    We're just applying
    that rule 3 for disjunction
  • 26:24 - 26:26
    that we learned a few lectures ago.
  • 26:27 - 26:30
    Now we apply rule 2G again,
  • 26:30 - 26:35
    because we have the probability
    of a conjunction up in the top.
  • 26:37 - 26:42
    And, since these are not independent of
    each other
  • 26:42 - 26:45
    -- we hope not, if it's a hypothesis
    and the evidence for it --
  • 26:45 - 26:48
    then we have to use
    the conditional probability.
  • 26:49 - 26:53
    And using rule 2G, we find that
    the probability of the hypothesis,
  • 26:53 - 26:55
    given the evidence, is equal to
  • 26:55 - 27:00
    the probability of the hypothesis, times
    the probability of the evidence,
  • 27:00 - 27:05
    given the hypothesis, divided by
    the probability of the hypothesis,
  • 27:05 - 27:09
    times the probability of the evidence,
    given the hypothesis,
  • 27:09 - 27:13
    plus the probability
    of the hypothesis being false,
  • 27:13 - 27:17
    that is the probability of NOT h,
    times the probability of the evidence,
  • 27:17 - 27:21
    given NOT h, or the hypothesis being false.
  • 27:22 - 27:23
    And that's a mouthful
  • 27:23 - 27:27
    and it's a long formula,
    but that's the mathematical formula
  • 27:27 - 27:33
    that Bayes proved in the 18th century
    and it provides the mathematical basis
  • 27:34 - 27:36
    for that whole system of boxes
    that we talked about before.
  • 27:37 - 27:43
    But if you don't like the mathematical
    proof and that's too confusing for you,
  • 27:43 - 27:44
    then use the boxes.
  • 27:44 - 27:47
    And if you don't like the boxes,
    use the mathematical proof.
  • 27:48 - 27:51
    They're both going to work:
    just pick the one that works for you.
  • 27:51 - 27:53
    In fact, you don't have to pick
    either of them,
  • 27:53 - 27:57
    because remember, this is an honors
    lecture, it's optional,
  • 27:58 - 28:00
    and it won't be on the quiz.
  • 28:01 - 28:04
    But if you do want to try this method,
    and make sure that you understand it,
  • 28:05 - 28:08
    we'll have a bunch of exercises for you,
    where you can test your skills.
Title:
Lecture 9-7 - Bayes Theorem Honors
Description:

From the Think Again: How to Reason and Argue course on Coursera

more » « less
Video Language:
English

English subtitles

Incomplete

Revisions