YouTube

Got a YouTube account?

New: enable viewer-created translations and captions on your YouTube channel!

English subtitles

← Information.3.InformationAndProbability

Get Embed Code
2 Languages

Showing Revision 4 created 09/23/2016 by Steven Gunawan.

  1. We've talked about information as bits,
    measuring information
  2. we've talked about counting, so we can use
    bits to count 0 0 0 1, 1 0 1 1,
  3. counting from zero up to three, modulo two.
  4. We've talked as bits as labeling,
  5. that we can use barcodes which are just
    bits to label things.
  6. And finally we've talked about how bits are
    physical,
  7. that all bits that we have in computers, all
    the bits of information that I'm conveying
  8. via the vibrations of my vocal chords and
    the vibrations of the air
  9. are actually physical systems, physical
    manifestations of information.
  10. And then we also talked about a discovery
    which is 150 years old,
  11. that all physical systems carry information,
  12. and that amount of information can be quantified.
  13. So number of bits is the logarithm to the base two
    of the number of possibilities,
  14. a result which ironically is inscribed on Boltzmann's
    grave.
  15. So now I'd like to give you another aspect
    of bits, and this a very 20th century aspect
  16. of bits of information.
  17. And that is the relationship between information
    and probability.
  18. So probability is something that we're all
    familiar with and all confused by,
  19. and I'm always confused by probability.
  20. Human beings are known to have a very bad
    intuitive sense of probability,
  21. we overestimate the probability of truly
    awful events,
  22. we underestimate the probability of fine,
    nice, normal events.
  23. Of course, from an evolutionary standpoint,
    overestimating the probability of some event
  24. like a sabretooth tiger dropping out of this
    tree and sinking his teeth into your neck,
  25. this is probably a good thing, which might
    be why.
  26. But there's a simple idea of probability,
    and let me try to demonstrate them right here.
  27. So let's take the example of heads and tails.
  28. I have here a nice shiny, new nickel that's
    been given to me by a member of the Santa Fe
  29. Institute, she didn't ask me to give it back
    either so I'm five cents ahead.
  30. So it can either be heads or tails.
  31. What do you think? What's the probability that
    it's head or that it's tails?
  32. Well I claim it's fifty-fifty. But why?
  33. Why is it one half? The probability that it's
    heads or tails.
  34. It was tails, I swear.
  35. So there are two notions of probability for
    heads and tails.
  36. So one notion is - and I claim that this is the kind
    of nicest, most intuitive notion - when I just
  37. flip it like this, I wasn't watching it on the
    air, I didn't know how hard I flipped it,
  38. I didn't see it before I put it down there.
  39. I have no reason for preferring heads over
    tails.
  40. Heads over tails are just a priori they have
    equal weight.
  41. Heads. It was heads by the way, now the
    probability is one that it was heads,
  42. that's the funny thing about probabilities.
  43. First you don't know and you have ones that
    are probabilities.
  44. These are called prior or a priori probabilities.
  45. So probability of heads is equal to the
    probability of tails is one-half,
  46. because there is no reason to prefer heads
    over tails. This is a good argument.
  47. So this is the prior probabilities of heads
    or tails, it's 50 percent.
  48. But there's another argument about why
    the probability of heads and tails would be 50 percent.
  49. So let me just try it like this, let me just
    this coin a bunch of times.
  50. Tails.
  51. Heads.
  52. Heads.
  53. Heads.
  54. Tails.
  55. Heads.
  56. Tails.
  57. Heads.
  58. Heads.
  59. So I actually got seven heads and three tails
    out of ten tosses.
  60. That was kind of dull, this is the problem.
  61. With probability it's dull and confusing
    and to figure out what's going on,
  62. you have to do it many times.
  63. Because I don't think that you're going to
    agree that this shiny new United States nickel
  64. really has a probability of having seven out
    of ten of having heads and three out of ten of
  65. having tails.
  66. It was just the luck of the draw,
  67. or the luck of the toss.
  68. It just so happens that there were seven
    heads and three tails, which, if you're flipping
  69. a coin ten times, is pretty reasonable.
  70. So if I were to flip this coin a whole bunch
    more times,
  71. which I'm not going to do because I know it
    will be dull, you would be very bored by this.
  72. So if I flip a coin, and I should say a fair coin,
  73. I should note that in my classes at MIT,
    the students all start out seeming to
  74. believe what I say, but after a few lectures,
    they become very distrustful.
  75. I don't know why this is, I seem like a
    trustworthy person.
  76. Anyway, I flip a fair coin m times and we
    look at the number of heads and the number of tails
  77. and the sum of the number of heads plus the
    number of tails is equal to m.
  78. I just flipped it ten times.
  79. And we're going to call the frequency,
  80. or the frequency of heads
  81. is just equal to the number of heads divided
    by m.
  82. So I flipped it ten times, I got seven heads,
    frequency is 0.7.
  83. Frequency of tails, as you may very well guess,
    is the number of tails over m,
  84. and that's equal to one minus the number of heads
    divided by m.
  85. Now what we expect, just from personal
    experience, is that if we just keep flipping
  86. the coin many many many times.
  87. Well, if I flip it 100 times, I certainly don't
    expect to get exactly 50 heads,
  88. which would be a frequency of exactly 0.5,
    matching the probability.
  89. But I would expect to get something a little
    better than 0.7, seven-tenths.
  90. That seems, you know, very unlikely, that
    if I flip it a hundred times I'm going to get 70 heads.
  91. It's perfectly possible, why not.
  92. So I will just give you the formula for this.
  93. So the expected number of heads, which is
    also the expected number of tails, because
  94. there's nothing to choose between them,
  95. is equal to 50 percent.
  96. I flip it 100 times, for example, m is equal to 100.
    Then m over two is equal to 50.
  97. So I'd expect to get it roughly 50, and then
    I'm going to use this notation, plus or minus,
  98. I'll explain what this is in a moment, plus
    one-half times the square root of m.
  99. So actually what you would expect means
    well it's roughly in this interval.
  100. I flip it 100 times, the square root of 100
    is 10.
  101. I expect it to be roughly within five, might
    be a few more, might be seven or eight more
  102. but I'd be really kind of surprised if there
    were seventy heads and thirty tails.
  103. I would think it'd be more likely, you know
    60 heads, 40 tails, but probably more like
  104. 55 and 45.
  105. And that's actually what you can do.
  106. So let's actually ask why is this so.
  107. So if I look at all different possible sequences
  108. H H T T H H H T H H H T
  109. you may notice that the first ten of these
    are pretty much what I got for when
  110. I was flipping the coin.
  111. Dot dot dot, which is a way of meaning
    et cetera.
  112. Just keeps on going, and then we're going
    to have n of these,
  113. and we're going to count the number of
    possible sequences
  114. with exactly m_h heads and m_t tails.
  115. Of course, because it's got to be heads or
    tails,
  116. at least unless it lands on its side, which
    I don't think it's going to do,
  117. this has got to add up to m.
  118. So I'm going to count the number of possible
    sequences with exactly m_h heads, m_t tails,
  119. the two have to add up to m.
  120. And what we're going to find out, well there's not so
    many sequences which are heads heads heads...
  121. tails.
  122. So there's going to be a very small number
    of sequences that have almost all heads and
  123. a few tails.
  124. There's similarly going to be very small
    number of sequences that have almost all
  125. tails and a few heads, and there's going to
    be humongous number of sequences that have
  126. roughly the same number of heads and of tails.
  127. So you can see, to relate this to information
    theory,
  128. each sequence is like a sequence of zeros
    and ones.
  129. You can call heads zero and tails one,
  130. this is just a long long long bit string.
  131. And so we can relate ideas of information,
  132. numbers of possible sequences with a
    particular pattern,
  133. in this case a particular number of heads and
    of tails
  134. to probability.