Got a YouTube account?

New: enable viewer-created translations and captions on your YouTube channel!

English subtitles

← Information.6.MutualInformation

Introduction to mutual information

Get Embed Code
1 Language


Showing Revision 22 created 10/20/2016 by Antonio Rueda Toicen.

  1. so we've been talking about information
  2. how you measure information
  3. and of course, you measure information in bits
  4. how you can use information to label things
  5. for instance, with bar codes
  6. so you can use information
  7. to label things
  8. and then we talked about
  9. probability and information
  10. and if I have probabilities for events P_i
  11. then it says that the amount of information
  12. that's associated with this event occurring is
  13. minus the sum over i
  14. p_i log to the base 2
  15. of p_i
  16. this beautiful formula that was developed
  17. by Maxwell, Boltzmann, and Gibbs
  18. back in the middle of nineteenth century
  19. to talk about the amount of entropy
  20. and atoms or molecules
  21. this is often called S
  22. for entropy, as well
  23. and then was rediscovered
  24. by Claude Shannon
  25. in the 1940's
  26. to talk about information theory in the abstract
  27. and the mathematical theory of communication
  28. in fact, there is a funny story
  29. about this that
  30. Shannon, when he came up with this formula
  31. minus sum over i p_i log to the base 2 of p_i
  32. he went to John Von Neumann
  33. the famous mathematician
  34. and he said, "what sould I call this quantity?"
  35. and Von Neumann says
  36. "you should call it H, because that's what Boltzmann called it "
  37. but Von Neumann, who had a famous memory
  38. apparently forgot
  39. of Boltzmann's H
  40. of his famous "H theorem"
  41. was the same thing but without the minus sign
  42. so it's a negative quantity
  43. and it gets more and more negative
  44. as opposed to entropy
  45. which is a positive quantiy
  46. and gets more and more positive
  47. so, actually these fundamental formulas
  48. about information theory
  49. go back to the mid 19th century
  50. a hundred and fifty years
  51. so now we'd like to apply them
  52. to ideas about communication
  53. and to do that, I'd like to tell you
  54. a little bit more about
  55. probability
  56. so, we talked about
  57. probabilities for events
  58. probability x
  59. you know x equals
  60. "it's sunny"
  61. probability of y
  62. y is "it's raining"
  63. and we could look at the probability
  64. of x
  65. and I'm gonna use the notation
  66. I introduced for Boolean logic before
  67. is the probabiity of this thing right here
  68. means "AND"
  69. "probability of X AND Y"
  70. or we can also just call
  71. this the probability of X Y simultaneously
  72. keep on streamlining our notation
  73. this is the probability
  74. that it's raining... it's sunny and it's raining
  75. now, mostly in the world
  76. this is a pretty small probability
  77. but here in Santa Fe
  78. it happens all the time
  79. and as a result, you get rather beautiful
  80. rainbows...single, double, triple
  81. on a daily basis
  82. so, we have...
  83. this is what is called
  84. the joint probability
  85. the joint probability that it's sunny and its raining
  86. the joint probability of X and Y
  87. and what do we expect of this
    joint probability?
  88. so, we have the probability of X and Y
  89. and this tells you the probability
    that it's sunny and it's raining
  90. we can also look at the probability
  91. so X AND (NOT Y)
  92. again using our notation
  93. introduced to us by the famous husband
  94. of the daughter of the severe general
    of the British ___(?)
  95. George Boole, married to Mary Everest
  96. and we have a relationship which says
  97. that the probability of X on its own
  98. should be equal to the probability
    of X AND Y plus the
  99. probability of X AND (NOT Y)
  100. and the probability of X on its own
  101. is called the "marginal probability"
  102. so, it's just the probability that
  103. it's sunny on its own
  104. so the probability that it's sunny on its own
  105. is the probability that it's sunny and it's raning
  106. plus the probability that it's sunny and it's not raining
  107. I think this makes some kind of sense
  108. why is called the "marginal probability"?
  109. I have no idea
  110. so let's not even worry about it
  111. there's a very nice picture of probabilities
  112. in terms of set theory
  113. I don't know about you
  114. but I grew up in the age of "new math"
  115. where they tried to teach us
  116. about set theory
  117. and unions of sets
  118. and intersections of sets and things like that
  119. from starting at a very early age
  120. which means people of my generation
  121. are completely unable to do their tax returns
  122. but for me, dealing a lot with math
  123. it actually has been quite helpful
  124. for my career to learn about set theory at the age of 3 or 4
  125. or whatever it was
  126. so, we have a picture like this
  127. this is the space or the set of all events
  128. here is the set X
  129. which is the set of events X, where
    it's sunny
  130. here is the set of events Y, where is
    the set of events where it's raining
  131. this thing right here is called
  132. "X intersection Y"
  133. which is the set of events
  134. where it's both sunny and it's raining
  135. but in contrast, if I look at
  136. this right here
  137. this is "X union Y"
  138. which is the set of events
  139. where it's either sunny or raining
  140. and now you can kind of see
  141. where George Boole got his funny
    "cap" and "cup" notation
  142. we can pair this with X AND Y
  143. X AND Y, from a logical standpoint
  144. is essentially the same as this union
    of these sets
  145. and similarly, X intersection Y
    is X OR Y --translator's note: professor Lloyd meant "union" when referring to OR and "intersection" when referring to AND
  146. so when I take the logical statement
    corresponding to the set of events
  147. that I write it as X AND Y
  148. the set of events is the intersection
    of it's sunny and it's raining
  149. X OR Y is the intersection of events
    where it's sunny or it's raining
    --translator's note: professor Lloyd meant "union"
    when referring to OR, "intersection" refers to AND--
  150. and you can have all kinds of you know
    nice pictures
  151. here's Z where let's say it's snowy at the
    same time it's sunny
  152. which is something that I've seen happen
    here in Santa Fe
  153. this is not so strange in here
  154. where we have X intersection Y intersection Z
  155. which is not the empty when in terms of Santa Fe
  156. ok, so now let's actually look
  157. at the kinds of information that are
    associated with this
  158. suppose that I have a set of possible
    events, I'll call one set labeled by i
  159. the other set, labeled by j
  160. and now I can look at p of i and j
  161. so this is a case where the
    first type of event
  162. is i and the second type of event is j
  163. and I can define
  164. you know, I'm gonna do this
    slightly different
  165. let's call this... we'll be slightly fancier
  166. we'll call these event x_i and event y_j
  167. so, i labels the different events of x
  168. and j labels the different events of y
  169. so, for instance x_i could be two events
    either it's sunny or it's not sunny
  170. so i could be zero, and it would be
    'it's not sunny'
  171. and 1 could be it's sunny
  172. and j could be it's either raining
  173. or it's not raining
  174. so there are two possible value of y
  175. I'm just trying to make my life easier
  176. so we have a joint probability
    distribution x_i and y_j
  177. this is our joint probability, as before
  178. and now we have a joint information
  179. which we shall call I of X and Y
  180. this is the information
  181. that's inherent in the joint set of events
  182. X and Y
  183. in our case, it being sunny and not sunny,
    raining and not raining
  184. and this just takes the same form as before
  185. we sum over all different possibilities
  186. sunny-raining, not sunny-raining,
    sunny-not raining, not sunny-not raining
  187. this is why one shouldn't try to enumerate these things
  188. p of x_i y_j logarithm of p of x_i y_j
  189. so this is the amount of information that's
  190. inherent with these two sets of events
  191. and of course, we still have this, if you like the
  192. marginal information, the information
    of X on its own
  193. which is now just the sum over events x
    on its own
  194. of the marginal distribution
  195. why it's called "marginal" I don't know
  196. it's just the probability for X on its own
  197. p of X_i log base two of X_i
  198. and similarly we can talk about
  199. I of Y is minus the sum over j
    p of Y_j log to the base 2 of
  200. p of Y_j
  201. this is the amount of information
  202. inherent whether it's sunny or not sunny
  203. it could be up to a bit of information
  204. if it's probability one half of being
    sunny or not sunny
  205. then there's a bit of information let me
    tell you in Santa Fe
  206. there's far less than a bit of information
  207. on whether it's sunny or not
  208. because it's sunny most of the time
  209. similarly, raining or not raining
  210. could be up to a bit of information
  211. if each of these probabilities is 1/2
  212. again we're in the high desert here
  213. it's normally not raining
  214. so, you've far less than a bit of information
  215. on the question whether it's raining or not raining
  216. so, we have joint information
  217. constructed out of joint probabilities
  218. marginal information, or information on the original variables on their own,
  219. constructed
    out of marginal probabilities
  220. and let me end this little section by defining
  221. a very useful quantity which is called the
    mutual information
  222. the mutual information, which is defined to be
  223. I( X ...I normally define it with this little colon
  224. right in the middle, because it looks nice
    and symmetrical
  225. and we'll see that this isn't symmetrical
  226. it's the information in X plus the information in Y
  227. minus the information in X and Y taken together
  228. it's possible to show that this is always greater or equal to zero
  229. and this mutual information can be thought of as the amount of information
  230. the variable X has about Y
  231. if X and Y are completely uncorrelated, so it's completely
    uncorrelated whether it's sunny
  232. or not sunny or raining or not raining
  233. then this will be zero
  234. however, in the case of sunny and not sunny
  235. raning and not raining, they are very correlated
  236. in the sense that once you know that it's sunny
  237. it's probabiy not raining, even though
  238. sometimes that does happen here in Santa Fe
  239. and so in that case, you'd expect
  240. to find a large amount of mutual information
  241. in most places in fact, you'll find that knowing
  242. whether it's sunny or not sunny
  243. gives you a very good prediction
  244. about whether it's raining or it's not raining
  245. mutual information measures the amount of information
    that X can tell us about Y
  246. it's symmetric, so it tells us the amount of information that
    Y can tell us about X
  247. and another way of thinking about it
  248. is that it's the amount of information
  249. that X and Y hold in common
  250. which is why it's called "mutual information"