## Got a YouTube account?

New: enable viewer-created translations and captions on your YouTube channel!

## ← Information.6.MutualInformation

• 1 Follower
• 250 Lines

Introduction to mutual information

### Get Embed Code x Embed video Use the following code to embed this video. See our usage guide for more details on embedding. Paste this in your document somewhere (closest to the closing body tag is preferable): ```<script type="text/javascript" src='https://amara.org/embedder-iframe'></script> ``` Paste this inside your HTML body, where you want to include the widget: ```<div class="amara-embed" data-url="http://www.youtube.com/watch?v=d7AUaut6hso" data-team="complexity-explorer"></div> ``` 1 Language

Showing Revision 22 created 10/20/2016 by Antonio Rueda Toicen.

1. so we've been talking about information
2. how you measure information
3. and of course, you measure information in bits
4. how you can use information to label things
5. for instance, with bar codes
6. so you can use information
7. to label things
8. and then we talked about
9. probability and information
10. and if I have probabilities for events P_i
11. then it says that the amount of information
12. that's associated with this event occurring is
13. minus the sum over i
14. p_i log to the base 2
15. of p_i
16. this beautiful formula that was developed
17. by Maxwell, Boltzmann, and Gibbs
18. back in the middle of nineteenth century
19. to talk about the amount of entropy
20. and atoms or molecules
21. this is often called S
22. for entropy, as well
23. and then was rediscovered
24. by Claude Shannon
25. in the 1940's
26. to talk about information theory in the abstract
27. and the mathematical theory of communication
28. in fact, there is a funny story
29. about this that
30. Shannon, when he came up with this formula
31. minus sum over i p_i log to the base 2 of p_i
32. he went to John Von Neumann
33. the famous mathematician
34. and he said, "what sould I call this quantity?"
35. and Von Neumann says
36. "you should call it H, because that's what Boltzmann called it "
37. but Von Neumann, who had a famous memory
38. apparently forgot
39. of Boltzmann's H
40. of his famous "H theorem"
41. was the same thing but without the minus sign
42. so it's a negative quantity
43. and it gets more and more negative
44. as opposed to entropy
45. which is a positive quantiy
46. and gets more and more positive
47. so, actually these fundamental formulas
48. about information theory
49. go back to the mid 19th century
50. a hundred and fifty years
51. so now we'd like to apply them
52. to ideas about communication
53. and to do that, I'd like to tell you
54. a little bit more about
55. probability
56. so, we talked about
57. probabilities for events
58. probability x
59. you know x equals
60. "it's sunny"
61. probability of y
62. y is "it's raining"
63. and we could look at the probability
64. of x
65. and I'm gonna use the notation
66. I introduced for Boolean logic before
67. is the probabiity of this thing right here
68. means "AND"
69. "probability of X AND Y"
70. or we can also just call
71. this the probability of X Y simultaneously
72. keep on streamlining our notation
73. this is the probability
74. that it's raining... it's sunny and it's raining
75. now, mostly in the world
76. this is a pretty small probability
77. but here in Santa Fe
78. it happens all the time
79. and as a result, you get rather beautiful
80. rainbows...single, double, triple
81. on a daily basis
82. so, we have...
83. this is what is called
84. the joint probability
85. the joint probability that it's sunny and its raining
86. the joint probability of X and Y
87. and what do we expect of this
joint probability?
88. so, we have the probability of X and Y
89. and this tells you the probability
that it's sunny and it's raining
90. we can also look at the probability
X AND NOT Y
91. so X AND (NOT Y)
92. again using our notation
93. introduced to us by the famous husband
94. of the daughter of the severe general
of the British ___(?)
95. George Boole, married to Mary Everest
96. and we have a relationship which says
97. that the probability of X on its own
98. should be equal to the probability
of X AND Y plus the
99. probability of X AND (NOT Y)
100. and the probability of X on its own
101. is called the "marginal probability"
102. so, it's just the probability that
103. it's sunny on its own
104. so the probability that it's sunny on its own
105. is the probability that it's sunny and it's raning
106. plus the probability that it's sunny and it's not raining
107. I think this makes some kind of sense
108. why is called the "marginal probability"?
109. I have no idea
110. so let's not even worry about it
111. there's a very nice picture of probabilities
112. in terms of set theory
113. I don't know about you
114. but I grew up in the age of "new math"
115. where they tried to teach us
116. about set theory
117. and unions of sets
118. and intersections of sets and things like that
119. from starting at a very early age
120. which means people of my generation
121. are completely unable to do their tax returns
122. but for me, dealing a lot with math
123. it actually has been quite helpful
124. for my career to learn about set theory at the age of 3 or 4
125. or whatever it was
126. so, we have a picture like this
127. this is the space or the set of all events
128. here is the set X
129. which is the set of events X, where
it's sunny
130. here is the set of events Y, where is
the set of events where it's raining
131. this thing right here is called
132. "X intersection Y"
133. which is the set of events
134. where it's both sunny and it's raining
135. but in contrast, if I look at
136. this right here
137. this is "X union Y"
138. which is the set of events
139. where it's either sunny or raining
140. and now you can kind of see
141. where George Boole got his funny
"cap" and "cup" notation
142. we can pair this with X AND Y
143. X AND Y, from a logical standpoint
144. is essentially the same as this union
of these sets
145. and similarly, X intersection Y
is X OR Y --translator's note: professor Lloyd meant "union" when referring to OR and "intersection" when referring to AND http://www.onlinemathlearning.com/intersection-of-two-sets.html--
146. so when I take the logical statement
corresponding to the set of events
147. that I write it as X AND Y
148. the set of events is the intersection
of it's sunny and it's raining
149. X OR Y is the intersection of events
where it's sunny or it's raining
--translator's note: professor Lloyd meant "union"
when referring to OR, "intersection" refers to AND--
150. and you can have all kinds of you know
nice pictures
151. here's Z where let's say it's snowy at the
same time it's sunny
152. which is something that I've seen happen
here in Santa Fe
153. this is not so strange in here
154. where we have X intersection Y intersection Z
155. which is not the empty when in terms of Santa Fe
156. ok, so now let's actually look
157. at the kinds of information that are
associated with this
158. suppose that I have a set of possible
events, I'll call one set labeled by i
159. the other set, labeled by j
160. and now I can look at p of i and j
161. so this is a case where the
first type of event
162. is i and the second type of event is j
163. and I can define
164. you know, I'm gonna do this
slightly different
165. let's call this... we'll be slightly fancier
166. we'll call these event x_i and event y_j
167. so, i labels the different events of x
168. and j labels the different events of y
169. so, for instance x_i could be two events
either it's sunny or it's not sunny
170. so i could be zero, and it would be
'it's not sunny'
171. and 1 could be it's sunny
172. and j could be it's either raining
173. or it's not raining
174. so there are two possible value of y
175. I'm just trying to make my life easier
176. so we have a joint probability
distribution x_i and y_j
177. this is our joint probability, as before
178. and now we have a joint information
179. which we shall call I of X and Y
180. this is the information
181. that's inherent in the joint set of events
182. X and Y
183. in our case, it being sunny and not sunny,
raining and not raining
184. and this just takes the same form as before
185. we sum over all different possibilities
186. sunny-raining, not sunny-raining,
sunny-not raining, not sunny-not raining
187. this is why one shouldn't try to enumerate these things
188. p of x_i y_j logarithm of p of x_i y_j
189. so this is the amount of information that's
190. inherent with these two sets of events
together
191. and of course, we still have this, if you like the
192. marginal information, the information
of X on its own
193. which is now just the sum over events x
on its own
194. of the marginal distribution
195. why it's called "marginal" I don't know
196. it's just the probability for X on its own
197. p of X_i log base two of X_i
198. and similarly we can talk about
199. I of Y is minus the sum over j
p of Y_j log to the base 2 of
200. p of Y_j
201. this is the amount of information
202. inherent whether it's sunny or not sunny
203. it could be up to a bit of information
204. if it's probability one half of being
sunny or not sunny
205. then there's a bit of information let me
tell you in Santa Fe
206. there's far less than a bit of information
207. on whether it's sunny or not
208. because it's sunny most of the time
209. similarly, raining or not raining
210. could be up to a bit of information
211. if each of these probabilities is 1/2
212. again we're in the high desert here
213. it's normally not raining
214. so, you've far less than a bit of information
215. on the question whether it's raining or not raining
216. so, we have joint information
217. constructed out of joint probabilities
218. marginal information, or information on the original variables on their own,
219. constructed
out of marginal probabilities
220. and let me end this little section by defining
221. a very useful quantity which is called the
mutual information
222. the mutual information, which is defined to be
223. I( X ...I normally define it with this little colon
224. right in the middle, because it looks nice
and symmetrical
225. and we'll see that this isn't symmetrical
226. it's the information in X plus the information in Y
227. minus the information in X and Y taken together
228. it's possible to show that this is always greater or equal to zero
229. and this mutual information can be thought of as the amount of information
230. the variable X has about Y
231. if X and Y are completely uncorrelated, so it's completely
uncorrelated whether it's sunny
232. or not sunny or raining or not raining
233. then this will be zero
234. however, in the case of sunny and not sunny
235. raning and not raining, they are very correlated
236. in the sense that once you know that it's sunny
237. it's probabiy not raining, even though
238. sometimes that does happen here in Santa Fe
239. and so in that case, you'd expect
240. to find a large amount of mutual information
241. in most places in fact, you'll find that knowing
242. whether it's sunny or not sunny
243. gives you a very good prediction
244. about whether it's raining or it's not raining
245. mutual information measures the amount of information
that X can tell us about Y
246. it's symmetric, so it tells us the amount of information that
Y can tell us about X
247. and another way of thinking about it
248. is that it's the amount of information
249. that X and Y hold in common
250. which is why it's called "mutual information"