## ← MaxEnt 3 The Maximum Entropy Method

• 1 Follower
• 143 Lines

### Get Embed Code x Embed video Use the following code to embed this video. See our usage guide for more details on embedding. Paste this in your document somewhere (closest to the closing body tag is preferable): ```<script type="text/javascript" src='https://amara.org/embedder-iframe'></script> ``` Paste this inside your HTML body, where you want to include the widget: ```<div class="amara-embed" data-url="http://www.youtube.com/watch?v=uZoURtzcN9M" data-team="complexity-explorer"></div> ``` 3 Languages

Showing Revision 16 created 08/05/2016 by Andrew Medeiros.

1. We've boiled down our max ent. prescription
2. into 2 steps. The first thing is you want the
3. probability distribution to satisfy this constraint
4. on the average value of the waiting time.
5. And the second thing is we want
6. that particular probability distribution to
7. have the maximum entropy - to maximize the
8. function p log p, and I always forget the negative
9. sign. So in fact, we're maximizing the function
10. negative sum over all states of the system
11. p log p. And remember, states of the system
12. here are how long you've waited for a particular
13. cab. Or rather, how long you've waited from
14. a particular time you decided to start waiting.
15. So, this turns out to be a hard problem, or
16. at least a non-trivial problem. If you have
17. a little calculus, you're really good at maximizing
18. functions, so let's start there. Let's imagine
19. in particular...and I'll use this very simple example
20. of maximizing a function over a 2-dimensional
21. space. Let's call these the x1 and x2 axes.
22. So I have some function. I'm just
23. going to draw the contours for you.
24. So here we go. And what I'm going to do is
25. make it such that there's a single maximum over
26. this space. And what we'll talk about in one of
27. the appendices, if we have time, is why we can
28. prove that the entropy function has a unique
29. maximum even what you subject it to these
30. constraints. For now, you can take it on faith
31. that, in fact, this problem has a unique solution.
32. So, here's a function. I've given it a unique
33. maximum, and so we'll call this function f.
34. So, using your amazing calculus skills, you know
35. that the maximum of this function is defined as
36. the point where the derivative of f with
37. respect to x is 0. And remember this is a vector
38. here so what this means is df/dx1 is 0 and
39. df/dx2 is zero. Now, yes, you might have
40. accidentally hit a minimum, so just check to
41. make sure it's actually a minimum. That's
42. what you would do. So now the problem is that
43. we're not allowed to range over this entire space.
44. We're restricted to some subspace. We're restricted
45. in particular to some constraint here. So how
46. can we find the maximum of the function, not
47. the global maximum, but the maximum that
48. is also satisfying a set of constraints and what I'll
49. do is I'll draw those constraints as a line
50. in this space. A point here is a valid argument
51. for the function f, but it doesn't satisfy this
52. constraint here and what I'll do is
53. define this constraint in the following way.
54. I'll say the constraint is that g(x) = c, where c
55. is some number. And just to be clear, for us,
56. it's better to write g(p). We set that equal to
57. 4 minutes. That's just to remind you our particular
58. constraint is that the function g is equal to 4.
59. This is the general case here. So what we want
60. to do now is not find the maximum point, the top
61. of the mountain. We want to find the point
62. that is the maximum along this line - this
63. line defined by g(x) = c. So let me give you
64. a little intuition about how you might do this.
65. Imagine you're on a train going through this
66. mountainous landscape, and as you're going
67. through, down here, you'll see you're crossing
68. contours of the function f. In this case, you're
69. going uphill - the function is increasing - so you
70. know a point there is not a maximum of the function
71. along this line because if you wait a little bit
72. longer, you'll get to this point here and you've
73. already crossed the contour line. So here,
74. you're going up. Notice here, you're going down
75. the other side of the mountain - you're crossing
76. contour lines in the other way, so you know,
77. in fact, that the maximum can't be over here
78. because you were already higher over here.
79. So, somewhere between here and here is the
80. maximum - somewhere in the middle you reach the
81. peak. You reach the peak when the contours of the
82. function f are parallel to the track that you are taking -
83. where there is a tangent point between the contour
84. and the direction of your motion on this fictitious
85. train. So, we know how to get the directions of the
86. contours of the function f - those are, indeed, just the
87. gradient of the function...this is a vector, remember.
88. And we are going to say these are equal to the
89. perpendiculars to the train tracks. So if the
90. perpendicular to the contour is parallel to the
91. perpendicular to the train track, that means
92. that the direction of the contour is parallel
93. to the direction of the train track. If the two
94. perpendiculars are parallel, so are the two
95. original vectors. So then the next question is
96. how can I get the perpendicular to the train track.
97. What I want you do is imagine this is the train
98. track for g(x) = c, and here is the train track
99. for g(x) = c' and so forth. So here is another set
100. of contours defined by the function g and we
101. want the perpendiculars to these contours to
102. be parallel to the contours for f - the
103. perpendiculars for the contours of f. So what
104. that means is that this gradient here - these arrows
105. here, and in particular, these arrows right here -
106. are equal to some real number lambda times
107. the gradient of the constraint. When this equation
108. is satisfied, that means that these contours here
109. are precisely parallel to these contours here.
110. So in order to maximize a function f subject
111. to a set of constraints, don't solve this here.
112. Don't solve this problem, solve this problem.
113. And now you notice you have this mysterious
114. value lambda. This is called the Lagrange multiplier.
115. So what we're going to do is try to find a solution
116. where the gradients are parallel to each other.
117. In other words, that one can be transformed into
118. another by scaling by some constant factor
119. along all axes. So this is the intuitive motivation
120. for how to solve the problem of maximization
121. subject to constraints when you have only
122. a single constraint. What we do is find the point
123. at which these two gradients align. There's a wrinkle.
124. The wrinkle is the following. It looks like we have only
125. one constraint here and our contraint is just
126. that this function is equal to 4. But we actually have
127. 2 constraints. Our second constraint is overall
128. normalization and it says that we want this
129. function p here to normalize to 1. If you
130. sum the probabilities of all the arrival times, they
131. have to equal 1. Now, of course, p is a probability
132. so we know that has to be true. We didn't talk about
133. this explicitly, but when we start wandering over
134. function space - when these xs here become ps,
135. we start manipulating the probabilities - what we
136. want to do is be able to relax the constraint
137. that they sum to 1 when we consider maximizing
138. this function f. We want to be able to range over
139. the entire space including, for example, points
140. where all the probabilities - all the ps - are 0.
141. And then later what we'll do is we'll impose the
142. normalization constraint. So we actually have
143. 2 constraints, and not just 1.