English subtitles

← MaxEnt 3 The Maximum Entropy Method

Get Embed Code
3 Languages

Showing Revision 16 created 08/05/2016 by Andrew Medeiros.

  1. We've boiled down our max ent. prescription
  2. into 2 steps. The first thing is you want the
  3. probability distribution to satisfy this constraint
  4. on the average value of the waiting time.
  5. And the second thing is we want
  6. that particular probability distribution to
  7. have the maximum entropy - to maximize the
  8. function p log p, and I always forget the negative
  9. sign. So in fact, we're maximizing the function
  10. negative sum over all states of the system
  11. p log p. And remember, states of the system
  12. here are how long you've waited for a particular
  13. cab. Or rather, how long you've waited from
  14. a particular time you decided to start waiting.
  15. So, this turns out to be a hard problem, or
  16. at least a non-trivial problem. If you have
  17. a little calculus, you're really good at maximizing
  18. functions, so let's start there. Let's imagine
  19. in particular...and I'll use this very simple example
  20. of maximizing a function over a 2-dimensional
  21. space. Let's call these the x1 and x2 axes.
  22. So I have some function. I'm just
  23. going to draw the contours for you.
  24. So here we go. And what I'm going to do is
  25. make it such that there's a single maximum over
  26. this space. And what we'll talk about in one of
  27. the appendices, if we have time, is why we can
  28. prove that the entropy function has a unique
  29. maximum even what you subject it to these
  30. constraints. For now, you can take it on faith
  31. that, in fact, this problem has a unique solution.
  32. So, here's a function. I've given it a unique
  33. maximum, and so we'll call this function f.
  34. So, using your amazing calculus skills, you know
  35. that the maximum of this function is defined as
  36. the point where the derivative of f with
  37. respect to x is 0. And remember this is a vector
  38. here so what this means is df/dx1 is 0 and
  39. df/dx2 is zero. Now, yes, you might have
  40. accidentally hit a minimum, so just check to
  41. make sure it's actually a minimum. That's
  42. what you would do. So now the problem is that
  43. we're not allowed to range over this entire space.
  44. We're restricted to some subspace. We're restricted
  45. in particular to some constraint here. So how
  46. can we find the maximum of the function, not
  47. the global maximum, but the maximum that
  48. is also satisfying a set of constraints and what I'll
  49. do is I'll draw those constraints as a line
  50. in this space. A point here is a valid argument
  51. for the function f, but it doesn't satisfy this
  52. constraint here and what I'll do is
  53. define this constraint in the following way.
  54. I'll say the constraint is that g(x) = c, where c
  55. is some number. And just to be clear, for us,
  56. it's better to write g(p). We set that equal to
  57. 4 minutes. That's just to remind you our particular
  58. constraint is that the function g is equal to 4.
  59. This is the general case here. So what we want
  60. to do now is not find the maximum point, the top
  61. of the mountain. We want to find the point
  62. that is the maximum along this line - this
  63. line defined by g(x) = c. So let me give you
  64. a little intuition about how you might do this.
  65. Imagine you're on a train going through this
  66. mountainous landscape, and as you're going
  67. through, down here, you'll see you're crossing
  68. contours of the function f. In this case, you're
  69. going uphill - the function is increasing - so you
  70. know a point there is not a maximum of the function
  71. along this line because if you wait a little bit
  72. longer, you'll get to this point here and you've
  73. already crossed the contour line. So here,
  74. you're going up. Notice here, you're going down
  75. the other side of the mountain - you're crossing
  76. contour lines in the other way, so you know,
  77. in fact, that the maximum can't be over here
  78. because you were already higher over here.
  79. So, somewhere between here and here is the
  80. maximum - somewhere in the middle you reach the
  81. peak. You reach the peak when the contours of the
  82. function f are parallel to the track that you are taking -
  83. where there is a tangent point between the contour
  84. and the direction of your motion on this fictitious
  85. train. So, we know how to get the directions of the
  86. contours of the function f - those are, indeed, just the
  87. gradient of the function...this is a vector, remember.
  88. And we are going to say these are equal to the
  89. perpendiculars to the train tracks. So if the
  90. perpendicular to the contour is parallel to the
  91. perpendicular to the train track, that means
  92. that the direction of the contour is parallel
  93. to the direction of the train track. If the two
  94. perpendiculars are parallel, so are the two
  95. original vectors. So then the next question is
  96. how can I get the perpendicular to the train track.
  97. What I want you do is imagine this is the train
  98. track for g(x) = c, and here is the train track
  99. for g(x) = c' and so forth. So here is another set
  100. of contours defined by the function g and we
  101. want the perpendiculars to these contours to
  102. be parallel to the contours for f - the
  103. perpendiculars for the contours of f. So what
  104. that means is that this gradient here - these arrows
  105. here, and in particular, these arrows right here -
  106. are equal to some real number lambda times
  107. the gradient of the constraint. When this equation
  108. is satisfied, that means that these contours here
  109. are precisely parallel to these contours here.
  110. So in order to maximize a function f subject
  111. to a set of constraints, don't solve this here.
  112. Don't solve this problem, solve this problem.
  113. And now you notice you have this mysterious
  114. value lambda. This is called the Lagrange multiplier.
  115. So what we're going to do is try to find a solution
  116. where the gradients are parallel to each other.
  117. In other words, that one can be transformed into
  118. another by scaling by some constant factor
  119. along all axes. So this is the intuitive motivation
  120. for how to solve the problem of maximization
  121. subject to constraints when you have only
  122. a single constraint. What we do is find the point
  123. at which these two gradients align. There's a wrinkle.
  124. The wrinkle is the following. It looks like we have only
  125. one constraint here and our contraint is just
  126. that this function is equal to 4. But we actually have
  127. 2 constraints. Our second constraint is overall
  128. normalization and it says that we want this
  129. function p here to normalize to 1. If you
  130. sum the probabilities of all the arrival times, they
  131. have to equal 1. Now, of course, p is a probability
  132. so we know that has to be true. We didn't talk about
  133. this explicitly, but when we start wandering over
  134. function space - when these xs here become ps,
  135. we start manipulating the probabilities - what we
  136. want to do is be able to relax the constraint
  137. that they sum to 1 when we consider maximizing
  138. this function f. We want to be able to range over
  139. the entire space including, for example, points
  140. where all the probabilities - all the ps - are 0.
  141. And then later what we'll do is we'll impose the
  142. normalization constraint. So we actually have
  143. 2 constraints, and not just 1.