English subtitles

← MaxEnt 4 MaxEnt Applied to the Taxicab Example, Part 1

Get Embed Code
3 Languages

Showing Revision 1 created 12/02/2016 by Zhengqun Koo.

  1. Just to remind youwhere we are,
    we have two constraints,
  2. one is, the expectation value of x,
    the average waiting time is four minutes,
  3. the other is that the probabilities
    sum to 1.
  4. Those are our two constraints,
    and then what we're going to do is
  5. maximize the entropy of a distribution,
    subject to those constraints.
  6. Okay? So S is going to be the entropy
    function,
  7. and we're not going to have 1 g,
    but in fact 2 gs, okay?
  8. One g is this function,
    one g is this function. Okay?
  9. So, how do you do maximization,
    under constraints,
  10. for more than one constraint?
    I gave you an intuitive picture
  11. about how you could do one
    constraint at a time,
  12. but how could you do two constraints?
  13. I'm going to tell you the answer,
  14. because it's much harder to work through
    the problem of multiple constraints,
  15. but it's an intuitive answer, one that is
    worth remembering,
  16. and if you ever want to work through it,
  17. there's plenty of places to find the
    answer.
  18. So, here's how to do lagrange multipliers,
  19. the method of lagrange multipliers,
    you remember the lagrange multiplier
  20. is that lambda term, okay? And so,
    that's where the method get its name from,
  21. you want to maximize the function f,
  22. subject to a set of constraints, now,
  23. and we'll number these constraints
    g sub i, so g1, g2, all the way through
  24. your n constraints however
    many you have,
  25. and the way you do that is you set the
    gradient of the function equal to
  26. a linear combination of the gradients of
    the constraints.
  27. So, here's the case where you have
    n constraints.
  28. So, this is the general method of
    lagrange multipliers,
  29. in order to maximize this function subject
    to these n constraints,
  30. set the gradient of the function f equal
    to some linear combination
  31. of gradient of the function g, and
    then the problem
  32. is just how do you find g. What you know
    is, you know your maximum point
  33. is such that you can add together all of
    these gradients in such a way,
  34. with some weight so that you can reproduce
    the original gradient of the contours.
  35. Okay? And so, now the problem is, what
    are these Ls, or what are these lambdas,
  36. so what I'm going to do is, I'm going to
    walk you through, now the problem of
  37. maximum entropy, using this formula here,
    and if these seem mysterious right now,
  38. by the end, they hopefully should not be.
  39. What you're going to do is turn knobs,
    and twiddle knobs around,
  40. until you get the lambda such that
  41. those lambdas satisfy the particular
    constraint values you have in mind.
  42. So, we're going to maximize not arbitrary
    function f, but in fact the entropy,
  43. and our constraints are going to be a
    constraint on the average, and
  44. a constraint on the normalization. So,
    we want the derivative of S
  45. with respect to p_i, we do this term by
    term in the vector,
  46. we want that to be equal to lambda1,
    times the derivative of g1 wrt p_i,
  47. plus lambda2 times the derivative of g2,
    with respect to p_i. Okay?
  48. So this reminds you, S is the entropy
    of the distribution,
  49. S is equal to minus the sum over all possible
    waiting times.
  50. So again, just for convenience's sake,
    I'm talking about the discrete case,
  51. you can take limits, if you have your
    measures set up correctly,
  52. and you can turn this into integrals,
    and turn this into integrals,
  53. and this as well into integral.
  54. But it's easier just conceptually, to talk
    about the discrete case first.
  55. So, g1, remember, this is a function of p,
    alright, p is a vector here, okay?
  56. g1 is just the sum i 0 to infinity, okay,
    of p_i times i.
  57. and I'm using just, I'm using i now
    instead of x,
  58. it's easier to write for me. Okay?
  59. So, this is the constraint function, okay,
    that constraints the average value.
  60. And of course what we want in the end is
    we want g1(p) to be equal to 4 minutes.
  61. g2(p) is just the normalization constraint
    so the function just looks like summing
  62. over all values of p, and of course in the
    end what we'll do is we'll take g2 = 1.
  63. And we previously defined the entropy
    here.
  64. So, what if the derivative of the entropy,
    with respect to a particular probability,
  65. right, a particular probability of a
    particular configuration, okay? So,
  66. Alright move this out here, S is equal to
    negative p_i log p_i i from 0 to infinity,
  67. okay? So, dS/d(p_i), equals, well the only
    term that's going to survive is
  68. where you have the p_i in it, and then
    we have the derivative of p_i log p_i,
  69. that has two terms: log p_i, and then the
    other one is p_i times the derivative of
  70. log p_i. derivative of log p_i is 1/p_i,
    so in fact, you have plus 1.
  71. So, this is the left hand side of your
    lagrange multiplier equation.
  72. And just to remind you, we set the base
    of the log to e.
  73. So, now we have to take the derivative of
    g1 with respect to p_i,
  74. okay? So again, take the derivative of
    this sum here with respect to p_i,
  75. and what of course you'll find, is that
    dg1/d(p_i) = i,
  76. and finally, dg2/d(p_i) = 1.
  77. There's only 1 term in the sum that
    doesn't get destroyed by the derivative.
  78. So, let's now put all this together,
  79. we have negative log p_i -1 is equal to
    lambda1, times the derivative of g_1,
  80. with respect to p_i, which is i, plus the
    derivative of g2 with respect to p_i,
  81. times lambda 2, so this is now our
    equation, okay, that is satisfied
  82. when you try to maximize the entropy,
    try to maximize this function,
  83. subject to these constraints, for some
    value of the constraint.
  84. So let's solve for p_i. So, let's move
    things around here,
  85. and we have negative 1 minus lambda1 i,
    minus lambda 2, equals log p_i,
  86. we'll exponentiate both sides, flip them
    around, and we have
  87. p_i is equal to e to the negative 1 minus
    lambda1 i, minus lambda 2. Okay?
  88. And, we can write that somewhat more
    succinctly in the following way,
  89. e to the lambda1 i divided by Z,
  90. where Z is equal to e to the 1
    plus lambda 2.
  91. The probability of waiting for a certain
    time i, is equal to e to the negative
  92. lambda1 times i, there's an exponential
    distribution of waiting times.
  93. Now, all that remains is to figure out
    what on earth lambda1 is,
  94. and what on earth Z is.
  95. And what we're going to do is we're going
    to turn, we're going to figure out
  96. the value you have to set lambda1 to,
  97. in order to satisfy this particular value
    of the constraint,
  98. and this particular value of the
    constraint. So,
  99. we know the functional form of the
    distribution,
  100. and now we just have to figure out the
    parameters of that function.
  101. And there will be two parameters.
  102. So the first thing we know is of course,
  103. that the probability is normalized, and
    that means in fact that,
  104. okay?
  105. plugging in this functional form, and so
    now, already, we can solve for Z
  106. in terms of lambda1. So, eliminating the
    first variable here, Z, is easy.
  107. We can just set Z equal to the sum from i
    0 to infinity, e to the negative lambda1 i
  108. okay? So we already eliminated one
    variable,
  109. and now, all we have to do is to solve
    for the other constraint. Okay?
  110. In particular, just let me write this here
  111. In particular, we have the sum from i 0 to
    infinity times e to the negative lambda1 i
  112. i, all over Z, that has to be equal to 4.