## ← MaxEnt 4 MaxEnt Applied to the Taxicab Example, Part 1

• 1 Follower
• 112 Lines

### Get Embed Code x Embed video Use the following code to embed this video. See our usage guide for more details on embedding. Paste this in your document somewhere (closest to the closing body tag is preferable): ```<script type="text/javascript" src='https://amara.org/embedder-iframe'></script> ``` Paste this inside your HTML body, where you want to include the widget: ```<div class="amara-embed" data-url="http://www.youtube.com/watch?v=kw4deNTVO7A" data-team="complexity-explorer"></div> ``` 3 Languages

Showing Revision 1 created 12/02/2016 by Zhengqun Koo.

1. Just to remind youwhere we are,
we have two constraints,
2. one is, the expectation value of x,
the average waiting time is four minutes,
3. the other is that the probabilities
sum to 1.
4. Those are our two constraints,
and then what we're going to do is
5. maximize the entropy of a distribution,
subject to those constraints.
6. Okay? So S is going to be the entropy
function,
7. and we're not going to have 1 g,
but in fact 2 gs, okay?
8. One g is this function,
one g is this function. Okay?
9. So, how do you do maximization,
under constraints,
10. for more than one constraint?
I gave you an intuitive picture
11. about how you could do one
constraint at a time,
12. but how could you do two constraints?
13. I'm going to tell you the answer,
14. because it's much harder to work through
the problem of multiple constraints,
15. but it's an intuitive answer, one that is
worth remembering,
16. and if you ever want to work through it,
17. there's plenty of places to find the
18. So, here's how to do lagrange multipliers,
19. the method of lagrange multipliers,
you remember the lagrange multiplier
20. is that lambda term, okay? And so,
that's where the method get its name from,
21. you want to maximize the function f,
22. subject to a set of constraints, now,
23. and we'll number these constraints
g sub i, so g1, g2, all the way through
many you have,
25. and the way you do that is you set the
gradient of the function equal to
26. a linear combination of the gradients of
the constraints.
27. So, here's the case where you have
n constraints.
28. So, this is the general method of
lagrange multipliers,
29. in order to maximize this function subject
to these n constraints,
30. set the gradient of the function f equal
to some linear combination
31. of gradient of the function g, and
then the problem
32. is just how do you find g. What you know
is, you know your maximum point
33. is such that you can add together all of
these gradients in such a way,
34. with some weight so that you can reproduce
the original gradient of the contours.
35. Okay? And so, now the problem is, what
are these Ls, or what are these lambdas,
36. so what I'm going to do is, I'm going to
walk you through, now the problem of
37. maximum entropy, using this formula here,
and if these seem mysterious right now,
38. by the end, they hopefully should not be.
39. What you're going to do is turn knobs,
and twiddle knobs around,
40. until you get the lambda such that
41. those lambdas satisfy the particular
constraint values you have in mind.
42. So, we're going to maximize not arbitrary
function f, but in fact the entropy,
43. and our constraints are going to be a
constraint on the average, and
44. a constraint on the normalization. So,
we want the derivative of S
45. with respect to p_i, we do this term by
term in the vector,
46. we want that to be equal to lambda1,
times the derivative of g1 wrt p_i,
47. plus lambda2 times the derivative of g2,
with respect to p_i. Okay?
48. So this reminds you, S is the entropy
of the distribution,
49. S is equal to minus the sum over all possible
waiting times.
50. So again, just for convenience's sake,
I'm talking about the discrete case,
51. you can take limits, if you have your
measures set up correctly,
52. and you can turn this into integrals,
and turn this into integrals,
53. and this as well into integral.
54. But it's easier just conceptually, to talk
55. So, g1, remember, this is a function of p,
alright, p is a vector here, okay?
56. g1 is just the sum i 0 to infinity, okay,
of p_i times i.
57. and I'm using just, I'm using i now
58. it's easier to write for me. Okay?
59. So, this is the constraint function, okay,
that constraints the average value.
60. And of course what we want in the end is
we want g1(p) to be equal to 4 minutes.
61. g2(p) is just the normalization constraint
so the function just looks like summing
62. over all values of p, and of course in the
end what we'll do is we'll take g2 = 1.
63. And we previously defined the entropy
here.
64. So, what if the derivative of the entropy,
with respect to a particular probability,
65. right, a particular probability of a
particular configuration, okay? So,
66. Alright move this out here, S is equal to
negative p_i log p_i i from 0 to infinity,
67. okay? So, dS/d(p_i), equals, well the only
term that's going to survive is
68. where you have the p_i in it, and then
we have the derivative of p_i log p_i,
69. that has two terms: log p_i, and then the
other one is p_i times the derivative of
70. log p_i. derivative of log p_i is 1/p_i,
so in fact, you have plus 1.
71. So, this is the left hand side of your
lagrange multiplier equation.
72. And just to remind you, we set the base
of the log to e.
73. So, now we have to take the derivative of
g1 with respect to p_i,
74. okay? So again, take the derivative of
this sum here with respect to p_i,
75. and what of course you'll find, is that
dg1/d(p_i) = i,
76. and finally, dg2/d(p_i) = 1.
77. There's only 1 term in the sum that
doesn't get destroyed by the derivative.
78. So, let's now put all this together,
79. we have negative log p_i -1 is equal to
lambda1, times the derivative of g_1,
80. with respect to p_i, which is i, plus the
derivative of g2 with respect to p_i,
81. times lambda 2, so this is now our
equation, okay, that is satisfied
82. when you try to maximize the entropy,
try to maximize this function,
83. subject to these constraints, for some
value of the constraint.
84. So let's solve for p_i. So, let's move
things around here,
85. and we have negative 1 minus lambda1 i,
minus lambda 2, equals log p_i,
86. we'll exponentiate both sides, flip them
around, and we have
87. p_i is equal to e to the negative 1 minus
lambda1 i, minus lambda 2. Okay?
88. And, we can write that somewhat more
succinctly in the following way,
89. e to the lambda1 i divided by Z,
90. where Z is equal to e to the 1
plus lambda 2.
91. The probability of waiting for a certain
time i, is equal to e to the negative
92. lambda1 times i, there's an exponential
distribution of waiting times.
93. Now, all that remains is to figure out
what on earth lambda1 is,
94. and what on earth Z is.
95. And what we're going to do is we're going
to turn, we're going to figure out
96. the value you have to set lambda1 to,
97. in order to satisfy this particular value
of the constraint,
98. and this particular value of the
constraint. So,
99. we know the functional form of the
distribution,
100. and now we just have to figure out the
parameters of that function.
101. And there will be two parameters.
102. So the first thing we know is of course,
103. that the probability is normalized, and
that means in fact that,
104. okay?
105. plugging in this functional form, and so
now, already, we can solve for Z
106. in terms of lambda1. So, eliminating the
first variable here, Z, is easy.
107. We can just set Z equal to the sum from i
0 to infinity, e to the negative lambda1 i
108. okay? So we already eliminated one
variable,
109. and now, all we have to do is to solve
for the other constraint. Okay?
110. In particular, just let me write this here
111. In particular, we have the sum from i 0 to
infinity times e to the negative lambda1 i
112. i, all over Z, that has to be equal to 4.