
Title:

Description:

Just to remind youwhere we are,
we have two constraints,

one is, the expectation value of x,
the average waiting time is four minutes,

the other is that the probabilities
sum to 1.

Those are our two constraints,
and then what we're going to do is

maximize the entropy of a distribution,
subject to those constraints.

Okay? So S is going to be the entropy
function,

and we're not going to have 1 g,
but in fact 2 gs, okay?

One g is this function,
one g is this function. Okay?

So, how do you do maximization,
under constraints,

for more than one constraint?
I gave you an intuitive picture

about how you could do one
constraint at a time,

but how could you do two constraints?

I'm going to tell you the answer,

because it's much harder to work through
the problem of multiple constraints,

but it's an intuitive answer, one that is
worth remembering,

and if you ever want to work through it,

there's plenty of places to find the
answer.

So, here's how to do lagrange multipliers,

the method of lagrange multipliers,
you remember the lagrange multiplier

is that lambda term, okay? And so,
that's where the method get its name from,

you want to maximize the function f,

subject to a set of constraints, now,

and we'll number these constraints
g sub i, so g1, g2, all the way through

your n constraints however
many you have,

and the way you do that is you set the
gradient of the function equal to

a linear combination of the gradients of
the constraints.

So, here's the case where you have
n constraints.

So, this is the general method of
lagrange multipliers,

in order to maximize this function subject
to these n constraints,

set the gradient of the function f equal
to some linear combination

of gradient of the function g, and
then the problem

is just how do you find g. What you know
is, you know your maximum point

is such that you can add together all of
these gradients in such a way,

with some weight so that you can reproduce
the original gradient of the contours.

Okay? And so, now the problem is, what
are these Ls, or what are these lambdas,

so what I'm going to do is, I'm going to
walk you through, now the problem of

maximum entropy, using this formula here,
and if these seem mysterious right now,

by the end, they hopefully should not be.

What you're going to do is turn knobs,
and twiddle knobs around,

until you get the lambda such that

those lambdas satisfy the particular
constraint values you have in mind.

So, we're going to maximize not arbitrary
function f, but in fact the entropy,

and our constraints are going to be a
constraint on the average, and

a constraint on the normalization. So,
we want the derivative of S

with respect to p_i, we do this term by
term in the vector,

we want that to be equal to lambda1,
times the derivative of g1 wrt p_i,

plus lambda2 times the derivative of g2,
with respect to p_i. Okay?

So this reminds you, S is the entropy
of the distribution,

S is equal to minus the sum over all possible
waiting times.

So again, just for convenience's sake,
I'm talking about the discrete case,

you can take limits, if you have your
measures set up correctly,

and you can turn this into integrals,
and turn this into integrals,

and this as well into integral.

But it's easier just conceptually, to talk
about the discrete case first.

So, g1, remember, this is a function of p,
alright, p is a vector here, okay?

g1 is just the sum i 0 to infinity, okay,
of p_i times i.

and I'm using just, I'm using i now
instead of x,

it's easier to write for me. Okay?

So, this is the constraint function, okay,
that constraints the average value.

And of course what we want in the end is
we want g1(p) to be equal to 4 minutes.

g2(p) is just the normalization constraint
so the function just looks like summing

over all values of p, and of course in the
end what we'll do is we'll take g2 = 1.

And we previously defined the entropy
here.

So, what if the derivative of the entropy,
with respect to a particular probability,

right, a particular probability of a
particular configuration, okay? So,

Alright move this out here, S is equal to
negative p_i log p_i i from 0 to infinity,

okay? So, dS/d(p_i), equals, well the only
term that's going to survive is

where you have the p_i in it, and then
we have the derivative of p_i log p_i,

that has two terms: log p_i, and then the
other one is p_i times the derivative of

log p_i. derivative of log p_i is 1/p_i,
so in fact, you have plus 1.

So, this is the left hand side of your
lagrange multiplier equation.

And just to remind you, we set the base
of the log to e.

So, now we have to take the derivative of
g1 with respect to p_i,

okay? So again, take the derivative of
this sum here with respect to p_i,

and what of course you'll find, is that
dg1/d(p_i) = i,

and finally, dg2/d(p_i) = 1.

There's only 1 term in the sum that
doesn't get destroyed by the derivative.

So, let's now put all this together,

we have negative log p_i 1 is equal to
lambda1, times the derivative of g_1,

with respect to p_i, which is i, plus the
derivative of g2 with respect to p_i,

times lambda 2, so this is now our
equation, okay, that is satisfied

when you try to maximize the entropy,
try to maximize this function,

subject to these constraints, for some
value of the constraint.

So let's solve for p_i. So, let's move
things around here,

and we have negative 1 minus lambda1 i,
minus lambda 2, equals log p_i,

we'll exponentiate both sides, flip them
around, and we have

p_i is equal to e to the negative 1 minus
lambda1 i, minus lambda 2. Okay?

And, we can write that somewhat more
succinctly in the following way,

e to the lambda1 i divided by Z,

where Z is equal to e to the 1
plus lambda 2.

The probability of waiting for a certain
time i, is equal to e to the negative

lambda1 times i, there's an exponential
distribution of waiting times.

Now, all that remains is to figure out
what on earth lambda1 is,

and what on earth Z is.

And what we're going to do is we're going
to turn, we're going to figure out

the value you have to set lambda1 to,

in order to satisfy this particular value
of the constraint,

and this particular value of the
constraint. So,

we know the functional form of the
distribution,

and now we just have to figure out the
parameters of that function.

And there will be two parameters.

So the first thing we know is of course,

that the probability is normalized, and
that means in fact that,

okay?

plugging in this functional form, and so
now, already, we can solve for Z

in terms of lambda1. So, eliminating the
first variable here, Z, is easy.

We can just set Z equal to the sum from i
0 to infinity, e to the negative lambda1 i

okay? So we already eliminated one
variable,

and now, all we have to do is to solve
for the other constraint. Okay?

In particular, just let me write this here

In particular, we have the sum from i 0 to
infinity times e to the negative lambda1 i

i, all over Z, that has to be equal to 4.