
Title:
0536 Using Tools

Description:

So now let's go back and analyze this maximize differential strategy

versus the maximizing probability of winning strategy.

The question is, how do these 2 compare?

When are they different, and when are they the same?

If you're trying to impress the scouts, you're not going to be making some crazy moves,

so probably most of the time, you'd expect the 2 strategies to agree,

but some of the time, maybe 1 of them is going to be more aggressive

or taking more chances than the other.

Let's see if we can analyze that.

So I start off by defining a bunch of states, and I'm just going to look

from 1 player's point of view.

It doesn't really matter to have both since it's symmetric,

so for all these values of me, you, and pending, collect all those states.

It turns out that there's 35,000 of them.

Then I define a variable r to be a default dictionary, which counts up integers,

so it starts at 0, and then I go through all the states,

and I increment the count for a result for the tuple of the action that's taken by

max_wins and the action that's taken by max_diffs.

I want to count up.

This is going to be hold, hold, roll, roll. etc.

I want to see how many of each do we have,

and let's convert r back to standard dict, and there we have it.

So most of the time, 29,700 out of the 35,000both strategies agree that roll

is the right thing to do.

Then another 1200 times, both strategies agree that hold is the right thing to do.

But in 2 cases, they differ.

So sometimes, max_wins says hold and max_diffs says roll.

That happened 381 times, but 10 times more often, it's max_wins that says roll

and max_diff that says hold.

That actually surprised me.

So it's the max_wins strategy that's really more aggressive.

It's rolling more often.

I thought it was going to be the max_diffs strategy.

I thought that was going to be more aggressive, right?

So that's the one that's trying to impress the scouts.

I thought it was going to be rolling trying to rack up a really big score.

But no! So the data tells a different story.

It's not trying to rack up a really big score.

So what's going on?

Well, first it might be nice just to quantify how different they are

since I kind of asked that question.

So there's 35, 301 states all together and they differ on 3975 + 381,

and that's 12% of the states that they differ on.

So what's the story?

Where do those 12% of the states come from?

We still don't know, and we don't even quite know what questions to ask,

but it's here that some of our design choices start to pay off.

So remember we always start our design with an inventory of concepts,

and we have things like the dice and the score,

and then we got into things like the utility function and the quality function,

so we built all these up, and yes, we're building from the ground up,

and yes, at the top, we have a play_pig function, and we can still call that function,

but at the bottom, we have all these useful tools.

So now when we're not just about playing pig, now we're trying to analyze the situation

to understand this story of why are these 2 different?

Well, play_pig by itselfthe top level function we definethat's not going to help us,

but all these little tools that we built down here, they will be helpful.

We can start to put them together and explore.

So we built this tower, and the tower built up to define the play pig function,

and in some languages, it's all about building the tower.

When you're done, that's all you have.

But in Python, it's common and in many languages, it's a good design and strategy

to say let's just build up components along the way so that weyes, we have the tower,

but we can also go out in other directions.

If we're interestednot just in playing pigbut we're interested in figuring out this story,

then we can quickly assemble pieces from down here

and build something that can address that.

So I've got all the pieces available. It makes it easy to explore.

But I still need an idea, and here's my idea.

I expected maximize differential to be aggressive, to try to rack up the big points,

and I found out that it was actually maximizing the probability of winning

that was more aggressive that rolled more often.

Why could that be? I think I might know the answer.

I think it might be that the maximized differential is more willing to lose

rather than more excited about winning by a lot.

What do I mean by that?

Well, if you're maximizing the probability of winning,

you don't care if you lose by 1 or if you lose by 40,

it's all a loss.

The maximized differentialif he's losing by a fair amount, he might say,

wellsay he's behind 390 in a game to 40,

and say he's accumulated 30 points,

If he's trying to maximize the probability of winning,

he would keep on rolling.

He says, well, I don't have that good of a chance of winning,

but all that counts is winning.

If I stop now, the opponent's going to win on the next move,

so I've got to keep rolling.

Probably I'll pig out and only get 1 point, but it's worth it for that small chance of winning.

That's what the maximize win probability strategy would do.

The maximize differential strategy would say, hey, if I can get 30 points rather than 1,

that cuts the differential way down, so that's worth doing.

I'll sacrifice winning in order to maximize the differential.

Now that's a suggestion of a story, but I don't know yet.

Is that the right story? Let's find out.