
Title:
0434 Left Turn Policy Solution

Description:

Here is my solution, I have the value function initialized. It has lots of 999s.

The policy is a similar function in 3D.

Then I have a function called policy2d, which is the one I'm later going to print.

That's the same in 2D.

Scrolling down, my update function is exactly the same as before for dynamic programming

While change exists go through [x, y]'s and all orientations

of which there are 4, so it's now a deeper loop.

If you found the goal location, then update the value,

and if there's an actual update, set "change" to True

and also mark it as the goal location.

Otherwise, if our grid cell is navigable at all,

let's go through the 3 different actions and here's a tricky part

how to make the action work but it works beautifully.

We go through the 3 different actions.

When we tag the ith action,

we add the corresponding orientation change to our orientation modulo 4.

It's a cyclic buffer, so this might subtract 1.

Keeping it the same will add 1 to orientation.

Then we apply the corresponding new motion model to x and y to obtain x2 and y2.

Then over here is our model of a car that steers first and then moves.

Scrolling down further, if we arrived at a valid grid cell in that it's still inside the grid

and it's not an obstacle, then like before we add to the value

the value of this new grid cell plus the cost of the corresponding action.

This is nonuniform, depending on what action we pick now.

This improves over the existing value.

We set this value to be the new value, and we mark change as True.

We also memorize the action name as before.

This is all effectively the same code as we had before

when we did dynamic programming in a 2dimensional world.

It gets us the value function, and it gets us the policy action.

However, I printed out a 2dimensional table, not a 3dimensional table.

To get to the 2dimensional table, I now need to be sensitive of my initial state.

Otherwise, it actually turns out to be undefined.

Let me set the initial state to be x, y, and orientation.

All I do now is run the policy.

With the very first state, I copy over the policy form the 3dimensional table

into the 2dimensional one, which will be this hash mark over here.

While I haven't reached the goal state quite yet as indicated

by checking for the star in my policy table.

Now, my policy table has a hash mark R and L,

but otherwise is the same as before.

If it's a hash mark, we just keep our orientation the way it is.

If it's R, I turn to the right. L is turn to the left.

I apply my forward motion,

and I then update my new x and y coordinates

to be the corresponding after the motion,

and I update my orientation to be o2.

Finally, I copy the 3dimensional symbol for my policy straight into the 2dimensional array.

This is the array that I finally print.

The key insight here is to go from the 3dimensional full policy

to a 2dimensional array I had to run the policy.

That's something you would have done to get back this table over here.

That's somewhat nontrivial. I didn't tell you this, but I hope you figured it out.

But everything else is the same dynamic programming loop that you've seen before.