Here is my solution.
As I go through all different actions a, as before,
I now create a new inner loop of going through different action outcomes.
This lists is (-1, 0, 1),
and I set the actual outcome to the adjacent action in the action list.
You might remember the action list is a list of different outcomes.
By incrementing it by 1 or decrementing it by 1, I can pick a slightly different action in that list.
Of course, I have to do the modulo 4 on the right side.
Then the limitation is similar to before. I project the outcome into new coordinates--x2 and y2.
Now I need to assign the probability with this outcome
where if they modify a 0, we take the success probability.
If it's not 0, we take 1 minus that divided by 2, because there are 2 possible undesired outcomes.
Then the test proceeds by checking whether this is a legal grid cell,
it's inside the grid, and the grid value is 0.
Then like before, I add the value of the grid cell
by now multiplying by the probability of that specific action outcome.
Otherwise, I do the same for the collision cost.
Finally, I take my cumulative value of v2, which I initialized with the cost of motion.
You can't see this right here, but it's filled up.
I update my value function just like before.
You can see the quote over here.
This is what you should have programmed.
The key difference to our example in class is the inner loop over here
where I go over different possible action outcomes,
compute the actual action outcome,
and then do the probabilistic addition of these outcomes rather than just studying one outcome.