YouTube

Got a YouTube account?

New: enable viewer-created translations and captions on your YouTube channel!

English subtitles

← 04-30 Value Program Solution

Get Embed Code
3 Languages

Showing Revision 1 created 06/29/2012 by Amara Bot.

  1. Here's my implementation, which should be relatively straight forward.
  2. We have a value function that is the same size as my world,
  3. and I initialize with 99 everywhere.
  4. This has to be evaluated as large enough it doesn't conflict with any actual value.
  5. I now update the value function a number times--I don't know how often--
  6. but as long as change something, I update it.
  7. Therefore, I introduced the variable "change," which I set to True in the beginning.
  8. While change is the case, I update, but I neatly set change to False.
  9. The only way to come back to True is that I actually changed something.
  10. Now I go through all the grid cells in a fixed order.
  11. It happens to be not very efficient, but certainly gets the job done.
  12. I first check if the grid cell I'm considering is the goal.
  13. Here is a typical case where I check for change.
  14. If the value is presently correctly set to 0, I don't do anything.
  15. If it's larger than 0, such as 99, then I set it down to 0, and I've just changed something.
  16. Therefore, I set the change flag back to True.
  17. If it's not a goal cell, then here is my full update function.
  18. I go through all the actions.
  19. I project a potential next state upon executing an action
  20. by adding the corresponding delta to the x and y.
  21. That gives me x2 and y2.
  22. I test whether x2 and y2 are legitimate states.
  23. For that they have to be inside the grid.
  24. I check whether the numbers are larger than 0 and smaller than the dimension of the grid.
  25. And it has to be an action that action navigable grid cell.
  26. Therefore, I check that the coordinates in the grid has a 0.
  27. If that's the case, I can propagate back the value.
  28. My new value is the value of this future grid cell plus the cost per step,
  29. which happens to be 1.
  30. Now, if this value is better than the value I have already, which means it is smaller,
  31. then I assign this new value to my original grid cell x and y, plus of course the cost step.
  32. Then I know I've changed something.
  33. Therefore I set change to "True," and the procedure repeats.
  34. The only thing missing at the very end when I'm done,
  35. I print out the value function using these commands over here.
  36. I should warn you this is not very efficient.
  37. The reason why it is not efficient is that value slowly propagates
  38. from the end towards the beginning.
  39. But leaving this concern aside, it actually computes the correct value function.
  40. There are ways to make it more efficient.
  41. It's also interesting to see what happens if I cut off any path to the goal.
  42. The the resulting value function will retain 99s for most of the state variables--
  43. exactly those where there is no valid path to the goal.