Homework 5 3 Passive RL Agent ANSWER

  1. The answer is according to the policy the agent would prefer to follow this straight line,
  2. because it is the most direct, and it is the longer goal.
  3. Now, at any point he might slip off to one of these squares.
  4. Those would all potentially be explored,
  5. but if he did he would go back down onto the road.
  6. Likewise, he might fall off onto any of of these squares,
  7. but if he did, he would also go back towards the road.
  8. That's certainly true under this situation, when he's off road,
  9. but it also turns out to be true here and here,
  10. because the closest way to get to the goal would be to go in the north direction.
  11. Therefore, these three rows could all potentially be explored,
  12. but the bottom two rows would never be explored under any conditions
  13. no matter what happens stochastically as long as the agent is following this fixed policy.