English subtitles

← 10-08 Agents Of Reinforcement Learning

Unit 10 08 Agents of Reinforcement Learning.mp4

Get Embed Code
2 Languages

Showing Revision 1 created 11/28/2012 by Amara Bot.

  1. Now here's where reinforcement learning comes into play:
  2. What if you don't know R--the Reward function?
  3. What if you don't even know P--the transition model of the world?
  4. Then you can't solve the Markov Decision Process
  5. because you don't have what you need to solve it.
  6. However, with reinforcement learning,
  7. you can learn R and P by interacting with the world
  8. or you can learn substitutes that will tell you
  9. as much as you know, so that you never actually have to compute with R and P.
  10. What you learn, exactly, depends on what you already know and what you want to do.
  11. So we have several choices.
  12. One choice is we can build a utility-based agent.
  13. So we're going to list agent types, based on what we know,
  14. what we want to learn,
  15. and what we then use once we've learned.
  16. So for a utility-based agent,
  17. if we already know T, the transition model,
  18. but we don't know R, the Reward model,
  19. then we can learn R--and use that,
  20. along with P, to learn our utility function;
  21. and then go ahead and use the utility function
  22. just as we did in normal Markov Decision Processes.
  23. So that's one agent design.
  24. Another design that we'll see in this Unit
  25. is called a Q-learning agent.
  26. In this one, we don't have to know P or R;
  27. and we learn a value function, which is usually denoted by Q.
  28. And that's a type of utility
  29. but, rather than being a utility over states,
  30. it's a utility of state action pairs--and that tells us:
  31. For any given state and any given action,
  32. what's the utility of that result--
  33. without knowing the utilities and rewards, individually?
  34. And then we can just use that Q directly.
  35. So we don't actually have to ever learn the transition model, P,
  36. with a Q-learning agent.
  37. And finally, we can have a reflex agent
  38. where, again, we don't need to know P and R to begin with;
  39. and we learn directly, the policy, pi of S;
  40. and then we just go ahead and apply pi.
  41. So it's called a reflex agent because it's pure stimulus response:
  42. I'm in a certain state, I take a certain action.
  43. I don't have to think about modeling the world, in terms of:
  44. What are the transitions--where am I going to go next?
  45. I just go ahead and take that action.