-
Title:
10-08 Agents Of Reinforcement Learning
-
Description:
Unit 10 08 Agents of Reinforcement Learning.mp4
-
Now here's where reinforcement learning comes into play:
-
What if you don't know R--the Reward function?
-
What if you don't even know P--the transition model of the world?
-
Then you can't solve the Markov Decision Process
-
because you don't have what you need to solve it.
-
However, with reinforcement learning,
-
you can learn R and P by interacting with the world
-
or you can learn substitutes that will tell you
-
as much as you know, so that you never actually have to compute with R and P.
-
What you learn, exactly, depends on what you already know and what you want to do.
-
So we have several choices.
-
One choice is we can build a utility-based agent.
-
So we're going to list agent types, based on what we know,
-
what we want to learn,
-
and what we then use once we've learned.
-
So for a utility-based agent,
-
if we already know T, the transition model,
-
but we don't know R, the Reward model,
-
then we can learn R--and use that,
-
along with P, to learn our utility function;
-
and then go ahead and use the utility function
-
just as we did in normal Markov Decision Processes.
-
So that's one agent design.
-
Another design that we'll see in this Unit
-
is called a Q-learning agent.
-
In this one, we don't have to know P or R;
-
and we learn a value function, which is usually denoted by Q.
-
And that's a type of utility
-
but, rather than being a utility over states,
-
it's a utility of state action pairs--and that tells us:
-
For any given state and any given action,
-
what's the utility of that result--
-
without knowing the utilities and rewards, individually?
-
And then we can just use that Q directly.
-
So we don't actually have to ever learn the transition model, P,
-
with a Q-learning agent.
-
And finally, we can have a reflex agent
-
where, again, we don't need to know P and R to begin with;
-
and we learn directly, the policy, pi of S;
-
and then we just go ahead and apply pi.
-
So it's called a reflex agent because it's pure stimulus response:
-
I'm in a certain state, I take a certain action.
-
I don't have to think about modeling the world, in terms of:
-
What are the transitions--where am I going to go next?
-
I just go ahead and take that action.