
Title:
1008 Agents Of Reinforcement Learning

Description:
Unit 10 08 Agents of Reinforcement Learning.mp4

Now here's where reinforcement learning comes into play:

What if you don't know Rthe Reward function?

What if you don't even know Pthe transition model of the world?

Then you can't solve the Markov Decision Process

because you don't have what you need to solve it.

However, with reinforcement learning,

you can learn R and P by interacting with the world

or you can learn substitutes that will tell you

as much as you know, so that you never actually have to compute with R and P.

What you learn, exactly, depends on what you already know and what you want to do.

So we have several choices.

One choice is we can build a utilitybased agent.

So we're going to list agent types, based on what we know,

what we want to learn,

and what we then use once we've learned.

So for a utilitybased agent,

if we already know T, the transition model,

but we don't know R, the Reward model,

then we can learn Rand use that,

along with P, to learn our utility function;

and then go ahead and use the utility function

just as we did in normal Markov Decision Processes.

So that's one agent design.

Another design that we'll see in this Unit

is called a Qlearning agent.

In this one, we don't have to know P or R;

and we learn a value function, which is usually denoted by Q.

And that's a type of utility

but, rather than being a utility over states,

it's a utility of state action pairsand that tells us:

For any given state and any given action,

what's the utility of that result

without knowing the utilities and rewards, individually?

And then we can just use that Q directly.

So we don't actually have to ever learn the transition model, P,

with a Qlearning agent.

And finally, we can have a reflex agent

where, again, we don't need to know P and R to begin with;

and we learn directly, the policy, pi of S;

and then we just go ahead and apply pi.

So it's called a reflex agent because it's pure stimulus response:

I'm in a certain state, I take a certain action.

I don't have to think about modeling the world, in terms of:

What are the transitionswhere am I going to go next?

I just go ahead and take that action.