Return to Video

09-30 Value Iterations And Policy 1

  • 0:00 - 0:03
    So, now that we have a value backup function
  • 0:03 - 0:05
    that we discussed in depth, the question now becomes
  • 0:05 - 0:07
    what's the optimal policy?
  • 0:07 - 0:10
    And it turns out this value backup function defines
  • 0:10 - 0:12
    the optimal policy as completely opposite
  • 0:12 - 0:14
    of which action to pick,
  • 0:14 - 0:18
    which is just the action that maximizes this expression over here.
  • 0:18 - 0:22
    For any state S, any value function V,
  • 0:22 - 0:24
    we can define a policy,
  • 0:24 - 0:28
    and that's the one that picks the action under argmax
  • 0:28 - 0:31
    that maximizes the expression over here.
  • 0:31 - 0:35
    For the maximization, we can safely draw up gamma and R(s).
  • 0:35 - 0:38
    Baked in the value iteration function was already
  • 0:38 - 0:41
    an action choice that picks the best action.
  • 0:41 - 0:43
    We just made it explicit.
  • 0:43 - 0:45
    This is the way of backing up values,
  • 0:45 - 0:48
    and once values have been backed up,
  • 0:48 -
    this is the way to find the optimal thing to do.
Title:
09-30 Value Iterations And Policy 1
Description:

Unit 9 30 Value Iterations and Policy 1

more » « less
Team:
Udacity
Project:
CS271 - Intro to Artificial Intelligence
Duration:
0:51
Amara Bot added a translation

English subtitles

Revisions