Got a YouTube account?

New: enable viewer-created translations and captions on your YouTube channel!

English subtitles

← 05_Reinforcement Learning

Get Embed Code
2 Languages

Showing Revision 1 created 01/18/2014 by Cogi-Admin.

  1. All right. So, that's supervised learning and unsupervised learning.
  2. That's pretty good. The last one is reinforcement learning.
  3. >> Whoo!
  4. >> Now, reinforcement learning is what we both do. So, Michael does a little
  5. bit of reinforcement learning here and there.
  6. You've got how manypapers published in reinforcement learning?
  7. >> All of them.
  8. >> [LAUGH]
  9. >> Several. I have several The man
  10. has like 100 papers on reinforcement learning.
  11. . In fact, he wrote, with his
  12. colleagues, the great summary journal article bringing
  13. everyone up to date what reinforcement learning was like back in 19?
  14. >> Like 112 years ago.
  15. >> 1992 or something like that?
  16. >> People were saying, yeah we should probably, somebody should write a new
  17. one because the other one is getting a little long in the tooth.
  18. >> But there have books written on machine learning since then.
  19. >> That's right.
  20. >> It's a very popular field. That's why we're both
  21. in it. Michael tends to prove a lot of things.
  22. >> It is not, that is not why I'm in it.
  23. >> Why, I'm, wait, what?
  24. >> You said it's a very popular field, and that's why we're in it.
  25. >> No, no, no, no, no, no. Did I say that?
  26. >> That's what I heard.
  27. >> I didn't mean
  28. to say that.
  29. >> Ooooh! Let's run it back, let's see.
  30. >> Yeah, let's do that again. Because I did not mean to say
  31. that. It is a very popular field. Perhaps because you're in it, Michael.
  32. >> I don't think that's it. When I was
  33. an undergraduate I thought, the thing that I really want
  34. to understand, I liked AI, I liked the whole
  35. idea of AI. But what I really want to understand
  36. is, how can you learn to be better from
  37. experience? Like, I built a tic-tac-toe playing program. And like,
  38. I want this tic-tac-toe playing program to get really good
  39. at tic-tac-toe. Because I was always interested in the most
  40. practical, society impacting problems.
  41. >> Tic-tac-toe generalizes pretty well to world hunger.
  42. >> Eventually. So that is what got me interested in
  43. it. And I was, I didn't even know what it
  44. was called for a long time. So I started doing
  45. reinforcement learning and then discovered that it was interesting and popular,
  46. >> Right. Well. I certainly wouldn't suggest
  47. we are doing the science that we are
  48. doing because it is popular. We are doing it because we are interested in it.
  49. >> Yes.
  50. >> And I'm interested in reinforcement learning
  51. because, in some sense it kind of encapsulates all the things I
  52. happen to care about. I come from a sort of general AI
  53. background, and I care about modeling people. I care about building smart
  54. agents that have to live in a world with other smart agents, thousands
  55. of them, hundreds of thousands of them, thousands of them. Some of
  56. whom might be human. And I have to field some way to predict
  57. what to do over time. So, from a sort of technical point
  58. of view, if we can think of supervised learning as function approximation, and
  59. unsupervised learning as, you know,
  60. concise, compact description, what's the difference
  61. between something like reinforcement learning and
  62. those two? Supervised learning in particular.
  63. >> > So, often the that supervised
  64. learning, sorry, reinforcement learning is described is
  65. as learning from delayed reward. So instead of the feedback that you get in
  66. supervised learning, which is, here's what you
  67. should do, and the feedback you get
  68. in unsupervised learning, which is [INAUDIBLE], the
  69. feedback in reinforcement learning may come several
  70. steps after the decisions that you've actually made. So, a good example
  71. of that, or an easy example of that would be actually your tic-tac-toe
  72. program, right? So, you do something in tic-tac-toe. You put an X
  73. in the center and then you put let's say, an O over here.
  74. >> Ooh.
  75. >> And then I put an X right here.
  76. >> Nice.
  77. >> And then you ridiculously put an O in the center.
  78. >> Aw, come on.
  79. >> Which allows me to put an X over here. And I win.
  80. >> All right.
  81. >> Now what's interesting about that is, I didn't
  82. tell you what happened until the very end, when I said, X wins.
  83. >> Right, and now I know I made a mistake somewhere along the
  84. way, but I don't know exactly where. I may have to kind of
  85. roll back the game in my mind and eventually figure out where it
  86. is that I went off track, and what it is that I did wrong.
  87. >> And in the full generality of reinforcement learning, you may have never
  88. made a mistake. It may simply be that, that's the way games go. But
  89. you would like to know which of the moves you made mattered. Now,
  90. if it were a supervised learning problem, I would have put the X here,
  91. you would put the O there, and would have
  92. been told, that's good. I would have put the X
  93. here, and then when you put the O there,
  94. it would have been, that's bad. The O goes here.
  95. >> Or something like that, or were told where he should have
  96. put the O. But, here all he gets is, eventually some kind of
  97. signal saying, you did something well, you did something poorly. And even
  98. then it's only relevant to the other signals that you might have gotten.
  99. >> Right, so then reinforcement learning is, in some
  100. sense, harder, because nobody's telling you what to do.
  101. You have to work it out on your own.
  102. >> Yeah, it's like playing a game without knowing any of the rules.
  103. Or at least knowing how you win or lose. But being told
  104. every once in awhile that you've won or you've lost. Okay, now.
  105. >> Sometimes I feel like that.
  106. >> I know, man.