1 00:00:00,000 --> 00:00:06,000 And here are the answers: For the error introduced by a lack of enough sampling, 2 00:00:06,000 --> 00:00:08,000 all these problems are true. 3 00:00:08,000 --> 00:00:10,000 If you don't have enough samples, 4 00:00:10,000 --> 00:00:13,000 it might make the utility too high; it might make the utility too low-- 5 00:00:13,000 --> 00:00:16,000 and it could certainly be improved by taking more trials. 6 00:00:16,000 --> 00:00:20,000 But with the differences due to having not quite the right policy, 7 00:00:20,000 --> 00:00:22,000 The answers aren't the same. 8 00:00:22,000 --> 00:00:24,000 So yes, if you don't have the right policy, 9 00:00:24,000 --> 00:00:28,000 that could make the utilities too low--if you're doing something silly, 10 00:00:28,000 --> 00:00:31,000 like starting in this state and the policy says, 11 00:00:31,000 --> 00:00:34,000 "Drive straight into the minus 1" 12 00:00:34,000 --> 00:00:37,000 that could make the utility of this state lower than it really should be. 13 00:00:37,000 --> 00:00:40,000 But it can't make the utility too high. 14 00:00:40,000 --> 00:00:43,000 So we really have a bound on the utility here. 15 00:00:43,000 --> 00:00:47,000 The bound is: what does the optimal policy do? 16 00:00:47,000 --> 00:00:49,000 And no matter what policy we have, 17 00:00:49,000 --> 00:00:51,000 it's not going to be better than the optimal policy; 18 00:00:51,000 --> 00:00:54,000 and so we can only be making things worse 19 00:00:54,000 --> 00:00:56,000 with our policy, not making them better. 20 00:00:56,000 --> 00:01:00,000 And finally, having more N won't necessarily improve things. 21 00:01:00,000 --> 99:59:59,999 It will decrease the variance, but it won't decrease or improve the mean.