And here are the answers: For the error introduced by a lack of enough sampling, all these problems are true. If you don't have enough samples, it might make the utility too high; it might make the utility too low-- and it could certainly be improved by taking more trials. But with the differences due to having not quite the right policy, The answers aren't the same. So yes, if you don't have the right policy, that could make the utilities too low--if you're doing something silly, like starting in this state and the policy says, "Drive straight into the minus 1" that could make the utility of this state lower than it really should be. But it can't make the utility too high. So we really have a bound on the utility here. The bound is: what does the optimal policy do? And no matter what policy we have, it's not going to be better than the optimal policy; and so we can only be making things worse with our policy, not making them better. And finally, having more N won't necessarily improve things. It will decrease the variance, but it won't decrease or improve the mean.