And here are the answers: For the error introduced by a lack of enough sampling,
all these problems are true.
If you don't have enough samples,
it might make the utility too high; it might make the utility too low--
and it could certainly be improved by taking more trials.
But with the differences due to having not quite the right policy,
The answers aren't the same.
So yes, if you don't have the right policy,
that could make the utilities too low--if you're doing something silly,
like starting in this state and the policy says,
"Drive straight into the minus 1"
that could make the utility of this state lower than it really should be.
But it can't make the utility too high.
So we really have a bound on the utility here.
The bound is: what does the optimal policy do?
And no matter what policy we have,
it's not going to be better than the optimal policy;
and so we can only be making things worse
with our policy, not making them better.
And finally, having more N won't necessarily improve things.
It will decrease the variance, but it won't decrease or improve the mean.
十分なサンプルがない場合のエラーでは
この問題はすべて真になります
十分なサンプルが得られなければ
効用は極端に高くも低くもなる可能性があり
トライアルを重ねることで改善されるでしょう
しかし正しいポリシーが得られず
効用に差がある場合は
答えは同じになりません
正しいポリシーがない場合は
効用は極端に低くなるかもしれません
例えばこんなおかしな状態から始めれば
ポリシーの指示は“-1に進め”なので
この効用は本来の値よりかなり低くなるでしょう
しかしこの場合効用は高くはなりません
この効用には限界があり
限界は最適ポリシーの内容を表します
どんなポリシーを獲得しても
最適ポリシーが優位になります
このポリシーは状況を悪化させるもので
好転させるものではありません
そして最後にNの値が増えても
状況が改善するとは限りません
効用の差は縮小されますが
平均値には減少も改善もありません