[Computer-go] Using RAVE statistics during playout
drake at lclark.edu
Fri Mar 29 10:46:51 PDT 2013
The "Last Good Reply" approach is similar (although not identical) to this.
We (Orego) got an improvement from it. Some others have, some haven't.
On Fri, Mar 29, 2013 at 10:40 AM, Alexander Kozlovsky <
alexander.kozlovsky at gmail.com> wrote:
> I know that RAVE data typically used during tree traversing.
> But is it possible to use it during random playout, in order to
> increase playout quality?
> On the first sight it seems as dangerous idea, because
> RAVE statistics are incrementally gathered from the same
> playouts, and this can lead to problematic positive feedback
> loop, as in saying "The rich get richer and the poor get poorer".
> That is, random initial fluctuation can get stronger with time
> and statistics become skewed, because good moves which
> receive unfortunate initial RAVE data will be ignored
> in future random playout.
> But what if we see move selection during random playout
> as a typical multiarm bandit problem? Then the algorithm
> of next playout move selection can be the next:
> 1) select several (say, 4) valid candidate moves for the playout.
> 2) choose the next move using multiarm bandit formula.
> We can do this, because for each candidate move we
> know (a) number of rave wins for this move, (b) number
> of playouts with this move, (c) total number of playouts
> (all of this numbers are tied to current UCT node)
> I think, this should add exploration element to next move
> selection and prevent skewing of RAVE statistics.
> I suspect using RAVE data can improve playout strength
> Has anybody trying something like this, or it is just crazy idea?
> Computer-go mailing list
> Computer-go at dvandva.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Computer-go