[computer-go] Improvement of UCT search algorithm
Don Dailey
drd at mit.edu
Sun Oct 8 20:27:34 PDT 2006
On Sun, 2006-10-08 at 15:37 +0200, Łukasz Lew wrote:
> I would like to add that in my experiments with a small number of
> playouts ~30000, without all as first heuristics, the common situation
> was that a bot explored the tree more or less properly, but in the end
> it finished with a wrong move, because last refutation found didn't
> manage to make a significant change to the root decision.
>
> How do You deal with this problem?
> How do You choose the final move?
This is no general solution other than more time. More time is always
the answer and is why this algorithm is scalable.
But there may be things that will help a bit. I don't select a move
unless it is both best and most sampled unless I have reach a maximum
time threshold, in which case I select the most sampled move instead of
the best. I don't apply this rule if the score is close to the
extremes, in other words the position is pretty much won or lost because
my algorithm tends to sample all the moves evenly in lost game
situations and in won game situations there is little to gained and I
want my time to be applied mostly to critical moves.
It would be nice to know if a move is best overall and also best based
on the last N samples or using a heuristic such as yours. Then you
might be in a position to say, "hey, this new move looks like it might
be a winner - let's wait to see if this is just noise."
But this is a recursive problem. Some subtree from the root may have a
new best move coming along slowly and it's difficult to tell noise from
a truly good move.
- Don
More information about the computer-go
mailing list