[computer-go] Fast UCT (epsilon trick)
Łukasz Lew
lukasz.lew at gmail.com
Sun Jan 14 01:50:03 PST 2007
Hi
I've looked back into the article and I see that it is not trivial to
understand how does the epsilon trick (ET) works. Therefore I will
describe it here on the example of UCT.
The bottleneck of UCT algorithm is choosing a move. Program has to
iterate over all children
evaluating their UCB value. We want to use of this loop rarely.
One way to do it is to group playouts together. I.e. play 2 or more MC
simulations per each tree traversal. But this is not good enough as it
doesn't distribute
the playouts over the tree and some parts will not get enough of them.
The ET works as follows:
With each node of the tree program keeps additional information:
- child_to_visit
- visits_to_go
When we visit node,
if node.visits_to_go > 0 then
node->visits_to_go <- node.visits_to_go - 1
node = child_to_visit
So we have predetermined strategy to go to particular child visits_to_go times.
But
if node.visis_to_go = 0 then // we have to set the strategy first
node->child_to_visit = UCT_find_child (node)
node->visits_to_go = Epsilon * node->visited_count // round up
As we can see, when Epsilon is small or when a particular node is
visited not too many times, UCT-ET is identical to UCT.
But when node was visited more times (near the root, far from leaves),
then UCT-ET may save some time and go to precalculated child.
That's it.
I hope You like it :)
Łukasz Lew
PS.
In Df-PN, good Epsilon was 1/4 I believe that here it will be much
smaller. 1/64 maybe.
Or even one may change the Epsilon equation to:
node->visits_to_go = Epsilon * sqrt(node->visited_count)
If someone will do some experiments, I'm eager to hear the results.
PS2
float InvSqrt (float x) {
float xhalf = 0.5f * x;
int i = *(int*)&x;
i = 0x5f3759df - (i>>1);
x = *(float*)&i;
x = x * (1.5f - xhalf * x * x);
return x;
}
;-)
More information about the computer-go
mailing list