[computer-go] Fast UCT (epsilon trick)

Łukasz Lew lukasz.lew at gmail.com
Sun Jan 14 01:50:03 PST 2007


Hi
I've looked back into the article and I see that it is not trivial to
understand how does the epsilon trick (ET) works. Therefore I will
describe it here on the example of UCT.

The bottleneck of UCT algorithm is choosing a move. Program has to
iterate over all children
evaluating their UCB value. We want to use of this loop rarely.
One way to do it is to group playouts together. I.e. play 2 or more MC
simulations per each tree traversal. But this is not good enough as it
doesn't distribute
the playouts over the tree and some parts will not get enough of them.

The ET works as follows:
With each node of the tree program keeps additional information:
- child_to_visit
- visits_to_go


When we visit node,
if node.visits_to_go > 0 then
  node->visits_to_go <- node.visits_to_go - 1
  node = child_to_visit

So we have predetermined strategy to go to particular child visits_to_go times.

But
if node.visis_to_go = 0 then      // we have to set the strategy first
  node->child_to_visit = UCT_find_child (node)
  node->visits_to_go = Epsilon * node->visited_count   // round up

As we can see, when Epsilon is small or when a particular node is
visited not too many times, UCT-ET is identical to UCT.

But when node was visited more times (near the root, far from leaves),
then UCT-ET may save some time and go to precalculated child.

That's it.
I hope You like it :)
Łukasz Lew

PS.

In Df-PN, good Epsilon was 1/4 I believe that here it will be much
smaller. 1/64 maybe.
Or even one may change the Epsilon equation to:
node->visits_to_go = Epsilon * sqrt(node->visited_count)

If someone will do some experiments, I'm eager to hear the results.

PS2

float InvSqrt (float x) {
   float xhalf = 0.5f * x;
   int i = *(int*)&x;
   i = 0x5f3759df - (i>>1);
   x = *(float*)&i;
   x = x * (1.5f - xhalf * x * x);
   return x;
}

;-)


More information about the computer-go mailing list