[Computer-go] Progressive bias
Detlef Schmicker
ds2 at physik.de
Sun Feb 23 02:42:20 PST 2014
Hi,
as a result of my oakfoam scaling tests I had a look at our progressive
bias impementation.
I recognized, that the playing strength is quite sensitive to the exact
way of progressive bias. I looked into pachi and the "Progressive
Strategies for Monte-Carlo Tree Search" paper.
I could not find a mathematical reason for the ways used.
Pachi has an implementation which was justified by effective
implementation (if I understood correctly) and
"Progressive Strategies for Monte-Carlo Tree Search"
uses a additative term: H_B/n_i with H_B representing heuristic
knowledge and n_i are the playouts of the node.
On the one hand I wondered that using playouts of the node (and not
playouts of the parent) interferes with the UCT term sqrt(log(N)/n_i),
which lead me to change this. And I do not see a mathematical reason for
scaling with 1/N, why not 1/N^2 or something like exp(-c*N)??
On the other hand H_B is by no way specified. One may tend to use gammas
(from "Computing Elo Ratings of Move Patterns in the Game of Go"), but
as gammas are products I thought it might be more correct to use their
log as an additive term?!
so my actual progressive term is
log(gamma)/N,
with gamma from the ELO paper and N being the playouts of the parent
node (I talk about 80ELO improvenment over the term (gamma/n_i) tested
on 9x9 with 5000 playouts/move against pachi)
But I would feel better with mathematical arguments for using 1/N and
log(gamma)
Any hints would be very great:)
Detlef
More information about the Computer-go
mailing list