[Computer-go] UCB-1 tuned policy

Igor Polyakov weiqiprogramming at gmail.com
Tue Apr 14 03:36:39 PDT 2015


I implemented UCB1-tuned in my basic UCB-1 go player, but it doesn't 
seem like it makes a difference in self-play.

It seems like it's able to run 5-25% more simulations, which means it's 
probably exploiting deeper (and has less steps until it runs out of room 
to play legal moves), but I have yet to see any strength improvements on 
9x9 boards.

As far as I understand, the only thing that's different is the formula. 
Has anyone actually seen any difference between the two algorithms?



More information about the Computer-go mailing list