[Computer-go] UCB-1 tuned policy
weiqiprogramming at gmail.com
Tue Apr 14 03:36:39 PDT 2015
I implemented UCB1-tuned in my basic UCB-1 go player, but it doesn't
seem like it makes a difference in self-play.
It seems like it's able to run 5-25% more simulations, which means it's
probably exploiting deeper (and has less steps until it runs out of room
to play legal moves), but I have yet to see any strength improvements on
As far as I understand, the only thing that's different is the formula.
Has anyone actually seen any difference between the two algorithms?
More information about the Computer-go