[Computer-go] naive thoughts on enthalpy and dynamic komi

Nick Wedd nick at maproom.co.uk
Fri Feb 4 05:02:27 PST 2011


In a Monte-Carlo program, the amount of information derived from one 
playout is given by its enthalpy
   -p(win).log_2(p(win)) - p(lose).log_2(p(lose)).
This has a maximum at p(win) = 0.5, and is 0 if p(win) is 0 or 1.

So, suppose your MC Go-playing program is doing its playouts, and has 
found several moves which have all won more than 90% of the time.  It 
can do more playouts with these moves, but this is a poor way of getting 
more information about which of them is best.  If instead, it pretends 
that it will have to give an extra 10 points of komi, maybe it finds 
that these moves now win, on average, only 60% of the time.  Now the 
enthalpy of the playouts is greater, so it is gathering information 
faster.  The information is not as good, it is measuring the wrong 
thing; but it is gathering it more than twice as fast, which should more 
than compensate.

Similarly, suppose its best move has won less than 10% of the playouts. 
It could resign, but let's say it is giving a handicap to a weaker 
player.  Instead of just doing more playouts, it can pretend that it 
will be receiving extra komi.  Again, the quality of the information per 
playout then drops, but the quantity goes up, hopefully by more than 
enough to compensate.

This seems like an argument for using dynamic komi, adjusted from time 
to time during each game move.

Nick
-- 
Nick Wedd    nick at maproom.co.uk



More information about the Computer-go mailing list