[computer-go] Floating komi

Raymond Wold computergolist at w-wins.com
Wed Mar 5 10:06:51 PST 2008


Don Dailey wrote:
> 
> Hideki Kato wrote:
>> I'd like to give here an example to make things clear.
>>
>> The conditions are:
>> 1) Using digitizing scheme that maps real score to [0,1] (or [-1,1]) 
>> so that the program cannot distinguish losing/winning by 0.5 or 10.5 
>> pt at all.
>> 2) Playouts include some foolish moves (usually with low 
>> but not zero probability), not to connect large groups in atari 
>> position for example, due to hold its randomness.
>> 3) The position is at early endgame where there are no moves that 
>> gain greater than 2 pt, for example, in perfect play.
>> 4) Black is behind by 0.5 pt.
>>
>> The playouts may return winning but gambling move (perhaps with low 
>> probability) under above conditions, especialy in case of the number 
>> of playouts is small which is usually true on 19x19, and UCT will 
>> choose it.
>>
>> The question is, which is better to keep 0.5 pt behind or to play 
>> gambling moves (here I mean such moves that B will lose many pts if W 
>> will answer correctly) with expecting W's (stupid) mistakes?
>>
>>   
> The assumption is that you suddenly cannot trust MC to do what it does
> best even though you did for the entire game up until this point.   MC
> of course will choose the "gambling" move.      The whole concept of MC
> is to do what is most likely to produce a win.      

Not entirely, no. The concept of MC is to do what has most lines leading
to a win, which is slightly different. There's obviously a strong
correlation, or MC wouldn't work at all, but I think it's dangerous to
assume that MC by definition plays the best move. For one thing it makes
it very hard to argue about how to improve MC programs, it creates lots
of noise of "don't do that, it will only make your program weaker".

> We should think twice before asking it to choose the moves that produces
> the more sure loss.    We are the ones that have a bias about this, not
> the MC programs.  
> 
> 
>> In addition to above, there is one more issue to consider. If the 
>> playout has a systematic error, nakade for example, it's not good to 
>> keep 0.5 pt ahead.  Having more margin is clearly better.
>>   
> I believe nakade is a strawman.    There are lots of things MC does
> better and lot's of things it does less well.    You can always find
> positions that are hard or easy for your program to solve, but it isn't
> intrinsic to this issue.     I don't think you should weaken this
> concept of playing for the best winning chances for the very few
> positions where MC programs take longer to resolve the endgame and there
> is a slight chance that it will win if it just happens to be enough to
> cover the exact situation.   Because this is no solution - it is at best
> a patch and would only work in some cases.

I don't think patching one thing at a time is such a bad way to write a
go program. Small steps, one at a time, and you suddenly have a much
stronger program. And again you're making the assumption that to deviate
from accurate MC means less winning chances. It might mean less winning
LINES, but the probability of a loss or win is entirely dependent of how
the opponent plays, which is (hopefully) never random. And this does not
mean you're doing opponent modeling, or - if you define opponent
modeling very loosely so it includes this - that what you're doing is bad.

> If I could do something
> that didn't hurt the program in other ways, but might help certain
> positions once in a while,  I would go for it. 

I don't think you'll find ANY improvement to ANY non-trivial program
that doesn't, in some cases, make it play worse. What matters is how it
does in the average case.

> I've been in game
> programming a long time - if you have a problem with certain types of
> positions you really want a pointed solution that has little or no
> impact on other positions.   You don't want to be going back and forth
> fixing things up but you want to solve the problem as correctly as
> possible the first time.       I'll call this principle, "every solution
> has a side effect" but this is a pretty bad side effect.    (I can't
> tell you how many times I "fixed" something in my chess program with
> some evaluation change only to find that I broke many other things at
> the same time.)

A good understanding of _why_ your program works can help this a lot,
ensuring that you know how to fix a problem without causing bigger
problems elsewhere. I think part of the problem here is that few (if
any) people know _why_ MC works. Why does the cumulative result of
random play-outs correlate so strongly with the strength of the
position? In what ways does it NOT correlate, that can be fixed?




More information about the computer-go mailing list