[computer-go] Way MC plays

Jacques Basaldúa jacques at dybot.com
Sat Mar 1 06:03:42 PST 2008


Don Dailey wrote:

> I personally have serious doubts about knowledge extraction from human
> games, but I hope you have success.    I think you can get more from
> computer games of strong players even though the level is weaker. Here
> is why I say that:

> 1.  A strong computer still plays a lot of good moves - so the delta
> between human and computer games is not as high as you think.

> 2.  A certain consistency in computer games that humans don't possess. 

> 3.  You have access to the internals, such as a score that quantifies
> moves.

My idea combines offline knowledge with MC methods. So there is a lot
of online computer generated knowledge. I don't pretend to make a dumb
savant program, just playing the only move it finds in a joseki database.
But if the program considers the joseki stored in (state, action[i-1]) 
-> action[i] records. It will immediately start searching a tree whose
first move at all depths is the joseki sequence. If the joseki depends
on a ladder, it will find that ladder much earlier and if it is not good
it will hopefully play something else. At blitz time settings, playing 
the wrong  joseki is bad but not terrible. That would be about the level 
of a human dan making a mistake in a blitz game, still better than current 
programs.

In this context, human knowledge produces a priori values for UCT. (The 
exact formula of the MC tree search is not yet decided.) There is enough
computer generated knowledge in the search. When the end of the game 
is still 200+ moves ahead, the search alone does not find enough 
difference between good and not so good moves.

The knowledge, I think, makes sense to extract is:

1. (state, last move) -> (next moves list) records (joseki or places
to play when the last move is somewhere else.)
2. urgency of shapes
3. distribution of statistics (e.g. proportion of times tenuki is played 
at move n) that helps making playouts more humanlike while still fast.
I finally abandoned full board Bradley-Terry scored random playouts. I 
draw a random number to decide if tenuki should be played, if true I 
invent a random move and use it as if it was the previous move. Else,
I play a Bradley-Terry scored random move in the 40 neighbors of the 
last move. That is almost immediate in my hologram board system because
I store the mask. On other implementations it may not be so good.

About urgency of shapes:

The urgency of a BT adjusted 40 neighbors shape hits 40% in a 10 M+ sample 
with about 160 K patterns. (Overlearning should not be very important
given the lack of degrees of liberty.) It hits 66% with the first 5 moves.

 Hits of move  1 =  0.402048 acc =  0.402048
 Hits of move  2 =  0.117102 acc =  0.519151
 Hits of move  3 =  0.065450 acc =  0.584600
 Hits of move  4 =  0.045592 acc =  0.630193
 Hits of move  5 =  0.035190 acc =  0.665382

That sounds great news. But there is bad news of course. That doesn't go any 
further. To reach 80% it needs 14 moves

 Hits of move 14 =  0.006206 acc =  0.801095

And to reach 90% it needs 96 moves!

 Hits of move 96 =  0.001323 acc =  0.900994

Shape is a factor that explains, perhaps 2/3 of the moves (shape alone about 
40%, but shape "in the right place" a little more than 66%). Not less and
_not more_ than that. It is naive to think that the 12th move in terms of 
shape is better than the 50th. But, imagine there are 250 legal moves, the 
248th is a really bad move. 

Can shape make a difference? I don't know. If its slow, surely not. If it 
is almost free in terms of performance as in my board system, I hope it does.


Jacques.




More information about the computer-go mailing list