[computer-go] Genetic playout algorithms

Jason House jason.james.house at gmail.com
Thu Jul 5 17:01:44 PDT 2007


Darren Cook wrote:
> I've been toying with the idea of having a set of playout algorithms and
> allowing black and white to choose different algorithms in that playout.
>  (The idea came from trying to think how I could apply genetic
> algorithms to UCT playouts.)
>
> Here's how it would work. Assume you have 4 algorithms, A/B/C/D, some
> aggressive, some defensive, etc. All with a random element. For the
> first 16 playouts you try all combinations:
>   Black uses A, White uses A;
>   Black uses A, White uses B;
>   ...
>   Black uses D, White uses D;
>
> Now, if you noticed any trends then emphasize them in the choice of
> future playout algorithms. So if black never won with algorithm A, but
> always won with B, and won about half with C and D, then he'd choose A
> 5% of the time, B 45% of the time, C 25% and D 25% of the time. White
> may have found playout algorithms A and B won 3 out of 4, whereas C and
> D only won 1 out of 4. So white would choose A and B 35% of the time, C
> and D 15% of the time.
>
> In go terms, white may be ahead on territory but have a lot of
> weaknesses. Algorithm A might be weighting responding to the last enemy
> move. Algorithm B might be encouraging making good shape. Algorithm C
> might encourage capture or atari when you can. D might prefer areas of
> equal influence.
>
> So, after those initial 16 playouts each side would be choosing a
> playout algorithm that better exploits their current position.
>
> I have also considered instead of 4 distinct algorithms, one algorithm
> with some tunable parameters. The idea then would be based on what wins
> and loses to adjust the parameters before doing the next playout.
>
> Darren
>   
I've been thinking about that type of thing too.  It seems like Remi's 
framework of learning from professional games could be applied in a 
similar manner.


More information about the computer-go mailing list