[Computer-go] learning patterns for mc go
pasky at ucw.cz
Thu May 6 07:45:25 PDT 2010
On Tue, Apr 27, 2010 at 05:54:33PM +0200, Olivier Teytaud wrote:
> > My problem is that I can't find many papers about learning of MC playout
> > policies, in particular patterns.
> A just published paper about learning MC policies:
> It works quite well for Havannah (not tested on hex I think).
Why have you restricted the tiling to the actions being both performed
by the same player? This seems to give "if I played X before, following
up by Y will be good", but wouldn't be "if opponent played X before,
replying by Y will be good" at least as useful? Have you considered also
discouraging replies that give very bad results?
One thing I have hit when trying to implement something like this is
that minimax prunes a lot of interesting situations - if sequence A-B-C
is good, minimax will quickly redirect to less good A-X-C even if in
simulations, B is very likely to be played.
> But in the case of Go, the Wang-policy is too strong for being improved like
Does this imply that you have tried to implement it but weren't
successful, or is this just a feeling?
> (fill board and nakade in http://hal.inria.fr/inria-00386477/)
Thanks, and I have thought I know about all the recent computer-go
Petr "Pasky" Baudis
When I feel like exercising, I just lie down until the feeling
goes away. -- xed_over
More information about the Computer-go