[Computer-go] learning patterns for mc go

Petr Baudis pasky at ucw.cz
Thu May 6 07:45:25 PDT 2010


On Tue, Apr 27, 2010 at 05:54:33PM +0200, Olivier Teytaud wrote:
> > My problem is that I can't find many papers about learning of MC playout
> > policies, in particular patterns.
> >
> 
> A just published paper about learning MC policies:
> http://hal.inria.fr/inria-00456422/fr/
> It works quite well for Havannah (not tested on hex I think).

Very interesting!

Why have you restricted the tiling to the actions being both performed
by the same player? This seems to give "if I played X before, following
up by Y will be good", but wouldn't be "if opponent played X before,
replying by Y will be good" at least as useful? Have you considered also
discouraging replies that give very bad results?

One thing I have hit when trying to implement something like this is
that minimax prunes a lot of interesting situations - if sequence A-B-C
is good, minimax will quickly redirect to less good A-X-C even if in
simulations, B is very likely to be played.

> But in the case of Go, the Wang-policy is too strong for being improved like
> that.

Does this imply that you have tried to implement it but weren't
successful, or is this just a feeling?

> (fill board and nakade in http://hal.inria.fr/inria-00386477/)

Thanks, and I have thought I know about all the recent computer-go
papers... :-)

-- 
				Petr "Pasky" Baudis
When I feel like exercising, I just lie down until the feeling
goes away.  -- xed_over



More information about the Computer-go mailing list