[Computer-go] mini-max with Policy and Value network
sheppardco at aol.com
Mon May 22 05:48:04 PDT 2017
My reaction was "well, if you are using alpha-beta, then at least use LMR rather than hard pruning." Your reaction is "don't use alpha-beta", and you would know better than anyone!
Yes, LMR in Go has is a big difference compared to LMR in chess: Go tactics take many moves to play out, whereas chess tactics are often pretty immediate. So LMR could hurt Go tactics much more than it hurts chess tactics. Compare the benefit of forcing the playout to the end of the game.
From: Computer-go [mailto:computer-go-bounces at computer-go.org] On Behalf Of Gian-Carlo Pascutto
Sent: Monday, May 22, 2017 4:08 AM
To: computer-go at computer-go.org
Subject: Re: [Computer-go] mini-max with Policy and Value network
On 20/05/2017 22:26, Brian Sheppard via Computer-go wrote:
> Could use late-move reductions to eliminate the hard pruning. Given
> the accuracy rate of the policy network, I would guess that even move
> 2 should be reduced.
The question I always ask is: what's the real difference between MCTS with a small UCT constant and an alpha-beta search with heavy Late Move Reductions? Are the explored trees really so different?
In any case, in my experiments Monte Carlo still gives a strong benefit, even with a not so strong Monte Carlo part. IIRC it was the case for AlphaGo too, and they used more training data for the value network than is publicly available, and Zen reported the same: Monte Carlo is important.
The main problem is the "only top x moves part". Late Move Reductions are very nice because there is never a full pruning. This heavy pruning by the policy network OTOH seems to be an issue for me. My program has big tactical holes.
Computer-go mailing list
Computer-go at computer-go.org
More information about the Computer-go