[Computer-go] mini-max with Policy and Value network

Gian-Carlo Pascutto gcp at sjeng.org
Mon May 22 07:09:36 PDT 2017


On 22-05-17 14:48, Brian Sheppard via Computer-go wrote:
> My reaction was "well, if you are using alpha-beta, then at least use
> LMR rather than hard pruning." Your reaction is "don't use 
> alpha-beta", and you would know better than anyone!

There's 2 aspects to my answer:

1) Unless you've made a breakthrough with value nets, there appears to
be a benefit to keeping the Monte Carlo simulations.

2) I am not sure the practical implementations of both algorithms end up
searching in a different manner.

(1) Is an argument against using alpha-beta. If we want to get rid of
the MC simulations - for whatever reason - it disappears. (2) isn't an
argument against. Stating the algorithm in a different manner may make
some heuristics or optimizations more obvious.

> Yes, LMR in Go has is a big difference compared to LMR in chess: Go 
> tactics take many moves to play out, whereas chess tactics are often
>  pretty immediate.

Not sure I agree with the basic premise here.

> So LMR could hurt Go tactics much more than it hurts chess tactics.
> Compare the benefit of forcing the playout to the end of the game.
LMR doesn't prune anything, it just reduces the remaining search depth
for non-highly rated moves. So it's certainly not going to make
something tactically weaker than hard pruning? If you're talking about
not pruning or reducing at all, you get the issue of the branching
factor again.

In chess you have quiescent search to filter out the simpler tactics. I
guess Monte Carlo simulations may act similar in that they're going to
raise/lower the score if in some simulations tactical shenanigans happen.

-- 
GCP


More information about the Computer-go mailing list