[Computer-go] 9x9 is last frontier?

Dan dshawul at gmail.com
Tue Mar 6 10:46:07 PST 2018


I am pretty sure it is an MCTS problem and I suspect not something that
could be easily solved with a policy network (could be wrong hree). My
opinon is that DCNN is not
a miracle worker (as somebody already mentioned here) and it is going to
fail  resolving tactics.  I would be more than happy with it if it has same
power as a qsearch to be honest.

Search traps are the major problem with games like Chess, and what makes
transitioning the success of DCNN from Go to Chess non trivial.
The following paper discusses shallow traps that are prevalent in chess. (
https://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/download/1458/1571 )
They mention traps make MCTS very inefficient.  Even if the MCTS is given
50x more time is needed by an exhaustive minimax tree, it could fail to
find a level-5 or level-7 trap.
It will spend, f.i, 95% of its time searching an asymetric tree of depth >
7 when a shallow trap of depth-7 exists, thus, missing to find the level-7
trap.
This is very hard to solve even if you have unlimited power.

The plain MCTS as used by AlphaZero is the most ill-suited MCTS version in
my opinion and i have hard a hard time seeing how it can be competitive
with Stockfish tactically.

My MCTS chess engine with  AlphaZero like MCTS was averaging was missing a
lot of tactics. I don't use policy or eval networks but qsearch() for eval,
and the policy is basically
choosing which ever moves leads to a higher eval.

a) My first improvement to the MCTS is to use minimax backups instead of
averaging. This was an improvmenet but not something that would solve the
traps

b) My second improvment is to use alphabeta rollouts. This is a rollouts
version that can do nullmove and LMR etc... This is a huge improvment and
none of the MCTS
versons can match it. More on alpha-beta rollouts here (
https://www.microsoft.com/en-us/research/wp-content/uploads/2014/11/huang_rollout.pdf
)

So AlphaZero used none of the above improvements and yet it seems to be
tactically strong. Leela-Zero suffered from tactical falls left and right
too as I expected.

So the only explanation left is the policy network able to avoid traps
which I find hard to believe it can identify more than a qsearch level
tactics.

All I am saying is that my experience (as well as many others) with MCTS
for tactical dominated games is bad, and there must be some breakthrough in
that regard in AlphaZero
for it to be able to compete with Stockfish on a tactical level.

I am curious how Remi's attempt at Shogi using AlphaZero's method will
turnout.

regards,
Daniel








On Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go <
computer-go at computer-go.org> wrote:

> Training on Stockfish games is guaranteed to produce a blunder-fest,
> because there are no blunders in the training set and therefore the policy
> network never learns how to refute blunders.
>
>
>
> This is not a flaw in MCTS, but rather in the policy network. MCTS will
> eventually search every move infinitely often, producing asymptotically
> optimal play. But if the policy network does not provide the guidance
> necessary to rapidly refute the blunders that occur in the search, then
> convergence of MCTS to optimal play will be very slow.
>
>
>
> It is necessary for the network to train on self-play games using MCTS.
> For instance, the AGZ approach samples next states during training games by
> sampling from the distribution of visits in the search. Specifically: not
> by choosing the most-visited play!
>
>
>
> You see how this policy trains both search and evaluation to be internally
> consistent? The policy head is trained to refute the bad moves that will
> come up in search, and the value head is trained to the value observed by
> the full tree.
>
>
>
> *From:* Computer-go [mailto:computer-go-bounces at computer-go.org] *On
> Behalf Of *Dan
> *Sent:* Monday, March 5, 2018 4:55 AM
> *To:* computer-go at computer-go.org
> *Subject:* Re: [Computer-go] 9x9 is last frontier?
>
>
>
> Actually prior to this it was trained with hundreds of thousands of
> stockfish games and didn’t do well on tactics (the games were actually a
> blunder fest). I believe this is a problem of the MCTS used and not due to
> for lack of training.
>
>
>
> Go is a strategic game so that is different from chess that is full of
> traps.
>
> I m not surprised Lela zero did well in go.
>
>
>
> On Mon, Mar 5, 2018 at 2:16 AM Gian-Carlo Pascutto <gcp at sjeng.org> wrote:
>
> On 02-03-18 17:07, Dan wrote:
> > Leela-chess is not performing well enough
>
> I don't understand how one can say that given that they started with the
> random network last week only and a few clients. Of course it's bad!
> That doesn't say anything about the approach.
>
> Leela Zero has gotten strong but it has been learning for *months* with
> ~400 people. It also took a while to get to 30 kyu.
>
> --
> GCP
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20180306/0fdbdfef/attachment.html>


More information about the Computer-go mailing list