[Computer-go] 9x9 is last frontier?

uurtamo . uurtamo at gmail.com
Tue Mar 6 14:02:40 PST 2018


Summarizing the objections to my (non-evidence-based, but hand-wavy
observationally-based) assertion that 9x9 is going down anytime someone
really wants it to go down, I get the following:

* value networks can't hack it (okay, maybe? does this make it less likely?
-- we shouldn't expect to cut-and-paste.)
* double ko is some kind of super special problem on 9x9
* the margins are way narrower

the second issue (ko in general, but multi-ko) is exactly why go is
pspace-complete (I made a rough argument for the fact that under slight
relaxation it isn't, but didn't flesh it out or publish it).

I am not feeling strong arguments about the overall picture (i.e. it's
super much harder than 19x19 to beat humans at) other than that the margins
are narrower.

Does anyone else have a different synopsis?

Thanks,

Steve




On Tue, Mar 6, 2018 at 12:17 PM, Brian Sheppard via Computer-go <
computer-go at computer-go.org> wrote:

> Well, AlphaZero did fine at chess tactics, and the papers are clear on the
> details. There must be an error in your deductions somewhere.
>
>
>
> *From:* Computer-go [mailto:computer-go-bounces at computer-go.org] *On
> Behalf Of *Dan
> *Sent:* Tuesday, March 6, 2018 1:46 PM
>
> *To:* computer-go at computer-go.org
> *Subject:* Re: [Computer-go] 9x9 is last frontier?
>
>
>
> I am pretty sure it is an MCTS problem and I suspect not something that
> could be easily solved with a policy network (could be wrong hree). My
> opinon is that DCNN is not
>
> a miracle worker (as somebody already mentioned here) and it is going to
> fail  resolving tactics.  I would be more than happy with it if it has same
> power as a qsearch to be honest.
>
>
>
> Search traps are the major problem with games like Chess, and what makes
> transitioning the success of DCNN from Go to Chess non trivial.
>
> The following paper discusses shallow traps that are prevalent in chess. (
> https://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/download/1458/1571
> )
>
> They mention traps make MCTS very inefficient.  Even if the MCTS is given
> 50x more time is needed by an exhaustive minimax tree, it could fail to
> find a level-5 or level-7 trap.
>
> It will spend, f.i, 95% of its time searching an asymetric tree of depth >
> 7 when a shallow trap of depth-7 exists, thus, missing to find the level-7
> trap.
>
> This is very hard to solve even if you have unlimited power.
>
>
>
> The plain MCTS as used by AlphaZero is the most ill-suited MCTS version in
> my opinion and i have hard a hard time seeing how it can be competitive
> with Stockfish tactically.
>
>
>
> My MCTS chess engine with  AlphaZero like MCTS was averaging was missing a
> lot of tactics. I don't use policy or eval networks but qsearch() for eval,
> and the policy is basically
>
> choosing which ever moves leads to a higher eval.
>
>
>
> a) My first improvement to the MCTS is to use minimax backups instead of
> averaging. This was an improvmenet but not something that would solve the
> traps
>
>
>
> b) My second improvment is to use alphabeta rollouts. This is a rollouts
> version that can do nullmove and LMR etc... This is a huge improvment and
> none of the MCTS
>
> versons can match it. More on alpha-beta rollouts here (
> https://www.microsoft.com/en-us/research/wp-content/
> uploads/2014/11/huang_rollout.pdf )
>
>
>
> So AlphaZero used none of the above improvements and yet it seems to be
> tactically strong. Leela-Zero suffered from tactical falls left and right
> too as I expected.
>
>
>
> So the only explanation left is the policy network able to avoid traps
> which I find hard to believe it can identify more than a qsearch level
> tactics.
>
>
>
> All I am saying is that my experience (as well as many others) with MCTS
> for tactical dominated games is bad, and there must be some breakthrough in
> that regard in AlphaZero
>
> for it to be able to compete with Stockfish on a tactical level.
>
>
>
> I am curious how Remi's attempt at Shogi using AlphaZero's method will
> turnout.
>
>
>
> regards,
>
> Daniel
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go <
> computer-go at computer-go.org> wrote:
>
> Training on Stockfish games is guaranteed to produce a blunder-fest,
> because there are no blunders in the training set and therefore the policy
> network never learns how to refute blunders.
>
>
>
> This is not a flaw in MCTS, but rather in the policy network. MCTS will
> eventually search every move infinitely often, producing asymptotically
> optimal play. But if the policy network does not provide the guidance
> necessary to rapidly refute the blunders that occur in the search, then
> convergence of MCTS to optimal play will be very slow.
>
>
>
> It is necessary for the network to train on self-play games using MCTS.
> For instance, the AGZ approach samples next states during training games by
> sampling from the distribution of visits in the search. Specifically: not
> by choosing the most-visited play!
>
>
>
> You see how this policy trains both search and evaluation to be internally
> consistent? The policy head is trained to refute the bad moves that will
> come up in search, and the value head is trained to the value observed by
> the full tree.
>
>
>
> *From:* Computer-go [mailto:computer-go-bounces at computer-go.org] *On
> Behalf Of *Dan
> *Sent:* Monday, March 5, 2018 4:55 AM
> *To:* computer-go at computer-go.org
> *Subject:* Re: [Computer-go] 9x9 is last frontier?
>
>
>
> Actually prior to this it was trained with hundreds of thousands of
> stockfish games and didn’t do well on tactics (the games were actually a
> blunder fest). I believe this is a problem of the MCTS used and not due to
> for lack of training.
>
>
>
> Go is a strategic game so that is different from chess that is full of
> traps.
>
> I m not surprised Lela zero did well in go.
>
>
>
> On Mon, Mar 5, 2018 at 2:16 AM Gian-Carlo Pascutto <gcp at sjeng.org> wrote:
>
> On 02-03-18 17:07, Dan wrote:
> > Leela-chess is not performing well enough
>
> I don't understand how one can say that given that they started with the
> random network last week only and a few clients. Of course it's bad!
> That doesn't say anything about the approach.
>
> Leela Zero has gotten strong but it has been learning for *months* with
> ~400 people. It also took a while to get to 30 kyu.
>
> --
> GCP
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20180306/952cbc92/attachment.html>


More information about the Computer-go mailing list