[Computer-go] mini-max with Policy and Value network
gcp at sjeng.org
Tue May 23 02:55:33 PDT 2017
On 23-05-17 03:39, David Wu wrote:
> Leela playouts are definitely extremely bad compared to competitors like
> Crazystone. The deep-learning version of Crazystone has no value net as
> far as I know, only a policy net, which means it's going on MC playouts
> alone to produce its evaluations. Nonetheless, its playouts often have
> noticeable and usually correct opinions about early midgame game
> positions (as confirmed by the combination of own judgment as a dan
> player and Leela's value net). Which I find amazing - that it can even
> approximately get these right.
Leela's Monte Carlo playouts were designed and implemented in 2007,
before most of the current literature around them was public. Back then,
they were very "thick" and good enough to make the program one of the
strongest around. Needless to say, in the ~9 years or so when I was
absent from go programming, others made substantial progress in that
area, especially as before value nets this was clearly one of the most
important components of strength. Leela's Monte Carlo playouts for sure
are weaker than those of Crazy Stone and Zen, and even pachi. I have
done work on this in the last year, but a more complete overhaul isn't
in 0.10.0 yet.
Nevertheless (as you also observe below) they still contribute a benefit
to the strength of the engine. That's why I've been consistently saying
dropping them doesn't seem to be good, and why I like the orthogonality
they provide with the value net (and am generally wary of methods that
tune the playouts with or towards the value net).
> So clearly what's going on is that the playouts allow suicide,
I'll need to reconstruct the position you set up, but this is something
that shouldn't happen. Thank you for pointing it out, I'll try to
confirm on my side.
> Now I'm just speculating. My guess is that somehow 3% of the time, the
> game is scored without black having captured white's group. As in -
> black passes, white passes, white's dead group is still on the board, so
> white wins. The guess would be that liberties and putting it in atari
> increases the likelihood that the playouts kill the group before having
> both players pass and score. But that's just a guess, maybe there's also
> more black magic involving adjusting the "value" of a win depending on
> unknown factors beyond just having a "big win". Would need Gian-Carlo to
> actually confirm or refute this guess though.
Leela allows passes with a very low probability, so your analysis is
> given that they're a significant weight in the evaluation
> alongside the value net, they're probably one of the major things
> holding Leela back at this point.
I assume that as well, which is why I've been doing some work on them,
but I'm also prepared to be disappointed. Note that I didn't put the
significant weighting arbitrarily: it's set to what gave the maximum
I suspect that when there are multiple options that seem objectively
equally good (from the value net), the playouts also help play towards
the option where it is harder to mess up. In this case, a larger amount
of stochasticity is not a bad thing.
More information about the Computer-go