# [Computer-go] mini-max with Policy and Value network

Hideki Kato hideki_katoh at ybb.ne.jp
Tue May 23 01:51:57 PDT 2017

```Agree.

(1) To solve L&D, some search is necessary in practice.  So, the
value net cannot solve some of them.
(2) The number of possible positions (input of the value net) in
real games is at least 10^30 (10^170 in theory).  If the value
net can recognize all?  L&Ds depend on very small difference of
the placement of stones or liberties.  Can we provide necessary
amount of training data?  Have the network enough capacity?
The answer is almost obvious by the theory of function
approximation.  (ANN is just a non-linear function
approximator.)
(3) CNN cannot learn exclusive-or function due to the ReLU
hyperbolic).  CNN is good at approximating continuous (analog)
functions but Boolean (digital) ones.

Hideki

David Wu: <CAGEydYud7otGtD6u-VPv3O1u7n9d44q30fYQkgMGKGKpB-F1Dw at mail.gmail.com>:
>
>Some additional playing around with the same position can flip the roles of
>the playouts and value net - so now the value net is very wrong and the
>playouts are mostly right. I think this gives good insight into what the
>value net is doing and why as a general matter playouts are still useful.
>
>Here's how:
>Play black moves as detailed in the previous email in the "leela10.sgf"
>game that Marc posted to resolve all the misunderstandings of the playouts
>and get it into the "3-10% white win" phase, but otherwise leave white's
>dead group on-board with tons of liberties. Let white have an absolute
>feast on the rest of the board while black simply connects his stones
>solidly. White gets to cut through Q6 and get pretty much every point
>available.
>
>Black is still winning even though he loses almost the entire rest of the
>board, as long as the middle white group dies. But with some fiddling
>around, you can arrive at a position where the value net is reporting 90%
>white win (wrong), while the playouts are rightly reporting only 3-10%
>white win.
>
>Intuitively, the value net only fuzzily evaluates white's group as probably
>dead, but isn't sure that it's dead, so intuitively it counts some value
>for white's group "in expectation" for the small chance it lives. And the
>score is otherwise not too far off on the rest of the board - the way I
>played it out, black wins by only ~5 points if white dies. So the small
>uncertainty that the huge white group is might actually be alive produces
>enough "expected value" for white to overwhelm the 5 point loss margin,
>such that the value net is 90% sure that white wins.
>
>What the value net has failed to "understand" here is that white's group
>surviving is a binary event. I.e. a 20% chance of the group being alive and
>white winning by 80 points along with a 80% chance that it's dead and white
>losing by 5 points does not average out to white being (0.2 * 80) - (0.8 *
>5) = 16 points ahead overall (although probably the value net doesn't
>exactly "think" in terms of points but rather something fuzzier). The
>playouts provide the much-needed "understanding" that win/loss is binary
>and that the expectation operator should be applied after mapping to
>win/loss outcomes, rather than before.
>
>It seems intuitive to me that a neural net would compute things in too much
>of a fuzzy and averaged way and thereby be vulnerable to this mistake. I
>wonder if it's possible to train a value net to get these things more
>correct without weakening it otherwise, with the right training. As it is,
>I suspect this is a systematic flaw in the value net's ability to produce
>good probabilities of winning in games where the game hinges on the life
>and death chances of a single large dragon, and where the expected score
>could be wildly uncorrelated with the probability of winning.
>
>
>On Mon, May 22, 2017 at 9:39 PM, David Wu <lightvector at gmail.com> wrote:
>
>> Leela playouts are definitely extremely bad compared to competitors like
>> Crazystone. The deep-learning version of Crazystone has no value net as
>far
>> as I know, only a policy net, which means it's going on MC playouts alone
>> to produce its evaluations. Nonetheless, its playouts often have
>noticeable
>> and usually correct opinions about early midgame game positions (as
>> confirmed by the combination of own judgment as a dan player and Leela's
>> value net). Which I find amazing - that it can even approximately get
>these
>> right.
>>
>> On to the game:
>>
>> Analyzing with Leela 0.10.0 in that second game, I think I can infer
>> pretty exactly what the playouts are getting wrong. Indeed the upper left
>> is being disastrously misplayed by them, but that's not all. -  I'm
>finding
>> that multiple different things are all being played out wrong. All of the
>> following numbers are on white's turn, to give white the maximal chance to
>> distract black from resolving the tactics in the tree search and forcing
>> the playouts to do the work - the numbers are sometimes better if it's
>> black's turn.
>>
>> * At move 186, playouts as it stands show about on the order of 60% for
>> white, despite black absolutely having won the game. I partly played out
>> the rest of the board in a very slow and solid way just in case it was
>> confusing things, but not entirely, so that the tree search would still
>> have plenty of endgame moves to be distracted by. Playouts stayed at 60%
>> for white.
>>
>> * Add bA15, putting white down to 2 liberties: the PV shows an exchange of
>> wA17 and bA14, keeping white at 2 liberties, and it drops to 50%.
>> * Add bA16, putting white in atari: it drops to 40%.
>>
>> So clearly there's some funny business with black in the playouts
>> self-atariing or something, and the chance that black does this lessens as
>> white has fewer liberties and therefore is more likely to die first. Going
>> further:
>>
>> * Add bE14, it drops to 30%.
>>
>> So the potential damezumari is another problem - I guess like 10% of the
>> playouts are letting white kill the big black lump at E13. Now, 30% is
>> still way too high. Where does the rest come from?
>>
>> * Add bA17, having black actually capture. it drops to 20%. So looks like
>> even having white in atari doesn't stop the playouts from misplaying.
>Black
>> in the misplays that corner despite being able to capture white any time!
>>
>> Now where are the remaining 20% losses coming from?
>>
>> * Have black physically capture the 5 white stones at K18: still 20%
>> * Solidly connect everything for black around the board: still 20%.
>> * Add bK18: drops to 10%.
>>
>> Oh! So 10% more more came from black dying at the top. Even after black
>> has physically captured the 5 white stones there and having a living
>> 5-space eye, the playouts have black die there maybe 10% of the time.
>>
>> Okay, now how is black still losing 10% of the time? White literally
>> cannot make 2 eyes even if he gets infinitely many moves in a row. I've
>> already solidly connected everything so that even if black passes forever,
>> white can kill literally nothing of black's and black will have enough
>> points.
>>
>> * Add wK14 or wM14: 3%
>> * Instead of that, add bK14: increases to 20% (!!!). Then have white
>> capture those 2 stones with wM14: 3%.
>> * Instead of all that, add both K16 and M16: 3%.
>>
>> So clearly what's going on is that the playouts allow suicide, despite the
>> Leela gui not allowing it. That's the only way that white could possibly
>> live - black suicides 3 stones then white makes 2 eyes. And this neatly
>> explains these observations - adding bK14 puts black one step closer to
>> suicide 3 stones, so the MC winrate for white rises to 20%. Having white
>> play removes the possibility of black handing white 2 eyes via suicide.
>And
>> similarly filling K16 and M16 makes it so that when black plays 3 stones,
>> it's capture and not a suicide. So the playouts allowing suicide here is
>>
>> Okay, so now that we removed that possibility, how is black still losing
>> 3% of the time?
>>
>> * Remove liberties from white: once most of them are gone, drops to 2%.
>> Then drops to 1%, then once white is in atari, drops to -1%.
>> (the -1% presumably comes from the expected thing documented in the Leela
>> FAQ where Leela counts big wins as a bit more than 100%).
>>
>> Now I'm just speculating. My guess is that somehow 3% of the time, the
>> game is scored without black having captured white's group. As in - black
>> passes, white passes, white's dead group is still on the board, so white
>> wins. The guess would be that liberties and putting it in atari increases
>> the likelihood that the playouts kill the group before having both players
>> pass and score. But that's just a guess, maybe there's also more black
>> magic involving adjusting the "value" of a win depending on unknown
>factors
>> beyond just having a "big win". Would need Gian-Carlo to actually confirm
>> or refute this guess though.
>>
>> ----
>>
>> Phew! That was some long and fun exploration of how light Leela's playouts
>> are.
>> Hopefully some of this helps improve the playouts for future versions of
>> Leela - given that they're a significant weight in the evaluation
>alongside
>> the value net, they're probably one of the major things holding Leela back
>> at this point.
>>
>>
>> On Mon, May 22, 2017 at 3:01 PM, Marc Landgraf <mahrgell87 at gmail.com>
>> wrote:
>>
>>> And talkig about tactical mistakes:
>>> Another game, where a trick joseki early in the game (top right)
>>> completely fools Leela. Leela here play this like it would be done in
>>> similar shapes, but then gets completely blindsided. But to make things
>>> worse, it finds the one way to make the loss the biggest. (note: this is
>>> not reliable when trying this trick joseki, Leela will often lose the 4/5
>>> stones on the left, but will at least take the one stone on top in sente
>>> instead of screwing up like it did here) Generally this "trick" is not
>that
>>> deep reading wise, but given its similarity to more common shapes I can
>>> understand how the bot falls for it.
>>> Anyway, Leela manages to fully stabilize the game (given our general
>>> difference in strength, this should come as no surprise), just to throw
>>> away the center group.
>>>
>>> But what you should really look at here is Leelas evaluation of the game.
>>>
>>> Even very late in the game, the MC part of Leela considers Leela well
>>> ahead, completely misreading the L+D here. Usually in most games Leela
>>> loses to me, the issue comes the other way around. Leela NN strongly
>>> believes in the game to be won, while the MC-part notices the real
>trouble.
>>> But not here. Now of course this kind of misjudgement also could serve as
>>> explaination how this group could die in first place.
>>>
>>> But having had my own MC-Bot I really wonder how it could misevaluate so
>>> badly here. To really lose this game as Black it either requires
>>> substantial self ataris by Black, or large unanswered self atari by
>White.
>>> Does Leela have such light playouts that those groups can really flip
>>> status in 60%+ of the MC-Evaluations?
>>>
>>> 2017-05-22 20:46 GMT+02:00 Marc Landgraf <mahrgell87 at gmail.com>:
>>>
>>>> Leela has surprisingly large tactical holes. Right now it is throwing a
>>>> good number of games against me in completely won endgames by fumbling
>away
>>>> entirely alive groups.
>>>>
>>>> As an example I attached one game of myself (3d), even vs Leela10 @7d.
>>>> But this really isn't a onetime occurence.
>>>>
>>>> If you look around move 150, the game is completely over by human
>>>> standards as well as Leelas evaluation (Leela will give itself >80%
>here)
>>>> But then Leela starts doing weird things.
>>>> 186 is a minor mistake, but itself does not yet throw the game. But it
>>>> is the start of series of bad turns.
>>>> 236 then is a non-threat in a Ko fight, and checking Leelas evaluation,
>>>> Leela doesn't even consider the possibility of it being ignored. This is
>>>> btw a common topic with Leela in ko fights - it does not look at all at
>>>> what happens if the Ko threat is ignored.
>>>> 238 follows up the "ko threat", but this move isn't doing anything
>>>> either! So Leela passed twice now.
>>>> Suddenly there is some Ko appearing at the top right.
>>>> Leela plays this Ko fight in some suboptimal way, not fully utilizing
>>>> local ko threats, but this is a concept rather difficult to grasp for
>AIs
>>>> afaik.
>>>> I can not 100% judge whether ignoring the black threat of 253 is correct
>>>> for Leela, I have some doubts on this one too.
>>>> With 253 ignored, the game is now heavily swinging, but to my judgement,
>>>> playing the hane instead of 256 would still keep it rather close and I'm
>>>> not 100% sure who would win it now. But Leela decides to completely bury
>>>> itself here with 256, while giving itself still 70% to win.
>>>> As slowly realization of the real game state kicks in, the rest of the
>>>> game is then the usual MC-throw away style we have known for years.
>>>>
>>>> Still... in this game you can see how a series of massive tactical
>>>> blunders leads to throwing a completely won game. And this is just one
>of
>>>> many examples. And it can not be all pinned on Ko's. I have seen a fair
>>>> number of games where Leela does similar mistakes without Ko involved,
>even
>>>> though Ko's drastically increase Leelas fumble chance.
>>>> At the same time, Leela is completely and utterly outplaying me on a
>>>> strategical level and whenever it manages to not make screwups like the
>>>> ones shown I stand no chance at all. Even 3 stones is a serious
>challenge
>>>> for me then. But those mistakes are common enough to keep me around
>even.
>>>>
>>>> 2017-05-22 17:47 GMT+02:00 Erik van der Werf <erikvanderwerf at gmail.com>:
>>>>
>>>>> On Mon, May 22, 2017 at 3:56 PM, Gian-Carlo Pascutto <gcp at sjeng.org>
>>>>> wrote:
>>>>>
>>>>>> On 22-05-17 11:27, Erik van der Werf wrote:
>>>>>> > On Mon, May 22, 2017 at 10:08 AM, Gian-Carlo Pascutto <gcp at sjeng.org
>>>>>> > <mailto:gcp at sjeng.org>> wrote:
>>>>>> >
>>>>>> >     ... This heavy pruning
>>>>>> >     by the policy network OTOH seems to be an issue for me. My
>>>>>> program has
>>>>>> >     big tactical holes.
>>>>>> >
>>>>>> >
>>>>>> > Do you do any hard pruning? My engines (Steenvreter,Magog) always
>>>>>> > move predictor (a.k.a. policy net), but I never saw the need to do
>>>>>> hard
>>>>>> > pruning. Steenvreter uses the predictions to set priors, and it is
>>>>>> very
>>>>>> > selective, but with infinite simulations eventually all potentially
>>>>>> > relevant moves will get sampled.
>>>>>>
>>>>>> With infinite simulations everything is easy :-)
>>>>>>
>>>>>> In practice moves with, say, a prior below 0.1% aren't going to get
>>>>>> searched, and I still regularly see positions where they're the
>winning
>>>>>> move, especially with tactics on the board.
>>>>>>
>>>>>> Enforcing the search to be wider without losing playing strength
>>>>>> appears
>>>>>> to be hard.
>>>>>>
>>>>>>
>>>>> Well, I think that's fundamental; you can't be wide and deep at the
>>>>> same time, but at least you can chose an algorithm that (eventually)
>>>>> explores all directions.
>>>>>
>>>>> BTW I'm a bit surprised that you are still able to find 'big tactical
>>>>> holes' with Leela now playing as 8d KGS
>>>>>
>>>>> Best,
>>>>> Erik
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Computer-go mailing list
>>>>> Computer-go at computer-go.org
>>>>> http://computer-go.org/mailman/listinfo/computer-go
>>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Computer-go mailing list
>>> Computer-go at computer-go.org
>>> http://computer-go.org/mailman/listinfo/computer-go
>>>
>>
>>
>---- inline file
>_______________________________________________
>Computer-go mailing list
>Computer-go at computer-go.org
>http://computer-go.org/mailman/listinfo/computer-go
--
Hideki Kato <mailto:hideki_katoh at ybb.ne.jp>
```