[Computer-go] Significance of resignation in AGZ

Chaz G. chaz.gwenapp at gmail.com
Sun Dec 3 01:04:59 PST 2017


Hi Brian,

Thanks for sharing your genuinely interesting result. One question though:
why would you train on a non-"zero" program? Do you think your program as a
result of your rules would perform better than zero, or is it imitating the
best known algorithm inconvenient for your purposes?

Best,
-Chaz

On Sat, Dec 2, 2017 at 7:31 PM, Brian Sheppard via Computer-go <
computer-go at computer-go.org> wrote:

> I implemented the ad hoc rule of not training on positions after the first
> pass, and my program is basically playing moves until the first pass is
> forced. (It is not a “zero” program, so I don’t mind ad hoc rules like
> this.)
>
>
>
> *From:* Computer-go [mailto:computer-go-bounces at computer-go.org] *On
> Behalf Of *Xavier Combelle
> *Sent:* Saturday, December 2, 2017 12:36 PM
> *To:* computer-go at computer-go.org
>
> *Subject:* Re: [Computer-go] Significance of resignation in AGZ
>
>
>
> It might make sense to enable resignation threshold even on stupid level.
> As such the first thing the network should learn would be not to resign to
> early (even before not passing)
>
>
>
> Le 02/12/2017 à 18:17, Brian Sheppard via Computer-go a écrit :
>
> I have some hard data now. My network’s initial training reached the same
> performance in half the iterations. That is, the steepness of skill gain in
> the first day of training was twice as great when I avoided training on
> fill-ins.
>
>
>
> The has all the usual caveats: only one run before/after, YMMV, etc.
>
>
>
> *From:* Brian Sheppard [mailto:sheppardco at aol.com <sheppardco at aol.com>]
> *Sent:* Friday, December 1, 2017 5:39 PM
> *To:* 'computer-go' <computer-go at computer-go.org>
> <computer-go at computer-go.org>
> *Subject:* RE: [Computer-go] Significance of resignation in AGZ
>
>
>
> I didn’t measure precisely because as soon as I saw the training artifacts
> I changed the code. And I am not doing an AGZ-style experiment, so there
> are differences for sure. So I will give you a swag…
>
>
>
> Speed difference is maybe 20%-ish for 9x9 games.
>
>
>
> A frequentist approach will overstate the frequency of fill-in plays by a
> pretty large factor, because fill-in plays are guaranteed to occur in every
> game but are not best in the competitive part of the game. This will affect
> the speed of learning in the early going.
>
>
>
> The network will use some fraction (almost certainly <= 20%) of its
> capacity to improve accuracy on positions that will not contribute to its
> ultimate strength. This applies to both ordering and evaluation aspects.
>
>
>
>
>
>
>
>
>
> *From:* Andy [mailto:andy.olsen.tx at gmail.com <andy.olsen.tx at gmail.com>]
> *Sent:* Friday, December 1, 2017 4:55 PM
> *To:* Brian Sheppard <sheppardco at aol.com> <sheppardco at aol.com>;
> computer-go <computer-go at computer-go.org> <computer-go at computer-go.org>
> *Subject:* Re: [Computer-go] Significance of resignation in AGZ
>
>
>
> Brian, do you have any experiments showing what kind of impact it has? It
> sounds like you have tried both with and without your ad hoc first pass
> approach?
>
>
>
>
>
>
>
>
>
> 2017-12-01 15:29 GMT-06:00 Brian Sheppard via Computer-go <
> computer-go at computer-go.org>:
>
> I have concluded that AGZ's policy of resigning "lost" games early is
> somewhat significant. Not as significant as using residual networks, for
> sure, but you wouldn't want to go without these advantages.
>
> The benefit cited in the paper is speed. Certainly a factor. I see two
> other advantages.
>
> First is that training does not include the "fill in" portion of the game,
> where every move is low value. I see a specific effect on the move ordering
> system, since it is based on frequency. By eliminating training on
> fill-ins, the prioritization function will not be biased toward moves that
> are not relevant to strong play. (That is, there are a lot of fill-in
> moves, which are usually not best in the interesting portion of the game,
> but occur a lot if the game is played out to the end, and therefore the
> move prioritization system would predict them more often.) My ad hoc
> alternative is to not train on positions after the first pass in a game.
> (Note that this does not qualify as "zero knowledge", but that is OK with
> me since I am not trying to reproduce AGZ.)
>
> Second is the positional evaluation is not training on situations where
> everything is decided, so less of the NN capacity is devoted to situations
> in which nothing can be gained.
>
> As always, YMMV.
>
> Best,
> Brian
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
>
>
>
>
>
> _______________________________________________
>
> Computer-go mailing list
>
> Computer-go at computer-go.org
>
> http://computer-go.org/mailman/listinfo/computer-go
>
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20171203/d610e6f8/attachment-0001.html>


More information about the Computer-go mailing list