[Computer-go] AlphaGo Zero

Petr Baudis pasky at ucw.cz
Fri Oct 20 11:13:20 PDT 2017


  Few open questions I currently have, comments welcome:

  - there is no input representing the number of captures; is this
    information somehow implicit or can the learned winrate predictor
    never truly approximate the true values because of this?

  - what ballpark values for c_{puct} are reasonable?

  - why is the dirichlet noise applied only at the root node, if it's
    useful?

  - the training process is quite lazy - it's not like the network sees
    each game immediately and adjusts, it looks at last 500k games and
    samples 1000*2048 positions, meaning about 4 positions per game (if
    I understood this right) - I wonder what would happen if we trained
    it more aggressively, and what AlphaGo does during the initial 500k
    games; currently, I'm training on all positions immediately, I guess
    I should at least shuffle them ;)

On Fri, Oct 20, 2017 at 03:23:49PM +0200, Petr Baudis wrote:
>   I tried to reimplement the system - in a simplified way, trying to
> find the minimum that learns to play 5x5 in a few thousands of
> self-plays.  Turns out there are several components which are important
> to avoid some obvious attractors (like the network predicting black
> loses on every move from its second game on):
> 
>   - disabling resignation in a portion of games is essential not just
>     for tuning resignation threshold (if you want to even do that), but
>     just to correct prediction signal by actual scoring rather than
>     starting to always resign early in the game
> 
>   - dirichlet (or other) noise is essential for the network getting
>     looped into the same game - which is also self-reinforcing
> 
>   - i have my doubts about the idea of high temperature move choices
>     at the beginning, especially with T=1 ... maybe that's just bad
>     very early in the training
> 
> On Thu, Oct 19, 2017 at 02:23:41PM +0200, Petr Baudis wrote:
> >   The order of magnitude matches my parameter numbers.  (My attempt to
> > reproduce a simplified version of this is currently evolving at
> > https://github.com/pasky/michi/tree/nnet but the code is a mess right
> > now.)
> 
> -- 
> 					Petr Baudis, Rossum
> 	Run before you walk! Fly before you crawl! Keep moving forward!
> 	If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go

-- 
					Petr Baudis, Rossum
	Run before you walk! Fly before you crawl! Keep moving forward!
	If we fail, I'd rather fail really hugely.  -- Moist von Lipwig


More information about the Computer-go mailing list