[Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13
ds2 at physik.de
Sun Nov 20 02:16:25 PST 2016
-----BEGIN PGP SIGNED MESSAGE-----
> Now I'm making 13x13 selfplay games like AlphaGo paper. 1. make a
> position by Policy(SL) probability from initial position. 2. play a
> move uniformly at random from available moves. 3. play left moves
> by Policy(RL) to the end. (2) means it plays very bad move usually.
> Maybe it is because making completely different position? I don't
> understand why this (2) is
I did not read the alphago paper like this.
I read it uses the RL policy the "usual" way (I would say it means
something like randomizing with the net probabilities for the best 5
moves or so)
but randomize the opponent uniformaly, meaning the net values of the
opponent are taken from an earlier step in the reinforcement learning.
step 10000 playing against step 7645 in the reinforcement history?
Or did I understand you wrong?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
-----END PGP SIGNATURE-----
More information about the Computer-go