[Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13
yss at bd.mbn.or.jp
Sat Nov 19 19:51:45 PST 2016
> You did not try reinforcement learning I think. Do you have any idea,
> why this would make the policy network 250ELO stronger, as mentioned
> in the alphago paper (80% winrate)?
I have not tried reinforcement learning, but I guess if threre are two moves,
SL probability are
taking 5 stones(35%), good shape(37%).
RL may change this
taking 5 stones(80%), good shape(10%).
For weaker player, taking 5 stones is maybe safe.
> Do you think playing strength would be better, if one only takes into
> account the moves of the winning player?
I think learning only from winning player moves will get better result.
Now I'm making 13x13 selfplay games like AlphaGo paper.
1. make a position by Policy(SL) probability from initial position.
2. play a move uniformly at random from available moves.
3. play left moves by Policy(RL) to the end.
(2) means it plays very bad move usually. Maybe it is because making
completely different position? I don't understand why this (2) is needed.
More information about the Computer-go