[Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13

Hiroshi Yamashita yss at bd.mbn.or.jp
Sat Nov 19 19:51:45 PST 2016


Hi Detlef,

> You did not try reinforcement learning I think. Do you have any idea,
> why this would make the policy network 250ELO stronger, as mentioned
> in the alphago paper (80% winrate)?

I have not tried reinforcement learning, but I guess if threre are two moves,
SL probability are
 taking 5 stones(35%), good shape(37%).
RL may change this 
 taking 5 stones(80%), good shape(10%). 
For weaker player, taking 5 stones is maybe safe.

> Do you think playing strength would be better, if one only takes into
> account the moves of the winning player?

I think learning only from winning player moves will get better result.


Now I'm making 13x13 selfplay games like AlphaGo paper.
 1. make a position by Policy(SL) probability from initial position.
 2. play a move uniformly at random from available moves.
 3. play left moves by Policy(RL) to the end.
(2) means it plays very bad move usually. Maybe it is because making
 completely different position? I don't understand why this (2) is needed.

Thanks,
Hiroshi Yamashita




More information about the Computer-go mailing list