[Computer-go] mini-max with Policy and Value network

Hiroshi Yamashita yss at bd.mbn.or.jp
Sat May 20 12:41:55 PDT 2017


Hi,

HiraBot author reported mini-max search with Policy and Value network. 
It does not use monte-carlo.
Only top 8 moves from Policy is searched in root node. In other depth,
 top 4 moves is searched.

Game result against Policy network best move (without search)

             Win Loss winrate 
MaxDepth=1, (558-442) 0.558   +40 Elo
MaxDepth=2, (351-150) 0.701  +148 Elo
MaxDepth=3, (406-116) 0.778  +218 Elo
MaxDepth=4, (670- 78) 0.896  +374 Elo
MaxDepth=5, (490- 57) 0.896  +374 Elo
MaxDepth=6, (520- 20) 0.963  +556 Elo

Search is simple alpha-beta.
There is a modification Policy network high probability moves tend to be selected.
MaxDepth=6 takes one second/move on i7-4790k + GTX1060.

His nega-max code
http://kiyoshifk.dip.jp/kiyoshifk/apk/negamax.zip
CGOS result, MaxDepth=6
http://www.yss-aya.com/cgos/19x19/cross/minimax-depth6.html
His Policy network(without search) is maybe
http://www.yss-aya.com/cgos/19x19/cross/DCNN-No336-tygem.html
His Policy and Value network(MCTS) is maybe
http://www.yss-aya.com/cgos/19x19/cross/Hiratuka10_38B100.html

Thanks,
Hiroshi Yamashita



More information about the Computer-go mailing list