[Computer-go] mini-max with Policy and Value network

Hiroshi Yamashita yss at bd.mbn.or.jp
Sat May 20 12:41:55 PDT 2017


HiraBot author reported mini-max search with Policy and Value network. 
It does not use monte-carlo.
Only top 8 moves from Policy is searched in root node. In other depth,
 top 4 moves is searched.

Game result against Policy network best move (without search)

             Win Loss winrate 
MaxDepth=1, (558-442) 0.558   +40 Elo
MaxDepth=2, (351-150) 0.701  +148 Elo
MaxDepth=3, (406-116) 0.778  +218 Elo
MaxDepth=4, (670- 78) 0.896  +374 Elo
MaxDepth=5, (490- 57) 0.896  +374 Elo
MaxDepth=6, (520- 20) 0.963  +556 Elo

Search is simple alpha-beta.
There is a modification Policy network high probability moves tend to be selected.
MaxDepth=6 takes one second/move on i7-4790k + GTX1060.

His nega-max code
CGOS result, MaxDepth=6
His Policy network(without search) is maybe
His Policy and Value network(MCTS) is maybe

Hiroshi Yamashita

More information about the Computer-go mailing list