[Computer-go] mini-max with Policy and Value network

Hideki Kato hideki_katoh at ybb.ne.jp
Tue May 23 08:19:02 PDT 2017

Gian-Carlo Pascutto: <0357614a-98b8-6949-723e-e1a849c7589a at sjeng.org>:

>Now, even the original AlphaGo played moves that surprised human pros
>and were contrary to established sequences. So where did those come
>from? Enough computation power to overcome the low probability?
>Synthesized by inference from the (much larger than mine) policy network?

Demis Hassabis said in a talk:
After the game with Sedol, the team used "adversarial learning" in 
order to fill the holes in policy net (such as the Sedol's winning 
move in the game 4).


Hideki Kato <mailto:hideki_katoh at ybb.ne.jp>

More information about the Computer-go mailing list