[Computer-go] mini-max with Policy and Value network

Hideki Kato hideki_katoh at ybb.ne.jp
Tue May 23 08:19:02 PDT 2017


Gian-Carlo Pascutto: <0357614a-98b8-6949-723e-e1a849c7589a at sjeng.org>:

>Now, even the original AlphaGo played moves that surprised human pros
>and were contrary to established sequences. So where did those come
>from? Enough computation power to overcome the low probability?
>Synthesized by inference from the (much larger than mine) policy network?

Demis Hassabis said in a talk:
After the game with Sedol, the team used "adversarial learning" in 
order to fill the holes in policy net (such as the Sedol's winning 
move in the game 4).

Hideki

-- 
Hideki Kato <mailto:hideki_katoh at ybb.ne.jp>


More information about the Computer-go mailing list