[Computer-go] Training the value network (a possibly more efficient approach)

John Tromp john.tromp at gmail.com
Tue Jan 10 15:20:28 PST 2017

hi Bo,

> Let me know if there is any silly mistakes :)

You say "the perfect policy network can be
derived from the perfect value network (the best next move is the move
that maximises the value for the player, if the value function is
perfect), but not vice versa.", but a perfect policy for both players
can be used to generate a perfect playout which yields the perfect


