[Computer-go] Training the value network (a possibly more efficient approach)

Bo Peng bo at withablink.com
Wed Jan 11 02:48:59 PST 2017


Hi Remi,

Thanks for sharing your experience.

As I am writing this, it seems there could be a third method: the perfect
value function shall have the minimax property in the obvious way. So we
can train our value function to satisfy the minimax property as well. In
fact, we can train it such that a shallow-level MCTS gives as close a
result as a deeper-level MCTS. This can be regarded as some kind of
bootstrapping.
 
Wonder if you have tried this. Seems might be a natural idea...

Bo

On 1/11/17, 18:35, "Computer-go on behalf of Rémi Coulom"
<computer-go-bounces at computer-go.org on behalf of remi.coulom at free.fr>
wrote:

>Hi,
>
>Thanks for sharing your idea.
>
>In my experience it is rarely efficient to train value functions from
>very short term data (ie, next move). TD(lambda), or training from the
>final outcome of the game is often better, because it uses a longer
>horizon. But of course, it is difficult to tell without experiments
>whether your idea would work or not. The advantage of your ideas is that
>you can collect a lot of training data more easily.
>
>Rémi
>
>----- Mail original -----
>De: "Bo Peng" <bo at withablink.com>
>À: computer-go at computer-go.org
>Envoyé: Mardi 10 Janvier 2017 23:25:19
>Objet: [Computer-go] Training the value network (a possibly more
>efficient approach)
>
>
>Hi everyone. It occurs to me there might be a more efficient method to
>train the value network directly (without using the policy network).
>
>
>You are welcome to check my method:
>http://withablink.com/GoValueFunction.pdf
>
>
>Let me know if there is any silly mistakes :)
>





More information about the Computer-go mailing list