[Computer-go] Training the value network (a possibly more efficient approach)

Rémi Coulom remi.coulom at free.fr
Wed Jan 11 02:35:41 PST 2017


Hi,

Thanks for sharing your idea.

In my experience it is rarely efficient to train value functions from very short term data (ie, next move). TD(lambda), or training from the final outcome of the game is often better, because it uses a longer horizon. But of course, it is difficult to tell without experiments whether your idea would work or not. The advantage of your ideas is that you can collect a lot of training data more easily.

Rémi

----- Mail original -----
De: "Bo Peng" <bo at withablink.com>
À: computer-go at computer-go.org
Envoyé: Mardi 10 Janvier 2017 23:25:19
Objet: [Computer-go] Training the value network (a possibly more efficient approach)


Hi everyone. It occurs to me there might be a more efficient method to train the value network directly (without using the policy network). 


You are welcome to check my method: http://withablink.com/GoValueFunction.pdf 


Let me know if there is any silly mistakes :) 

_______________________________________________
Computer-go mailing list
Computer-go at computer-go.org
http://computer-go.org/mailman/listinfo/computer-go



More information about the Computer-go mailing list