[Computer-go] Training the value network (a possibly more efficient approach)
remi.coulom at free.fr
Wed Jan 11 02:35:41 PST 2017
Thanks for sharing your idea.
In my experience it is rarely efficient to train value functions from very short term data (ie, next move). TD(lambda), or training from the final outcome of the game is often better, because it uses a longer horizon. But of course, it is difficult to tell without experiments whether your idea would work or not. The advantage of your ideas is that you can collect a lot of training data more easily.
----- Mail original -----
De: "Bo Peng" <bo at withablink.com>
À: computer-go at computer-go.org
Envoyé: Mardi 10 Janvier 2017 23:25:19
Objet: [Computer-go] Training the value network (a possibly more efficient approach)
Hi everyone. It occurs to me there might be a more efficient method to train the value network directly (without using the policy network).
You are welcome to check my method: http://withablink.com/GoValueFunction.pdf
Let me know if there is any silly mistakes :)
Computer-go mailing list
Computer-go at computer-go.org
More information about the Computer-go