[Computer-go] Training the value network (a possibly more efficient approach)
bo at withablink.com
Wed Jan 11 07:14:14 PST 2017
>How do you get the V(s) for those datasets? You play out the endgame
>with the Monte Carlo playouts?
>I think one problem with this approach is that errors in the data for
>V(s) directly correlate to errors in MC playouts. So a large benefit of
>"mixing" the two (otherwise independent) evaluations is lost.
Yes, that is a problem for Human games dataset.
On the other hand, currently the SL part is relatively easier (it seems
everyone arrives at a 50-60% accuracy), and the main challenges of the RL
part is generating the huge number of self-play games.
In self-play games we have an accurate end-game v(s) / V(s). And v(s) /
V(s) is able to use the information in self-play games more efficiently. I
think this can be helpful.
More information about the Computer-go