[Computer-go] Training the value network (a possibly more efficient approach)

Xavier Combelle xavier.combelle at gmail.com
Wed Jan 11 09:09:19 PST 2017



Le 11/01/2017 à 16:14, Bo Peng a écrit :
> Hi,
>
>> How do you get the V(s) for those datasets? You play out the endgame
>> with the Monte Carlo playouts?
>>
>> I think one problem with this approach is that errors in the data for
>> V(s) directly correlate to errors in MC playouts. So a large benefit of
>> "mixing" the two (otherwise independent) evaluations is lost.
> Yes, that is a problem for Human games dataset.
>
> On the other hand, currently the SL part is relatively easier (it seems
> everyone arrives at a 50-60% accuracy), and the main challenges of the RL
> part is generating the huge number of self-play games.
>
> In self-play games we have an accurate end-game v(s) / V(s). And v(s) /
> V(s) is able to use the information in self-play games more efficiently. I
> think this can be helpful.
>
Could, some distributed workload such as fishtest for stockfish help to
generate
huge number of self-play game

If it is the case I could create the framework to use it. It is
classical programming and as such
I should be able to do it (at the opposite of Computer go software which
is hard for me by lack of practice).
Of course it means distribute at least the binary so, or the source, so
proprietary software could be reluctant to share it.
But for free software there should not any problem.

If someone is interested by my proposition, I would be pleased to
realize it.

Xavier




More information about the Computer-go mailing list