Yes, I think the important thing of the value function is to detect 
moves that are very bad so that MC-eval does not have to sample more 
than once for many variations.

If the evaluation function was trained on pro moves only, it would not 
know what a bad move looks like. At least the evaluation function would 
not be able to see thee difference between "very bad", "never good" and 
"sometimes possible".


On 2016-11-21 15:22, Gian-Carlo Pascutto wrote:
> For the Value Network indeed the procedure is as described, with one
> move at time U being uniformly sampled from {1,361} until it is legal. 
> I
> think it's because we're not interested (only) in playing good moves,
> but also analyzing as diverse as possible positions to learn whether
> they're won or lost. Throwing in one totally random move vastly
> increases the diversity and the number of odd positions the network
> sees, while still not leading to totally nonsensical positions.

