[Computer-go] Fw: Re: AlphaZero tensorflow implementation/tutorial

cody2007 cody2007 at protonmail.com
Sun Dec 9 18:19:41 PST 2018


(resending because I forgot to send this to the mailing list originally)
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Sunday, December 9, 2018 8:59 PM, cody2007 <cody2007 at protonmail.com> wrote:

> Hi Daniel,
> Thanks for your thoughts/questions.
>
>>a) Do you use Dirichlet noise during training, if so is it limited to first 30 or so plies ( which is the opening phase of chess) ?
>>The alphazero paper is not clear about it.
> I don't. I had implemented functionality for it at one point but wasn't entirely clear if I had implemented it correctly / was unclear from what they said in the paper so have disabled it. Have you noticed the noise to be useful?
>
>>b) Do you need to shuffle batches if you are doing one epoch?
> To clarify: I run out 128 games in parallel (my batch size) to 20 turns/player (with 1000 simulations). So I get 20 turns times 128 games. I train on *all* turns in random order, so 20 gradient descent steps. Otherwise, I'd be training on turn 1, 2, 3... in order. Then I repeat after the gradient steps with new simulations. Is that what you mean by epoch?
>
> I've been concerned I could be overfitting by doing all 20 turns so have been running a test where I randomly select 10 and discard the rest. So far, no difference in performance. Could be I haven't trained out enough to see a difference or that I'd have to reduce the turn sampling even further (or pool the training examples over larger numbers of games to randomize further, which I think the AlphaGo papers did, if I recall).
>
>>do you shuffle those postions? I found the latter to be very important to avoid overfitting.
> If you mean random rotations and reflections, yes, I do.
>
>>c) Do you think there is a problem with using Adam Optimizer instead of SGD with learning rate drops?
> I haven't tried-- have you? In other domains (some stuff I've done with A3C objectives), I felt like it was unstable in my hands--but maybe that's just me. (A3C in general (for other games I've tried it on) has been unstable for me which is partly why I've gone down the route of exploring the AlphaGo approach.
>
> Have you written up anything about your experiments?
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Sunday, December 9, 2018 8:34 PM, Dani <dshawul at gmail.com> wrote:
>
>> Thanks for the tutorial! I have some questions about training
>>
>> a) Do you use Dirichlet noise during training, if so is it limited to first 30 or so plies ( which is the opening phase of chess) ?
>> The alphazero paper is not clear about it.
>>
>> b) Do you need to shuffle batches if you are doing one epoch? Also after generating game positions from each game,
>> do you shuffle those postions? I found the latter to be very important to avoid overfitting.
>>
>> c) Do you think there is a problem with using Adam Optimizer instead of SGD with learning rate drops?
>>
>> Daniel
>>
>> On Sun, Dec 9, 2018 at 6:23 PM cody2007 via Computer-go <computer-go at computer-go.org> wrote:
>>
>>> Thanks for your comments.
>>>
>>>>looks you made it work on a 7x7 19x19 would probably give better result especially against yourself if you are a complete novice
>>> I'd expect that'd make me win even more against the algorithm since it would explore a far smaller amount of the search space, right?
>>> Certainly something I'd be interested in testing though--I just would expect it'd take many months more months of training however, but would be interesting to see how much performance falls apart, if at all.
>>>
>>>>for not cheating against gnugo, use --play-out-aftermath of gnugo parameter
>>> Yep, I evaluate with that parameter. The problem is more that I only play 20 turns per player per game. And the network seems to like placing stones in terrotories "owned" by the other player. My scoring system then no longer counts that area as owned by the player. Probably playing more turns out and/or using a more sophisticated scoring system would fix this.
>>>
>>>>If I don't mistake a competitive ai would need a lot more training such what does leela zero https://github.com/gcp/leela-zero
>>> Yeah, I agree more training is probably the key here. I'll take a look at leela-zero.
>>>
>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>> On Sunday, December 9, 2018 7:41 PM, Xavier Combelle <xavier.combelle at gmail.com> wrote:
>>>
>>>> looks you made it work on a 7x7 19x19 would probably give better result especially against yourself if you are a complete novice
>>>>
>>>> for not cheating against gnugo, use --play-out-aftermath of gnugo parameter
>>>>
>>>> If I don't mistake a competitive ai would need a lot more training such what does leela zero https://github.com/gcp/leela-zero
>>>>
>>>> Le 10/12/2018 à 01:25, cody2007 via Computer-go a écrit :
>>>>
>>>>> Hi all,
>>>>>
>>>>> I've posted an implementation of the AlphaZero algorithm and brief tutorial. The code runs on a single GPU. While performance is not that great, I suspect its mostly been limited by hardware limitations (my training and evaluation has been on a single Titan X). The network can beat GNU go about 50% of the time, although it "abuses" the scoring a little bit--which I talk a little more about in the article:
>>>>>
>>>>> https://medium.com/@cody2007.2/alphazero-implementation-and-tutorial-f4324d65fdfc
>>>>>
>>>>> -Cody
>>>>>
>>>>> _______________________________________________
>>>>> Computer-go mailing list
>>>>> Computer-go at computer-go.org
>>>>>
>>>>> http://computer-go.org/mailman/listinfo/computer-go
>>>
>>> _______________________________________________
>>> Computer-go mailing list
>>> Computer-go at computer-go.org
>>> http://computer-go.org/mailman/listinfo/computer-go
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20181210/526a37f0/attachment.html>


More information about the Computer-go mailing list