[Computer-go] dealing with multiple local optima

Minjae Kim xiver77 at gmail.com
Fri Feb 24 00:39:23 PST 2017


I've recently viewed the paper of AlphaGo, which has done gradient-based
reinforcement learning to get stronger. The learning was successful enough
to beat a human master, but in this case, supervised learning with a large
database of master level human games was preceded the reinforcement
learning. For a complex enough game as go, one can expect that the search
space for the policy function would not be smooth at all. So supposedly
supervised learning was necessary to guide the policy function to a good
starting point before reinforcement. Without such, applying reinforcement
learning directly to a random policy can easily make the policy stuck at a
bad local optimum. I could have a miunderstanding at this point; correct me
if so, but to continue on: if it is hard to have "the good starting point"
such as a trained policy from human expert game records, what is a way to
devise one. I've had a look on NEAT and HyperNEAT, which are evolutionary
methods. Do these evolutionary algorithms scale well on complex strategic
decision processes and not just on simple linear decisions such as food
gathering and danger avoidance? In case not, what alternatives are known?
Is there any success case of a chess, go, or any kind of complex strategic
game playing algorithm, where it gained expert strength without domain
knowledge such as expert game examples?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20170224/32e25363/attachment.html>


More information about the Computer-go mailing list