[Computer-go] dealing with multiple local optima
jim.oflaherty.jr at gmail.com
Fri Feb 24 13:12:46 PST 2017
NEAT and hyperNEAT are awesome when "evolving" fairly simple networks with
a very limited number of input and output dimensions. However, without
access to some serious computational power, scaling the NEAT method up to
the kind of level you would need for the current encoding methods for the
input layer used by AlphaGo into its ANNs and then the decoding methods for
the output layer, is likely not feasible for anything less than a Google
sized team and investment; i.e half a dozen people and millions of dollars
of computational access on their Google Cloud distributed computing
architecture. Amazon (with AWS) and Microsoft (with Azure) are the only two
other companies that would have the excess capacity in both supporting the
personnel and the distributed computation costs. A distant second would be
companies like IBM who could carry the personnel while leveraging AWS,
Azure, Google Cloud, etc.
Once there is a sufficient investment in this kind of evolutionary
meta-modeling, it will be a very useful starting point for others. However,
until someone is willing to play extremely long term and pony up the HUGE
up front costs of evolutionary bootstrapping out of the simple models NEAT
handles today, it is a short-term dead in.
On Fri, Feb 24, 2017 at 2:39 AM, Minjae Kim <xiver77 at gmail.com> wrote:
> I've recently viewed the paper of AlphaGo, which has done gradient-based
> reinforcement learning to get stronger. The learning was successful enough
> to beat a human master, but in this case, supervised learning with a large
> database of master level human games was preceded the reinforcement
> learning. For a complex enough game as go, one can expect that the search
> space for the policy function would not be smooth at all. So supposedly
> supervised learning was necessary to guide the policy function to a good
> starting point before reinforcement. Without such, applying reinforcement
> learning directly to a random policy can easily make the policy stuck at a
> bad local optimum. I could have a miunderstanding at this point; correct me
> if so, but to continue on: if it is hard to have "the good starting point"
> such as a trained policy from human expert game records, what is a way to
> devise one. I've had a look on NEAT and HyperNEAT, which are evolutionary
> methods. Do these evolutionary algorithms scale well on complex strategic
> decision processes and not just on simple linear decisions such as food
> gathering and danger avoidance? In case not, what alternatives are known?
> Is there any success case of a chess, go, or any kind of complex strategic
> game playing algorithm, where it gained expert strength without domain
> knowledge such as expert game examples?
> Computer-go mailing list
> Computer-go at computer-go.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Computer-go