[Computer-go] dealing with multiple local optima

Minjae Kim xiver77 at gmail.com
Fri Feb 24 08:59:47 PST 2017


But those video games have a very simple optimal policy. Consider Super
Mario: if you see an enemy, step on it; if you see a whole, jump over it;
if you see a pipe sticking up, also jump over it; etc.

On Sat, Feb 25, 2017 at 12:36 AM, Darren Cook <darren at dcook.org> wrote:

> > ...if it is hard to have "the good starting point" such as a trained
> > policy from human expert game records, what is a way to devise one.
>
> My first thought was to look at the DeepMind research on learning to
> play video games (which I think either pre-dates the AlphaGo research,
> or was done in parallel with it): https://deepmind.com/research/dqn/
>
> It just learns from trial and error, no expert game records:
>
> http://www.theverge.com/2016/6/9/11893002/google-ai-
> deepmind-atari-montezumas-revenge
>
> Darren
>
>
>
> --
> Darren Cook, Software Researcher/Developer
> My New Book: Practical Machine Learning with H2O:
>   http://shop.oreilly.com/product/0636920053170.do
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20170225/3aecb916/attachment.html>


More information about the Computer-go mailing list