<div dir="ltr"><div>But those video games have a very simple optimal policy. Consider Super Mario: if you see an enemy, step on it; if you see a whole, jump over it; if you see a pipe sticking up, also jump over it; etc.<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Feb 25, 2017 at 12:36 AM, Darren Cook <span dir="ltr"><<a href="mailto:darren@dcook.org" target="_blank">darren@dcook.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">> ...if it is hard to have "the good starting point" such as a trained<br>
> policy from human expert game records, what is a way to devise one.<br>
<br>
My first thought was to look at the DeepMind research on learning to<br>
play video games (which I think either pre-dates the AlphaGo research,<br>
or was done in parallel with it): <a href="https://deepmind.com/research/dqn/" rel="noreferrer" target="_blank">https://deepmind.com/research/<wbr>dqn/</a><br>
<br>
It just learns from trial and error, no expert game records:<br>
<br>
<a href="http://www.theverge.com/2016/6/9/11893002/google-ai-deepmind-atari-montezumas-revenge" rel="noreferrer" target="_blank">http://www.theverge.com/2016/<wbr>6/9/11893002/google-ai-<wbr>deepmind-atari-montezumas-<wbr>revenge</a><br>
<span class="HOEnZb"><font color="#888888"><br>
Darren<br>
<br>
<br>
<br>
--<br>
Darren Cook, Software Researcher/Developer<br>
My New Book: Practical Machine Learning with H2O:<br>
  <a href="http://shop.oreilly.com/product/0636920053170.do" rel="noreferrer" target="_blank">http://shop.oreilly.com/<wbr>product/0636920053170.do</a><br>
______________________________<wbr>_________________<br>
Computer-go mailing list<br>
<a href="mailto:Computer-go@computer-go.org">Computer-go@computer-go.org</a><br>
<a href="http://computer-go.org/mailman/listinfo/computer-go" rel="noreferrer" target="_blank">http://computer-go.org/<wbr>mailman/listinfo/computer-go</a></font></span></blockquote></div><br></div>