[computer-go] producing a good probability distribution over
legal moves
Daniel Burgos
dburgos at gmail.com
Thu May 17 10:05:27 PDT 2007
But it is very difficult that a board position is repeated between games. I
don't see how you will use the training pairs in the new games.
2007/5/17, George Dahl <george.dahl at gmail.com>:
>
> What I am actually proposing is collapsing the results of the playouts
> offline and then having a function that maps board positions to
> playout values without actually doing playouts. So I would use an MC
> player to generate a lot of training pairs of the form (position,
> score) where position would be some sort of vector representation of
> the board position and score would be a single scalar value that
> corresponded to the value the Monte Carlo program decided after many
> simulations that the position had. So I would create something that
> learned a value function for positions. Then, once training was
> complete, this function could be evaluated very quickly and hopefully
> give the same results that, say, 100k playouts would give. But since
> it was so fast, the program could use the extra time to actually do
> more playouts. The difference would be that the new playouts would be
> biased by the value function. So the probability of picking a move
> would be proportional to the value function evaluated on the resulting
> position. This way I would be bootstrapping a general global position
> evaluator and using it to improve monte carlo simulations.
>
> Imagine if you had a monte carlo program that took almost no time to
> run. You would use it to do "heavy" playouts for another monte carlo
> program to make it even stronger.
>
> The reason this might be easier than learning from a database of
> professional games is that it is easy to generate scalar scores of
> positions with a monte carlo program. Presumably it is also easier to
> learn how to play like a mediocre monte carlo program than like a
> professional player.
>
> - George
>
>
>
> On 5/17/07, Zach Keatts <shjanzey at gmail.com> wrote:
> > What you would have after your training/evaluator phase is a hueristic
> > knowlege of possibly better montecarlo trees to consider. This will
> > definitely cut down on the search space, but could also alienate a
> strong
> > search path. I have been thinking along these same line for some
> time. The
> > problem then lies in where you can decide what trees would be worth
> looking
> > at initially. What about a database of professional games? Take the
> > "winning" games as examples of strong searches that ended in a win. The
> > problem is even more complex because where in the "winning tree" do you
> tell
> > montecarlo to start searching? Will you assign a higher probability to
> each
> > move in those games (defining a known probabilistically stronger
> predicted
> > result)?
> >
> > That is one approach. The other approach is the purely simulated
> approach
> > where you run simulations and gradually allow your probability function
> to
> > evolve based on the results. Although this is a purer approach I think
> the
> > aforementioned strategy may yield some useful experimentation. The
> strong
> > point is that it will take advantage of the human brain's innate pattern
> > recognition and calculation skills. Since we have recorded games we
> have
> > plenty of examples of this "thought process". For a 9Dan winning game,
> > those trees surely are worth investigating...
> >
> >
> > On 5/16/07, George Dahl <george.dahl at gmail.com> wrote:
> > >
> > > I find Monte-Carlo Go a fascinating avenue of research, but what pains
> > > me is that a huge number of simulations are performed each game and at
> > > the end of the game the results are thrown out. So what I was
> > > thinking is that perhaps the knowledge generated by the simulations
> > > could be collapsed in some way.
> > >
> > > Suppose that epsilon greedy versions of a reasonably strong MC Go
> > > program were to play a very large number of games against themselves.
> > > By epsilon greedy versions I mean that with probability epsilon a
> > > random move is played and with probability 1- epsilon the move the MC
> > > Player would normally play is played. Each position in these games
> > > would be stored along with the Monte Calro/UCT evaluation for that
> > > position's desirability. This would produce an arbitrarily large
> > > database of position/score pairs. At this point a general function
> > > approximator / learning algorithm (such as a neural network) could be
> > > trained to map positions to scores. If this was successful, it would
> > > produce something that could very quickly (even a large neural net
> > > evaluation or what have you would be much faster than doing a large
> > > number of MC playouts) map positions to scores. Obviously the scores
> > > would not be perfect since the monte carlo program did not play
> > > anywhere near perfect Go. But this static evaluator could then be
> > > plugged back into the monte carlo player and used to bias the random
> > > playouts. Wouldn't it be useful to be able to quickly estimate the MC
> > > score without doing any playouts?
> > >
> > > Clearly this idea could be extended recursively with a lot of offline
> > > training. What makes this formulation more valuable is that given
> > > enough time and effort someone familiar with machine learning should
> > > be able to produce a learning architecture that can actually learn the
> > > MC scores. It would be a straightforward, if potentially quite
> > > difficult, supervised learning task with effectively unlimited data
> > > since more could be generated at will. Such a learning architecture
> > > could be used in the manner I described above or thrown at the more
> > > general reinforcement learning problem.
> > >
> > >
> > > Does anyone have any thoughts on this idea? Does anyone know of it
> > > being tried before?
> > >
> > > - George
> > > _______________________________________________
> > > computer-go mailing list
> > > computer-go at computer-go.org
> > > http://www.computer-go.org/mailman/listinfo/computer-go/
> > >
> >
> >
> > _______________________________________________
> > computer-go mailing list
> > computer-go at computer-go.org
> > http://www.computer-go.org/mailman/listinfo/computer-go/
> >
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://computer-go.org/pipermail/computer-go/attachments/20070517/813560e4/attachment.htm
More information about the computer-go
mailing list