[computer-go] Re: Explanation to MoGo paper wanted.

David Silver silver at cs.ualberta.ca
Tue Jul 3 10:22:37 PDT 2007


> > Hello all,
> >
> > We just presented our paper describing MoGo's improvements at ICML,
> > and we thought we would pass on some of the feedback and corrections
> > we have received.
> > (http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf)
> >
> I have the feeling that the paper is important, but it is completly
> obfuscated by the strange reinforcement learning notation and jargon.

I am sorry if the paper is not clear to you or other people on this  
mailing list. However, we chose this notation for several good reasons:

1. We wish to reach a wide audience - the whole machine learning  
community, for whom this notation is well-known.
2. We want other communities to find out about UCT, and start using  
it many different domains. It is not just a Go-programming algorithm!
3. We want to point out that UCT is a reinforcement learning  
algorithm, and fits into an existing framework. This is an important  
point for all of us - the established ideas and methods of RL can be  
applied to our UCT Go programs.
4. There are already many papers describing UCT in the games  
literature. There are very few papers describing UCT to the machine  
learning community. So we hope to make a clear presentation of UCT to  
them, and show that it can achieve good performance.

> Can anyone explain it in Go-programming words?

Maybe I can explain it using pictures :-)
I just updated my website, so you can see our ICML presentation. It  
may help to understand the ideas: http://www.cs.ualberta.ca/~silver/ 
research/presentations/files/sylvain-silver.pdf

> Is the RLOG Evaluation function used for evaluation or for just  
> selecting
> the best move? (by doing a 1 Ply search).

We used the RLGO evaluation function in two different ways.

1. We tried using it for play-outs (as a "heavy" simulation), which  
didn't work as well as MoGo's handcrafted play-outs. This is  
surprising, because RLGO is much stronger than MoGo's simulation player.

2. We tried using it for new nodes in the tree. When a new position  
is encountered, and we add it to the UCT, what should the initial  
value be? We use RLGO to provide an initial value, and we specify how  
many games of simulation this initial value is worth. The RLGO value  
function does better than any of the other heuristics we tried.

> Can anyone explain me, why it is necessary to obfuscate things at  
> all? Why
> is a move an action and not just a move, a game an episode and not  
> a game?
> Is it less scientific if coders than myself can understand it?

Not less scientific, but less general. We hope to make the point that  
our ideas are not restricted to games.

>
> It was pointed out by Donald Knuth in his paper on Alpha-Beta, that  
> the -
> simple - algorithm was not understood for a long time, because of the
> inappropriate mathematical notation. For recursive functions,  
> (pseudo-)code
> is much better suited than the mathematical notation. Actually its
> pseudo-mathematic notation.
> Why is this inappropriate notation still used?

Actually I think the best notation would be: description in plain  
text + mathematical notation + pseudocode + many diagrams. But in a  
conference paper we have just 8 pages to describe everything, so we  
must make some compromises.

>
> I have build just for fun a simple BackGammon engine. I think it  
> does what
> the paper proposses for the Monte-Carlo-Part. It uses a simple  
> evaluation
> function to select the next move in the Rollout aka Monte-Carlo  
> simulation.
> The engine does not build up an UCT-tree. It uses UCT only at the  
> root. The
> rollout always starts at the first ply.
> The 1ply engine has not the slightest chance against sophisticated
> BackGammon programm. But the simple minded UCT version is already a  
> serious
> opponent.

Why do you call this UCT if there is no tree? Isn't this just roll- 
out simulation, as used by Tesauro and Galperin in 1996?

-Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://computer-go.org/pipermail/computer-go/attachments/20070703/7ca4421d/attachment.htm


More information about the computer-go mailing list