[computer-go] Explanation to MoGo paper wanted.
George Dahl
george.dahl at gmail.com
Wed Jul 4 13:57:23 PDT 2007
On 7/4/07, Don Dailey <drd at mit.edu> wrote:
> On Wed, 2007-07-04 at 11:34 +0200, Magnus Persson wrote:
> > but what really will make a
> > difference is in the quality in the playouts.
>
> I would like to suggest a more abstract view of things. In the purest
> form of the algorithm there isn't an artificial distinction between the
> tree and the play-outs. The algorithm is applied as if the whole tree
> already exists (conceptually) and nodes are updated to the end of the
> game.
>
> We had to impose end nodes and a tree that grows in depth due to the
> fact that it's impractical to store the whole tree in memory. So we
> have a tree phase on the one hand, and on the other hand we have a
> play-out phase that simulates an unexplored tree (but without updates
> which introduces out of necessity a small inefficiency.)
>
> This makes everything a bit of a compromise but a well advised one due
> to hardware limitations. But then we started imposing our will on the
> play-outs in order to make them "smarter." But we didn't do the same
> to the tree portion because we now believe they are 2 separate things
> (even though they really are not.)
>
> So I prefer to think of the play-outs and the tree as the same thing. I
> think whatever is done can be applied to both. For instance Lazarus
> does a lot of pruning and the pruning rules are the same for tree
> portion and the play-out portion. Actually, Lazarus saw most of the
> improvement from the tree pruning when I test each without the other.
>
> But I notice that we are now looking at the tree as the "search" portion
> and the play-outs as the "evaluation function." I think that is
> incredible because I have always believed that "tree search" and
> "evaluation" are the same thing, just different forms or states. Like
> water and ice, or matter and energy.
>
> It's interesting that chess has this too. Traditionally programs have
> always had these 3 very crude phases, search, quiescence, evaluation.
> Modern programs have somewhat blurred these distinctions but it hasn't
> changed very much.
>
> UCT comes along and finally does away with the distinction altogether.
> Now you can call it all evaluation or search, whatever pleases you.
> But in it's purest form, UCT with totally random play-outs is a
> beautiful thing - a recursive evaluation function with zero (almost)
> domain specific knowledge.
>
> Of course now we just had to go and spoil it all by imposing domain
> specific rules. I have done the same and I admit it. It would be fun
> to see how far we could go if domain specific knowledge was forbidden as
> an experiment. Once patterns are introduced along with other direct Go
> knowledge, it's still fun but it feels a bit "wrong", kind of like
> cheating.
Is it still cheating if the program learns and discover's the patterns
itself? Then isn't it just precomputing a few things?
It's clear that when we do this, we introduce strengths and
> weaknesses to the program, making it a bit more fragile, less
> "universal" or robust. Stronger too, but more susceptible to
> in-transitivity.
>
> Of course we do this in Chess programs in a big way. We very tediously
> "tell" the program what is good and what is bad. It has no choice, it
> must accept our definition of right and wrong, our morality. However
> in our great wisdom we provide a search mechanism in order to correct
> our bad judgments. The search mechanism is an admission that we know
> we are wrong about many things.
>
> Of course you are right - if the play-outs are improved, the quality of
> the moves will also improve.
>
> - Don
>
>
>
>
>
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
More information about the computer-go
mailing list