[computer-go] Re: Explanation to MoGo paper wanted.
(BackGammonCode)
Chris Fant
chrisfant at gmail.com
Tue Jul 3 23:22:37 PDT 2007
At first, I was cool with the whole "mathematical notation is more
general" argument. But the fact of the matter is, these results don't
hold water in a general sense. They only hold water in the
environment that they were tested -- Computer Go. Seems like it
should be up to the person in the other environment to adapt your
successful algorithm (and notation/terminology) to their environment.
On 7/4/07, chrilly <c.donninger at wavenet.at> wrote:
>
>
> Thanks, the dictionary is really great.
>
> Chrilly
>
> ----- Original Message -----
> From: David Silver
> To: computer-go at computer-go.org
> Sent: Tuesday, July 03, 2007 11:29 PM
> Subject: [computer-go] Re: Explanation to MoGo paper wanted.
> (BackGammonCode)
>
>
>
>
> > It's because Go is not only game in the world and certainly not only
> > reinforcement learning problem. They are using a widely accepted
> > terminology.
> >
> But a very inappropriate one. I have read Suttons book and all the things I
> know (e.g. TD-Gammon) are completly obfuscated. Its maybe suitable to
> present generel concepts, but it is extremly complicated to formulate an
> algorithm in this framework.
>
> Here is quick and dirty RL<->Computer Go translation kit to try and help
> bridge the gap!
>
>
> RL terminology Go terminology
>
>
> State Position
> Action Move
> Reward Win/Loss
> Return Win/Loss
> Episode Game
> Time-step One move
> Agent ProgramValue function Evaluation function
> Policy Player
> Default policy Simulation player
> Uniform random policy Light simulation player
> Other stochastic policy Heavy simulation player
> Greedy policy 1-ply search player
> Epsilon-greedy policy 1-ply search player with some random moves
> Feature Factor used for position
> evaluation
> Weight Weight of each factor in evaluation function
> Tabular representation One weight for each complete position
> Partial tabular UCT tree
> representation
> State abstraction One weight for many positions
> Linear value function Evaluation function
> approximation using weighted sum of various factors
> Feature discovery Learning new factors for the evaluation function
> Sample-based search Simulation (Monte-Carlo methods, etc.)
> Transition function Rules of the game
> Environment Rules of the game + opponent
> Trajectory Move sequence
> Online During actual play
> Offline Before/after actual play (e.g. preprocessing)
> On-policy If both players play as normal
> Off-policy If either player behaves differently
>
>
> -Dave
>
>
>
> ________________________________
>
>
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
>
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
More information about the computer-go
mailing list