[computer-go] Re: Explanation to MoGo paper wanted. (BackGammonCode)

Chris Fant chrisfant at gmail.com
Tue Jul 3 23:22:37 PDT 2007


At first, I was cool with the whole "mathematical notation is more
general" argument.  But the fact of the matter is, these results don't
hold water in a general sense.  They only hold water in the
environment that they were tested -- Computer Go.  Seems like it
should be up to the person in the other environment to adapt your
successful algorithm (and notation/terminology) to their environment.


On 7/4/07, chrilly <c.donninger at wavenet.at> wrote:
>
>
> Thanks, the dictionary is really great.
>
> Chrilly
>
> ----- Original Message -----
> From: David Silver
> To: computer-go at computer-go.org
> Sent: Tuesday, July 03, 2007 11:29 PM
> Subject: [computer-go] Re: Explanation to MoGo paper wanted.
> (BackGammonCode)
>
>
>
>
> > It's because Go is not only game in the world and certainly not only
> > reinforcement learning problem. They are using a widely accepted
> > terminology.
> >
> But a very inappropriate one. I have read Suttons book and all the things I
> know (e.g. TD-Gammon) are completly obfuscated. Its maybe suitable to
> present generel concepts, but it is extremly complicated to formulate an
> algorithm in this framework.
>
> Here is quick and dirty RL<->Computer Go translation kit to try and help
> bridge the gap!
>
>
> RL terminology          Go terminology
>
>
> State                   Position
> Action                                         Move
> Reward                                         Win/Loss
> Return                                         Win/Loss
> Episode                                 Game
> Time-step                                 One move
> Agent                   ProgramValue function          Evaluation function
> Policy                                         Player
> Default policy          Simulation player
> Uniform random policy   Light simulation player
> Other stochastic policy Heavy simulation player
> Greedy policy           1-ply search player
> Epsilon-greedy policy   1-ply search player with some random moves
> Feature                                        Factor used for position
> evaluation
> Weight                  Weight of each factor in evaluation function
> Tabular representation  One weight for each complete position
> Partial tabular         UCT tree
>     representation
> State abstraction       One weight for many positions
> Linear value function   Evaluation function
>     approximation          using weighted sum of various factors
> Feature discovery       Learning new factors for the evaluation function
> Sample-based search     Simulation (Monte-Carlo methods, etc.)
> Transition function     Rules of the game
> Environment             Rules of the game + opponent
> Trajectory              Move sequence
> Online                  During actual play
> Offline                 Before/after actual play (e.g. preprocessing)
> On-policy               If both players play as normal
> Off-policy              If either player behaves differently
>
>
> -Dave
>
>
>
>  ________________________________
>
>
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
>
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>


More information about the computer-go mailing list