[computer-go] Re: Explanation to MoGo paper wanted. (BackGammon Code)

David Silver silver at cs.ualberta.ca
Tue Jul 3 14:29:08 PDT 2007


> > It's because Go is not only game in the world and certainly not only
> > reinforcement learning problem. They are using a widely accepted
> > terminology.
> >
> But a very inappropriate one. I have read Suttons book and all the  
> things I
> know (e.g. TD-Gammon) are completly obfuscated. Its maybe suitable to
> present generel concepts, but it is extremly complicated to  
> formulate an
> algorithm in this framework.

Here is quick and dirty RL<->Computer Go translation kit to try and  
help bridge the gap!

RL terminology          Go terminology

State                   Position
Action	                                        Move
Reward	                                        Win/Loss
Return	                                        Win/Loss
Episode	                                Game
Time-step	                                One move
Agent                   Program
Value function          Evaluation function
Policy	                                        Player
Default policy          Simulation player
Uniform random policy   Light simulation player
Other stochastic policy Heavy simulation player
Greedy policy           1-ply search player
Epsilon-greedy policy   1-ply search player with some random moves
Feature                                        Factor used for  
position evaluation
Weight                  Weight of each factor in evaluation function
Tabular representation  One weight for each complete position
Partial tabular         UCT tree
     representation
State abstraction       One weight for many positions
Linear value function   Evaluation function
     approximation          using weighted sum of various factors
Feature discovery       Learning new factors for the evaluation function
Sample-based search     Simulation (Monte-Carlo methods, etc.)
Transition function     Rules of the game
Environment             Rules of the game + opponent
Trajectory              Move sequence
Online                  During actual play
Offline                 Before/after actual play (e.g. preprocessing)
On-policy               If both players play as normal
Off-policy              If either player behaves differently

-Dave

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://computer-go.org/pipermail/computer-go/attachments/20070703/8bb8cd07/attachment.htm


More information about the computer-go mailing list