[computer-go] thoughts on 100,000 cgos games

Don Dailey drd at mit.edu
Mon May 8 05:37:15 PDT 2006


You are talking about the credit assignment problem.   Give a single
game record where one side loses (obviously) how do you identify the
losing move?    

One "solution" that is commonly used in machine learning is to find the
point where you noticed the problem (score suddenly drops) and assign
some "blame" to all the previous moves - giving more of the blame to the
most recent moves and working your way backwards, giving less and less
blame.

With temporal difference learning you assign credit to every single move
by score - but you work your way backwards from the final position which
you can score perfectly.   You are essentially making adjustments to the
evaluation function at each move - to push it in the direction of the
NEXT position evaluated.



On Sun, 2006-05-07 at 19:51 -0700, David G Doshay wrote:
> With respect to learning, about the best idea so far is that if I  
> note that
> some evaluation of the board goes from SlugGo winning to SlugGo
> loosing, and SlugGo does eventually loose, then it is likely that the
> error was before that. I am not at all sure that this must be true.
> But
> that does narrow it down to "some move near here." In my opinion
> at this time, that is not enough to do automated learning. 



More information about the computer-go mailing list