[computer-go] Re: Amsterdam 2007 paper
Don Dailey
drd at mit.edu
Fri May 18 11:19:51 PDT 2007
On Fri, 2007-05-18 at 11:43 -0600, David Silver wrote:
> I also use an online learning algorithm in RLGO to adjust feature
> weights during the game. I use around a million features (all possible
> patterns from 1x1 up to 3x3 at all locations on the board) and update
> the weights online from simulated games using temporal difference
> learning. I also use the sum of feature weights to estimate the value
> of a move, rather than a multiplicative estimate. The learning signal
> is only win/lose at the end of each simulation, rather than supervised
> learning like Remi. The results are encouraging (currently ~1820 Elo
> on CGOS, based on 5000 simulations per move) for a program that does
> not use UCT or Monte-Carlo Tree Search in any way.
This is impressive! Does your program use an alpha/beta tree search?
I'm not clear on how it selects a move.
- Don
More information about the computer-go
mailing list