[computer-go] Re: Amsterdam 2007 paper

Rémi Coulom Remi.Coulom at univ-lille3.fr
Fri May 18 23:47:25 PDT 2007


John Tromp wrote:
> On 5/18/07, Rémi Coulom <Remi.Coulom at univ-lille3.fr> wrote:
>
>> My idea was very similar to what you describe. The program built a
>> collection of rules of the kind "if condition then move". Condition
>> could be anything from a "tree-search rule" of the kind "in this
>> particular position play x", or general rule such as "in atari, extend".
>> It could be also anything in-between, such as a miai specific to the
>> current position. The strengths of moves were updated with an
>> incremental Elo-rating algorithm, from the outcomes of random 
>> simulations.
>
> The obvious way to update weights is to reward all the
> rules that fired for the winning side, and penalize all rules that 
> fired for
> the losing side, with rewards and penalties decaying toward the end
> of the playout. But this is not quite Elo like, since it doesn't 
> consider rules
> to beat each other. So one could make the reward dependent on the 
> relative
> weight of the chosen rule versus all alternatives. increasing the 
> reward if the
> alternatives carried a lot of weight.
> Is that how your ratings worked?
It is Elo-like in the generalized Bradley-Terry sense I describe in my 
paper: you have one team of one color beating one team of the other 
color. What I do exactly is compute the total Elo rating of black moves 
(with a decay so that clean-up moves don't count, and moves close to the 
root count more), and the total Elo rating of white moves. Then I 
compute the difference between the real outcome and the expected outcome 
according to Elo ratings, and correct Elo ratings proportionally to that 
difference.

Rémi


More information about the computer-go mailing list