[computer-go] cgos

Don Dailey drd at mit.edu
Mon Sep 18 12:32:25 PDT 2006


Łukasz,

The constant in the ELO system is not ad hoc.

I have more comments:

On Mon, 2006-09-18 at 20:51 +0200, Łukasz Lew wrote:
> On 9/18/06, Don Dailey <drd at mit.edu> wrote:
> > CGOS has 2 numbers associated with each player,  a rating and an
> > uncertainty value.   I'm probably doing the same  thing "TrueSkill" is
> > doing.
> 
> If You are referring to K as the uncertainty measure, then there are
> plenty of such systems, some of them created ad hoc, some of them
> created on statistical analysis like ELO
> (but K factor is added ad hoc).
> 
> I advise True Skill because:
> - it is heavily tested
> - it is developed on Microsoft Research and used in XBox  Live Console.

Is that supposed to be an argument in favor?  


Here is what I read on a webpage about TrueSkill:

        The mean skill update equations of the TrueSkill ranking system
        are similar to the update equations of the ELO algorithm. The
        key difference is that a variable Kfactor is used for both
        players mainly depending on the ratio of the uncertainties of
        the two players. Hence, playing against a very certain player in
        the TrueSkill ranking system allows the uncertain player to move
        up or down in larger steps than in the case when playing against
        another uncertain player.

So there you have it.  It turns out that I re-invented the wheel by
accident.   But embarrassingly,  I must have been thinking along the
same lines as Microsoft.  In my defense I have to say that occasionally
Microsoft comes up with something pretty good.

- Don
 



> - evaluation of both rating and uncertainty are theoretically supported
>    while in ELO only rating updating is based on theoretical model
> - TrueSkill site gives explicit equations for a case of two player
> game rating updates, so
>   it should be relatively straight forward to implement it.
> 
> 
> 
> >
> > The uncertainty probably changes too fast, and I can improve the early
> > rating estimates significantly - I will make those improvements in the
> > next CGOS.
> >
> > I could fix some things now - but I have too much to do and I want to
> > focus the time I spend for this on the new CGOS.    I probably will make
> > the one change you request, to show ALL the matches in the cross-tables.
> 
> That is so great for me!
> 
> BTW
> I want to support a feature request of sending opponent name and version by GTP.
> 
> 
> >
> > I get a lot of requests, usually by private email to change things and
> > people don't realize this.  The requests are often conflicting - a lot
> > of this is a matter of personal taste and judgment.
> >
> > The changes I make will improve the rating drift situation.  But even
> > the current CGOS will eventually correct itself - it's just a little
> > sluggish at doing so.   This will be improved with better ways of
> > getting initial rating estimates in the new CGOS.
> 
> I'm afraid that new versions of players may appear to fast.
> 
> Moreover the drift itself is not a big problem. The problem is its effect that
> ratings of programs playing in different environments are incomparable.
> 
> For instance ZG1bot-MC-100k escaped just after the deflation started
> and was affected
> only slightly. Stronger ZG1bot-MC-200k played during those happy
> deflation times and got
> rating lower than 100k version.
> 
> Probably Valkyria UCT3 vs UCT4 is experiencing similar problem.
> This way the value of CGOS as a tool for evaluation of programs and
> motivation of programmers is diminishing. Especially for strong
> programs where Anchor has no direct effect.
> 
> I vote on fixing rating of some strong gnugo version, but probably CPU
> is a problem.
> 
> Lukasz
> 
> >
> > - Don
> >
> >
> > On Mon, 2006-09-18 at 15:11 +0200, Łukasz Lew wrote:
> > > Another solution is to implement TrueSkill rating system.
> > > The main difference is that it has two numbers per player
> > > - it's strength and uncertainty about it.
> > > This way MoGo, Valkyria_UCT3/4 etc would have still large uncertainty
> > > what would solve both problems: would grow faster to the "destination
> > > rating" and would not drain points from their opponents so badly until
> > > they would get to the destination.
> > > increase their rating faster, and not
> > >
> > >
> >
> >



More information about the computer-go mailing list