[computer-go] Re: ZG1-trunMCx-50k

Łukasz Lew lukasz.lew at gmail.com
Mon Sep 25 16:05:05 PDT 2006


On 9/26/06, Don Dailey <drd at mit.edu> wrote:
> Łukasz did the experiment with 5 players, which is misleading.  If you
> put several identical players on CGOS like he did, you can pick and
> choose the very worst case and present it as a horrible anomaly.   Even
> so, he didn't find much difference,  his spread is about 54 rating
> points which indicates that we may be off about 27 rating points or
> so.
>
> Really, the most important constant in CGOS ratings is the K-factor.
> We can make the ratings arbitrarily stable by choosing lower K-factors
> but I happen to know that some programmers improve their programs
> without changing the names or version numbers on CGOS, so if I make the
> K-factor too low,  these programs will take a very long time to "reset"
> to their realistic ratings.   CGOS could become less practical as a tool
> for programmers if the K-factors are so low that it takes many days to
> get an established rating.

What do You think about advising all CGOS users to use one account for
one version of bot.

And allow to decrease K indefinitely with a speed chosen to guarantee
convergence.

And to increase K after each reconnect and after version/name change of bot.

Best Regards,
Lukasz
>
> It's almost ironic that this is the same issue that monte/carlo programs
> deal with.  When you try to do alpha beta searching using monte carlo
> evaluations at end nodes,  the score that gets returned is very
> optimistic and the problem is worse if there are many legal moves
> available.  This is because the move that is best has to compete with
> the move that is "luckiest" in the simulations.   The score that is
> passed up the tree is usually too high unless there is only 1 clearly
> best move.
>
> By the way, I don't understand your data because I don't know how you
> calculated the ratings.   Did you do performance ratings as a whole, or
> did you run the simulation incrementally, rating them 1 game at a time?
> And what K-factor did you use if you did this incrementally?
>
> - Don
>
>
>
>
>
>
>
> On Mon, 2006-09-25 at 13:31 -0700, Christoph Birk wrote:
> > >> ELO is a statistical rating system.   There is only a 44 rating point
> > >> difference in the worst case, which is hardly significant.   44 rating
> > >> points isn't nearly enough to say with serious confidence that you are a
> > >> better player.
> >
> > I did a Monte-Carlo simulation of the (ELO) rating of identical
> > programs with P(win)=0.5. (like the ZG1-trunMCxxx group on CGOS).
> >
> > After about 600 games the rating spread (best-worst) of the ZG1-group
> > is (1505,1526,1530,1547,1566) 51 points.
> > The likelyhood for a spread of more than 50 points is about 12% ...
> > IMHO not too unlikely.
> >
> > Christoph
> > _______________________________________________
> > computer-go mailing list
> > computer-go at computer-go.org
> > http://www.computer-go.org/mailman/listinfo/computer-go/
>
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>


More information about the computer-go mailing list