[Computer-go] Shodan Go Bet

Raymond Wold computergolist at w-wins.com
Sat Jul 24 09:47:06 PDT 2010


On 24.07.2010 15:13, Gian-Carlo Pascutto wrote:
> Raymond Wold wrote:
>
>    
>>> Playing strength is a function of everything you know and don't know
>>> and all your strengths and weaknesses and how you put them together.
>>>    Every player, even the very best has weaknesses but those do not
>>> determine how strong he is.
>>>        
>> These two sentences contradict eachother to me. If a program plays as a
>> 9dan professional as long as there are never any ladders, but has a bug
>> that flips the status of every ladder, then that weakness /does/
>> determine how strong he is. As incredible as it would be to make such a
>> program, it would not deserve to be classed along with the human 9dan
>> professionals, it would probably be more correct to consider it double
>> digit kyu,
>>      
> This makes no sense. The rating of the program is the strenght given
> this weakness. If this turns out to be 9dan pro despite laddder
> misreading, for example because it turns out to be very hard to set up
> game-result-defining ladders, the program is 9dan pro.
>
>    

I was simply not clear enough. I meant a program that, if you took all 
the games where no ladders appeared, and based a rank on that 
(cherry-picking the games where the flaw in the program never shows), 
would be counted as 9dan pro. Of course the program would not be 9dan 
pro in any reasonable sense, and that was my point, since all the games 
with ladders would ruin its statistics. The weakness would determine its 
strength.

> There is a tendency for humans to rate computers according to the flaws
> they see they can understand. If a 10kyu sees a program making a mistake
> he understands and could avoid, he will think the program is worse than
> 10kyu. You are also falling into this trap by giving that example.
>
>    
I should probably never have mentioned my goal of eliminating bias and 
promoting intellectual honesty, now everyone takes that as an attack on 
them personally and try their hardest to find such in my arguments. I 
mean, I don't mind people finding my biases so I can counteract them, 
but this will just lead to people reading things where there are none, 
like you seem to have done.

>>> Basically your idea of fair is that the first few games shouldn't
>>> count - you just said it differently and it's a ridiculous idea.
>>>        
>> That is indeed what I am saying, and I don't think it is so ridiculous.
>>      
> You are saying that a known weakness of human players is that they need
> warmup time and that they should not be rated according to their weakest
> performance, which is coincidentally what you are arguing *against*.
>
>    

No, I am saying that a humans weaknesses isn't very important, since the 
human will notice and learn to avoid them in very few games where one is 
exploited. Without gaining significant rank in the proccess. A program 
has no such benefit, and once a version of a program has a flaw, that 
flaw remains there until a new versions is made attempting to fix the 
flaw. Any claim about the program's strength will be undermined by the 
set of people who know of its flaws. Any challenge demonstrating the 
program's skill (such as John Tromp's bet) can be called into question 
in either direction, with speculation on whether the opponent knows of 
the flaw(s) or not. The game turns into something not go, but rather "do 
you know this program or not?" I am more interested in the game of go.

>> A human can much easier detect and correct his flaws, especially when
>> they are being exploited. Thus the effect of weaknesses isn't as
>> important for humans.
>>      
> A weakness of human chess players against computers is that they don't
> perform well in very fast timecontrols, which lead GMs to lose to
> computers even when the latter were far behind in playing strength on
> slow timecontrols.
>
> I'm curious what you suggest to fix this human weaknesses.
>    
> Another weakness is that they have problems reliably visualizing
> positions 20 ply out and identifying all the tactical possibilities
> there, and backtracking that to the current position. The nasty
> computers exploit this in almost every game.
>
> I'm also curious how you suggest to adapt to that.
>
>    
I was talking about weaknesses that could be exploited by players with a 
rank lower than the "ordinary rank" of the opponent in the game in 
question. Whether that be move-faster-than-human-motor-control blitz 
chess, actual chess at reasonable timing, or go. A go program ranked 1 
dan on KGS for instance, should not have flaws that a 4 kyu can reliably 
exploit to win every even game if its author(s) wants to claim a true 1 
dan playing strength for it.

>> If I am anti-anything, it would be against bias in program authors and
>> testers. I am for intellectual honesty. If you think a program should be
>> compensated in its ranking for the handicap that it can't learn, you
>> should give the program a much higher ranking than it would get on a go
>> server.
>>      
> As already explained, this argument works perfectly well the other way
> around (warmup time for humans and you wanting to drop the first games).
> If you think you are unbiased or intellectually honest when making such
> an argument, you're fooling yourself.
>
>    

I would think that two humans ranked the same would, over many games, 
get an even result. I would not mind putting this to the test with my 
own ranking. Another human can play me in a hundred games were we both 
do our best, and even if we disregard those first hundred games, I will 
not have exposed any weaknesses that can be exploited by an even 
opponent. People don't do this when playing go simply because they know 
that learning the weaknesses of the opponent isn't a viable strategy - 
they will just learn yours in return, and fix their own. A computer 
program is unable to do this. Thus the difference.

>> I want programs to improve under /my/ standard. I am kind of hoping
>> other go coders feel the same. Isn't handling the hard problem of
>> playing go what this is all about, and not just getting a high rating
>> for kudos and commercial gain? Surely tackling what your program is
>> worst at does this the best?
>>      
> Handling the hard problem of go means maximizing playing strength. That
> *is* my interest, and this does *NOT* entail fixing every possible
> weakness. Practise has demonstrated this convincingly for Go, for chess,
> and for other games. The strength of a program is *NOT* solely
> determined by its weakest part.
>    

So you are saying that I am wrong in that a lower-ranked human than a 
program can learn its flaws over a lot of games and reliably beat it, 
without having gained correspondingly in skill against other humans? Or 
are you saying that this does not matter, is entirely irrelevant to a 
fair judgement of skill?

> Your "commercial gain" argument is very lame and silly. If this were the
> interest, fixing the weaknesses would be more important than making the
> program play well. To understand why, see the second paragraph of this mail.
>    

So if you are marketing your go program for the purpose of a learning 
aid, for instance, it makes no sense to want to cover up that after 
twenty games you will know how to beat it soundly, having learned the 
wrong lesson (how to beat that specific program, rather than getting 
better at go)? If a customer is browsing for programs by skill, it makes 
no sense to use a go rating from a popular server where people don't 
play your program repeatedly, instead of the rating of the players that 
can reliably beat it after some practice?

I don't know that this happens (not having tried any of the commercial 
programs myself (authors are welcome to donate me copies to have me try 
out and give it a rating *I* think it deserves)), but I don't think it 
sounds lame and silly.

On 24.07.2010 15:13, Don Dailey wrote:
> I don't want this to get into a big debate,  but here is what it comes 
> down to in my opinion.      You are trying to isolate the learning 
> process from some hypothetical state where "learning is complete."     
> This is impossible to do and it's not what the human state is all 
> about.   We constantly learn and you cannot isolate the two things.
>
No, I am not.
> There are players who start young and improve during their entire 
> lives,  and when they lose we would consider it pretty strange if they 
> said this game shouldn't count because they are still "learning."   
>   Wouldn't you consider that intellectually dishonest on their part?
>
If two players play a hundred games where the result turns out even, the 
players starting out as 4 kyu, and ending out as 2 kyu by the last game, 
they would still be even. If only one of them has a reliable rank, I 
would still think it honest of the other player to claim the 2 kyu rank 
at the end.

If a human player plays a hundred games against a go program, where the 
human progresses from 4 kyu to 2 kyu over the games, and the human at 
the end have dominated the go program , I would not think it honest of 
the authors of the go program to claim that it has a 2 kyu strength or 
higher.

Normal growth and learning is not the issue - the issue is flaws that 
can be learned /without/ significant advancement in rank.

> That's why I consider this idea ill-conceived.   It's rather like a 
> foot race where you let the slow starter have a running 
> start because you consider it unfair that he get penalized for being a 
> slow starter.   Whatever you think you are measuring,  it's not a fair 
> race.
>
> Tournament play is what it is and if the conditions are not the same 
> for every player  then it's inherently unfair.  I think it's pretty 
> silly to consider games invalid because you believe you have not yet 
> learned enough about the opponent to take full advantage of his 
> ignorance.   That is what I consider intellectually dishonest (to 
> borrow your own phrase.)
>
> For some reason I think of Las Vegas,  another rigged game.   I do not 
> gamble as I consider a form of ugly greed but I know that the games 
> are rigged so that you cannot win.    And if someone actually develops 
> the skill to win at something,  such as by counting cards - they will 
> kick you out.     It's a bizarre situation where you are allowed to 
> play as long as you are not very good at it.       This is rather like 
> that,   let the human keep playing the computer until he figures out 
> how to beat it,  then start rating the games.    And of course if the 
> computer wins anyway,  the human just stops playing it.     It sounds 
> like an honest rating system to me.
>

But I am not talking about a human progressing in ranking until he is 
ranked higher than the claimed ranking of the program. I am talking 
about weaknesses he can exploit long before he learns to beat actual 
human players of that ranking. If a human player loses significantly in 
a hundred-game match against a program, I would not object very hard to 
giving the program even the rank of the player at the end of the series, 
regardless of whether he's progressed in skill. If John Tromp knows in 
advance which program he will play, and practices up to the match 
against it, trying his best to learn its flaws, and the program still 
wins, I would have no problem admitting that the program is at least 2 
or 3 dan (or whatever rank John Tromp has at the time of playing). 
/That/ would be an interesting result.

As I said, I do not think you are unreasonable for wanting the exact 
same conditions for every player. I see the logic. Equality is one nice 
measure of fairness. I just have a different standard for judging 
programs, because of the nature of those programs, which is very 
different from the nature of humans.


I'm not sure it's relevant, but an interesting thought experiment might 
be to consider how you would feel about a go program with a huge library 
of trick plays, that it employed whenever it thought it was behind. Or a 
go program that tried its best to get the opponent to lose on time. 
Given that tournament play is the same for everyone, would you still not 
feel that such a program's rating would be even a little undeserved?



More information about the Computer-go mailing list