[computer-go] thoughts on 100,000 cgos games

Aidan Karley aidan_karley at mail.ru
Mon May 8 01:00:23 PDT 2006


In article <1147049090.8544.269.camel at localhost.localdomain>, Don Dailey 
wrote:
> Any kind of learning needs a "training signal" - some kind of way of
> determining what to reward and what to punish.
>
       I detect the sound of (AI) text-book pages being quoted. <G>
       
> What immediately comes to mind is
> to consider that the moves of the winner are likely to be better on the
> whole that the moves of the loser.
>
       An implicit assumption here is that the values of moves fall in a 
fairly narrow, consistent band. But doesn't the concept of "good play" 
imply that many moves are just of workman-like quality, while some are 
important, and a few are [brilliant | important]. Examples (in reverse 
order) would be launching an invasion on a large moyo, backing up the 
invasion with a couple of solid supporting plays to establish a framework 
for your group, then doing a competent job of filling the walls in and 
pushing for eyes and territory. Intuitively I think the distribution is 
more like this (+++++) than this (xxxxx):
       frequency ^     |+++++++
                 |     |       |
                       |       |
                       |       |
                       |       |
                       |       |
                       |       |
                       |xxxxxxx|
                       |       |
                       |       |xxxxxxx
                       |       |+++++++
                       |       |       |xxxxxxx
                       |       |       |       |+++++++
                       |_______|_______|_______|_______|_
                        OK   fair    good      great 
                                       [brilli- | import-]-ance ->
       Hmmm, but how do you weight the "great" moves compared to the "OK" 
moves so you can calculate a mean? I can't remember enough of my 
non-parametric statistics course of 20-odd years ago to even remember if 
it's possible without a relationship of the form "great is 4 times as 
valuable as good". I think it should be possible to do it 
non-parametrically, but I'm damned if I can remember how (I could do the 
compulsory exam questions, but I think I ducked the NP optional 
questions).
       Given a board scoring engine and a couple of random (or just 
/identical/) bots, wouldn't it be plausible to actually try to measure 
how the outcome changes for each move in a number of games and generate a 
number of estimates for the frequency-"-ance" distribution as above? Has 
it been done, and were the results useful, or even consistent?
       
       On r.g.g. yesterday, someone (Richard Mullins?) talked about a 
hypothetical 19x19 parallel-processor as possibly being a useful 
go-engine. I think I disposed of that idea after a couple of seconds 
thought, but it did raise the question in my mind of software (or even 
hardware, in a few decades) modules for doing things like score 
estimation on a "whole board snapshot" basis. I'm sure we've almost all 
had to do it at the club - "Hey Fred, who do you think is winning here?" 
I understand that it's quite common for the KGS scoring tool to be used 
for this, but it obviously depends on the ruleset implemented (cue 
Jasiek. Hi Robert!) Do you think that an agreed standard for doing this 
might be useful? After all, the current state of computer go is not high 
enough that I've heard people trying to tune bots for a particular 
commonly used ruleset.
       
-- 
 Aidan Karley,
 Aberdeen, Scotland,
 Location: 57°10' N,  02°09'  W (sub-tropical Aberdeen), 0.021233
 Written at Mon, 08 May 2006 07:13 +0100




More information about the computer-go mailing list