[computer-go] thoughts on 100,000 cgos games
Aidan Karley
aidan_karley at mail.ru
Mon May 8 01:00:23 PDT 2006
In article <1147049090.8544.269.camel at localhost.localdomain>, Don Dailey
wrote:
> Any kind of learning needs a "training signal" - some kind of way of
> determining what to reward and what to punish.
>
I detect the sound of (AI) text-book pages being quoted. <G>
> What immediately comes to mind is
> to consider that the moves of the winner are likely to be better on the
> whole that the moves of the loser.
>
An implicit assumption here is that the values of moves fall in a
fairly narrow, consistent band. But doesn't the concept of "good play"
imply that many moves are just of workman-like quality, while some are
important, and a few are [brilliant | important]. Examples (in reverse
order) would be launching an invasion on a large moyo, backing up the
invasion with a couple of solid supporting plays to establish a framework
for your group, then doing a competent job of filling the walls in and
pushing for eyes and territory. Intuitively I think the distribution is
more like this (+++++) than this (xxxxx):
frequency ^ |+++++++
| | |
| |
| |
| |
| |
| |
|xxxxxxx|
| |
| |xxxxxxx
| |+++++++
| | |xxxxxxx
| | | |+++++++
|_______|_______|_______|_______|_
OK fair good great
[brilli- | import-]-ance ->
Hmmm, but how do you weight the "great" moves compared to the "OK"
moves so you can calculate a mean? I can't remember enough of my
non-parametric statistics course of 20-odd years ago to even remember if
it's possible without a relationship of the form "great is 4 times as
valuable as good". I think it should be possible to do it
non-parametrically, but I'm damned if I can remember how (I could do the
compulsory exam questions, but I think I ducked the NP optional
questions).
Given a board scoring engine and a couple of random (or just
/identical/) bots, wouldn't it be plausible to actually try to measure
how the outcome changes for each move in a number of games and generate a
number of estimates for the frequency-"-ance" distribution as above? Has
it been done, and were the results useful, or even consistent?
On r.g.g. yesterday, someone (Richard Mullins?) talked about a
hypothetical 19x19 parallel-processor as possibly being a useful
go-engine. I think I disposed of that idea after a couple of seconds
thought, but it did raise the question in my mind of software (or even
hardware, in a few decades) modules for doing things like score
estimation on a "whole board snapshot" basis. I'm sure we've almost all
had to do it at the club - "Hey Fred, who do you think is winning here?"
I understand that it's quite common for the KGS scoring tool to be used
for this, but it obviously depends on the ruleset implemented (cue
Jasiek. Hi Robert!) Do you think that an agreed standard for doing this
might be useful? After all, the current state of computer go is not high
enough that I've heard people trying to tune bots for a particular
commonly used ruleset.
--
Aidan Karley,
Aberdeen, Scotland,
Location: 57°10' N, 02°09' W (sub-tropical Aberdeen), 0.021233
Written at Mon, 08 May 2006 07:13 +0100
More information about the computer-go
mailing list