[Computer-go] replacing dynamic komi with a scoring function
sheppardco at aol.com
Mon Jan 9 12:13:39 PST 2012
I gather that the goal of keeping a histogram is to maximize the median
score. Do I have that right?
It wasn't obvious to me that this would work as a tree exploration policy.
Can you clarify? For example, what move should A explore to ensure that its
value converges to the largest median score?
I am asking because an exploration policy that maximizes Prob(score >= 0.5)
will not necessarily maximize Prob(score >= 1.5).
From: computer-go-bounces at dvandva.org
[mailto:computer-go-bounces at dvandva.org] On Behalf Of Zach Wegner
Sent: Monday, January 09, 2012 1:54 PM
To: computer-go at dvandva.org
Subject: Re: [Computer-go] replacing dynamic komi with a scoring function
On Mon, Jan 9, 2012 at 7:26 AM, Vlad Dumitrescu <vladdu55 at gmail.com> wrote:
> On Mon, Jan 9, 2012 at 13:17, Don Dailey <dailey.don at gmail.com> wrote:
>> I believe a more correct scoring function won't be based on how much you
>> by OR how often you win but will incorporate some other more relevant
>> concept and it will be dynamic. And it will not matter if the game is
>> a handicap game or otherwise because the scoring function will always be
>> relevant. The goal will be to maximize your winning chances but it
>> will incorporate something more sophisticated that just counting how
>> you win or how much you win by.
> I hope I may interfere with something that Don's nice description
> revealed to me. It feels rather obvious, but since nobody stated it
> explicitly, maybe it's news for at least some people here.
> MCTS is maximizing the chances of winning. These chances are largest
> for a minimal score difference because this allows for making some
> errors. Winning by the largest possible score has rather small chances
> to happen because every move has to be perfect.
> The curve describing the probability of ending the game with a certain
> score is bell-shaped and MCTS explores the area beneath it, looking
> for winning moves. With handicap, the disadvantaged side is getting
> less samples explored, making it less likely to discover the really
> good moves. Dynamic komi shifts the bell left or right in order to
> equalize the sampling on both sides, but as mentioned it isn't dynamic
> enough (the curve changes after each move) and also is actually using
> a different shape for the curve than the real "handicap curve".
> In theory, I think that the solution for keeping the same level of
> play with handicap as without would be to make sure that the the
> disadvantaged side gets just as many samples with or without handicap.
> That is, use more playouts when playing with handicap. In practice,
> this is probably prohibitive...
> I wonder if it might be possible to estimate the shape of this curve
> after each move and use that estimate to dynamically adjust the number
> of playouts. One might have to use higher precision calculations, too,
> so that the noise doesn't get too loud.
> Does this make any sense? Has anyone tried something like this?
> best regards,
> Computer-go mailing list
> Computer-go at dvandva.org
This is similar to ideas I've had to solve this. Of course, I'm not a
go programmer, so it's really just idle curiosity for me when reading
this list. Perhaps an ideal scoring function is "highest median
score". You can do this in a slow/memory intensive way by storing a
histogram of final score in each node instead of wins/losses.
Wins/losses can then be computed by summing the histogram buckets
above/below the komi level. Computing the median is simply finding the
bucket where the sum crosses 50%. Other percentages could be used here
as well, though I'm not quite sure what effect this might have.
This solves the problem of getting far less than one bit of
information per playout when the win rate is very high or low, and
also supersedes dynamic komi--if you have the full histogram at any
node, you can compute the exact win rate for any arbitrary komi.
Using this in practice might be fairly difficult, and would probably
involve making the histogram "lossy" in some sense. I would be
interested if somebody here could run tests with this as a limit study
though--keep the number of nodes allocated and playouts per move
constant, and see the effect on strength.
Computer-go mailing list
Computer-go at dvandva.org
More information about the Computer-go