[Computer-go] Monte Carlo (upper confidence bounds appliedto trees)
Aja
ajahuang at gmail.com
Tue Oct 26 12:30:34 PDT 2010
In Arpad Rimmel's PhD thesis
(http://tao.lri.fr/Papers/thesesTAO/RimmelPhD.pdf), Mogo's UCT formula (page
50) is really too complicated to try for me. Have anyone try that formula?
Aja
----- Original Message -----
From: "David Fotland" <fotland at smart-games.com>
To: <computer-go at dvandva.org>
Sent: Tuesday, October 26, 2010 11:44 PM
Subject: Re: [Computer-go] Monte Carlo (upper confidence bounds appliedto
trees)
> Erica and many Faces do not set c = 0, so there are many strong go
> programs that do not set c to zero. Fuego and Mogo use c = 0. I'm not
> sure about others.
>
> David
>
>> -----Original Message-----
>> From: computer-go-bounces at dvandva.org [mailto:computer-go-
>> bounces at dvandva.org] On Behalf Of Jason House
>> Sent: Tuesday, October 26, 2010 6:01 AM
>> To: ???? ??????; computer-go at dvandva.org
>> Subject: Re: [Computer-go] Monte Carlo (upper confidence bounds applied
>> to
>> trees)
>>
>> It looks like your example is using average win rate instead of an upper
>> bound. Rather than M/N UCT uses M/N + c*log(N1+N2+N3+...)/sqrt(N)
>>
>> where Ni is the total simulations of the ith move. That extra term means
>> moves with initially bad results will be revisited. Stronger go programs
>> add
>> in some other terms and set c=0, but that isn't UCT.
>>
>> Sent from my iPhone
>>
>> On Oct 26, 2010, at 6:43 AM, ???? ?????? <petr.smolov83 at mail.ru> wrote:
>>
>> > Hello all!
>> > I would much appreciate if someone could explain me what exactly "upper
>> confidence bounds applied to trees" is.
>> >
>> > I'm still not sure that UCT works good.
>> >
>> > For example, I have a position with two possible moves (a1 and b1):
>> >
>> > O
>> > / \
>> > (a1) (b1)
>> >
>> >
>> > First random game (using a1 as first move) returns 0, second (b1)
>> > returns
>> 1:
>> >
>> > O
>> > / \
>> > (a1) (b1)
>> > (0/1)(1/1)
>> >
>> >
>> > Ok, let's try to explore b1 move (the program likes it, because uct of
>> > a1
>> = 0):
>> >
>> > O
>> > / \
>> > (a1) (b1)
>> > (0/1)(1/1)
>> > / | \
>> > a2 b2 c2
>> >
>> >
>> > a2 returns 0, b2 and c2 returns 1:
>> >
>> > O
>> > / \
>> > (a1) (b1)
>> > (0/1)(3/4)
>> > / | \
>> > a2 b2 c2
>> > 0 1 1
>> >
>> >
>> > So the program will explore b2 and c2 (starting from b1), but maybe a2
>> > is
>> the only refuting move. Maybe a1 is better, but the program will continue
>> to
>> explore b1, because utc of a1 is 0 and utc of b1 is more than 0.
>> > _______________________________________________
>> > Computer-go mailing list
>> > Computer-go at dvandva.org
>> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>> _______________________________________________
>> Computer-go mailing list
>> Computer-go at dvandva.org
>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at dvandva.org
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
More information about the Computer-go
mailing list