[Computer-go] Idea how to improve RAVE
Aja Huang
ajahuang at gmail.com
Mon Apr 1 12:08:55 PDT 2013
Hello Alexander,
Your idea is interesting. If I understand correctly, your idea is similar
to Fuego's "weight RAVE updates", see
http://fuego.sourceforge.net/fuego-doc-1.1/smartgame-doc/classSgUctSearch.html#a158df0720f5b23d7a2cb7381c1355214
Weight RAVE updates.
Gives more weight to moves that are closer to the position for which the
RAVE statistics are stored. The weighting function is linearly decreasing
from 2 to 0 with the move number from the position for which the RAVE
statistics are stored to the end of the simulated game.
Regards,
Aja
2013/3/31 Alexander Kozlovsky <alexander.kozlovsky at gmail.com>
> Hi!
> I have an idea have to improve RAVE, but this is still rough.
> So, I want to describe it here in the hope it lead to some
> interesting discussion. I hope my not-so-good English
> allows me to describe the idea adequately.
>
> If you don't want to read all the details, you can first scroll down
> to "Use cases" to read when this improvement may be useful.
>
>
> --- Current RAVE statistics implementation ---
>
> Let's say, we want to accumulate RAVE statistics. Let's say
> we have four arrays for this: black_rave_total[], black_rave_wins[],
> white_rave_total[], white_rave_wins[].
>
> We increment black_rave_total[intersection] += 1
> if, during last simulation, black was first who play on this intersection.
>
> We increment black_rave_win[intersection] += 1
> if, during last simulation, black was first who play on this intersection,
> and the simulation result is "black win".
>
> --- New proposal ---
>
> What if we add two arrays: black_rave_win_move_sum[]
> and white_rave_win_move_sum[].
> These arrays will accumulates sum of move numbers when black
> was first who play on the intersection and black win in this simulation.
>
> Concrete example:
>
> That is, let's say we already done ten simulations for current node.
> In six simulations, black was first who play on B4 intersection.
> In three of this simulations black win.
>
> In first of this three simulations, move number when black
> play on B4 was 20 (this move number is counted from the
> start of random simulation, not from the start of the game)
>
> In second of three simulation, the move number for B4 was 25.
> In the third simulation where black play on B4 and win,
> the move number was 72.
>
> In this case, black_rave_win_move_sum for B4 will be
> 20 + 25 + 72 = 117
>
> This number allows us to calculate average move number
> for B4 when simulation result was successful for black:
> 117 / 3 = 39.
>
> I denote this as black_rave_avg_win_move_num:
> black_rave_avg_win_move_num[pos] =
> black_rave_win_move_sum[pos] / black_rave_wins[pos]
>
> In current RAVE, we use winrate to determine the "best" move:
> black_rave_winrate[pos] = black_rave_wins[pos] / black_rave_total[pos]
>
> I propose to use "weighted winrate" instead:
> black_weighed_rave_winrate[pos] =
> black_rave_winrate[pos] / black_rave_avg_win_move_num[pos]
>
> In current example, winrate for B4 is 3/6 = 0.5
> weighted winrate will be (3/6) / (117/3) = 0.0128205
>
> Weighted winrate will be bigger for successful moves which must played
> ealier
> during simulation. Good endgame moves will have low weighted winrate.
>
>
> --- Use cases ---
>
> 1) Let's say we have two moves with good RAVE winrate: E5 and A4.
> A4 have bigger winrate, because A4 is inside safe territory, and each
> successful simulation have A4. E5 is critical, and must be played
> very early for result to be successful. Each simulation with E5 also
> have A4, but some simulations without E5 were also successful
> because of dumb opponent play during simulation.
>
> So, A4 have bigger RAVE winrate. But E5 have bigger
> weighted winrate, because A4 can be played at any time during
> simulation, and E5 must be played early, or it will be useless.
>
> With using of weighted RAVE winrate we can determine that
> E5 is more important then A4, despite the fact A4 have bigger
> RAVE winrate.
>
> 2) Let's say black must do three moves during simulation
> in order to win - B2, B3, C3, exactly in this order. Without
> this moves black cannot win the simulation.
>
> All of this moves have the same winrate, because the
> simulation is successful for black only if all three moves
> are played during simulations.
>
> So, if we use simple RAVE winrate, we can have problems
> with determination of correct move order.
>
> But B2 have bigger weighted winrate then B3 and C3,
> (and B3 have bigger weighted winrate then C3), because
> in all successful simulations B2 played before B3,
> and hence average move number for B2 is strictly less
> then average move number for B3 and C3. So, when using
> weighted winrate, we can determine correct move order.
>
>
> What do you think, am I missing something?
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at dvandva.org
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20130401/89dc2a5d/attachment.html>
More information about the Computer-go
mailing list