[computer-go] UCT memory issues and scalability

Łukasz Lew lukasz.lew at gmail.com
Tue Jan 16 11:23:52 PST 2007


There is a good argument why 100 is ok.
When You have about 50 children, then waiting 100 playouts before
start of attaching them results only in 2 playouts per child loss, so
I guess even higher threshold should be OK.

Lukasz

On 1/16/07, Don Dailey <drd at mit.edu> wrote:
> I have been doing a lot of experiments with the scalability and
> memory usage of UCT.    I'm using the exact scheme that was
> described like this in a previous posting by someone:
>
>  Here is a summary of how it works:
>   - Use probability of winning as score, not territory
>   - Use the average outcome as position value
>   - Select the move that maximizes v + sqrt((2*log(t))/(10*n))
>
>   v is the value of the move (average outcome, between 0 and 1), n the
>   number of simulations of this move, and t the total number of
>   simulations at the current position. In case a move has n = 0, it is
>   selected first.
>
>
> For these experiments, I use random play-outs.   This version should
> be a relatively generic implementation of what many of us are doing,
> without the frills and a relatively stable bug-free implementation.
>
> Each version does 2x the number of play-outs as previous version
> and the level of the program is defined as the number of play-outs
> * 1024 - so level 2 does 2048 playouts, etc.   I am going to test
> as far as I can stand it like this:
>
>    level 1
>    level 2
>    level 4
>    level 8
>    level 16
>    ....
>
> Additionally,  I am testing 2 variants, because I am interesting in
> saving memory.   The base version does the basic thing - always
> expanding a node in the tree once it has been visited one time,
> or in other words each probe is guaranteed to add 1 new node to
> the tree.
>
> The second variant will not expand a node until it has been
> visited 5 times - expanding on the 6th visit.   This version is
> significantly more stingy about memory use.
>
> So far, after running well over 1000 games I cannot detect an
> advantage over using the less memory hungry version.   There
> seems to be no point in wasting the memory.
>
> In a separate less systematic test I did last week,  I also
> tested values as high as 100 visits before expansion at levels
> comparable or higher to CGOS time controls.   In that test,
> I tested 1 visit, 5 visits, 10 visits and 100 visits before
> expansion.
>
> After playing a massive round robin (where each player played
> a total of over 200 games)  I could not prove empirically that
> even 100 visits is too much.  The ELO rating difference varied
> only 40 ELO points from high to low but the 1 visit version,
> which is supposed to be the best,  was the second lowest rated
> version and the 100 visit version score 10 ELO points higher.
> This is all statistical noise.
> In this implementation the 100 visit version was substantially
> faster which makes it a no-brainer - however I replaced my
> node management routines with a fixed pool of nodes so there
> is no malloc or free operations.   Now the slowdown is very
> minor due to excessive node expansions.
>
> Does anyone have any reason to believe we should be expanding
> so aggressively as after only 1 visit?   My tests indicate
> that you can easily get away with anything less than 100.
>
> Here are the results of the current more systematic test.  I
> will continue to expand this by introducing higher levels
> to the autotester and dropping the lower ones in order to
> perform a scalability study.    We are seeing about 200 ELO
> per doubling at these levels.
>
> s05_032 means:  5 visits before expand - level 32 which
>                 is 32K play-outs per move.
>
>  Perf
>  Rating  Win perc  Tot Gms  Ave Time  Player
> -------  --------  -------  --------  -------
>  2107.1    90.749      227      50.3  s05_032
>  2102.7    90.323      217      52.2  s01_032
>  1942.1    79.452      219      32.1  s05_016
>  1916.3    78.027      223      32.6  s01_016
>  1764.3    65.447      246      16.1  s01_008
>  1753.1    61.200      250      15.3  s05_008
>  1542.1    40.000      245       6.8  s01_004
>  1526.5    39.914      233       6.3  s05_004
>  1284.3    22.273      220       1.8  s05_002
>  1274.8    22.594      239       1.8  s01_002
>  1000.0     7.623      223       0.2  s05_001
>   972.4     5.932      236       0.3  s01_001
>
> Black wins:      652      46.9 %
> White wins:      737      53.1 %
>
>
>
> - Don
>
>
>
>
>
>
>
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>


More information about the computer-go mailing list