[computer-go] UCT memory issues and scalability
Łukasz Lew
lukasz.lew at gmail.com
Tue Jan 16 11:23:52 PST 2007
There is a good argument why 100 is ok.
When You have about 50 children, then waiting 100 playouts before
start of attaching them results only in 2 playouts per child loss, so
I guess even higher threshold should be OK.
Lukasz
On 1/16/07, Don Dailey <drd at mit.edu> wrote:
> I have been doing a lot of experiments with the scalability and
> memory usage of UCT. I'm using the exact scheme that was
> described like this in a previous posting by someone:
>
> Here is a summary of how it works:
> - Use probability of winning as score, not territory
> - Use the average outcome as position value
> - Select the move that maximizes v + sqrt((2*log(t))/(10*n))
>
> v is the value of the move (average outcome, between 0 and 1), n the
> number of simulations of this move, and t the total number of
> simulations at the current position. In case a move has n = 0, it is
> selected first.
>
>
> For these experiments, I use random play-outs. This version should
> be a relatively generic implementation of what many of us are doing,
> without the frills and a relatively stable bug-free implementation.
>
> Each version does 2x the number of play-outs as previous version
> and the level of the program is defined as the number of play-outs
> * 1024 - so level 2 does 2048 playouts, etc. I am going to test
> as far as I can stand it like this:
>
> level 1
> level 2
> level 4
> level 8
> level 16
> ....
>
> Additionally, I am testing 2 variants, because I am interesting in
> saving memory. The base version does the basic thing - always
> expanding a node in the tree once it has been visited one time,
> or in other words each probe is guaranteed to add 1 new node to
> the tree.
>
> The second variant will not expand a node until it has been
> visited 5 times - expanding on the 6th visit. This version is
> significantly more stingy about memory use.
>
> So far, after running well over 1000 games I cannot detect an
> advantage over using the less memory hungry version. There
> seems to be no point in wasting the memory.
>
> In a separate less systematic test I did last week, I also
> tested values as high as 100 visits before expansion at levels
> comparable or higher to CGOS time controls. In that test,
> I tested 1 visit, 5 visits, 10 visits and 100 visits before
> expansion.
>
> After playing a massive round robin (where each player played
> a total of over 200 games) I could not prove empirically that
> even 100 visits is too much. The ELO rating difference varied
> only 40 ELO points from high to low but the 1 visit version,
> which is supposed to be the best, was the second lowest rated
> version and the 100 visit version score 10 ELO points higher.
> This is all statistical noise.
> In this implementation the 100 visit version was substantially
> faster which makes it a no-brainer - however I replaced my
> node management routines with a fixed pool of nodes so there
> is no malloc or free operations. Now the slowdown is very
> minor due to excessive node expansions.
>
> Does anyone have any reason to believe we should be expanding
> so aggressively as after only 1 visit? My tests indicate
> that you can easily get away with anything less than 100.
>
> Here are the results of the current more systematic test. I
> will continue to expand this by introducing higher levels
> to the autotester and dropping the lower ones in order to
> perform a scalability study. We are seeing about 200 ELO
> per doubling at these levels.
>
> s05_032 means: 5 visits before expand - level 32 which
> is 32K play-outs per move.
>
> Perf
> Rating Win perc Tot Gms Ave Time Player
> ------- -------- ------- -------- -------
> 2107.1 90.749 227 50.3 s05_032
> 2102.7 90.323 217 52.2 s01_032
> 1942.1 79.452 219 32.1 s05_016
> 1916.3 78.027 223 32.6 s01_016
> 1764.3 65.447 246 16.1 s01_008
> 1753.1 61.200 250 15.3 s05_008
> 1542.1 40.000 245 6.8 s01_004
> 1526.5 39.914 233 6.3 s05_004
> 1284.3 22.273 220 1.8 s05_002
> 1274.8 22.594 239 1.8 s01_002
> 1000.0 7.623 223 0.2 s05_001
> 972.4 5.932 236 0.3 s01_001
>
> Black wins: 652 46.9 %
> White wins: 737 53.1 %
>
>
>
> - Don
>
>
>
>
>
>
>
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
More information about the computer-go
mailing list