[Computer-go] GPU Go engine (re-sent with fixed quotes!!!)
fotland at smart-games.com
Tue Jan 8 22:52:38 PST 2013
> -----Original Message-----
> From: computer-go-bounces at dvandva.org [mailto:computer-go-
> bounces at dvandva.org] On Behalf Of Petr Baudis
> Sent: Tuesday, January 08, 2013 8:48 AM
> To: computer-go at dvandva.org
> Subject: Re: [Computer-go] GPU Go engine (re-sent with fixed quotes!!!)
> On Tue, Jan 08, 2013 at 04:34:15PM +0400, Alexander Kozlovsky wrote:
> > Is it possible to completely ignore superko during playouts and detect
> > it during UTC tree descent only, or maybe, detect only relatively
> > short superko cycles during playouts?
> > I think that probability of long superko cycle are very small and
> > statistically its effect during long series of random playout will be
> > insignificant.
Many faces only checks for simple ko in the playouts, and superko in the
> Short superko cycles might be enough, but this will need careful
> debugging. (Hidden) superkos may matter surprisingly often in endgame
> positions, at least on 9x9 in such a way that guarding against superko
> in some way is essential for good performance.
> > > > 2) What is a typical data size of structures used in modern > >
> > Monte-Carlo Go engines?
> > > > - size of UCT tree node;
> > >
> > > You will need at least UCT and RAVE values (#of playouts
> > > and value or #of wins), move coordinates and say another > extra
> > byte for some flags, per child, and pointer to its > children list.
> > I have a plan to completely avoid pointers to child nodes.
> > Maybe I don't fully understand something, but if each UCT tree node
> > have unique Zobrist hash and all nodes reside in gigantic hash table,
> > then address of hash table bucket with specific child node can be
> > calculated on the fly by calculating Zobrist hash of child position
> > from parent Zobrist hash. So, it is enough for each node to keep
> > relatively small header of node-specific information and array of
> > UTCValues of chid positions,
Many Faces does this (hash table only, no tree). I think also Aya.
> This is usually called transposition tables. It has its pros and cons;
> you will save on child pointers and allow merging transposed positions,
> but you will get all the hash table worries, mainly fixed capacity.
> There's no clear answer here which is better, I think. But you must have
> good entropy in your hashes as full collisions are catastrophic.
> > where index in the array corresponds to coordinates on the board.
> This seems somewhat wasteful, though; by middlegame, you will have a
> significant overhead, and you will want to have a way to skip unused
> positions during iteration, which will mean some more overhead.
> > So, I think is is possible to keep size of UTC node less then 1KB for
> > 19x19 board assuming each UTC value in array of child UTC values fit
> > in 2 bytes (I don't sure is 2 bytes is enough for UTC value, but why
> > not).
Many Faces is about 5 KB per node. You need to count wins and losses, and
rave wins and rave losses at least. I prefer to keep integer counts rather
than floating point win rates, to avoid accumulated rounding error.
> With longer thinking times, even single precision floats aren't
> enough. Careful here, typically many tree branches may have very similar
> > This assume some calculations during tree descent, but I think it is
> > not critical, because I think most processor time will be spend not in
> > tree descent, but in random playouts.
Most time is spent in move generation, since the heavy playouts are very
nonuniform, and the tree has a prior per move.
> I find that surprising amount of CPU time is spent in the tree descent
> (due to cache misses and mathematic operations) in Pachi, but with
> efficient representation and superscalar evaluation you should be able
> to push this down.
> > Also, I want to do not
> > not one, but several playouts after each tree descent (I do not know
> > whether this is a common practice), so tree walking time seems even
> > more minor compared to playouts.
Not common practice, since it is not a good use of CPU resources.
> This (or rather a variant of this) is sometimes called leaf
> parallelisation. It's not been found to work well so far, but I
> encourage you to do your own experiments.
> > I want to store Zobrist hash values not in 6 KB of shared memory used
> > for playout, but in 64 KB of GPU constant memory, which have 8 KB
> > cache per SM and to me looks very suitable for Zobrist hash storage,
> > because it seems optimized for both of
> Huh. Isn't the constant memory space shared with the shared memory?
> (Or worse, offloaded at will to main memory.)
> > By the way, I have read several time already about keeping edges in
> > array together with positions to avoid analyze of position index to
> > determine is it lie near board, but I just don't understand this. Is
> > it really so slow to check if position lie near board? As it seems to
> > me, many board representation (such as bit masks) imply much more
> > performance hit comparing to edge check. At the least, I can have a
> > small shared lookup table, which will tell for each index is it lie
> > near the edge, and near which edge exactly
The old Many Faces ran in 400 KB (yes KB, not MB), so it did not have space
for any off-board data. Board data arrays were 361 shorts. It can be made
efficient, so performance is not an issue. If you have extra cells off the
edge it makes the code smaller and simpler, which is why I switched to
20x21-ish for the MCTS engine.
> I'm not sure if anyone really uses bitmasks when there's no bad memory
> pressure. I don't. I just talk about bits of information, it's up to you
> if you round that up to nearest 2^n, n>2. :-)
> Having an edge frame allows us to avoid branching in many common for-
> each-neighbor loops, for example when decreasing/increasing liberty
> count, cached information like 3x3 pattern codes, and so on.
> Branching is expensive. But again, I encourage you to experiment. :)
> Petr "Pasky" Baudis
> Computer-go mailing list
> Computer-go at dvandva.org
More information about the Computer-go