[Computer-go] GPU Go engine (re-sent with fixed quotes!!!)

Alexander Kozlovsky alexander.kozlovsky at gmail.com
Sat Jan 12 07:14:50 PST 2013

On Tue, Jan 8, 2013 at 8:48 PM, Petr Baudis <pasky at ucw.cz> wrote:
>   Short superko cycles might be enough, but this will need careful
> debugging. (Hidden) superkos may matter surprisingly often in
> endgame positions, at least on 9x9 in such a way that guarding
> against superko in some way is essential for good performance.

I have read some previous discussions and concluded most practical
superko cycles have length 4 or 6 so for memory-restricted program
it may be enough to have round buffer of 8 last hashes, and
for pathological cases like mentioned in
restrict the total game length (tree depth + playot length) to some

>   This is usually called transposition tables. It has its pros and cons;
> you will save on child pointers and allow merging transposed positions,
> but you will get all the hash table worries, mainly fixed capacity.
> There's no clear answer here which is better, I think. But you must
> have good entropy in your hashes as full collisions are catastrophic.

It may be false memory, but I think I have read somewhere about
full collisions may be not so catastrophic as it seems at first.
If I remember correctly, the reasoning was, most of nodes represents
uninteresting moves with low winrate, and nodes with good winrate are
relatively seldom, so most collisions happens with two uninteresting
nodes, and collisions with two nodes with high winrate almost never

I have very little experience with Monte-Carlo and cannot comment
about validity of above, but suspect there are may be some truth
in this. So, I want to try write an engine with transposition tables
which will be relatively stable in presence of full collisions, by
having two properties:

- Propagation of total playouts count from transposition node
back to the tree root are disallowed, count of child node playouts
stored in parent node, inside array of child nodes, and each parent
of transposed (and, possibly, collided) node have independent
count of playouts for this child node

- If node detected as having multiple direct "parents", it
threat more optimistically to prevent negative effect when
new "good" node collides with previously existed "bad" node

>> I want to store Zobrist hash values not in 6 KB of shared memory
>> used for playout, but in 64 KB of GPU constant memory,
>> which have 8 KB cache per SM and to me looks very suitable
>> for Zobrist hash storage, because it seems optimized for both of
>   Huh. Isn't the constant memory space shared with the shared memory?
> (Or worse, offloaded at will to main memory.)

>From what I know, it is separate and very fast when multiple threads
inside warp request the same constant, at least on Fermi, but I
had no practical experience with it yet

More information about the Computer-go mailing list