[Computer-go] Direct DX11 and graphics cards for cheaper simulation hardware?

Hideki Kato hideki_katoh at ybb.ne.jp
Wed May 25 10:47:24 PDT 2011

Petr Baudis: <20110525150316.GI25968 at machine.or.cz>:
>On Wed, May 25, 2011 at 05:11:58PM +0900, Hideki Kato wrote:
>> Just simply avoid synchronization.  Tree-part updates the info in the 
>> search tree as soon as a result arrives, start descending tree, and send 
>> the leaf position to be simulated.
>> # I used broadcasting (udp/ip) but point-to-point is also possible.
>> For detail, see 
>> <http://www.geocities.jp/hideki_katoh/publications/gpw08-private.pdf>.
>This is where I'm not clear if this is possible to do with current GPUs
>at all. I *think* you cannot do it this way, at least with anything but
>Fermi which should support independent execution of multiple kernels

Since I have no experiments with current GPUs, I'm not sure also.

>But still, if each thread is separate simulation, you have only
>multiprocessor granularity and still have to treat simulations within
>single multiprocessor all in a single block; depending on the number
>of memory stalls, this may need to be much more than a single warp
>(32 threads), but that's the effective minimum.

The problems (of GPUs) are not only their SIMD architecture but also 
their very long memory latency.  Due to hiding that, GPUs run very fast 
only graphic applications which have millions of pixels.  In the case of 
Go, even on a 19x19 board, there are 361 intersections; 361 is not 
enough to hide the latency.  Or, running thousands (or millions) of 
simulations in parallel might perform better than CPUs but (different 
from the case of computer clusters) the host has to wait the longest 
simulation, as you pointed out.  So, I'm very suspicious of using GPUs 
for MCTS now.

Hideki Kato <mailto:hideki_katoh at ybb.ne.jp>

More information about the Computer-go mailing list