[Computer-go] Direct DX11 and graphics cards for cheaper simulation hardware?
hideki_katoh at ybb.ne.jp
Wed May 25 10:47:24 PDT 2011
Petr Baudis: <20110525150316.GI25968 at machine.or.cz>:
>On Wed, May 25, 2011 at 05:11:58PM +0900, Hideki Kato wrote:
>> Just simply avoid synchronization. Tree-part updates the info in the
>> search tree as soon as a result arrives, start descending tree, and send
>> the leaf position to be simulated.
>> # I used broadcasting (udp/ip) but point-to-point is also possible.
>> For detail, see
>This is where I'm not clear if this is possible to do with current GPUs
>at all. I *think* you cannot do it this way, at least with anything but
>Fermi which should support independent execution of multiple kernels
Since I have no experiments with current GPUs, I'm not sure also.
>But still, if each thread is separate simulation, you have only
>multiprocessor granularity and still have to treat simulations within
>single multiprocessor all in a single block; depending on the number
>of memory stalls, this may need to be much more than a single warp
>(32 threads), but that's the effective minimum.
The problems (of GPUs) are not only their SIMD architecture but also
their very long memory latency. Due to hiding that, GPUs run very fast
only graphic applications which have millions of pixels. In the case of
Go, even on a 19x19 board, there are 361 intersections; 361 is not
enough to hide the latency. Or, running thousands (or millions) of
simulations in parallel might perform better than CPUs but (different
from the case of computer clusters) the host has to wait the longest
simulation, as you pointed out. So, I'm very suspicious of using GPUs
for MCTS now.
Hideki Kato <mailto:hideki_katoh at ybb.ne.jp>
More information about the Computer-go