[computer-go] Re: Mogo scalability

Hideki Kato hideki_katoh at ybb.ne.jp
Tue May 6 07:40:02 PDT 2008


Mark Boon: <6C5C6A3B-294E-4251-94CD-27373AD98F24 at gmail.com>:
>
>On 4-mei-08, at 14:57, Hideki Kato wrote:
>
>>  By my obserbation (they are running on my pcs and
>> both are Q6600/3GHz with different mother boards), mogo_big_4core's
>> perallelism is around 300% (by top command), perhaps due to its
>> heavier uct part (just my guess).
>
>Of course the CPU load doesn't really say how effective  
>parallelization is. Recently I bought an octo-core Mac and have been  
>running some tests. It takes time to get real conclusive data but I  
>have some observations that come purely from some testing and  
>watching. When using eight cores I get a speed-up of around six  
>times. That is in number of playouts per second. I think that's a  
>much more useful metric than looking at the CPU load.

Yes, of course.  It's just as wrote.

>Still, even number of playouts is not the end-all I believe. I have  
>the distinct impression that eight cores running for one second plays  
>considerably worse than one core running for six seconds, even though  
>the number of playouts is in the same ball-park. I haven't had the  
>time to do an extensive test on that yet but I'm convinced that the  
>picture is more complicated than just looking at total computing power.

I've wrote a paper about this issue for GPW 2007 (in Japanese).

Following is its English abstract.  Later half addresses this problem 
which parallel implementations of UCT show worse performance than 
single thread ones.  The cause is that uct part create and evaluate 
positions _before_ mc part (threads) finishes simulations 
completely.
----------------------------------
	A Study on Implementing Parallel MC/UCT Algorithm

		HIDEKI KATO and IKUO TAKEUCHI

We have developed a parallel MC/UCT computer Go program as a test bed for our research,
applied recurrent neural networks. We measured the execution time of both commonly used
shared-tree and client-server implementations on two different types of systems, Intel Core
2 Quad on a PC and Cell Broadband Engine on a SONY PLAYSTATION 3. The client-server
implementation runs three times faster and 10% slower than shared-tree on the Playstation
3 and PC, respectively. Also, the effect of a well-known problem that parallelizing Monte
Carlo simulations may make UCT algorithm behave differently was evaluated with the winning
rates against GNU GO. Our experiments using four cores show that the winning rates
decrease 35 ELO at most and can be improved to 20 ELO.
-----------------------------------

-Hideki

>Mark
>
>---- inline file
>_______________________________________________
>computer-go mailing list
>computer-go at computer-go.org
>http://www.computer-go.org/mailman/listinfo/computer-go/
--
gg at nue.ci.i.u-tokyo.ac.jp (Kato)


More information about the computer-go mailing list