[computer-go] Re: OpenMP / Quad Core experiments
Hideki Kato
hideki_katoh at ybb.ne.jp
Tue Jan 1 19:05:46 PST 2008
Happy New Year, all.
terry mcintyre: <527549.61247.qm at web39809.mail.mud.yahoo.com>:
>I have been tinkering with OpenMP and my new HP Quad
>Intel 6600. Wrote a small program to compute the
>Taylor series of e and pi, just for exploration, and
>I've found some interesting data points.
>
>I am using gcc 4.2 and 4.3.1 - the latter being the
>head of the SVN repository. Kubuntu 7.10, both 32 and
>64 bit versions. One of my test programs is attached.
>
>Oddly, the OpenMP version is no faster than the
>single-threaded version - but it does keep the cores
>busier. It is possible that I am doing something
>wrong, as I am new to OpenMP.
>
>I was so puzzled by the results that I tried the same
>program on my AMD Athlon X2. The older AMD Athlon duo,
>with a 1 GHz clock, 64-bit Fedora Core 7, is 20%
>faster than the 1.6GHz quad 6600. I've also run the
>--monte-carlo version of GnuGo 4.7.11 on both
>machines, with similar results.
>
>The compilation line is:
>gcc -Wall -fopenmp -O3 -march=native -lgomp taylor3.c
>-o taylor3
>
>( the code is an adaptation of code from the OpenMP
>tutorial at http://kallipolis.com/openmp/ - which
>leads to another interesting discovery. The original
>code yields incorrect results for pi; the two parallel
>branches use the same index variable i,
>and one stomps on the other. Is this a feature of the
>gcc version of OpenMP? I'll be testing Intel's icc
>soon. )
>
>I'll be doing more testing this weekend, but I'd like
>to know if anyone has compared the Intel 6600 to other
>processors. So far, it sure looks like a tired old nag
>on her last ride to the glue factory; I'm wishing that
>I had waited for the Penryn version.
>
>One more puzzle: this processor is rated at 2.4GHz,
>but cpuinfo tells a different story:
It's because SpeedStep is working. You can stop it in BIOS setting.
http://en.wikipedia.org/wiki/SpeedStep
-Hideki
>terry at terry-quad-64:/proc$ cat cpuinfo
>processor : 0
>vendor_id : GenuineIntel
>cpu family : 6
>model : 15
>model name : Intel(R) Core(TM)2 Quad CPU Q6600
> @ 2.40GHz
>stepping : 11
>cpu MHz : 1596.000
>cache size : 4096 KB
>physical id : 0
>siblings : 4
>core id : 0
>cpu cores : 4
>fpu : yes
>fpu_exception : yes
>cpuid level : 10
>wp : yes
>flags : fpu vme de pse tsc msr pae mce cx8
>apic sep mtrr pge mca cmov pat pse36 clflush dts acpi
>mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc
>pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm
>bogomips : 4804.08
>clflush size : 64
>cache_alignment : 64
>address sizes : 36 bits physical, 48 bits virtual
>power management:
>
>
>Terry McIntyre <terrymcintyre at yahoo.com>
>
>Wherever is found what is called a paternal government, there is found state education. It
>has been discovered that the best way to insure implicit obedience is to commence tyranny in
>the nursery.
>
>Benjamin Disraeli, Speech in the House of Commons [June 15, 1874]
>
>
> ____________________________________________________________________________________
>Never miss a thing. Make Yahoo your home page.
>http://www.yahoo.com/r/hs
>/*
> * taylor.c
> *
> * calculate e and pi by their taylor expansions and multiply them
> * together.
> *
> * moved local variables inside parallel blocks ( performance tweak? )
> */
>
>#include <omp.h>
>#include <stdio.h>
>#include <time.h>
>
>#define num_steps 20000000
>
>int main(int argc, char *argv[])
>{
> double start, stop; /* times of beginning and end of procedure */
> double efinal, pifinal, product;
>
> /* start the timer */
> start = clock();
>
> /* calculate e and pi in parallel */
>#pragma omp parallel sections shared(efinal,pifinal)
> {
>#pragma omp section
> { /* calculate e using Taylor approximation */
> register double e, factorial;
> register int j;
>
> e = 1;
> factorial = 1;
> for (j = 1; j<num_steps; j++) {
> factorial *= j;
> e += 1.0/factorial;
> }
> efinal=e;
> } /* e section */
>
>#pragma omp section
> { /* calculate pi expansion */
> register int i;
> register double pi;
>
> pi = 0;
> for (i = 0; i < num_steps*10; i++) {
> /* we want 1/1 - 1/3 + 1/5 - 1/7 etc.
> therefore we count by fours (0, 4, 8, 12...) and take
> 1/(0+1) = 1/1
> - 1/(0+3) = -1/3
> 1/(4+1) = 1/5
> - 1/(4+3) = -1/7 and so on */
> pi += 1.0/(i*4.0 + 1.0);
> pi -= 1.0/(i*4.0 + 3.0);
> }
> pi = pi * 4.0;
> pifinal=pi;
> } /* pi section */
>
> } /* omp sections */
> /* threads rejoin here */
>
> product = efinal * pifinal;
>
> stop = clock();
>
> printf("e %f pi %f products = %f reached in %.3f seconds\n", efinal, pifinal, product,
>(double)(stop-start)/CLOCKS_PER_SEC);
>
> return 0;
>}
>---- inline file
>_______________________________________________
>computer-go mailing list
>computer-go at computer-go.org
>http://www.computer-go.org/mailman/listinfo/computer-go/
--
gg at nue.ci.i.u-tokyo.ac.jp (Kato)
More information about the computer-go
mailing list