[computer-go] Pay-as-you-go cluster?
Chrilly
c.donninger at wavenet.at
Sun Oct 1 11:31:59 PDT 2006
Hydra is a distributed cluster consisting of 32 Dual-Nodes, which are
connected by a Myrnet-network. We use MPI (Message Passing Interface) for
the paralleziation of the code. MPI is relative straightforward to use. I
think the question is not shared memory or distributed cluster. The question
is if the algorithm is scalable. In case of Hydra a shared memory
64-processor cluster would not help much. The main problem is starvation. In
Alpha-Beta one uses the so called Young-Brothers-Wait concept. The first
move in each node must be computed before the siblings/brothers are
distributed. Without Young-Brothers one solves the starvation problem, but
gets a massive parallel overhead. Moves are searched in parallel which would
have never been considered on a single processor. The Young-Brothers concept
introduces a non-parallelizable part in the search. Amdahls law puts an
upper bound to the parallel speedup. MPI makes it a little bit worse,
because in Hydra the minimum height of a subtree must be 7 Plies.
Distributing a smaller subtree does not pay. It takes longer to distribute
and collect the result than to compute the subtree locally. Another inherent
problem of computer chess is, that the number of nodes in e.g. a 10 Ply
subtree can differ by a factor of 10**4. It happens frequently that result
of all the distributed siblings are long available, but 1 move is pending.
In this case all the siblings have to wait for the result. This problem is
somewhat solved by dynamic work-balancing. The processor with the remaining
subtree can split up the tree to the waiting siblings or the processors
working on the siblings can search in other parts of the tree for new work.
But it is still a bottleneck.
As said already above: The question is not shared or distributed memory. The
question is the structure of the algorithm.
For massive parallel systems there exists also no real shared memory.
Massive parallel systems with shared memory are NUMA (Non-Uniform
Memory-Access) Machines. The shared memory is on the physical level a
network similar to the Hydra Cluster. The shared memory is an illusion for
the the programmer. This makes things usually worse, because the programmer
is not aware of the hidden costs. A memory access to a remote node can be by
a factor of 10**3 more expensive than a accessing local memory. In case of
MPI one knows that message passing is an expensive operation. E.g. In Hydra
Message Passing is only done, when the CPU waits on the PCI-Bus on the
result of the FPGA-chess card. In this case message passing is almost free.
Only the latency puts an restriction on the height of the subtree.
Chrilly
More information about the computer-go
mailing list