[Computer-go] pachi2
Hideki Kato
hideki_katoh at ybb.ne.jp
Fri Jan 14 19:00:15 PST 2011
Thank you for the info.
I have some questions. How's about the network devices? The cluster
uses InfiniBand or similar high speed interface devices? What's the
processor? Intel Xeon or IBM Power?
Interestingly, your improvements are similar to my experiments of
DeepZen (a pc cluster version of Zen), though the implementations are
very different. DeepZen uses typically 0.3 second interval and every
node computer has its own search-tree which is shared by multiple
search-threads and periodically broadcasts the info in the tree.
Hideki
Jean-loup Gailly: <AANLkTiki67GYBFaq-NqpsKu4K=F4wMRFvXGpzQg6Q=gs at mail.gmail.com>:
>Here is some preliminary information on the distributed version of pachi.
>Petr (pasky) and I will publish all the details later, this is just to give
>you an idea of what we are doing. Pasky is the main author of pachi and
>wrote most of the single machine code. I wrote the distributed code and
>some other improvements.
>
>All the code, including the distributed code, is GPL and available at
>http://repo.or.cz/w/pachi.git/
>
>The distributed pachi uses simple tcp/ip sockets, not MPI. This makes it
>portable to many environments. A master process receives stats updates
>regularly from all the slaves and distributes the aggregated updates back
>to all slaves. The master-slave protocol is specific to pachi but it is
>rather simple. It is fault tolerant: if a slave dies, the master will send
>again the whole game to the new slave that will replace it. If the master
>dies, I ignore the current game and restart a new one when doing test
>runs. If the master dies when running for KGS, I kill the kgsGtp program
>and start a new one; KGS then sends again the partial game and we continue
>from there.
>
>I measured scalability both on a single machine and in distributed
>mode. All the details will be published, but here is a summary. In single
>machine mode, doubling the number of cores gains roughly 100 elo or one
>stone. (I measured one stone to be approximately 100 elo). This is true up
>to the number of cores I can test (20 per machine, other cores are reserved
>for the OS and other apps).
>
>In distributed mode doubling the number of machines initially gains
>approximately 50 elo (half a stone) up to 8 machines. Above this we
>quickly hit a scalability limit and the best result so far is with 64
>machines; this is the configuration used for the KGS tournament (starting
>at round 4) and on KGS right now. 128 machines are currently much worse
>than 64.
>
>Preliminary analysis of the lost games shows that the current code
>has inherent scalability limits because the playouts are biased.
>When the playouts incorrectly judge the life status of a group,
>the results will be bad no matter how many cores and machines
>work on it. We are of course working on this to eliminate these
>scalability limits.
>
>Pachi has benefited enormously from ideas published on the computer-go
>mailing list and in many papers. By making its source completely open we
>hope to encourage further progress in this area.
>
>Petr and Jean-loup
>---- inline file
>_______________________________________________
>Computer-go mailing list
>Computer-go at dvandva.org
>http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
--
Hideki Kato <mailto:hideki_katoh at ybb.ne.jp>
More information about the Computer-go
mailing list