[Eeglablist] distributed processing
Robert Oostenveld
roberto at smi.auc.dk
Tue Mar 22 00:59:03 PST 2005
Dear Joseph (and other heavy EEGLAB users)
We have been performing specific (non-EEGLAB) Matlab computations in
parallel on our linux "cluster", i.e. a loosely connected network of
about 30 similar linux computers. This parallelization is part of our
FieldTrip toolbox, which in the near future will become the back-end
for dipolefitting and sourceanalysis in EEGLAB. My objectives were
twofold:
* I wanted to evaluate Matlab code (not low-level c-code), and the
Matlab code should be as "unaware" as possible of it being evaluated in
parallel
* I wanted to work with relatively large chuncks of data, i.e. each
chunk (subject/trial/whatever) that is evaluated in a separate job
should be computationally large enoug to justify the overhead of
sending the data over the network
For this purpose I have evaluated various open source parallel
computing toolboxes, but found that none of them was suitable for my
needs. I have also seen the recently released commercial DCT/DCE Matlab
toolbox, but have no experience with that one yet. The most important
problem that I faced (which I think also applies to the DCT/DCE
toolbox) is that the parallel computations are performed in separate
Matlab sessions. That means that each node in the cluster has to be
running it's own Matlab session, and that one master-node is running
the Matlab session that is controlling all of them. Although we are
connected to the university wide license server with ~300 concurrent
Matlab licenses, that does not neccesary mean that the number of
licenses that I can use is infinite. Especially the licenses of the
specialized toolboxes (signal processing, image processing,
optimization, statistics) that I use puts a limit on the number of
concurrent jobs that I can evaluate on our cluster (our university only
has ~10 licenses of each of those toolboxes).
Since I want my computations to simply scale with the number of nodes
that is available to me, without me having to buy additional licenses,
the solution that I implemented is based on the Matlab compiler
toolbox. Let me give an example: on the master node (the only one that
has to be running Matlab) can type something like
a = rand(1000,1000,30);
pfor(1:30, 'b(:,:,%d) = fft(a(:,:,%d))');
which is equivalent to executing
for i=1:30
b(:,:,i) = fft(a(:,:,i));
end
What happens is that the fft function (or any other custom! function in
its place) is wrapped into a m-function that is compiled into a
standalone executable. Subsequently, the data for each job is written
to a NFS shared disk and all jobs are remotely executed on the
available nodes of the cluster. The only requirements are: compiler
toolbox should be present on the master node, requires login (ssh/rsh)
connections between nodes, and there should be a common filespace. I
also tried around writing the data over the network (i.e. using TCP/IP
network sockets), but found that that made it too complex.
I have been planning to make my parallelization toolbox available on
the net, but sofar have not had time for it. The functions themselves
are quite straightforward and include documentation. I should write
some background documentation to them and more testing is required in a
different (clean) environment. There are some environment variables and
shared libraries that have to be set correctly for the standalone
executable to work on the client nodes. Furthermore, I still have to
improve support for general-purpose cluster management software (such
as MPI, GridEngine, gexec) with which I expect to obtain a more smooth
loadbalancing of my job over the available nodes.
If you are interested in trying out my toolbox, please contact me and I
will send you the code.
best regards,
Robert
----------------------------------------------------------------------
Robert Oostenveld, PhD
F.C. Donders Centre for Cognitive Neuroimaging
Radboud University Nijmegen
phone: +31-24-3619695
http://www.ru.nl/fcdonders/
----------------------------------------------------------------------
N.B. Starting from 1 September 2004, the University of Nijmegen has
changed its name to Radboud University Nijmegen. All web- and
email-addresses ending in ".kun.nl" should therefore be changed into
".ru.nl". Please update your address book and links.
More information about the eeglablist
mailing list