[Eeglablist] AMICA MPI ORTE amica12-ompi

Jason Palmer japalmer29 at gmail.com
Fri Aug 16 14:57:58 PDT 2013


Hi James,

 

I have put compiled versions of amica12 with opempi (instead of Mpich2) on
the http://sccn.ucsd.edu/~jason/amica_web.html page. Hopefully one of them
will work properly on your cluster. I'm trying to prepare an SVN package for
custom compiling . hopefully ready by EEGLAB workshop in November).

 

Best,

Jason

 

From: eeglablist-bounces at sccn.ucsd.edu
[mailto:eeglablist-bounces at sccn.ucsd.edu] On Behalf Of James Desjardins
Sent: Friday, August 09, 2013 7:49 AM
To: eeglablist
Subject: [Eeglablist] AMICA MPI ORTE amica12-ompi

 

Dear EEGLab community,

I am trying to do some MPI scaling tests of AMICA on the Orca cluster at
SHARCNet.

When submitting amica12 to the mpi queue the cluster returns a warning that
it does not appear to be an MPI binary (having loaded module
openmpi/Open64/1.6.2 ...as MPICH2 is not supported by default on this
cluster).

I have noticed that runamica12.m uses a file named "amica12-ompi" when
submitting to orte.  I am assuming that this is an Open MPI compiled
version, but I can't find it online.
Is this file available?

As it is now I can get it to execute multiple processes in parallel, but
they all appear as "MPI process 1 of 1". 

log=

 1 processor name = orc49
 1 host_num =  106003813
 This is MPI process 1 of 1 ; I am process 1 of 1 on node: orc49
 1  : node root process 1 of 1
Processing arguments ...
 1 processor name = orc48
 1 host_num =  106003812
 This is MPI process 1 of 1 ; I am process 1 of 1 on node: orc48
 1  : node root process 1 of 1
Processing arguments ...
 1 processor name = orc45
 1 host_num =  106003809
 This is MPI process 1 of 1 ; I am process 1 of 1 on node: orc45
 1  : node root process 1 of 1
Processing arguments ...
 1 processor name = orc49
 1 host_num =  106003813
 This is MPI process 1 of 1 ; I am process 1 of 1 on node: orc49
 1  : node root process 1 of 1
Processing arguments ...
 1 processor name = orc49
 1 host_num =  106003813
 This is MPI process 1 of 1 ; I am process 1 of 1 on node: orc49
 1  : node root process 1 of 1
Processing arguments ...
 1 processor name = orc26
 1 host_num =  106003748
 This is MPI process 1 of 1 ; I am process 1 of 1 on node: orc26
 1  : node root process 1 of 1
Processing arguments ...
 1 processor name = orc26
 1 host_num =  106003748
 This is MPI process 1 of 1 ; I am process 1 of 1 on node: orc26
 1  : node root process 1 of 1
Processing arguments ...
 1 processor name = orc26
 1 host_num =  106003748
 This is MPI process 1 of 1 ; I am process 1 of 1 on node: orc26
 1  : node root process 1 of 1

 

Subsequently, the AMICA iterations run in parallel but they do not
contribute to a common learning trajectory (...I was hoping that this common
trajectory would be the MPI contribution). It simply runs nproc *
iterations, and runs just as long as it would have with a single process.

log =

1 Aloocating varibles ...
 1 : Initializing variables ...
 1 : block size =  256
 1 : entering the main loop ...
 iter     1 lrate =  0.1000000000 LL =  -2.4645032022 nd =  0.0390799083, D
=   0.21370E-01  0.21370E-01  ( 21.84 s,  12.1 h)
 iter     1 lrate =  0.1000000000 LL =  -2.4635590602 nd =  0.0387601721, D
=   0.21565E-01  0.21565E-01  ( 22.03 s,  12.2 h)
 iter     1 lrate =  0.1000000000 LL =  -2.4647840084 nd =  0.0392816396, D
=   0.20707E-01  0.20707E-01  ( 22.73 s,  12.6 h)
 iter     1 lrate =  0.1000000000 LL =  -2.4647153083 nd =  0.0393176505, D
=   0.20211E-01  0.20211E-01  ( 23.02 s,  12.8 h)
 iter     1 lrate =  0.1000000000 LL =  -2.4636450390 nd =  0.0387922080, D
=   0.20026E-01  0.20026E-01  ( 22.00 s,  12.2 h)
 iter     1 lrate =  0.1000000000 LL =  -2.4644030605 nd =  0.0390331509, D
=   0.20852E-01  0.20852E-01  ( 22.59 s,  12.5 h)
 iter     1 lrate =  0.1000000000 LL =  -2.4651212047 nd =  0.0393551915, D
=   0.19452E-01  0.19452E-01  ( 21.85 s,  12.1 h)
 iter     1 lrate =  0.1000000000 LL =  -2.4658528898 nd =  0.0395950764, D
=   0.20598E-01  0.20598E-01  ( 22.02 s,  12.2 h)
 iter     2 lrate =  0.1000000000 LL =  -2.4081995992 nd =  0.0137304868, D
=   0.16004E-01  0.16004E-01  ( 21.84 s,  12.1 h)
 iter     2 lrate =  0.1000000000 LL =  -2.4080314778 nd =  0.0137468022, D
=   0.16080E-01  0.16080E-01  ( 21.95 s,  12.2 h)
 iter     2 lrate =  0.1000000000 LL =  -2.4082656572 nd =  0.0138478047, D
=   0.15557E-01  0.15557E-01  ( 22.64 s,  12.6 h)
 iter     2 lrate =  0.1000000000 LL =  -2.4081294783 nd =  0.0137186916, D
=   0.15250E-01  0.15250E-01  ( 22.58 s,  12.5 h)
 iter     2 lrate =  0.1000000000 LL =  -2.4082153209 nd =  0.0136066526, D
=   0.14954E-01  0.14954E-01  ( 21.94 s,  12.2 h)
 iter     2 lrate =  0.1000000000 LL =  -2.4082182187 nd =  0.0136708269, D
=   0.15642E-01  0.15642E-01  ( 22.57 s,  12.5 h)
 iter     2 lrate =  0.1000000000 LL =  -2.4082739918 nd =  0.0136954380, D
=   0.14698E-01  0.14698E-01  ( 21.80 s,  12.1 h)
 iter     2 lrate =  0.1000000000 LL =  -2.4084795063 nd =  0.0137385262, D
=   0.15559E-01  0.15559E-01  ( 21.92 s,  12.2 h)
 iter     3 lrate =  0.1000000000 LL =  -2.3987006843 nd =  0.0126939792, D
=   0.13088E-01  0.13088E-01  ( 21.89 s,  12.1 h)
 iter     3 lrate =  0.1000000000 LL =  -2.3986842833 nd =  0.0126889052, D
=   0.12796E-01  0.12796E-01  ( 21.91 s,  12.2 h)
 iter     3 lrate =  0.1000000000 LL =  -2.3987347517 nd =  0.0128010796, D
=   0.11894E-01  0.11894E-01  ( 22.63 s,  12.6 h)
 iter     3 lrate =  0.1000000000 LL =  -2.3987174365 nd =  0.0126242167, D
=   0.12434E-01  0.12434E-01  ( 22.72 s,  12.6 h)
 iter     3 lrate =  0.1000000000 LL =  -2.3986878291 nd =  0.0126228855, D
=   0.12257E-01  0.12257E-01  ( 21.89 s,  12.1 h)
 iter     3 lrate =  0.1000000000 LL =  -2.3987516324 nd =  0.0125602675, D
=   0.12508E-01  0.12508E-01  ( 22.48 s,  12.5 h)
 iter     3 lrate =  0.1000000000 LL =  -2.3987746322 nd =  0.0126191249, D
=   0.11551E-01  0.11551E-01  ( 21.81 s,  12.1 h)
 iter     3 lrate =  0.1000000000 LL =  -2.3988005299 nd =  0.0126222717, D
=   0.12565E-01  0.12565E-01  ( 21.92 s,  12.2 h)
 iter     4 lrate =  0.1000000000 LL =  -2.3946698183 nd =  0.0131246509, D
=   0.21446E-01  0.21446E-01  ( 21.89 s,  12.1 h)
 iter     4 lrate =  0.1000000000 LL =  -2.3947229270 nd =  0.0130983328, D
=   0.20770E-01  0.20770E-01  ( 21.99 s,  12.2 h)
 iter     4 lrate =  0.1000000000 LL =  -2.3946753622 nd =  0.0132666516, D
=   0.19454E-01  0.19454E-01  ( 22.59 s,  12.5 h)
 iter     4 lrate =  0.1000000000 LL =  -2.3947396165 nd =  0.0130850550, D
=   0.20767E-01  0.20767E-01  ( 22.76 s,  12.6 h)
 iter     4 lrate =  0.1000000000 LL =  -2.3946433983 nd =  0.0131271712, D
=   0.20577E-01  0.20577E-01  ( 21.97 s,  12.2 h)
 iter     4 lrate =  0.1000000000 LL =  -2.3947613917 nd =  0.0130450434, D
=   0.20422E-01  0.20422E-01  ( 22.48 s,  12.5 h)
 iter     4 lrate =  0.1000000000 LL =  -2.3947600745 nd =  0.0131224812, D
=   0.19343E-01  0.19343E-01  ( 21.82 s,  12.1 h)
 iter     4 lrate =  0.1000000000 LL =  -2.3947262532 nd =  0.0131216207, D
=   0.20610E-01  0.20610E-01  ( 21.95 s,  12.2 h)
 iter     5 lrate =  0.1000000000 LL =  -2.3921729925 nd =  0.0128108594, D
=   0.41854E-01  0.41854E-01  ( 21.90 s,  12.1 h)
 iter     5 lrate =  0.1000000000 LL =  -2.3922663442 nd =  0.0127645387, D
=   0.40850E-01  0.40850E-01  ( 22.02 s,  12.2 h)
 iter     5 lrate =  0.1000000000 LL =  -2.3921763103 nd =  0.0129229410, D
=   0.39446E-01  0.39446E-01  ( 22.55 s,  12.5 h) ...


Is this just a matter of the binary that I am using or do you think that my
parameters are off? or do I need to install MPICH2 on the cluster?


Thanks for your support... past, present and future!

James Desjardins, MA
Electrophysiology Technologist
Cognitive and Affective Neuroscience Lab, Psychology Department 
Jack and Nora Walker Centre for Lifespan Development Research
Brock University
500 Glenridge Ave.
St. Catharines, ON, Canada L2S 3A1
905-688-5550 x4676
--
"'Cause you never can tell What goes on down below!
"This pool might be bigger Than you or I know!"

McElligot's Pool
Dr.Seuss 1947

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sccn.ucsd.edu/pipermail/eeglablist/attachments/20130816/391efe78/attachment.html>


More information about the eeglablist mailing list