<html dir="ltr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style id="owaParaStyle" type="text/css">P {margin-top:0;margin-bottom:0;}</style>
</head>
<body ocsi="0" fpstyle="1">
<div style="direction: ltr;font-family: Tahoma;color: #000000;font-size: 10pt;">Dear EEGLab community,<br>
<br>
I am trying to do some MPI scaling tests of AMICA on the Orca cluster at SHARCNet.<br>
<br>
When submitting amica12 to the mpi queue the cluster returns a warning that it does not appear to be an MPI binary (having loaded module openmpi/Open64/1.6.2 ...as MPICH2 is not supported by default on this cluster).<br>
<br>
I have noticed that runamica12.m uses a file named "amica12-ompi" when submitting to orte. I am assuming that this is an Open MPI compiled version, but I can't find it online.<br>
Is this file available?<br>
<br>
As it is now I can get it to execute multiple processes in parallel, but they all appear as "MPI process 1 of 1".
<br>
<br>
log=<br>
<br>
<font face="Courier New"> 1 processor name = orc49<br>
1 host_num = 106003813<br>
This is MPI process 1 of 1 ; I am process 1 of 1 on node: orc49<br>
1 : node root process 1 of 1<br>
Processing arguments ...<br>
1 processor name = orc48<br>
1 host_num = 106003812<br>
This is MPI process 1 of 1 ; I am process 1 of 1 on node: orc48<br>
1 : node root process 1 of 1<br>
Processing arguments ...<br>
1 processor name = orc45<br>
1 host_num = 106003809<br>
This is MPI process 1 of 1 ; I am process 1 of 1 on node: orc45<br>
1 : node root process 1 of 1<br>
Processing arguments ...<br>
1 processor name = orc49<br>
1 host_num = 106003813<br>
This is MPI process 1 of 1 ; I am process 1 of 1 on node: orc49<br>
1 : node root process 1 of 1<br>
Processing arguments ...<br>
1 processor name = orc49<br>
1 host_num = 106003813<br>
This is MPI process 1 of 1 ; I am process 1 of 1 on node: orc49<br>
1 : node root process 1 of 1<br>
Processing arguments ...<br>
1 processor name = orc26<br>
1 host_num = 106003748<br>
This is MPI process 1 of 1 ; I am process 1 of 1 on node: orc26<br>
1 : node root process 1 of 1<br>
Processing arguments ...<br>
1 processor name = orc26<br>
1 host_num = 106003748<br>
This is MPI process 1 of 1 ; I am process 1 of 1 on node: orc26<br>
1 : node root process 1 of 1<br>
Processing arguments ...<br>
1 processor name = orc26<br>
1 host_num = 106003748<br>
This is MPI process 1 of 1 ; I am process 1 of 1 on node: orc26<br>
1 : node root process 1 of 1<br>
</font><br>
<div><br>
<div style="font-family:Tahoma; font-size:13px">
<div class="BodyFragment"><font size="2"><span style="font-size:10pt">
<div class="PlainText">Subsequently, the AMICA iterations run in parallel but they do not contribute to a common learning trajectory (...I was hoping that this common trajectory would be the MPI contribution). It simply runs nproc * iterations, and runs just
as long as it would have with a single process.<br>
<br>
log =<br>
<br>
<font face="Courier New">1 Aloocating varibles ...<br>
1 : Initializing variables ...<br>
1 : block size = 256<br>
1 : entering the main loop ...<br>
iter 1 lrate = 0.1000000000 LL = -2.4645032022 nd = 0.0390799083, D = 0.21370E-01 0.21370E-01 ( 21.84 s, 12.1 h)<br>
iter 1 lrate = 0.1000000000 LL = -2.4635590602 nd = 0.0387601721, D = 0.21565E-01 0.21565E-01 ( 22.03 s, 12.2 h)<br>
iter 1 lrate = 0.1000000000 LL = -2.4647840084 nd = 0.0392816396, D = 0.20707E-01 0.20707E-01 ( 22.73 s, 12.6 h)<br>
iter 1 lrate = 0.1000000000 LL = -2.4647153083 nd = 0.0393176505, D = 0.20211E-01 0.20211E-01 ( 23.02 s, 12.8 h)<br>
iter 1 lrate = 0.1000000000 LL = -2.4636450390 nd = 0.0387922080, D = 0.20026E-01 0.20026E-01 ( 22.00 s, 12.2 h)<br>
iter 1 lrate = 0.1000000000 LL = -2.4644030605 nd = 0.0390331509, D = 0.20852E-01 0.20852E-01 ( 22.59 s, 12.5 h)<br>
iter 1 lrate = 0.1000000000 LL = -2.4651212047 nd = 0.0393551915, D = 0.19452E-01 0.19452E-01 ( 21.85 s, 12.1 h)<br>
iter 1 lrate = 0.1000000000 LL = -2.4658528898 nd = 0.0395950764, D = 0.20598E-01 0.20598E-01 ( 22.02 s, 12.2 h)<br>
iter 2 lrate = 0.1000000000 LL = -2.4081995992 nd = 0.0137304868, D = 0.16004E-01 0.16004E-01 ( 21.84 s, 12.1 h)<br>
iter 2 lrate = 0.1000000000 LL = -2.4080314778 nd = 0.0137468022, D = 0.16080E-01 0.16080E-01 ( 21.95 s, 12.2 h)<br>
iter 2 lrate = 0.1000000000 LL = -2.4082656572 nd = 0.0138478047, D = 0.15557E-01 0.15557E-01 ( 22.64 s, 12.6 h)<br>
iter 2 lrate = 0.1000000000 LL = -2.4081294783 nd = 0.0137186916, D = 0.15250E-01 0.15250E-01 ( 22.58 s, 12.5 h)<br>
iter 2 lrate = 0.1000000000 LL = -2.4082153209 nd = 0.0136066526, D = 0.14954E-01 0.14954E-01 ( 21.94 s, 12.2 h)<br>
iter 2 lrate = 0.1000000000 LL = -2.4082182187 nd = 0.0136708269, D = 0.15642E-01 0.15642E-01 ( 22.57 s, 12.5 h)<br>
iter 2 lrate = 0.1000000000 LL = -2.4082739918 nd = 0.0136954380, D = 0.14698E-01 0.14698E-01 ( 21.80 s, 12.1 h)<br>
iter 2 lrate = 0.1000000000 LL = -2.4084795063 nd = 0.0137385262, D = 0.15559E-01 0.15559E-01 ( 21.92 s, 12.2 h)<br>
iter 3 lrate = 0.1000000000 LL = -2.3987006843 nd = 0.0126939792, D = 0.13088E-01 0.13088E-01 ( 21.89 s, 12.1 h)<br>
iter 3 lrate = 0.1000000000 LL = -2.3986842833 nd = 0.0126889052, D = 0.12796E-01 0.12796E-01 ( 21.91 s, 12.2 h)<br>
iter 3 lrate = 0.1000000000 LL = -2.3987347517 nd = 0.0128010796, D = 0.11894E-01 0.11894E-01 ( 22.63 s, 12.6 h)<br>
iter 3 lrate = 0.1000000000 LL = -2.3987174365 nd = 0.0126242167, D = 0.12434E-01 0.12434E-01 ( 22.72 s, 12.6 h)<br>
iter 3 lrate = 0.1000000000 LL = -2.3986878291 nd = 0.0126228855, D = 0.12257E-01 0.12257E-01 ( 21.89 s, 12.1 h)<br>
iter 3 lrate = 0.1000000000 LL = -2.3987516324 nd = 0.0125602675, D = 0.12508E-01 0.12508E-01 ( 22.48 s, 12.5 h)<br>
iter 3 lrate = 0.1000000000 LL = -2.3987746322 nd = 0.0126191249, D = 0.11551E-01 0.11551E-01 ( 21.81 s, 12.1 h)<br>
iter 3 lrate = 0.1000000000 LL = -2.3988005299 nd = 0.0126222717, D = 0.12565E-01 0.12565E-01 ( 21.92 s, 12.2 h)<br>
iter 4 lrate = 0.1000000000 LL = -2.3946698183 nd = 0.0131246509, D = 0.21446E-01 0.21446E-01 ( 21.89 s, 12.1 h)<br>
iter 4 lrate = 0.1000000000 LL = -2.3947229270 nd = 0.0130983328, D = 0.20770E-01 0.20770E-01 ( 21.99 s, 12.2 h)<br>
iter 4 lrate = 0.1000000000 LL = -2.3946753622 nd = 0.0132666516, D = 0.19454E-01 0.19454E-01 ( 22.59 s, 12.5 h)<br>
iter 4 lrate = 0.1000000000 LL = -2.3947396165 nd = 0.0130850550, D = 0.20767E-01 0.20767E-01 ( 22.76 s, 12.6 h)<br>
iter 4 lrate = 0.1000000000 LL = -2.3946433983 nd = 0.0131271712, D = 0.20577E-01 0.20577E-01 ( 21.97 s, 12.2 h)<br>
iter 4 lrate = 0.1000000000 LL = -2.3947613917 nd = 0.0130450434, D = 0.20422E-01 0.20422E-01 ( 22.48 s, 12.5 h)<br>
iter 4 lrate = 0.1000000000 LL = -2.3947600745 nd = 0.0131224812, D = 0.19343E-01 0.19343E-01 ( 21.82 s, 12.1 h)<br>
iter 4 lrate = 0.1000000000 LL = -2.3947262532 nd = 0.0131216207, D = 0.20610E-01 0.20610E-01 ( 21.95 s, 12.2 h)<br>
iter 5 lrate = 0.1000000000 LL = -2.3921729925 nd = 0.0128108594, D = 0.41854E-01 0.41854E-01 ( 21.90 s, 12.1 h)<br>
iter 5 lrate = 0.1000000000 LL = -2.3922663442 nd = 0.0127645387, D = 0.40850E-01 0.40850E-01 ( 22.02 s, 12.2 h)<br>
iter 5 lrate = 0.1000000000 LL = -2.3921763103 nd = 0.0129229410, D = 0.39446E-01 0.39446E-01 ( 22.55 s, 12.5 h) ...<br>
</font><br>
<br>
Is this just a matter of the binary that I am using or do you think that my parameters are off? or do I need to install MPICH2 on the cluster?<br>
<br>
<br>
Thanks for your support... past, present and future!<br>
<br>
James Desjardins, MA<br>
Electrophysiology Technologist<br>
Cognitive and Affective Neuroscience Lab, Psychology Department <br>
Jack and Nora Walker Centre for Lifespan Development Research<br>
Brock University<br>
500 Glenridge Ave.<br>
St. Catharines, ON, Canada L2S 3A1<br>
905-688-5550 x4676<br>
--<br>
"'Cause you never can tell What goes on down below!<br>
"This pool might be bigger Than you or I know!"<br>
<br>
McElligot's Pool<br>
Dr.Seuss 1947</div>
</span></font></div>
</div>
</div>
</div>
</body>
</html>