Computing Clusters

SCCN has two Rocks Clusters clusters for analyzing data so you don't have to overload your local workstation with large datasets:

Both clusters will accept ssh connections from a terminal, but only within the UCSD network, either physically or by using the UCSD VPN.

After logging in for the first time, you may be asked to create a set of ssh keys. Be sure to press Enter at each (there are three) prompt. Do not enter a passphrase.

It doesn't appear that you have set up your ssh key.
 This process will make the files:
 /home/user/.ssh/id_rsa.pub
 /home/user/.ssh/id_rsa
 /home/user/.ssh/authorized_keys
Generating public/private rsa key pair.
         Enter file in which to save the key (/home/user/.ssh/id_rsa): [Enter]
         Enter passphrase (empty for no passphrase): [Enter]
         Enter same passphrase again: [Enter]

juggling

for interactive MATLAB with no or little parallel processing

juggling"Juggling" is the name of our Rocks 6.0 computing cluster that was originally built in 2006 as a 6-node/12-core cluster.

It has been upgraded since then and even though it has only 8 nodes, the total processor count is 192 cores. Each compute node is equipped with

  • Dual AMD Opteron 6238 2.6GHz 16MB Cache Twelve-core
  • 128GB DDR3 1600 ECC Registered Memory

computing

for parallel processing, including amica

computing"Computing" is the name of our Rocks 6.1.1 computing cluster that is optimized for parallel computing. It is composed of a login node, three interactive nodes that alllow running MATLAB in an interactive session and 8 parallel nodes, ideal for running AMICA. Each compute node is equipped with

  • Quad/4-way AMD Opteron 6136,8 Cores x 2.40GHz, Socket G34
  • 256GB DDR3 Registered ECC/REG 1333 SDRAM

Queues are available for 32-, and 64-processor parallel computing.

Common Commands

qstat

Use qstat regularly to keep track of any queued jobs that you have running.

[user@juggling ~]$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
2371 0.50500 QLOGIN     user         r     05/12/2010 16:47:26 all.q@compute-0-6.local            1 
2372 0.50500 QLOGIN     user         r     05/12/2010 16:47:35 all.q@compute-0-5.local            1 
2373 0.50500 QLOGIN     user         r     05/12/2010 16:47:41 all.q@compute-0-9.local            1 

In this example, if you are user "user", you have three running QLOGINs, which means there are probably three MATLAB interactive sessions running on compute nodes compute-0-5, compute-0-6 and compute-0-9.

It is your responsibility to run this command regularly to track your cluster usage so you do not restrict access to other users.

To see the global picture of all users on all nodes and queues, use

[user@juggling ~]$ qstat -f -u '*'
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@compute-0-0.local BIP 0/5/12 12.03 lx26-amd64
15798 0.55500 QLOGIN fred r 09/09/2011 11:42:57 1
15803 0.55500 QLOGIN barney r 09/09/2011 14:07:15 1
15867 0.55500 QLOGIN wilma r 09/12/2011 15:18:34 1
15874 0.55500 QLOGIN betty r 09/14/2011 09:46:18 1
15875 0.55500 QLOGIN pebbles r 09/15/2011 06:27:41 1
...

qdel

Use qdel to delete hung or QLOGIN sessions that you cannot locate. The fewer simultaneous interactive sessions you run, the easier it will be to track these down.

[user@juggling ~]$ qdel 2371 2373
user has registered the job 2371 for deletion
user has registered the job 2371 for deletion

In this case, using the results from the previous qstat command, if you determine that jobs 2371 and 2373 running on compute nodes compute-0-6 and compute-0-9 are no longer needed, or known to have crashed, use qdel with the job-ID number(s) to delete those QLOGINs.

Note: qdel can only be run on head node (juggling or computing). This command will not work on a compute node.

qlogin

Use qlogin to if you want to connect to an available compute node to run an interactive shell. Most commonly, this is used to run matlab from the command line. When you successfully log into a compute node, your prompt will show the name of the node you are connected to, eg., compute-0-5. Please keep track of all of your interactive sessions so that you do not use up multiple queue slots. See qstat for more information.

[user@juggling ~]$ qlogin
Your job 2371 ("QLOGIN") has been submitted
             waiting for interactive job to be scheduled ...
             Your interactive job 2371 has been successfully scheduled.
             Establishing /opt/gridengine/bin/rocks-qlogin.sh session to host compute-0-5.local ...
[user@juggling-0-5 ~]$ matlab

When finished running MATLAB, be sure to "exit" out of the compute node to release that slot so others can use it. If you no longer need access to the cluster, "exit" from juggling as well.

Note: qlogin can only be run on the head node (juggling or computing). This command will not work on a compute node.

Cluster Status

Our cluster status is available in a graphical view by accessing http://juggling.ucsd.edu/ganglia/ or http://computing.ucsd.edu/ganglia/. Access to this web site is available only from within the UCSD network.

juggling - Detailed Commands

Using MATLAB interactively with no parallelization

Log onto juggling.ucsd.edu:

[user@workstation ~]$ ssh juggling

If you are logging in from a system that does not have X11 forwarding enabled by default (as are all the SCCN workstations), you may need to use a modified ssh command so you can view graphics:

[user@workstation ~]$ ssh -X juggling

Log onto a compute node:

[user@juggling ~]$ qlogin
Your job 2371 ("QLOGIN") has been submitted
             waiting for interactive job to be scheduled ...
             Your interactive job 2371 has been successfully scheduled.
             Establishing /opt/gridengine/bin/rocks-qlogin.sh session to host compute-0-x.local ...

Run MATLAB:

[user@juggling-0-x ~]$ matlab

For the benefit of others using the cluster, please exit from the compute node when you are finished.

[user@juggling-0-x ~]$ exit

Then, log off of the cluster.

[user@juggling ~]$ exit

computing - Detailed Commands

Using computing for submitting a parallel job, such as amica

Log onto computing.ucsd.edu

[user@workstation ~]$ ssh computing

Determine the number of processors you need for your job: 32, or 64 or 128. This determines the queue that you will use.

The queues are defined as the following:

32-proc queues

queue name and the associated compute node(s):

  • q1 - computing-0-3
  • q2 - computing-0-4
  • q3 - computing-0-5
  • q4 - computing-0-6
  • q5 - computing-0-7
  • q6 - computing-0-8
  • q7 - computing-0-9
  • q8 - computing-0-10

64-proc queues

queue name and the associated compute node(s):

  • qa1 - computing-0-3, computing-0-4
  • qa2 - computing-0-5, computing-0-6
  • qa3 - computing-0-7, computing-0-8
  • qa4 - computing-0-9, computing-0-10

128-proc queues

queue name and the associated compute node(s):

  • qb1 - computing-0-3, computing-0-4, computing-0-5, computing-0-6
  • qb2 - computing-0-7, computing-0-8, computing-0-9, computing-0-10

Determine which queue to use by determining which compute nodes are available:

[user@computing ~]$ qstat -g c
CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
all.q 0.00 12 0 84 96 0 0
q1 0.54 0 0 32 32 0 0
q2 1.90 32 0 0 32 32 0
q3 1.88 32 0 0 32 32 0
q4 1.89 32 0 0 32 32 0
q5 1.89 32 0 0 32 32 0
q6 1.87 32 0 0 32 32 0
q7 1.90 32 0 0 32 32 0
q8 0.54 0 0 32 32 0 0
qa1 1.89 0 0 0 64 64 0
qa2 1.89 0 0 0 64 64 0
qa3 1.89 0 0 0 64 64 0
qa4 0.54 0 0 64 64 0 0
qb1 1.89 0 0 0 128 128 0
qb2 1.21 0 0 64 128 64 0

To interpret this table, look at the AVAIL column.

32-proc jobs

if you are going to run a 32-proc job, which means you will use one of the qx queues, you can see that there are currently two available: q1 and q8. You can be reasonably assured that if you submit your 32-proc job to one or both of those queues, they will be processed immediately.

For the benefit of others using the cluster, use no more than four available qx queues at a time. You can submit as many as you want. They will be processed serially.

64-proc jobs

if you are going to run a 64-proc job, which means you will use one of the qax queues, you can see that there is only one available: qa4. You can be reasonably assured that if you submit your 64-proc job to that queues, it will be processed immediately.

For the benefit of others using the cluster, use no more than two available qax queue at a time. You can submit as many as you want. They will be processed serially.

128-proc jobs

if you are going to run a 128-proc job, which means you will use one of the qbx queues, you can see that there are currently none available. You can submit a 128-proc job to either queue, but it will not be processed until other jobs complete.

For the benefit of others using the cluster, use no more than one available qbx queue at a time. Also, be aware that 128-proc jobs take up an enormous amount of resources, so please submit these only if you have impending deadlines and absolutely require 128 processors.

amica

In most cases, you will be using amica to analyze your data in our parallel queues. Log onto one of the interactive nodes and start up MATLAB.

[user@computing ~]$ qlogin
[user@computing-0-0 ~]$ matlab

Run EEGLAB in MATLAB.

Load your data.

Select runamica.

Enter your queue and the number of processors and the script will handle the rest.