Computing Clusters

Members of the SCCN lab have access to a Rocks Clusters cluster for analyzing data so you don't have to overload your local workstation with large datasets:

The cluster will accept ssh connections from a terminal, but only within the UCSD network, including UCSD-PROTECTED wireless or UCSD VPN.

computing

for both interactive and parallel processing, including amica

computing"Computing" is the name of our Rocks 7.0 computing cluster. It is composed of a login node, seven interactive nodes that allow running MATLAB in an interactive session and four dedicated parallel nodes that accept qsub jobs, ideal for running AMICA. Each compute node is equipped with

  • Quad/4-way AMD Opteron 6136,8 Cores x 2.40GHz, Socket G34
  • 256GB DDR3 Registered ECC/REG 1333 SDRAM

Queues are available for 32-, and 64-processor parallel computing.

Common Commands

qstat

Use qstat regularly to keep track of any queued jobs that you have running.

[user@computing ~]$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
2371 0.50500 QLOGIN     user         r     05/12/2010 16:47:26 all.q@computing-0-3.local            1 
2372 0.50500 QLOGIN     user         r     05/12/2010 16:47:35 all.q@computing-0-2.local            1 
2373 0.50500 QLOGIN     user         r     05/12/2010 16:47:41 all.q@computing-0-4.local            1 

In this example, if you are user "user", you have three running QLOGINs, which means there are probably three MATLAB interactive sessions running on compute nodes compute-0-3, compute-0-2 and compute-0-4.

It is your responsibility to run this command regularly to track your cluster usage so you do not restrict access to users, including yourself..

To see the global picture of all users on all nodes and queues, use

[user@computing ~]$ qstat -f -u '*'
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@computing-0-0.local BIP 0/5/12 12.03 lx26-amd64
15798 0.55500 QLOGIN fred r 09/09/2011 11:42:57 1
15803 0.55500 QLOGIN barney r 09/09/2011 14:07:15 1
15867 0.55500 QLOGIN wilma r 09/12/2011 15:18:34 1
15874 0.55500 QLOGIN betty r 09/14/2011 09:46:18 1
15875 0.55500 QLOGIN pebbles r 09/15/2011 06:27:41 1
...

qdel

Use qdel to delete hung or QLOGIN sessions that you cannot locate. The fewer simultaneous interactive sessions you run, the easier it will be to track these down.

[user@computing ~]$ qdel 2371 2373
user has registered the job 2371 for deletion
user has registered the job 2371 for deletion

In this case, using the results from the previous qstat command, if you determine that jobs 2371 and 2373 running on compute nodes compute-0-6 and compute-0-9 are no longer needed, or known to have crashed, use qdel with the job-ID number(s) to delete those QLOGINs.

Note: qdel can only be run on head node (computing). This command will not work on a compute node.

qlogin

Use qlogin to if you want to connect to an available compute node to run an interactive shell. Most commonly, this is used to run matlab from theflexslider command line. When you successfully log into a compute node, your prompt will show the name of the node you are connected to, eg., compute-0-5. Please keep track of all of your interactive sessions so that you do not use up multiple queue slots. See qstat for more information.

[user@computing ~]$ qlogin
Your job 2371 ("QLOGIN") has been submitted
             waiting for interactive job to be scheduled ...
             Your interactive job 2371 has been successfully scheduled.
             Establishing /opt/gridengine/bin/rocks-qlogin.sh session to host computing-0-1.local ...
[user@computing-0-1 ~]$ matlab

When finished running MATLAB, be sure to "exit" out of the compute node to release that slot so others can use it. If you no longer need access to the cluster, "exit" from computing as well.

Note: qlogin can only be run on the head node (computing). This command will not work on a compute node.

Cluster Status

Our cluster status is available in a graphical view by accessing http://computing.ucsd.edu/ganglia/. Access to this web site is available only from within the UCSD network.

Using MATLAB interactively with no parallelization

Log onto computing.ucsd.edu:

[user@workstation ~]$ ssh computing

If you are logging in from a system that does not have X11 forwarding enabled by default (as are all the SCCN workstations), you may need to use a modified ssh command so you can view graphics:

[user@workstation ~]$ ssh -X computing

Log onto a compute node:

[user@computing ~]$ qlogin
Your job 2371 ("QLOGIN") has been submitted
             waiting for interactive job to be scheduled ...
             Your interactive job 2371 has been successfully scheduled.
             Establishing /opt/gridengine/bin/rocks-qlogin.sh session to host compute-0-x.local ...

Run MATLAB:

[user@computing-0-x ~]$ matlab

For the benefit of others using the cluster, please exit from the compute node when you are finished.

[user@computing-0-x ~]$ exit

Then, log off of the cluster.

[user@computing ~]$ exit

Using computing for submitting a parallel job, such as amica

Log onto computing.ucsd.edu

[user@workstation ~]$ ssh computing

Determine the number of processors you need for your job: 32, or 64 or 128. This determines the queue that you will use.

The queues are defined as the following:

32-proc queues

queue name and the associated compute node(s):

  • q1 - computing-0-7
  • q2 - computing-0-8
  • q3 - computing-0-9
  • q4 - computing-0-10

64-proc queues

queue name and the associated compute node(s):

  • qa1 - computing-0-7, computing-0-8
  • qa2 - computing-0-9, computing-0-10

Determine which queue to use by viewing which compute nodes are available:

[user@computing ~]$ qstat -g c
CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
all.q 0.00 12 0 84 96 0 0
q1 0.54 0 0 32 32 0 0
q2 1.90 32 0 0 32 32 0
q3 1.88 32 0 0 32 32 0
q4 1.89 32 0 0 32 32 0
qa1 1.89 0 0 0 64 64 0
qa2 1.89 0 0 0 64 64 0

To interpret this table, look at the AVAIL column.

32-proc jobs

if you are going to run a 32-proc job, which means you will use one of the qx queues, you can see that there is currently one available: q1. You can be reasonably assured that if you submit your 32-proc job to one or both of those queues, they will be processed immediately.

For the benefit of others using the cluster, use no more than two available qx queues at a time. You can submit as many as you want. They will be processed serially.

64-proc jobs

if you are going to run a 64-proc job, which means you will use one of the qax queues, you can see that there are currently none available. You can submit the job, but it will only be processed when all of the previously scheduled jobs using those nodes have completed.

For the benefit of others using the cluster, use no more than one available qax queue at a time. You can submit as many as you want. They will be processed serially.

amica

In most cases, you will be using amica to analyze your data in our parallel queues. Log onto one of the interactive nodes and start up MATLAB.

[user@computing ~]$ qlogin
[user@computing-0-0 ~]$ matlab

Run EEGLAB in MATLAB.

Load your data.

Select runamica.

Enter your queue and the number of processors and the script will handle the rest.