EEGLAB and high performance computing

From SCCN
Jump to: navigation, search

Contents

The Open EEGLAB Portal - Running EEGLAB on HPC Resources via the Neuroscience Gateway

The documentation for NSG has been moved to https://github.com/sccn/nsgportal/wiki.

Running EEGLAB on GPUs (Graphic Processing Units)

Introduction

GPU-based processing is promising in MATLAB. There are three options on the market: GPUmat, Jacket, and the Parallel Computing Toolbox (see the Conclusions below for more information).

As of mid-2010 -- we have only tested GPUmat. We tried testing the MATLAB Parallel Computing Toolbox but it did not support Matrix indexing so we could not really test it.

Testing GPUmat

The recent enthusiasm for using GPU (Graphical Processing Unit) computational capabilities led us to try the freely available GPUmat (Please correct this link as it does not point to the correct website) Matlab toolbox that runs data processing under GPUs (a GPU must be present on the machine you are using to test it). Computing using the GPUmat toolbox usually only involves recasting variables and requires minor changes to Matlab scripts or functions.

GPU processing can lead to speed-up of more than 100x for optimized applications. We were interested in testing the use of our Nvidia GPUs for running time-frequency decompositions and non-parametric surrogate statistics. Below we report first tests of using GPUmat for these two EEGLAB signal processing functionalities.

One of our servers has two Intel Xeon W5580 CPUs (total of 16 cores), 72 GB RAM, and one NVidia Tesla C2060 GPU boards [1]. When using the main 16 cores and running Matlab on one of them, it is likely that Matlab will automatically parallelize its computations on some of the other cores. We attempted to use the function maxNumCompThreads to set the maximum number of threads to 1 but it did not alter computation time. It is unclear if Matlab 7.10 supports the maxNumCompThreads function as a warning message indicates that it will be removed in future releases. Still, we thought we should be able to see a difference between using the main processor and the GPU board (with 112 cores).

This is our system configuration as returned by the GPUmat toolbox:

  • Running on -> "glnxa64"
  • Matlab ver. -> "7.10.0.499 (R2010a)"
  • GPUmat version -> 0.251
  • GPUmat build -> 03-May-2010
  • GPUmat architecture -> "glnxa64"

GPUmat used a GPU board with 112 cores. When typing "GPUstart" on the Matlab command line, the following message appears

Starting GPU
There is 1 device supporting CUDA
CUDA Driver Version:                           3.1
CUDA Runtime Version:                          3.1
 
Device 0: "Tesla C1060"
 CUDA Capability Major revision number:         2
 CUDA Capability Minor revision number:         0
 Total amount of global memory:                 2817720320 bytes
 Number of multiprocessors:                     14
 Number of cores:                               112
 
 - CUDA compute capability 2.0
...done
- Loading module EXAMPLES_CODEOPT
- Loading module EXAMPLES_NUMERICS
 -> numerics13.cubin
- Loading module NUMERICS
 -> numerics13.cubin

Basic matrix computation using GPUs provided a major speed-up

GPUstart;
 
EEG = pop_loadset('sample_data\eeglab_data_epochs_ica.set');
data = GPUsingle([EEG.data(:,:)]);                 
data = [data data data data data data ];
data = [data data data data data data ];
data2 = single(data);
 
tic; tmp = data.^1.3; GPUsync; toc
Elapsed time is 0.024915 seconds.
 
tic; tmp = data2.^1.3; toc
Elapsed time is 0.506489 seconds.

Raising each value in the EEG data matrix to a fractional power (1.3) using the GPU rather than the central processor produced a 20x speed increase.

Running non-parametric statistics on GPUs speeded up processing by 66%

We modified the repeated-measures ANOVA function to be GPUmat compatible. (All the Matlab GPU functions we used are made available at the bottom of this page).

c = { rand(400,800,100) rand(400,800,100); ...
          rand(400,800,100) rand(400,800,100)};
tic; [FC FR FI dfc dfr dfi] = anova2_cell(c); toc
Elapsed time is 1.466853 seconds.
 
c = { GPUsingle(rand(400,800,100)) GPUsingle(rand(400,800,100)); ...
      GPUsingle(rand(400,800,100)) GPUsingle(rand(400,800,100))};
tic; [FC FR FI dfc dfr dfi] = anova2_cell_gpu(c); GPUsync; toc
Elapsed time is 0.908533 seconds.

The anova2_cell() function is highly optimized (no loops!), and the GPU computation appeared to be about 66% faster than when using the main CPUs. This relatively minor speed up seems to be because the GPU functions are slow at accessing sub-indices in very large matrices. For smaller matrices, the difference between the GPU code reached about 100% speed up.

Using GPUs for wavelet decomposition gave a 2.6x speed-up

EEG = pop_loadset('sample_data\eeglab_data_epochs_ica.set');
data2 = EEG.data;
tic; timefreq(reshape(data2, size(data,1), EEG.pnts, size(data,2)/EEG.pnts), EEG.srate, 'cycles', 3); toc
Elapsed time is 9.117511 seconds.
 
data = GPUsingle([EEG.data(:,:)]);                 
tic; timefreq_gpu(reshape(data, size(data,1), EEG.pnts, size(data,2)/EEG.pnts), EEG.srate, 'cycles', 3); GPUsync; toc
Elapsed time is 3.417511 seconds.

Here we did observe a (2.6x) speed-up from performing the time-frequency wavelet decompositions on the GPU rather than the CPU.

Conclusions concerning GPU Computing in MATLAB

Arnaud Delorme - August 28, 2010

There are currently 3 options on the market: GPUmat (free), Jacket (commercial), and the Parallel Computing Toolbox of MATLAB (commercial). Overall, we were relatively disappointed with current GPU solutions. Even if Jacket proves more efficient than other options, we can only expect an additional speed-up of about 5-20% compared to GPUmat. This is far from the 100x speed-up that we were hearing about when GPU cards came out. However, a 3x speed up is still welcome. It would also be nice to have a GPU profiler to be able to see which commands are slow on the GPU so as to try to avoid them.

GPUmat: There is still some ways to go before the GPUmat toolbox can take full advantage of GPU processing capabilities. Based on feedback from other users, it seems that the way a function is programmed for GPUs dramatically influences its processing speed. Different CUDA language implementations may give speed-up differences of up to 1000x. It seems that the GPUmat version we tried leaves some room for improvement, in particular when accessing sub-indices of large matrices. Another bit of bad news for us was that for large data matrices (larger than 100MB) our GPU functions crashed. The real advantage of using GPU processing with EEGLAB would be to be able to process very large matrices (up to several Gb), to compute statistics and time-frequency decompositions across multiple data component or channel signals, and to bring other compute-intensive processing within a user's 'compute horizon' (the time the user is willing to wait for results...). Therefore currently, performing GPU-based EEGLAB processing via GPUmat remains of limited but possible interest.

PCT: Also, Matlab 2010b supports GPU (if you have a Matlab Parallel Toolbox license). It only offers primitive functionality, for instance you can't even do any indexing into matrices with the PCT solution so we were not able to test our functions. [January , 2019 - This question should be revisited!]

Jacket: The following is a quote from a Jacket representative. "Jacket has the fastest performance (see versus GPUmat here and see versus R2010B here) and broadest function support. It has also been used by many neuroscientists and is currently being leveraged by the SPM crowd. Jacket may be downloaded a free 15-day trial from the AccelerEyes website. Jacket costs $350 for academics. Jacket provides a GPU profiler, with GPROFVIEW". We have not tested jacket yet but we are planning to.

The EEGLAB-compatible GPUmat-based functions we tested are available here. Note that these functions are not totally functional (they only work under a limited set of conditions as tested above) and thus are only made available for exploratory testing purposes.

EEGLAB, Octave, Hadoop and supercomputer applications

We have a funded project to run EEGLAB on the Neuroscience Gateway to run EEGLAB jobs on the San Diego supercomputer. This is a free service and anybody in the world can use it. See this page for more information.

In the short term, Octave is the shortest way to using EEGLAB functions and actually obtain useful results. This page describes how to use EEGLAB on Octave.

Deployment of EEGLAB on local supercomputers

When it comes to using supercomputers, Matlab, although quite efficient, may become incredibly expensive. A single Matlab license may cost $2,100 ($1,050 for academia), and with all its commercial toolboxes might come to $145,000 or more. If you have a supercomputer with about 100 processors (as of 2011, this amounts to about $30,000 or 20,000 euros), you might need to pay the Mathworks about $30,000 to $500,000 to be able to run Matlab on it (the exact price depends on the number of users on the cluster, the number of nodes, and the extra toolboxes). This may be much more than the price of the supercomputer itself! Given that the Matlab core has not evolved dramatically over the past 10 years, and still has flaws (lack of consistency of the graphic interface between platforms; numerical inconsistencies in early version of Matlab 7.0), free alternatives to Matlab are needed in the Open Source community to run computation on supercomputers.

We have attempted to tackle this problem and as of June 2018 (EEGLAB 15+), we are currently supporting Octave (v4.4.0) for supercomputing applications (command line calls only, no graphic support). In our tests, Octave is about 50% slower than Matlab but this can easily be compensated by increasing the number of processors assigned to a specific processing task. Note that EEGLAB functions have not been parallelized (except a few rare exceptions). Therefore, you are required to open a Octave/Matlab session on each node and run custom scripts you write to take advantage of your parallel processing capability. Again, this page describes how to use EEGLAB on Octave.

Using EEGLAB with Hadoop

Hadoop Mapreduce is a framework for performing computation on large clusters of computers. There are two steps in Mapreduce job: a mapping task where a large number of workers (computers) work on a large number of data lines, and a reduce step, where (usually) a single worker pools all the mapping results.

Below we provide guidelines for using Elastic Mapreduce on the Amazon cloud. Note that Elastic Mapreduce is tailored to processing large quantities of log text files and not binary data. The gain in terms of processing speed compared to the cost of running such solution remains unclear if you have a local cluster of computers. In short, you might spend more time programming the solution and it might cost you more in terms of bandwidth and storage that if you are running it locally. These are the steps you should follow. These are new technologies so expertise in computer science is highly recommended.

  • Installing Hadoop command line interface. First install the Command Line Interface to Elastic Mapreduce. This will allow you to configure and run jobs on the Amazon cloud. You will also need to create an AWS account. Hadoop will need to run in streaming mode, where the data is simply streamed to any executable. It might also be possible to run Hadoop in native Java mode and compile Matlab code using the Java builder (this is probably much more complex than using the streaming mode though).
  • Transfer your data to Amazon storage cloud (the Amazon storage cloud is named S3). A useful tool to do this is the s3cp tools. Note that your data should be formatted in strings of characters. If you want to process raw EEG data, you will have to serialize it in text, with each channel for example representing one line. There is no limit to the length of a line of text. However, one must remember the overhead in terms of both signal processing and bandwidth associated with processing text. If you have 128 channels and 100 data files, this corresponds to 12800 processing hadoop steps. If you can allocate 1000 workers to the task, this means that each worker will process about 13 channels, a potential speedup of about 1000 on your task. To minimize bandwidth overhead, you might want to transfer the compressed binary data to S3, then have a local amazon EC2 amazon node uncompress it and put it back to S3 (this is because EC2 nodes bandwidth with S3 is free). If you are dealing with Terabytes of data, this task can take a long time (as S3 is configured to have a very slow reading latency and very high writing latency). There are tools to copy data in parallel to S3.
  • Solution 1 (easiest to implement) using Octave. EEGLAB command line code is compatible with Octave. Octave may be installed relatively easy on each of the nodes using the bootstraping method (a method to automatically install software on each of the nodes). The command to automatically install Octave on EC2 Amazon nodes is:
sudo yum –y install octave --enablerepo=epel

Then, for your main Matlab script, you might want to add the following at the beginning of the main script. This will make it executable and will allow it to process data on STDIN.

#!/usr/bin/octave -qf
Q = fread(stdin); %Standard Octave / MATLAB code from here on

Hadoop communicate with workers through STDIN and STDOUT pipes. You may write the output of your data processing using the printf or disp Matlab commands.

  • Solution 2, compiling Matlab code. Compiling Matlab code is the most efficient solution as Matlab compiled code is often 2 to 4 times faster than Octave code and compiled code does not require a Matlab licence. If you compile Matlab code on your local Unix workstation, you will need to make sure to use an Amazon AMI (virtual machine image) with the same set of librairies so that your code can run on that machine. You will need to pick an AMI that is compatible with Hadoop as well. Also, Matlab does not have a simple mechanism allowing it to read from STDIN. The easiest solution is to use third party compiled Mex files to do so (see for example popen). Another solution is to have a shell command write STDIN on disk, then call the Matlab executable (although this might impair performance).
  • Reduce step: once all the worker have computed what they had to compute (spectral power for example), the reduce step may write it back on S3 Amazon storage (and also do futher processing if necessary such as grouping back channels belonging to the same subject).
  • Running Hadoop: using the AWS command line interface, type something like the following.
elastic-mapreduce --create --stream --input s3n://Arno/myEEGserializedtextfiles/ \
--mapper s3://Arno/process_octave \
--reducer s3://Arno/reducer.py \
--output s3n://Arno/output --debug --verbose \
--log-uri s3n://Arno/logs --enable-debugging \
--bootstrap-action s3n://Arno/install_octave

Note the reduce step can be written in any programming language that takes data from STDIN and writes to STDOUT. The reduce step will usually not require to run EEGLAB commands. It is simply about pooling data from the workers and summarizing it. In this case, we used Python custom program (reducer.py) but it could have also been Octave/Matlab since Octave is installed on each of the workers. The exact content of your code will depend on what task you are interested in doing.

The solution outlined above should only be tried when dealing with gigantic amount of data that no local processor or cluster can handle. It is costly (mostly in terms of Amazon storage as storing 10 Terabytes of data will cost you about $800 per month as of 2013). It is therefore best suited when bootstraping data is required (lots of computation on little data). Send us your comments at eeglab@sccn.ucsd.edu .


Return to EEGLAB Wiki Home