EEGLAB and supercomputing applications using free alternatives to Matlab
In the next few years at least, Matlab will remain the platform onto which EEGLAB is developed and supported. Matlab has a breadth of useful tools that are not yet matched by open source environments (e.g., the capacity to use multiple cores, 3-D interactive graphics with transparency, memory mapping, timers for real time programming), plus a wealth of available Matlab toolboxes that are handy and well teswted (e.g., image processing toolbox, for correcting for multiple comparisons; signal processing toolbox, for spectral decompositions; optimization toolbox, for optimizing code; bioinformatics toolbox, for EEG classification; virtual reality toolbox, for real time 3-D rendering of EEG activity). Finally, the Matlab compiler allows compiling EEGLAB, thus freeing oneself from using Matlab itself; even Matlab scripts can be run by the EEGLAB compiled version). Given that Matlab is accessible to almost anyone working in science, our incentive to find a Matlab alternative remains relatively low (except for high performance and cloud computing applications).
In the short term, Octave is the shortest way to using EEGLAB functions and actually obtain useful results (see below). In the longer term, SAGE (based on Python) are other potential alternatives to Matlab.
Deployment of EEGLAB on local supercomputers
When it comes to using supercomputers, Matlab, although quite efficient, may become incredibly expensive. A single Matlab license may cost $2,100 ($1,050 for academia), and with all its commercial toolboxes might come to $145,000 or more. If you have a supercomputer with about 100 processors (as of 2011, this amounts to about $30,000 or 20,000 euros), you might need to pay the Mathworks about $30,000 to $500,000 to be able to run Matlab on it (the exact price depends on the number of users on the cluster, the number of nodes, and the extra toolboxes). This may be much more than the price of the supercomputer itself! Given that the Matlab core has not evolved dramatically over the past 10 years, and still has flaws (lack of consistency of the graphic interface between platforms; numerical inconsistencies in early version of Matlab 7.0), free alternatives to Matlab are needed in the Open Source community to run computation on supercomputers.
We have attempted to tackle this problem and as of June 2011 (EEGLAB 10.2+), we are currently supporting Octave (v3.4) for supercomputing applications (command line calls only, no graphic support). In our tests, Octave is about 50% slower than Matlab but this can easily be compensated by increasing the number of processors assigned to a specific processing task. Note that EEGLAB functions have not been parallelized (except a few rare exceptions). Therefore, you are required to open a Octave/Matlab session on each node and run custom scripts you write to take advantage of your parallel processing capability.
Using EEGLAB with Hadoop
Hadoop Mapreduce is a framework for performing computation on large clusters of computers. There are two steps in Mapreduce job: a mapping task where a large number of workers (computers) work on a large number of data lines, and a reduce step, where (usually) a single worker pools all the mapping results.
Below we provide guidelines for using Elastic Mapreduce on the Amazon cloud. Note that Elastic Mapreduce is tailored to processing large quantities of log text files and not binary data. The gain in terms of processing speed compared to the cost of running such solution remains unclear if you have a local cluster of computers. In short, you might spend more time programming the solution and it might cost you more in terms of bandwidth and storage that if you are running it locally. These are the steps you should follow. These are new technologies so expertise in computer science is highly recommended.
- Installing Hadoop command line interface. First install the Command Line Interface to Elastic Mapreduce. This will allow you to configure and run jobs on the Amazon cloud. You will also need to create an AWS account. Hadoop will need to run in streaming mode, where the data is simply streamed to any executable. It might also be possible to run Hadoop in native Java mode and compile Matlab code using the Java builder (this is probably much more complex than using the streaming mode though).
- Transfer your data to Amazon storage cloud (the Amazon storage cloud is named S3). A useful tool to do this is the s3cp tools. Note that your data should be formatted in strings of characters. If you want to process raw EEG data, you will have to serialize it in text, with each channel for example representing one line. There is no limit to the length of a line of text. However, one must remember the overhead in terms of both signal processing and bandwidth associated with processing text. If you have 128 channels and 100 data files, this corresponds to 12800 processing hadoop steps. If you can allocate 1000 workers to the task, this means that each worker will process about 13 channels, a potential speedup of about 1000 on your task. To minimize bandwidth overhead, you might want to transfer the compressed binary data to S3, then have a local amazon EC2 amazon node uncompress it and put it back to S3 (this is because EC2 nodes bandwidth with S3 is free). If you are dealing with Terabytes of data, this task can take a long time (as S3 is configured to have a very slow reading latency and very high writing latency). There are tools to copy data in parallel to S3.
- Solution 1 (easiest to implement) using Octave. EEGLAB command line code is compatible with Octave. Octave may be installed relatively easy on each of the nodes using the bootstraping method (a method to automatically install software on each of the nodes). The command to automatically install Octave on EC2 Amazon nodes is:
sudo yum –y install octave --enablerepo=epel
Then, for your main Matlab script, you might want to add the following at the beginning of the main script. This will make it executable and will allow it to process data on STDIN.
#!/usr/bin/octave -qf Q = fread(stdin); %Standard Octave / MATLAB code from here on
Hadoop communicate with workers through STDIN and STDOUT pipes. You may write the output of your data processing using the printf or disp Matlab commands.
- Solution 2, compiling Matlab code. Compiling Matlab code is the most efficient solution as Matlab compiled code is often 2 to 4 times faster than Octave code and compiled code does not require a Matlab licence. If you compile Matlab code on your local Unix workstation, you will need to make sure to use an Amazon AMI (virtual machine image) with the same set of librairies so that your code can run on that machine. You will need to pick an AMI that is compatible with Hadoop as well. Also, Matlab does not have a simple mechanism allowing it to read from STDIN. The easiest solution is to use third party compiled Mex files to do so (see for example popen). Another solution is to have a shell command write STDIN on disk, then call the Matlab executable (although this might impair performance).
- Reduce step: once all the worker have computed what they had to compute (spectral power for example), the reduce step may write it back on S3 Amazon storage (and also do futher processing if necessary such as grouping back channels belonging to the same subject).
- Running Hadoop: using the AWS command line interface, type something like the following.
elastic-mapreduce --create --stream --input s3n://Arno/myEEGserializedtextfiles/ \ --mapper s3://Arno/process_octave \ --reducer s3://Arno/reducer.py \ --output s3n://Arno/output --debug --verbose \ --log-uri s3n://Arno/logs --enable-debugging \ --bootstrap-action s3n://Arno/install_octave
Note the reduce step can be written in any programming language that takes data from STDIN and writes to STDOUT. The reduce step will usually not require to run EEGLAB commands. It is simply about pooling data from the workers and summarizing it. In this case, we used Python custom program (reducer.py) but it could have also been Octave/Matlab since Octave is installed on each of the workers. The exact content of your code will depend on what task you are interested in doing.
The solution outlined above should only be tried when dealing with gigantic amount of data that no local processor or cluster can handle. It is costly (mostly in terms of Amazon storage as storing 10 Terabytes of data will cost you about $800 per month as of 2013). It is therefore best suited when bootstraping data is required (lots of computation on little data). Send us your comments at firstname.lastname@example.org .
EEGLAB and Octave graphics
Although Octave is relatively compatible with Matlab for pure computation, the Octave graphic rendering engine (based on Gnuplot) is not powerful enough to render all subtleties of EEGLAB graphics. In particular the following cannot be rendered under Octave:
- Menus. Maybe QtOctave could help in this area; we have not yet tested it.
- Contour, surface, 3-D objects, complex graphics, interactive graphics...
However, Octave is actively evolving, and a more complete graphics environment might be supported in the future.
Below is a time-frequency decomposition plotted by Octave 3.0.3 for the tutorial dataset using the EEGLAB/Matlab code following it. The plot below is provided for illustrative purposes only; we had to implement some minor changes to make the EEGLAB time-frequency function newtimef() compatible with Octave (since the changes were compatible with Matlab, we registered them in the main repository). Other EEGLAB functions (even signal processing functions) also require minor changes to be run under Octave.
% cd xxxxx/eeglab % move to the proper directory/folder % octave --traditional % start Octave addpath('functions/timefreqfunc/') % within Octave, addpath('functions/adminfunc/') addpath('functions/sigprocfunc/') load('sample_data/eeglab_data_epochs_ica.set') [ersp itc imbase time freqs] = newtimef(EEG.data(1,:,:), EEG.pnts, [-1000 2000], EEG.srate, ... [3 0.5], 'plotitc', 'off', 'plotersp', 'off'); figure; imagesc(time, freqs, ersp); set(gca, 'ydir', 'normal'); xlim([time(1) time(end)]); xlabel('Time (ms)'); ylabel('Frequency (hz)'); colorbar;
For comparison, below is the graphic output of newtimef() under Matlab -- the same function used within Octave to create the figure above (note: here with ITC plotting disabled).
Scilab is another alternative to Matlab but it seems less compatible than Octave and does not support all the Matlab commands. It also requires the user to convert all Matlab scripts to the Scilab language and to import all new functions into the Scilab environment.
JMathLib is a Java clone implementation of Matlab. It does seem that Java is going to be the best language to implement a Matlab clone (since Matlab uses java itself for all its graphic output). JMathLib is a nice start but is not actively developed and does not have the same active community involvement as Octave.
MATLAB/EEGLAB vs Python for EEG processing
There is a trend in terms of tool development to migrate brain imaging tools to Python.
Of course, Python (and numpy/scipy which are Math packages build on top of Python) would be an interesting (and free) alternative to using Matlab. However, irrespective of what Python enthousiast would tell you, Python remains a programming language designed for programmers. For example, it is hard to understand for novices why a n-size vector should start be indexed at 0 and end at n-1 (in Matlab vectors start is 1 and at n). The indentation of the code is a nice a neat feature of Python. However, it does not come naturally to the novice programmer. It also makes copying and pasting code between file sources and command line interface problematic (since a snipet of code will most likely have the wrong indentation when copied to the Python command line). Also, Python is much more object oriented than Matlab, requiring users to understand that concept when calling functions. Finally, Python usually requires user to install a bunch of external libraries which does not come naturally to novices.