(Redirected from GPU)
Jump to: navigation, search



GPU-based processing is promising in MATLAB. There are three options on the market: GPUmat, Jacket, and the Parallel Computing Toolbox (see the Conclusions below for more information).

As of mid-2010 -- we have only tested GPUmat. We tried testing the MATLAB Parallel Computing Toolbox but it did not support Matrix indexing so we could not really test it.

Testing GPUmat

The recent enthusiasm for using GPU (Graphical Processing Unit) computational capabilities led us to try the freely available GPUmat (Please correct this link as it does not point to the correct website) Matlab toolbox that runs data processing under GPUs (a GPU must be present on the machine you are using to test it). Computing using the GPUmat toolbox usually only involves recasting variables and requires minor changes to Matlab scripts or functions.

GPU processing can lead to speed-up of more than 100x for optimized applications. We were interested in testing the use of our Nvidia GPUs for running time-frequency decompositions and non-parametric surrogate statistics. Below we report first tests of using GPUmat for these two EEGLAB signal processing functionalities.

One of our servers has two Intel Xeon W5580 CPUs (total of 16 cores), 72 GB RAM, and one NVidia Tesla C2060 GPU boards [1]. When using the main 16 cores and running Matlab on one of them, it is likely that Matlab will automatically parallelize its computations on some of the other cores. We attempted to use the function maxNumCompThreads to set the maximum number of threads to 1 but it did not alter computation time. It is unclear if Matlab 7.10 supports the maxNumCompThreads function as a warning message indicates that it will be removed in future releases. Still, we thought we should be able to see a difference between using the main processor and the GPU board (with 112 cores).

This is our system configuration as returned by the GPUmat toolbox:

  • Running on -> "glnxa64"
  • Matlab ver. -> " (R2010a)"
  • GPUmat version -> 0.251
  • GPUmat build -> 03-May-2010
  • GPUmat architecture -> "glnxa64"

GPUmat used a GPU board with 112 cores. When typing "GPUstart" on the Matlab command line, the following message appears

Starting GPU
There is 1 device supporting CUDA
CUDA Driver Version:                           3.1
CUDA Runtime Version:                          3.1
Device 0: "Tesla C1060"
 CUDA Capability Major revision number:         2
 CUDA Capability Minor revision number:         0
 Total amount of global memory:                 2817720320 bytes
 Number of multiprocessors:                     14
 Number of cores:                               112
 - CUDA compute capability 2.0
- Loading module EXAMPLES_CODEOPT
- Loading module EXAMPLES_NUMERICS
 -> numerics13.cubin
- Loading module NUMERICS
 -> numerics13.cubin

Basic matrix computation using GPUs provided a major speed-up

EEG = pop_loadset('sample_data\eeglab_data_epochs_ica.set');
data = GPUsingle([,:)]);                 
data = [data data data data data data ];
data = [data data data data data data ];
data2 = single(data);
tic; tmp = data.^1.3; GPUsync; toc
Elapsed time is 0.024915 seconds.
tic; tmp = data2.^1.3; toc
Elapsed time is 0.506489 seconds.

Raising each value in the EEG data matrix to a fractional power (1.3) using the GPU rather than the central processor produced a 20x speed increase.

Running non-parametric statistics on GPUs speeded up processing by 66%

We modified the repeated-measures ANOVA function to be GPUmat compatible. (All the Matlab GPU functions we used are made available at the bottom of this page).

c = { rand(400,800,100) rand(400,800,100); ...
          rand(400,800,100) rand(400,800,100)};
tic; [FC FR FI dfc dfr dfi] = anova2_cell(c); toc
Elapsed time is 1.466853 seconds.
c = { GPUsingle(rand(400,800,100)) GPUsingle(rand(400,800,100)); ...
      GPUsingle(rand(400,800,100)) GPUsingle(rand(400,800,100))};
tic; [FC FR FI dfc dfr dfi] = anova2_cell_gpu(c); GPUsync; toc
Elapsed time is 0.908533 seconds.

The anova2_cell() function is highly optimized (no loops!), and the GPU computation appeared to be about 66% faster than when using the main CPUs. This relatively minor speed up seems to be because the GPU functions are slow at accessing sub-indices in very large matrices. For smaller matrices, the difference between the GPU code reached about 100% speed up.

Using GPUs for wavelet decomposition gave a 2.6x speed-up

EEG = pop_loadset('sample_data\eeglab_data_epochs_ica.set');
data2 =;
tic; timefreq(reshape(data2, size(data,1), EEG.pnts, size(data,2)/EEG.pnts), EEG.srate, 'cycles', 3); toc
Elapsed time is 9.117511 seconds.
data = GPUsingle([,:)]);                 
tic; timefreq_gpu(reshape(data, size(data,1), EEG.pnts, size(data,2)/EEG.pnts), EEG.srate, 'cycles', 3); GPUsync; toc
Elapsed time is 3.417511 seconds.

Here we did observe a (2.6x) speed-up from performing the time-frequency wavelet decompositions on the GPU rather than the CPU.

Conclusions concerning GPU Computing in MATLAB

Arnaud Delorme - August 28, 2010

There are currently 3 options on the market: GPUmat (free), Jacket (commercial), and the Parallel Computing Toolbox of MATLAB (commercial). Overall, we were relatively disappointed with current GPU solutions. Even if Jacket proves more efficient than other options, we can only expect an additional speed-up of about 5-20% compared to GPUmat. This is far from the 100x speed-up that we were hearing about when GPU cards came out. However, a 3x speed up is still welcome. It would also be nice to have a GPU profiler to be able to see which commands are slow on the GPU so as to try to avoid them.

GPUmat: There is still some ways to go before the GPUmat toolbox can take full advantage of GPU processing capabilities. Based on feedback from other users, it seems that the way a function is programmed for GPUs dramatically influences its processing speed. Different CUDA language implementations may give speed-up differences of up to 1000x. It seems that the GPUmat version we tried leaves some room for improvement, in particular when accessing sub-indices of large matrices. Another bit of bad news for us was that for large data matrices (larger than 100MB) our GPU functions crashed. The real advantage of using GPU processing with EEGLAB would be to be able to process very large matrices (up to several Gb), to compute statistics and time-frequency decompositions across multiple data component or channel signals, and to bring other compute-intensive processing within a user's 'compute horizon' (the time the user is willing to wait for results...). Therefore currently, performing GPU-based EEGLAB processing via GPUmat remains of limited but possible interest.

PCT: Also, Matlab 2010b supports GPU (if you have a Matlab Parallel Toolbox license). It only offers primitive functionality, for instance you can't even do any indexing into matrices with the PCT solution so we were not able to test our functions. [January , 2019 - This question should be revisited!]

Jacket: The following is a quote from a Jacket representative. "Jacket has the fastest performance (see versus GPUmat here and see versus R2010B here) and broadest function support. It has also been used by many neuroscientists and is currently being leveraged by the SPM crowd. Jacket may be downloaded a free 15-day trial from the AccelerEyes website. Jacket costs $350 for academics. Jacket provides a GPU profiler, with GPROFVIEW". We have not tested jacket yet but we are planning to.

The EEGLAB-compatible GPUmat-based functions we tested are available here. Note that these functions are not totally functional (they only work under a limited set of conditions as tested above) and thus are only made available for exploratory testing purposes.