[Eeglablist] ICA running very slowly

Hiebel, Hannah (hannah.hiebel@uni-graz.at) hannah.hiebel at uni-graz.at
Wed Feb 15 01:50:59 PST 2017


Dear Andreas,

thanks a lot for your explanation! Good that you were able to replicate the problem and can confirm (based on your preliminary analysis) that the high-pass cutoff frequency is the decisive factor here.

If I understand correctly, your impression is that the high-pass cutoff influences the distribution of the data, which subsequently affects the computations in the extended ICA algorithm. Does this mean, whenever the distribution is more gaussian one would run into this problem?

As I don’t understand the role of the mentioned parameters well enough, I have to leave the discussion up to the experts. If you theoretically agreed that the default value for any of those parameters could be changed, the next question would be under which conditions such a change would be appropriate (is there a general rule? Does it require individual adaptation?....). I could see slightly increasing runtimes in all subjects but only in some of them the ICA became up to 50 times slower. Thus, the influence of the cutoff frequency apparently depends on the individual data... I don’t know, however, which individual characteristics are responsible for that. 

I hope others will provide additional input!
If there is anything else I could have a look at myself, let me know.

Best,
Hannah

____________________________________
Von: Andreas Widmann <widmann at uni-leipzig.de>
Gesendet: Donnerstag, 09. Februar 2017 19:34
An: Hiebel, Hannah (hannah.hiebel at uni-graz.at)
Cc: eeglablist
Betreff: [Eeglablist] ICA running very slowly

Dear Hannah and list,

I had a preliminary look on the problem and the data. I can replicate the problem--ICA takes longer with increasing high-pass cutoff frequency--with your data (and at a first glance to some extent also with my own data but I did not yet check that systematically). I could determine the relevant code parts causing the slowdown. However, I do not yet have a clear idea whether the issue should be resolved and if yes how. I hope for input from the ICA experts on the list.

The phenomenon (btw already appearing on the list previously but discussion diverged in a different direction; https://sccn.ucsd.edu/pipermail/eeglablist/2013/006738.html) is due to the use of extended ICA. The additional time is spent almost exclusively in the computation of activation (runica.m line 894) and kurtosis (lines 898-899) for extended ICA. If I got it right (please correct!) different learning rules are applied for sources with sub-gaussian vs. super-gaussian distribution in extended ICA. That is, sign of kurtosis has to be computed for every component and data block.

In runica.m an algorithm is implemented that kurtosis is only computed every nth block (extblocks) if signs didn’t change between SIGNCOUNT_THRESHOLD (default 25) subsequent blocks. extblocks is (by default) doubled every time that rule applies. That is, kurtosis actually has to be computed less often if kurtosis of all components is reasonably high (as expected) and thus sign changes are rare.

With increasing high-pass cutoff more components apparently have a gaussian distribution with kurtosis close to zero and sign of kurtosis changes more frequently between subsequent blocks. Thus, the speed-up algorithm can not take effect and activation and kurtosis has to be expensively computed for each and every data block. There is actually a signsbias parameter (default 0.02) which is added to kurtosis (line 905) to solve that problem but for your data the default is too low. A signsbias of 0.05 already considerably speeds up extended ICA to almost normal speed with your high-pass filtered dataset.

Several questions arise:
* Is the interpretation correct that (some?) sources have a more gaussian distribution with increasing high-pass cutoff frequency? Is there a straight-forward explanation why? I have an intuitive idea but I cannot properly express it yet.
* Would it be safe to slightly rise the signsbias parameter? Does the distinction of learning rules matter for these very close to gaussian sources?
* Are there potential alternatives? In the thread linked above reduction of dimensions by PCA was suggested. However, in some quick tests I had to apply quite drastic reductions (<~
45 of 63 components) to achieve comparable results as with the slight rise of the signsbias parameter. Or would it possibly be ok to use a higher extblocks parameter? Intuitively both doesn’t sound like a very good idea to me.
* Given these effects, is it still recommended to apply these rather high high-pass cutoff frequency filters to compute (extended) ICA? How can we get better decompositions if source distributions are more gaussian?

I would appreciate any input! Thank you! Best,
Andreas

> Dear all,
>
>
> I am using ICA to clean my EEG data for eye-movement related artifacts. I've already done some testing in the past to see how certain pre-processing steps affect the quality of my decomposition (e.g. filter settings). In most cases, it took approximately 1-2 hours to run ICA for single subjects (62 channels: 59 EEG, 3 EOG channels).
>
>
> Now that I run ICA on my final datasets it suddenly takes hours over hours to do only a few steps. It still works fine in some subjects but in others runica takes up to 50 hours. I observed that in some cases the weights blow up (learning rate is lowered many times); in others it starts right away without lowering the learning rate but every step takes ages.
>
> I've done some troubleshooting to see if a specific pre-processing step causes this behavior but I cannot find a consistent pattern. It seems to me though that (at least in some cases) the high-pass filter played a role - can anyone explain how this is related? Could a high-pass filter potentially be too strict?
>
>
> On the eeglablist I could only find discussions about rank deficiency (mostly due to using average reference) as a potential reason. I re-referenced to linked mastoids - does this also affect the rank? When I check with rank(EEG.data(:, :)) it returns 62 though, which is equal to the number of  channels. For some of the "bad" subjects I nonehteless tried without re-referencing - no improvement. Also, reducing dimensionality with pca ("pca, 61") didn't help.
>
>
> Any advice would be very much appreciated!
>
>
> Many thanks in advance,
>
> Hannah
>
>
> Hannah Hiebel, Mag.rer.nat.
> Cognitive Psychology & Neuroscience
> Department of Psychology, University of Graz
> Universitätsplatz 2, 8010 Graz, Austria



More information about the eeglablist mailing list