[Eeglablist] ICA running very slowly

Wed May 10 08:14:17 PDT 2017

Dear Hannah, Andreas, et al.,

Just to add my (late) comments on this ... it seems that Andreas determined
the reason for the slowdown, being the increased number of kurtosis
computations performed, more for higher high-pass cutoff.

Re: the Gaussianity and high-pass cutoff issue, I believe that it does make
sense for EEG data that higher high-pass cutoff filters will produce more
closely Gaussian signals in the decomposition.

EEG data as we know is largely dominated by low-frequency components, or
relatively large amplitude. Most of the signals in EEG are super-Gaussian
(relatively high positive kurtosis), with the power-line noise (sinusoidal
signal) being typically the only detected sub-Gaussian signal (though an
important one it is). (Most of the EEG brain component signals are also
oscillatory type signals, but being active intermittently, end up having a
more super-Gaussian distribution, peaked around zero, rather than the
bimodal sinusoidal distribution.

So, with the EEG dominated by a significant number of low-frequency signals
of relatively large amplitude, and kurtosis magnitude, in addition to
artifacts that have strong kurtosis, the kurtosis signs usually get fixed
early and then are not recomputed.

If however, you remove all the low-frequency EEG signals (which are
basically almost all of the actual signal), you are left with only
high-frequency "signals", which may actually be largely spurious,
non-physical or physiological "noise", having nearly Gaussian distributions.

These Gaussian-like signals, with nearly zero kurtosis, as Andreas said,
apparently induce sign-changes in the "sources" at almost every iteration,
inducing the time-consuming kurtosis computation every iteration until the
end.

So, I believe this makes sense *for EEG data*. I would predict that if you
add artificial high-frequency signals with non-zero kurtosis to the EEG
data, the increasing time for ICA with increasing high-pass cutoff effect
would disappear. 

So besides what Andreas already determined, I would just add my comment on
why I think it makes sense for increasing high-pass cutoff to produce more
and more Gaussian-like signals (by removing more and more actual signals) in
EEG data.

Best,
Jason

-----Original Message-----
From: eeglablist-bounces at sccn.ucsd.edu
[mailto:eeglablist-bounces at sccn.ucsd.edu] On Behalf Of Hiebel, Hannah
(hannah.hiebel at uni-graz.at)
Sent: Wednesday, February 15, 2017 6:51 PM
To: Andreas Widmann
Cc: eeglablist
Subject: Re: [Eeglablist] ICA running very slowly

Dear Andreas,

thanks a lot for your explanation! Good that you were able to replicate the
problem and can confirm (based on your preliminary analysis) that the
high-pass cutoff frequency is the decisive factor here.

If I understand correctly, your impression is that the high-pass cutoff
influences the distribution of the data, which subsequently affects the
computations in the extended ICA algorithm. Does this mean, whenever the
distribution is more gaussian one would run into this problem?

As I don’t understand the role of the mentioned parameters well enough, I
have to leave the discussion up to the experts. If you theoretically agreed
that the default value for any of those parameters could be changed, the
next question would be under which conditions such a change would be
appropriate (is there a general rule? Does it require individual
adaptation?....). I could see slightly increasing runtimes in all subjects
but only in some of them the ICA became up to 50 times slower. Thus, the
influence of the cutoff frequency apparently depends on the individual
data... I don’t know, however, which individual characteristics are
responsible for that. 

I hope others will provide additional input!
If there is anything else I could have a look at myself, let me know.

Best,
Hannah

____________________________________
Von: Andreas Widmann <widmann at uni-leipzig.de>
Gesendet: Donnerstag, 09. Februar 2017 19:34
An: Hiebel, Hannah (hannah.hiebel at uni-graz.at)
Cc: eeglablist
Betreff: [Eeglablist] ICA running very slowly

Dear Hannah and list,

I had a preliminary look on the problem and the data. I can replicate the
problem--ICA takes longer with increasing high-pass cutoff frequency--with
your data (and at a first glance to some extent also with my own data but I
did not yet check that systematically). I could determine the relevant code
parts causing the slowdown. However, I do not yet have a clear idea whether
the issue should be resolved and if yes how. I hope for input from the ICA
experts on the list.

The phenomenon (btw already appearing on the list previously but discussion
diverged in a different direction;
https://sccn.ucsd.edu/pipermail/eeglablist/2013/006738.html) is due to the
use of extended ICA. The additional time is spent almost exclusively in the
computation of activation (runica.m line 894) and kurtosis (lines 898-899)
for extended ICA. If I got it right (please correct!) different learning
rules are applied for sources with sub-gaussian vs. super-gaussian
distribution in extended ICA. That is, sign of kurtosis has to be computed
for every component and data block.

In runica.m an algorithm is implemented that kurtosis is only computed every
nth block (extblocks) if signs didn’t change between SIGNCOUNT_THRESHOLD
(default 25) subsequent blocks. extblocks is (by default) doubled every time
that rule applies. That is, kurtosis actually has to be computed less often
if kurtosis of all components is reasonably high (as expected) and thus sign
changes are rare.

With increasing high-pass cutoff more components apparently have a gaussian
distribution with kurtosis close to zero and sign of kurtosis changes more
frequently between subsequent blocks. Thus, the speed-up algorithm can not
take effect and activation and kurtosis has to be expensively computed for
each and every data block. There is actually a signsbias parameter (default
0.02) which is added to kurtosis (line 905) to solve that problem but for
your data the default is too low. A signsbias of 0.05 already considerably
speeds up extended ICA to almost normal speed with your high-pass filtered
dataset.

Several questions arise:
* Is the interpretation correct that (some?) sources have a more gaussian
distribution with increasing high-pass cutoff frequency? Is there a
straight-forward explanation why? I have an intuitive idea but I cannot
properly express it yet.
* Would it be safe to slightly rise the signsbias parameter? Does the
distinction of learning rules matter for these very close to gaussian
sources?
* Are there potential alternatives? In the thread linked above reduction of
dimensions by PCA was suggested. However, in some quick tests I had to apply
quite drastic reductions (<~
45 of 63 components) to achieve comparable results as with the slight rise
of the signsbias parameter. Or would it possibly be ok to use a higher
extblocks parameter? Intuitively both doesn’t sound like a very good idea to
me.
* Given these effects, is it still recommended to apply these rather high
high-pass cutoff frequency filters to compute (extended) ICA? How can we get
better decompositions if source distributions are more gaussian?

I would appreciate any input! Thank you! Best, Andreas

> Dear all,
>
>
> I am using ICA to clean my EEG data for eye-movement related artifacts.
I've already done some testing in the past to see how certain pre-processing
steps affect the quality of my decomposition (e.g. filter settings). In most
cases, it took approximately 1-2 hours to run ICA for single subjects (62
channels: 59 EEG, 3 EOG channels).
>
>
> Now that I run ICA on my final datasets it suddenly takes hours over hours
to do only a few steps. It still works fine in some subjects but in others
runica takes up to 50 hours. I observed that in some cases the weights blow
up (learning rate is lowered many times); in others it starts right away
without lowering the learning rate but every step takes ages.
>
> I've done some troubleshooting to see if a specific pre-processing step
causes this behavior but I cannot find a consistent pattern. It seems to me
though that (at least in some cases) the high-pass filter played a role -
can anyone explain how this is related? Could a high-pass filter potentially
be too strict?
>
>
> On the eeglablist I could only find discussions about rank deficiency
(mostly due to using average reference) as a potential reason. I
re-referenced to linked mastoids - does this also affect the rank? When I
check with rank(EEG.data(:, :)) it returns 62 though, which is equal to the
number of  channels. For some of the "bad" subjects I nonehteless tried
without re-referencing - no improvement. Also, reducing dimensionality with
pca ("pca, 61") didn't help.
>
>
> Any advice would be very much appreciated!
>
>
> Many thanks in advance,
>
> Hannah
>
>
> Hannah Hiebel, Mag.rer.nat.
> Cognitive Psychology & Neuroscience
> Department of Psychology, University of Graz Universitätsplatz 2, 8010 
> Graz, Austria
_______________________________________________
Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
To unsubscribe, send an empty email to eeglablist-unsubscribe at sccn.ucsd.edu
For digest mode, send an email with the subject "set digest mime" to
eeglablist-request at sccn.ucsd.edu