[Eeglablist] How to correctly break down AR runica() in case of huge sets.

Mon Dec 13 14:43:38 PST 2010

Dear Jason,

thank you for all the useful hints provided along with your response.

I see your points, but I am still not sure on how performing ICA on
arbitrarily sampled subgroups of trials can be methodologically proof. I try
with a simple example: let's suppose we have to run an ICA for artifact
rejection in one experiment with 5 conditions with 100 trials each, and for
some reasons the physical features of the stimuli in condition 3
dramatically increase the probability of eyeblinks in those trials where
those stimuli are present. Let's consider the extreme situation in which all
the blinks would be concentrated only in 100% of trials belonging to the
experimental condition 3. Then we run, according to what reported above, 5
independent ICAs, each one on 100 trials.

Now, if we do not care about the proportion of trials per condition in each
of the 5 ICAs, we can end up with two prototypical situations:

- one where we will have the 100 trials of the critical condition 3 all in
one ICA (and thus we will be likely to observe a big "blink component"
accounting for a lot of variance),

- and one where we will have 20 trials of condition 3 (and so with blinks)
in each one of the 5 ICAs, and we will therefore likely to observe 5 blink
components explaining approximately 1/5 of the big blink component above,
but this time in each one 5 ICAs.

The point is: as far as the sum of the accounted variances by each of the 5
components is not identical to the one accounted by the single big
component, we know that we introduce a bias in performing solution 1 rather
than solution 2. Perhaps that does not imply that this difference reflects
the amount of neural activity we erroneously removed from one condition
rather than from another, but as far as our subgroups are not balanced
condition-wise, it means that we will introduce some artifactual
condition-related variability in our data.

In line, the question is: should we concern about the proportion of trials
per experimental condition introduced in each of the "subgroup ICAs",
whenever we would need to decompose the ICA as a consequence of processing
constraints?

And, if we do, to what extent our final backprojected data will eventually
be equal to the output of a big global ICA?

Third and last question:
I see you write:

*It would also be possible to modify the ICA algorithm to swap out data from
the disk, but as I said, I doubt using all the data would improve the
results over using as much data as you can load into memory. *

Is there a function (or in alternative a relatively easy way) to run runica
like that, or to run it in parallel on multiple machines - cores in a
Cluster-GPU like manner, maybe making use of the parallel processing
toolbox?

Hope I did not abuse of your helpfulness with all those kind of issues.

Cheers,

Mahesh

Mahesh M. Casiraghi
PhD candidate - Cognitive Sciences
Roberto Dell'Acqua Lab, University of Padova
Pierre Jolicoeur Lab, Univesité de Montréal
mahesh.casiraghi at umontreal.ca

I have the conviction that when Physiology will be far enough advanced, the
poet, the philosopher, and the physiologist will all understand each other.
Claude Bernard

On Mon, Dec 13, 2010 at 4:36 PM, Jason Palmer <japalmer29 at gmail.com> wrote:

> Hi Mahesh,
>
>
>
> Merging the results by simple averaging probably won’t work since the
> components are returned in random order (even after the variance sorting,
> components won’t necessarily have the same index.) Using matcorr() or a
> similar component matching algorithm before averaging is one possibility.
>
>
>
> But it seems to me that averaging will not improve anything in your
> situation. As long as you have enough data in each data block that ICA runs
> on, then the components you get should be well determined, allowing you to
> remove the artifacts separately, and use the separate unmixing matrices to
> decompose the different subsets.
>
>
>
> I’m not sure what kind of analysis you’re doing, but for many purposes, you
> want to identify brain components of interest and then analyze the
> activations and possibly localize them. In this case you only need to match
> up the components of interest in the separate decompositions, e.g. a frontal
> midline ERN component, and collect all the trials with the activations
> produced by the respective ICA unmixing matrices.
>
>
>
> Again, as long as you use as much data as you can load (possibly
> overlapping data blocks), the decompositions should be good by themselves.
> Comparing the components of interest across decompositions will give you an
> idea of how stable the components you’re looking at really are in your
> dataset. You might also look into characterizing the variance of the
> component maps in a bootstrapping sense, using a large number of resampled
> blocks.
>
>
>
> It would also be possible to modify the ICA algorithm to swap out data from
> the disk, but as I said, I doubt using all the data would improve the
> results over using as much data as you can load into memory. To me it makes
> more sense to verify the stability of the components you’re interested in,
> and use the separate ICA unmixing/sphere matrices on their corresponding
> data blocks, and separately back-project the components of interest, and
> then collect all the trials for the final analysis.
>
>
>
> Hope this is useful.
>
>
>
> Best,
>
> Jason
>
>
>
>
>
> *From:* eeglablist-bounces at sccn.ucsd.edu [mailto:
> eeglablist-bounces at sccn.ucsd.edu] *On Behalf Of *Mahesh Casiraghi
> *Sent:* Saturday, December 11, 2010 6:34 PM
> *To:* eeglablist at sccn.ucsd.edu
> *Subject:* [Eeglablist] How to correctly break down AR runica() in case of
> huge sets.
>
>
>
> Dear more experienced EEGLabbers and ICA experts,
>
>
>
>
>
> supposing one has to work with quite large datsets (several channels, very
> high sample rate, long record lengths) and would therefore be unable to load
> in memory several gigs of data altogether:
>
>
>
> A) Is it methodologically problematic to run independent ICAs on subgroups
> of trials and then separately perform AR (blinks and scalp detected ECG
> components rejection) on each of them?
>
>
>
> B) Assuming it would not be, as I tend indeed to think, a so recommendable
> way, is there a methodologically proof way to combine all the obtained - and
> presumably heterogeneous - sphere, weights and weights(-1) matrices in 3
> single Sph, W, and W(-1) matrices and then use these new to backproject
> after component rejection?
>
>
>
> C) More precisely, let's suppose we have 700 trials and we run 7
> independent ICAs each time on 100 of them.
>
>
>
> a) I would proceed in picking-up separately (subjective criteria, adjust,
> faster or whatever one may prefer) the to-be-rejected components,
> independently from each subgroup of trials.
>
> b) I would then remove subgroup by subgroup the respective w(-1) columns
> and EEG.icaact rows according to the discarded components.
>
> c) I would merge the obtained 7 EEG.icasphere, the 7 EEG.icaweights, and
> the 7 EEG.icawinv, in 3 single matrices of equal dimensions, averaging
> through nanmean (given the fact we are likely to pick up a different amount
> of components from each of the trial subgroups and we would need consistent
> matrix dimensions).
>
> d) I would finally independently backproject subgroup by subgroup using the
> same averaged EEG.icawinv and EEG.icasphere and each time the EEG.icaact of
> the current subgroup of trials.
>
>
>
> According to my first speculations, following a->b->c->d we should come up
> with something analogous to the output of a big global ICA.
>
>
>
> Am I wrong?
>
>
>
> D) Did someone among you already try to run something like that and is
> perhaps willing to provide some feedbacks-impressions?
>
>
>
>
>
> Cheers,
>
>
>
> Mahesh
>
>
>
>
>
>
>
>
>
> Mahesh M. Casiraghi
>
> PhD candidate - Cognitive Sciences
>
> Roberto Dell'Acqua Lab, University of Padova
>
> Pierre Jolicoeur Lab, Univesité de Montréal
>
> mahesh.casiraghi at umontreal.ca
>
>
>
> I have the conviction that when Physiology will be far enough advanced, the
> poet, the philosopher, and the physiologist will all understand each other.
>
> Claude Bernard
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sccn.ucsd.edu/pipermail/eeglablist/attachments/20101213/c10085de/attachment.html>