[Eeglablist] Reduction of Data Dimensionality before ICA

Fri Jan 18 11:02:47 PST 2013

Tarik,
   thanks for plugging my work!  Whether one should reduce the subspace prior to ICA (i.e., perform a PCA) depends on one's analytic goals.  For example, if one is performing artifact correction then one should definitely not perform a PCA first and is the practice I follow myself.  The key issue is that ICA and PCA have opposite biases with respect to forming factors.  PCA tends to lump similar things together whereas ICA tends to split things as much as possible.  So for example, if one is conducting an ICA on a P300 dataset with multiple averaged subjects then one will tend to have subject-specific P300 factors whereas PCA will tend to yield group-tendency ERP factors.  If the goal is to obtain a single group-tendency P300 factor, then applying an initial PCA is a useful way of constraining the dimensionality so that subject-specific splitting does not occur (and is the approach instantiated in my EP Toolkit).  The EEGlab research group follows a different strategy for dealing with this issue, which is to perform single-subject ICAs and then use clustering techniques to generalize across subjects.  With this approach there is no need to perform an initial PCA since there is only one subject in each ICA.  The reason I myself don't follow this approach is that I have reservations about using the clustering method to identify cross-subject ERP components.  For one thing, it complicates efforts to identify ERP components and the times I tried it, I wasn't persuaded by how it grouped things.  To some extent this difference in approach reflects differing views of EEG phenomena as the EEGlab group is more focused on dynamic single-trial phenomena and I'm more focused on isolating ERP componentry in order to link to fMRI activations.  A final note is that if there aren't enough observations in the dataset then taking a subspace can also be helpful to avoid over-extraction effects.  If one wishes to use an initial PCA, there are many approaches to choosing the number of factors to retain.  The classic one is the scree test.  My EP Toolkit has implemented an enhanced Scree procedure called the Parallel Test.

Cheers!

Joe

On Jan 10, 2013, at 7:17 PM, Tarik S Bel-Bahar <tarikbelbahar at gmail.com> wrote:

> Greetings Martin, a few thoughts below, hope they help a little!
> 
> 1. The eeglab documentation and general ICA practice both do not
> recommend reducing the dimensionality of EEG data via PCA before doing
> ICA. My understanding is that PCA reduces the dimensionality of the
> EEG data, and that this reduction reduces the "validity" of the data,
> making it less valid, whereas ICA maintains the true
> dimensionality of the data. Various groups have differing opinions and
> practices.
> Please see also the paper below:
> PLOS ONE: Independent EEG Sources Are Dipolar
> www.plosone.org/.../info%3Adoi%2F10.1371%2Fjournal.pone.0030...
> 
> 2. Published work with EEG and ICA does sometimes reduce data via PCA.
> Note that some published articles (all findable via Google Scholar)
> use PCA and then ICA, and some just ICA, and some just PCA. Perhaps
> the most prominent approach is using Dien et al.'s PCA for ERP
> approach with Dien's freely available sotware. See some recent
> articles from Dien's or Donchin's labs, for example.
> 
> Applying Principal Components Analysis to Event-Related Potentials: A Tutorial
> J Dien - Developmental Neuropsychology, 2012 - Taylor & Francis
> 
> 3. Having worked with 128 and higher density data with ICA previously,
> my recommendation is to avoid PCA, unless you have a very good and
> principled reason to use it. Note that you cannot hurt yourself but
> only learn more
> by doing both types of ICA decompositions (with and without PCA-based
> reduction) and comparing the results. Then you can see for yourself
> the
> possible differences between the ICA results you get by doing things
> in two ways.
> 
> 
> 
> 
>> 
>> It calls ICA with the following line
>> 
>> [wts, sph] = runica( input_data, 'extended', 1, 'stop', 1e-7,
>> 'maxsteps', 600, 'pca', pc);
>> 
>> where pc is 96 minus the number of excluded and interpolated channels
>> (this is mentioned earlier in the script, but without an explanation).
>> As far as I understand, the 'pca' argument in that line initiates a
>> reduction of the dimensionality of the data using a PCA.
>> 
>> However, the script is intended for the analysis of 128 channel data,
>> like we use in our Lab. Why does the script reduce the dimensionality of
>> the data to 96 or below. Is it justified to do that and is there a rule
>> for data reduction before performing an ICA. The number seems kind of
>> arbitrary to me.
>> 
>> Thanks to anybody who can help!
>> 
>> Regards,
>> Martin
>> 
>> _______________________________________________
>> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
>> To unsubscribe, send an empty email to eeglablist-unsubscribe at sccn.ucsd.edu
>> For digest mode, send an email with the subject "set digest mime" to eeglablist-request at sccn.ucsd.edu
> _______________________________________________
> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
> To unsubscribe, send an empty email to eeglablist-unsubscribe at sccn.ucsd.edu
> For digest mode, send an email with the subject "set digest mime" to eeglablist-request at sccn.ucsd.edu

--------------------------------------------------------------------------------

Joseph Dien,
Senior Research Scientist
University of Maryland 

E-mail: jdien07 at mac.com
Phone: 301-226-8848
Fax: 301-226-8811
http://joedien.com//

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sccn.ucsd.edu/pipermail/eeglablist/attachments/20130118/15b48d6f/attachment.html>