[Eeglablist] ICA (number of data points) for artifact rejection

Tue Jun 2 10:27:12 PDT 2009

Dear Eric,

your decomposition might give you noise but maybe not. This question  
of the number of data point is actually empirical and depends on the  
ICA algorithm you use and the quality of your data. What is sure is  
that if you have 256 channel and do not perform dimension reduction,  
you will be optimizing an ICA weight matrix of 256 x 256 value/ 
parameter so you need at least that many data points. We found that  
you might actually need much more than that but have not studied  
exactly how much (as you mention below, our current rule of thumb is  
20 times 256^2). In general, I have observed that the more data points  
you have, the more stable ICA components are. Interestingly the most  
dipolar components (the most biological ones) also become more stable.  
My interpretation is that if you add some more data, ICA will not put  
much weight on the transient artifact but rather on what is stable and  
reliable throughout the whole data. If the data segment is short, then  
transient artifacts might dominate the ICA decomposition.

If you are just interested in artifact removal, you might want to  
reduce the dimensionality of the data. I would do 50 in your case for  
the sake of speed/efficiency. You do not want to use to little PCA  
components because some data will actually be missing when you perform  
dimensionality reduction (if you use PCA to reduce from 256 to 50  
dimension by taking only the first 50 PCA components). If you were to  
look at the actual component and did not have enough data, I would use  
PCA 150. If you have enough data (and patience), I would do the full  
decomposition.

Of course, other people might come up with different values.
Hope this helps,

Arno

On 21 mai 09, at 18:55, Eric Landsness wrote:

> In previous eeglablist emails and the literature the number of  
> datapoints needed for ICA decomposition has been discussed.  See
>
> http://sccn.ucsd.edu/pipermail/eeglablist/2008/002384.html
> http://sccn.ucsd.edu/pipermail/eeglablist/2006/001568.html
>
> My dataset is 8 min of continuous data with 256 channels at 500 Hz  
> per subject which is way below the suggested sample size (256^2 * 20  
> = 44 minutes).
>
> I am using ICA purely for artifact removal (eye and EMG removal) and  
> was wondering what is the harm in having too little data?  How much  
> risk do I run of removing "real" data?
>
> I understand that my data will be overlearned (Sarela and Vigario  
> 2003), but I feel that the eye movements and EMG are being cleanly  
> decomposed into a few components that when removed significantly  
> improve my data set, is this a problem?  I am not using the  
> decomposition as a training set for other data sets.  With each  
> subject I run a new ICA decomposition.
>
> Thanks for the comments and sorry to bring up this issue again to  
> the email list.
> Eric
> _______________________________________________
> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
> To unsubscribe, send an empty email to eeglablist-unsubscribe at sccn.ucsd.edu
> For digest mode, send an email with the subject "set digest mime" to eeglablist-request at sccn.ucsd.edu