[Eeglablist] ICA decomposition of possibly 2 different conditions....

Thu Jan 21 13:12:25 PST 2010

Hallo, 
there is probably one more issue in the number of points to use, if we talk
about EEG: sampling rate. 
If we assume "useful" range of frequencies in EEG between ~ 0 and ~
150, then having more points in the shorter period just in sense of higher sampling
rate, may not help. One might need to take in to account number of frames as
well as the length of the recording. 
Does someone know about some study, done to investigate this?
All the best,
Ilya

________________________________
From: German Gomez Herrero <german.gomezherrero at tut.fi>
To: eeglablist at sccn.ucsd.edu
Sent: Tue, January 19, 2010 12:58:30 PM
Subject: Re: [Eeglablist] ICA decomposition of possibly 2 different conditions....

Hello,
I just wanted to share my opinion on the important issue of data points to use with ICA. My personal approach is to be as cautious as possible and I would usually use even more data samples than the recommended by Julie (e.g. I would use a k=100 points per weight assuming a sampling frequency of 250 Hz or so). Of course this is a rather arbitrary value and there is no way to know where is the best trade-off between giving ICA enough time to "learn" the components (i.e. providing enough data samples) and not violating the assumption that the components are stationary and their number smaller than the number of EEG sensors (i.e. not trying to analyze a too long EEG segment). Recently, my colleagues and I made some (very simple) toy experiments to investigate how many data samples you would need to obtain an accurate source estimate when the sources are EEG time-series:

http://www.cs.tut.fi/~gomezher/enrica/ieeespl2009.pdf 

Basically what we did was to mix sources that were just EEG time-series taken randomly taken from non-overlapping epochs of a long EEG recording. In this way our sources were almost independent (assuming that autocorrelations at very long lags in the EEG are negligible) while retaining very similar spectral properties to those one would expect from the true underlying sources. Because volume conduction is linear and instantaneous, these effects have a relatively minor impact on the spectral properties of the scalp EEG, which are mainly determined by the spectral properties of the underlying brain sources. What we found is that you may need up to k=300 points per weight to get "accurate enough" estimates. I have to admit that we were relatively exigent with the definition of "accurate enough" but, still, I think that the results point to the direction that k=25 might be too little in some cases. Of course, our simulations are very simplistic and, for
 instance, do not consider the possibility that more sources may become active as the analysis window length increases. 

Using a k=100 or more might be completely unaffordable when you have many data channels. In that case I would first use PCA to reduce the number of components. Specially in slow-wave sleep EEG I would expect to be able to explain most of the EEG variance with relatively few principal components. But of course, in many other cases you might lose too much data by rejecting components with PCA. My own experience tells that the most common mistake is too use too few data samples and only rarely one can be accused of using too many. In the extreme of using far too few data samples, overlearning may happen which, in some cases, can lead to completely wrong but visually appealing results:

http://www.cs.tut.fi/~gomezher/projects/eeg/cimed05.pdf 

Best wishes,
Germán

> With regard to the comment about decomposing dissimilar data
> separately,
> perhaps a distinction should be made. Arno is correct that by far the
> easiest
> approach is to look at *activity* differences within single sources.
> However,
> what I point out in the paper is that vastly different behavioral
> conditions
> (ie, sleep and wake) may show fundamentally different active sources.
> Within
> most experimental paradigms, the difference between conditions is MUCH
> less
> than the difference between sleep and wake, therefore warranting a
> single
> decomposition for both conditions. For your own curiosity, try
> decomposing the
> 2 conditions separately and see how similar your components are
> (assuming you
> have enough data to get clean decompositions).
> 
> Now, the amount of *good* data that you will need is, of course, a
> slightly
> hazy subject. I have previously recommended a points per weight factor
> of 25
> or more, but this is based on 71 channels (dimensions) and 256 sampling
> rate.
> More channels require more data, of course, but I'm not sure if the
> points per
> weight factor remains the same when the channel number scales up. The
> more the
> better usually... I got good decompositions for ~215 channel EEG with
> about an
> hour's worth of data. A half hour would likely not have been enough.
> For 128
> channels, you can get away with less data... perhaps a half hour even,
> but not
> less, I would guess. Sorry for the highly anecdotal answer, but I have
> found
> that there is a lot of variability between subjects even with
> comparable
> amounts of data... therefore pointing to individual differences in the
> 'quality' of data that ICA is good at decomposing. But that's just a
> theory.
> 
> Good luck, Julie
> 
> --
> Julie Onton, PhD
> http://sccn.ucsd.edu/~julie
> 
> > Dear all,
> >
> > as
> > it follows from J Onton’s paper: “…… jointly decomposing data from
> awake
> > and
> > sleeping conditions might not be optimal if the EEG source locations
> in these
> > portions of the data differed……” we do need to separate 2 conditions
> and than
> > perform ICA on
> > each of them separately.
> > I
> > guess it should not be a problem cutting out some of the data
> (artifacts) from
> > EEG recording using EEGLAB function Reject an then running ICA on
> this data,
> > but then the question of
> > enough data arises. It was suggested in several publications to use
> k*n2 data
> > points (where k is a coefficient and n is a number of channels), to
> get a
> > stable results of the ICA.  It was suggested to use k of 20 by some
> (128
> > channel EEG) (McMenamin 2009) or even bigger. What experiences do you
> have?
> > How
> > much data will one need to get stable ICA components in 128 channel
> recording?
> > And
> > thanks a lot for the previous answers, you help a lot.
> >
> >
> >       _______________________________________________
> > Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
> > To unsubscribe, send an empty email to eeglablist-
> unsubscribe at sccn.ucsd.edu
> > For digest mode, send an email with the subject "set digest mime" to
> > eeglablist-request at sccn.ucsd.edu
> 
> _______________________________________________
> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
> To unsubscribe, send an empty email to eeglablist-
> unsubscribe at sccn.ucsd.edu
> For digest mode, send an email with the subject "set digest mime" to
> eeglablist-request at sccn.ucsd.edu

_______________________________________________
Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
To unsubscribe, send an empty email to eeglablist-unsubscribe at sccn.ucsd.edu
For digest mode, send an email with the subject "set digest mime" to eeglablist-request at sccn.ucsd.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sccn.ucsd.edu/pipermail/eeglablist/attachments/20100121/5860f691/attachment.html>