[Eeglablist] ICA "adds" noise?

Wed Jan 23 20:16:06 PST 2013

Hi Jason,
    Thanks for pointing out that the tolerance parameter in the rank command needs acknowledgment.  However, the tolerance parameter isn't an issue here (see following):

unreferenced data(250 time points x 129 channels):
>> unsvd=svd(undata);

the last five singular values are: 
0.393656559503141
0.386648744208800
0.382247320277519
0.367742048503416
0

mean mastoid data:
>> mmsvd=svd(mmdata);

the last five singular values are: 
0.389856169308842
0.382248913003272
0.369775384079404
0.256844712944802
8.47133928749350e-14

average reference data:
>> arsvd=svd(ardata);

the last five singular values are: 
0.388271614071926
0.382248911271762
0.369496676930511
0.256588822168438
1.07911880666156e-12

here is a mean mastoid transformation of the average reference data (messy I know):
ardatamm=(ardata'-(ones(size(ardata'))*diag(mean(ardata(:,[57 101])'))))';
>> svdarmm=svd(ardatamm);

the last five singular values are: 
0.389856169308836
0.382248913003250
0.369775384079424
0.256844712944805
6.17491551452385e-14

here is an average reference of the mean mastoid transformation of the average reference data:
armmarData=(ardatamm'-(ones(size(ardatamm'))*diag(mean(ardatamm(:,:)'))))';
>> armmarsvd=svd(armmarData);

the last five singular values are: 
0.388271614071925
0.382248911271741
0.369496676930526
0.256588822168444
2.27618061791724e-12

There is a little bit of a drop-off in the second-to-last singular value with the initial rereferencing, perhaps due to the introduction of imprecision, but not to the point of losing any more ranks and it doesn't appear to be cumulative with further rereferencing operations.
So what I'm seeing is that if the data matrix has a rank of n-1 and the rereferencing matrix has a rank of n-1, then the resulting rereferenced data still has a rank of n-1.  You only start losing ranks if you drop the reference channel and rereference the rest of the data without it.  To take things to the logical extreme, consider what happens if you repetitively rereference a dataset to Cz over and over again.  You won't be losing a rank every time as nothing will change after the first rereference (since Cz will now be zero and subtracting zero from the rest of the channels will not change anything).  So rereferencing data (whether to average reference or mean mastoid reference or anything else) won't change the rank as long as one remembers to include the original reference channel in the matrix.  It definitely makes sense to drop a reference channel and/or reduce the subspace by one, that said, if one is doing an ICA of this data.

Joe

On Jan 23, 2013, at 8:45 PM, Jason Palmer <japalmer29 at gmail.com> wrote:

> Hi Joseph et al.,
>  
> I believe that any reference, average or channel, does reduce the rank by one. This is straightforward to show using linear algebra, e.g., here:
>  
> http://sccn.ucsd.edu/wiki/Linear_Representations_and_Basis_Vectors#EEG_Data_Reference_and_Re-referencing
>  
> The rank that matlab gives you depends on the tolerance used to declare small dimensions zero. E.g. a rank deficient matrix usually has smallest eigenvalue of around 1e-15 to 1e-8, due to numerical imprecision, particularly with large ill-conditioned matrices. You should see a sudden “drop off” though in the eigenvalue magnitudes after the theoretical rank.
>  
> Best,
> Jason
>  
> From: eeglablist-bounces at sccn.ucsd.edu [mailto:eeglablist-bounces at sccn.ucsd.edu] On Behalf Of Joseph Dien
> Sent: Wednesday, January 23, 2013 11:54 AM
> To: Matt Craddock
> Cc: eeglablist at sccn.ucsd.edu; Kristina Borgström
> Subject: Re: [Eeglablist] ICA "adds" noise?
>  
> Hmmm…. I should know better than to talk off the top of my head like that…
>  
> Well, part right, part wrong.
>  
> It's easy enough to just try it out and see what Matlab says the rank is of the data.  I took some Cz-referenced 129-channel data and then rereferenced it to mean mastoid and to average reference.
>  
> >> rank(undata)
>  
> ans =
>  
>    128
>  
> >> rank(mmdata)
>  
> ans =
>  
>    128
>  
> >> rank(ardata)
>  
> ans =
>  
>    128
>  
> so the bottom line is that average reference doesn't reduce the rank but neither does mean mastoid (my error).
>  
> As for the wiki page you linked to, it's worded in a confusing way.  It's not that "the average reference reduces the rank of the data" necessarily.  What it should say is that it doesn't increase the rank of the data.  So if you start off with 128 recording channels and a reference channel (n=129) and rank is 128 (because voltage data is relative by definition so two channels only give you one waveform), then after average reference, the rank is still 128 even though it now looks as though you've got 129 channels with independent waveforms.  If you dropped the 129th reference channel and then computed an average reference channel, you would indeed lose another rank (as seen below) but that would be because you were doing the procedure incorrectly and had deleted a channel of meaningful information (even though it is flat).  The flat reference channel should always be included in the average reference computation.  The same goes for computing the mean mastoid reference (or any other rereference), although unfortunately a lot of systems throw that information away.
>  
>  
> >> rank(ar128data)
>  
> ans =
>  
>    127
>  
> I definitely need to look into the effects of bridging more closely, especially for frequency-based applications.  This has been a very helpful discussion!
>  
> Joe
>  
>  
>  
> On Jan 23, 2013, at 6:44 AM, Matt Craddock <matt.craddock at uni-leipzig.de> wrote:
> 
> 
> On 18/01/2013 20:39, Joseph Dien wrote:
> 
> Another thought occurs to me.  I have indeed noticed a tendency for
> increased noise to show up in my own ICA-based artifact correction
> routine in the EP Toolkit (Tim Curran first reported it to me).  I've
> never worked out why.  I ended up implementing a trial-by-trial
> workaround wherein the eyeblink factors are removed from a given
> trial only when it reduces the overall variance of the trial.  In
> other words, when the benefit outweighs the cost.  The increased
> noise that I see is small enough that it gets averaged out for ERPs
> so has not been an issue.  Could be an issue for frequency-based
> measures though.  I need to look into this further.  Anyway, what
> you're reporting seems more severe than anything I've observed so
> perhaps something different.
> 
> Hi Joe, Kristina, and all,
> 
> I'm mostly dealing with frequency analysis; the noise does indeed pose
> some problems for frequency-based measures, since it translates into
> noise in the gamma band range (>40Hz, mostly). This issue has been
> reported previously on this list:
> 
> http://sccn.ucsd.edu/pipermail/eeglablist/2011/004316.html
> 
> and the conclusion then was that it was down to reduced rank:
> 
> http://sccn.ucsd.edu/pipermail/eeglablist/2011/004319.html
> 
> Hence why I jumped on that as an explanation when I saw Kristina's
> original post. My situation turns out to be a little different from
> hers, in that I use average reference rather than linked mastoids, and
> don't keep a reference channel in the data, so it didn't seem to be caused by the duplicate data issue Makoto identified (although sometimes it may have been - see later; but wouldn't that also be a rank reduction?). In my case I've found doing PCA first (reducing number of components to the rank, so usually only to numChannels-1) makes this problem go away, but given that everybody said avoid doing that first, I also had a closer look at the datasets where I'd had this problem and found in some cases that there were *very* high correlations between some channels (.99 in one case!). Removing one of those channels before running ICA (and *not* doing PCA) also fixed the problem. I didn't see any major differences in the components between PCAing first and removing the channels, though of course that's not to say there aren't any that would emerge if looking at them more systematically!
> 
> 
> Average reference doesn't reduce the rank.  Basically all it does is
> to virtually move the reference site.  In the original
> vertex-referenced data, there is informational ambiguity as to
> whether recorded voltage fluctuations are due to activity at the
> reference site or at the recording site (unavoidable since voltages
> are by their nature relative and so require a reference site).  When
> one algebraically rereferences the data to a different single
> reference site (including the virtual reference site of average
> reference) there is no increase in informational ambiguity.  Mean
> mastoid reference does increase increase informational ambiguity
> because it introduces a new ambiguity of whether reference site
> activity is occurring at the left or right mastoid.  In essence, this
> is because a subset of the total set of electrodes has been singled
> out and mixed together.  This increased ambiguity reduces the rank by
> one.
> 
> Hmm, but this page says that average reference does reduce rank, and that's been what people have said on this list for quite a while.
> http://sccn.ucsd.edu/wiki/Linear_Representations_and_Basis_Vectors
> 
> Happy to be corrected, but on the whole I'm left a little puzzled - the default behaviour of EEGlab's GUI is to suggest PCA reduction if your data is reduced rank. Given the consensus seems to be to avoid PCA, does that need to change, or at least to suggest people be very cautious about using it and try to find alternative ways of conditioning their data, be that removing channels or whatever?
> 
> Cheers,
> Matt
> 
> -- 
> Dr. Matt Craddock
> 
> Post-doctoral researcher,
> Institute of Psychology,
> University of Leipzig,
> Seeburgstr. 14-20,
> 04103 Leipzig, Germany
> Phone: +49 341 973 95 44
>  
> 
> --------------------------------------------------------------------------------
>  
> Joseph Dien,
> Senior Research Scientist
> University of Maryland 
>  
> E-mail: jdien07 at mac.com
> Phone: 301-226-8848
> Fax: 301-226-8811
> http://joedien.com//
>  
>  
>  
>  
>  
>  
>  
>  
>  
> 

--------------------------------------------------------------------------------

Joseph Dien,
Senior Research Scientist
University of Maryland 

E-mail: jdien07 at mac.com
Phone: 301-226-8848
Fax: 301-226-8811
http://joedien.com//

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sccn.ucsd.edu/pipermail/eeglablist/attachments/20130123/2905520a/attachment.html>