[Eeglablist] ICA "adds" noise?

Jason Palmer japalmer29 at gmail.com
Wed Jan 23 20:30:50 PST 2013


Hi Joseph,

Yes, sorry if I misunderstood what you were saying. The data rank after
reference should generally be the minimum of the old data rank and
num_chan-1.

Best,

Jason

 

From: Joseph Dien [mailto:jdien07 at mac.com] 
Sent: Wednesday, January 23, 2013 8:16 PM
To: japalmer at ucsd.edu
Cc: 'Matt Craddock'; eeglablist at sccn.ucsd.edu; 'Kristina Borgström'
Subject: Re: [Eeglablist] ICA "adds" noise?

 

Hi Jason,

    Thanks for pointing out that the tolerance parameter in the rank command
needs acknowledgment.  However, the tolerance parameter isn't an issue here
(see following):

 

unreferenced data(250 time points x 129 channels):

>> unsvd=svd(undata);

 

the last five singular values are: 

0.393656559503141

0.386648744208800

0.382247320277519

0.367742048503416

0

 

mean mastoid data:

>> mmsvd=svd(mmdata);

 

the last five singular values are: 

0.389856169308842

0.382248913003272

0.369775384079404

0.256844712944802

8.47133928749350e-14

 

average reference data:

>> arsvd=svd(ardata);

 

the last five singular values are: 

0.388271614071926

0.382248911271762

0.369496676930511

0.256588822168438

1.07911880666156e-12

 

 

here is a mean mastoid transformation of the average reference data (messy I
know):

ardatamm=(ardata'-(ones(size(ardata'))*diag(mean(ardata(:,[57 101])'))))';

>> svdarmm=svd(ardatamm);

 

the last five singular values are: 

0.389856169308836

0.382248913003250

0.369775384079424

0.256844712944805

6.17491551452385e-14

 

here is an average reference of the mean mastoid transformation of the
average reference data:

armmarData=(ardatamm'-(ones(size(ardatamm'))*diag(mean(ardatamm(:,:)'))))';

>> armmarsvd=svd(armmarData);

 

the last five singular values are: 

0.388271614071925

0.382248911271741

0.369496676930526

0.256588822168444

2.27618061791724e-12

 

There is a little bit of a drop-off in the second-to-last singular value
with the initial rereferencing, perhaps due to the introduction of
imprecision, but not to the point of losing any more ranks and it doesn't
appear to be cumulative with further rereferencing operations.

So what I'm seeing is that if the data matrix has a rank of n-1 and the
rereferencing matrix has a rank of n-1, then the resulting rereferenced data
still has a rank of n-1.  You only start losing ranks if you drop the
reference channel and rereference the rest of the data without it.  To take
things to the logical extreme, consider what happens if you repetitively
rereference a dataset to Cz over and over again.  You won't be losing a rank
every time as nothing will change after the first rereference (since Cz will
now be zero and subtracting zero from the rest of the channels will not
change anything).  So rereferencing data (whether to average reference or
mean mastoid reference or anything else) won't change the rank as long as
one remembers to include the original reference channel in the matrix.  It
definitely makes sense to drop a reference channel and/or reduce the
subspace by one, that said, if one is doing an ICA of this data.

 

Joe

 

 

On Jan 23, 2013, at 8:45 PM, Jason Palmer <japalmer29 at gmail.com> wrote:





Hi Joseph et al.,

 

I believe that any reference, average or channel, does reduce the rank by
one. This is straightforward to show using linear algebra, e.g., here:

 

 
<http://sccn.ucsd.edu/wiki/Linear_Representations_and_Basis_Vectors#EEG_Data
_Reference_and_Re-referencing>
http://sccn.ucsd.edu/wiki/Linear_Representations_and_Basis_Vectors#EEG_Data_
Reference_and_Re-referencing

 

The rank that matlab gives you depends on the tolerance used to declare
small dimensions zero. E.g. a rank deficient matrix usually has smallest
eigenvalue of around 1e-15 to 1e-8, due to numerical imprecision,
particularly with large ill-conditioned matrices. You should see a sudden
“drop off” though in the eigenvalue magnitudes after the theoretical rank.

 

Best,

Jason

 

From: eeglablist-bounces at sccn.ucsd.edu
[mailto:eeglablist-bounces at sccn.ucsd.edu] On Behalf Of Joseph Dien
Sent: Wednesday, January 23, 2013 11:54 AM
To: Matt Craddock
Cc: eeglablist at sccn.ucsd.edu; Kristina Borgström
Subject: Re: [Eeglablist] ICA "adds" noise?

 

Hmmm
. I should know better than to talk off the top of my head like that


 

Well, part right, part wrong.

 

It's easy enough to just try it out and see what Matlab says the rank is of
the data.  I took some Cz-referenced 129-channel data and then rereferenced
it to mean mastoid and to average reference.

 

>> rank(undata)

 

ans =

 

   128

 

>> rank(mmdata)

 

ans =

 

   128

 

>> rank(ardata)

 

ans =

 

   128

 

so the bottom line is that average reference doesn't reduce the rank but
neither does mean mastoid (my error).

 

As for the wiki page you linked to, it's worded in a confusing way.  It's
not that "the average reference reduces the rank of the data" necessarily.
What it should say is that it doesn't increase the rank of the data.  So if
you start off with 128 recording channels and a reference channel (n=129)
and rank is 128 (because voltage data is relative by definition so two
channels only give you one waveform), then after average reference, the rank
is still 128 even though it now looks as though you've got 129 channels with
independent waveforms.  If you dropped the 129th reference channel and then
computed an average reference channel, you would indeed lose another rank
(as seen below) but that would be because you were doing the procedure
incorrectly and had deleted a channel of meaningful information (even though
it is flat).  The flat reference channel should always be included in the
average reference computation.  The same goes for computing the mean mastoid
reference (or any other rereference), although unfortunately a lot of
systems throw that information away.

 

 

>> rank(ar128data)

 

ans =

 

   127

 

I definitely need to look into the effects of bridging more closely,
especially for frequency-based applications.  This has been a very helpful
discussion!

 

Joe

 

 

 

On Jan 23, 2013, at 6:44 AM, Matt Craddock <
<mailto:matt.craddock at uni-leipzig.de> matt.craddock at uni-leipzig.de> wrote:






On 18/01/2013 20:39, Joseph Dien wrote:




Another thought occurs to me.  I have indeed noticed a tendency for
increased noise to show up in my own ICA-based artifact correction
routine in the EP Toolkit (Tim Curran first reported it to me).  I've
never worked out why.  I ended up implementing a trial-by-trial
workaround wherein the eyeblink factors are removed from a given
trial only when it reduces the overall variance of the trial.  In
other words, when the benefit outweighs the cost.  The increased
noise that I see is small enough that it gets averaged out for ERPs
so has not been an issue.  Could be an issue for frequency-based
measures though.  I need to look into this further.  Anyway, what
you're reporting seems more severe than anything I've observed so
perhaps something different.


Hi Joe, Kristina, and all,

I'm mostly dealing with frequency analysis; the noise does indeed pose
some problems for frequency-based measures, since it translates into
noise in the gamma band range (>40Hz, mostly). This issue has been
reported previously on this list:

 <http://sccn.ucsd.edu/pipermail/eeglablist/2011/004316.html>
http://sccn.ucsd.edu/pipermail/eeglablist/2011/004316.html

and the conclusion then was that it was down to reduced rank:

 <http://sccn.ucsd.edu/pipermail/eeglablist/2011/004319.html>
http://sccn.ucsd.edu/pipermail/eeglablist/2011/004319.html

Hence why I jumped on that as an explanation when I saw Kristina's
original post. My situation turns out to be a little different from
hers, in that I use average reference rather than linked mastoids, and
don't keep a reference channel in the data, so it didn't seem to be caused
by the duplicate data issue Makoto identified (although sometimes it may
have been - see later; but wouldn't that also be a rank reduction?). In my
case I've found doing PCA first (reducing number of components to the rank,
so usually only to numChannels-1) makes this problem go away, but given that
everybody said avoid doing that first, I also had a closer look at the
datasets where I'd had this problem and found in some cases that there were
*very* high correlations between some channels (.99 in one case!). Removing
one of those channels before running ICA (and *not* doing PCA) also fixed
the problem. I didn't see any major differences in the components between
PCAing first and removing the channels, though of course that's not to say
there aren't any that would emerge if looking at them more systematically!





Average reference doesn't reduce the rank.  Basically all it does is
to virtually move the reference site.  In the original
vertex-referenced data, there is informational ambiguity as to
whether recorded voltage fluctuations are due to activity at the
reference site or at the recording site (unavoidable since voltages
are by their nature relative and so require a reference site).  When
one algebraically rereferences the data to a different single
reference site (including the virtual reference site of average
reference) there is no increase in informational ambiguity.  Mean
mastoid reference does increase increase informational ambiguity
because it introduces a new ambiguity of whether reference site
activity is occurring at the left or right mastoid.  In essence, this
is because a subset of the total set of electrodes has been singled
out and mixed together.  This increased ambiguity reduces the rank by
one.


Hmm, but this page says that average reference does reduce rank, and that's
been what people have said on this list for quite a while.
 <http://sccn.ucsd.edu/wiki/Linear_Representations_and_Basis_Vectors>
http://sccn.ucsd.edu/wiki/Linear_Representations_and_Basis_Vectors

Happy to be corrected, but on the whole I'm left a little puzzled - the
default behaviour of EEGlab's GUI is to suggest PCA reduction if your data
is reduced rank. Given the consensus seems to be to avoid PCA, does that
need to change, or at least to suggest people be very cautious about using
it and try to find alternative ways of conditioning their data, be that
removing channels or whatever?

Cheers,
Matt

-- 
Dr. Matt Craddock

Post-doctoral researcher,
Institute of Psychology,
University of Leipzig,
Seeburgstr. 14-20,
04103 Leipzig, Germany
Phone: +49 341 973 95 44

 


----------------------------------------------------------------------------
----

 

Joseph Dien,

Senior Research Scientist

University of Maryland 

 

E-mail:  <mailto:jdien07 at mac.com> jdien07 at mac.com

Phone: 301-226-8848

Fax: 301-226-8811

 <http://joedien.com/> http://joedien.com//

 

 

 

 

 

 

 

 

 

 


----------------------------------------------------------------------------
----

 

Joseph Dien,

Senior Research Scientist

University of Maryland 

 

E-mail: jdien07 at mac.com

Phone: 301-226-8848

Fax: 301-226-8811

http://joedien.com// <http://joedien.com/> 

 

 

 

 

 

 

 

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sccn.ucsd.edu/pipermail/eeglablist/attachments/20130123/94cc9254/attachment.html>


More information about the eeglablist mailing list