[Eeglablist] An open experiment on the data rank issue with ICA

Mon Apr 10 14:48:12 PDT 2023

Dear Arno,

> The most important is, I think to show that these problems do not occur
if the matrix is full rank and one does not use an average reference (not
including the original reference).

At least, I suggest we distinguish 'real rank' (calculated by Matlab rank
function) and 'effective rank' (the number of eigenvalues > 10^-7).
Otherwise we cannot get out of the confusion seen before publishing this
paper.

Publishing a paper is only a step ahead of writing to this mailing list,
but it gives us credit so it is more justifiable.
All the coauthors and I are grateful for EEGLAB and its community.for
giving us this opportunity to address the issue. Particularly, I am happy
that Sven's original idea for fixing the issue is now officially credited.
Thanks Arno!

Makoto

On Wed, Apr 5, 2023 at 5:04 PM Arnaud Delorme via eeglablist <
eeglablist at sccn.ucsd.edu> wrote:

> Hi Makoto,
>
> In your paper, we read your main concerns:
>
> "1. EEGLAB does not account for the initial reference electrode.
> Therefore, EEGLAB reduces data rank by re-referencing. This violates the
> first and second properties described previously (Hu et al., 2019)."
>
> This is not correct. You can add back the common reference to the data
> when you reference the data, with no loss of data rank. There is an entire
> page in the tutorial on this process (and we pointed you to this page
> several times). But you are right that maybe this should be the default
> method when the reference is a scalp channel. The issue is that people need
> to provide the location of the reference, which might not be available to
> them.
>
>
> https://urldefense.com/v3/__https://eeglab.org/tutorials/05_Preprocess/rereferencing.html*retaining-the-reference-channel__;Iw!!Mih3wA!FjmU8KUeMMhf-Jh7ww7xfFT3jYcM12PSs9M0B27FWu9BRXjS0tddjZDWqxaNvDMXRtClBxhZ7a_xfT_SGXGZJM5G$
>
> "2. Although EEGLAB's implementation of the ICA (pop_runica) includes an
> effective rank deficiency checker,"
>
> This is also not totally correct. EEGLAB implements 2 methods to check
> rank, not one, and we were discussing including a 3rd one because of issues
> with numerical inaccuracies and the fact that computing the rank sometimes
> return incorrect results. But this is link to point 3 below, so I get your
> point.
>
> "3. Even if the data rank is ensured to be cleanly deficient by one [which
> can be detected by using the rank () function] through EEGLAB's
> re-referencing process, EEGLAB calculates λmin, which reintroduces a
> non-zero small number (typically <10−10) via numerical error. This non-zero
> noise forces effectively rank-deficient decomposition.”
>
> Interesting. I loaded the tutorial dataset and computed the average
> reference. Then I used PCA.
>
> [pc,eigvec,sv] = runpca(double(EEG.data));
>
> The last eigenvalue corresponding to the dimension that would be discarded
> is
>
> sv(end,end)
>
> >> 8.4791e-04
>
> Compared the smallest second eigenValue
>
> sv(end-1,end-1)
>
> >> 386.1381
>
> Now it may well be that 10^6 difference in scale between eigenvalues is
> enough to disrupt ICA decomposition, as claimed in the paper, but this will
> depend on the number of channels etc.
>
> Also, credit should have been given to this tutorial page, which outlined
> the problem more than ten years ago.
>
>
> https://urldefense.com/v3/__https://eeglab.org/tutorials/06_RejectArtifacts/RunICA.html*how-to-deal-with-corrupted-ica-decompositions__;Iw!!Mih3wA!FjmU8KUeMMhf-Jh7ww7xfFT3jYcM12PSs9M0B27FWu9BRXjS0tddjZDWqxaNvDMXRtClBxhZ7a_xfT_SGa_pb_A0$
>
> The most important is, I think to show that these problems do not occur if
> the matrix is full rank and one does not use an average reference (not
> including the original reference).
> Also puzzling is that using agressive PCA dimension reduction, as outlined
> in the page above, seems to partially solve the problem, which seems
> opposite to the conclusion of the paper.
>
> I have compared the code you contributed (reref.m function but it is
> identical to the one in EEGLAB, so I am confused).
>
> Note that this is also linked to a well known problem that ICA requires
> high numerical precision. If you run ICA in single precision (32 bit float
> number) the results will be different from double precision (64-bit float
> number). The PCA problem outlined in the paper is also an issue with
> numerical precision.
>
> Proposed action in EEGLAB:
> 1. Update the documentation (Done)
> https://urldefense.com/v3/__https://eeglab.org/tutorials/06_RejectArtifacts/RunICA.html*how-to-deal-with-corrupted-ica-decompositions__;Iw!!Mih3wA!FjmU8KUeMMhf-Jh7ww7xfFT3jYcM12PSs9M0B27FWu9BRXjS0tddjZDWqxaNvDMXRtClBxhZ7a_xfT_SGa_pb_A0$
> 2. I think the best strategy is to systematically run PCA before ICA when
> the rank is reduced and then check the eigenvalue for the dimension to be
> removed are below 1e-7 (or should we check the ratio of eigenvalues). If
> this ratio is larger than 1e-7 issue a warning in red, indicating that
> dimension reduction is not appropriate, and that people can expect Ghost
> ICs.
> 3. Add additional warning when people reference the data, advising them to
> re-reference after ICA.
>
> Cheers,
>
> Arno
>
> Ps: maybe next time, submit a pull request instead of publishing a paper
> :-) — although it is nice to have everything documented.
>
> > On Apr 4, 2023, at 10:09 AM, Makoto Miyakoshi via eeglablist <
> eeglablist at sccn.ucsd.edu> wrote:
> >
> > Dear eeglab mailinglist subscribers,
> >
> > On May 7 2021, I announced on this mailing list that I started the open
> > experiment on the data rank issue (forwarded below). Just last week, the
> > project was published. Please see the paper from the URL below.
> >
> >
> https://urldefense.com/v3/__https://www.frontiersin.org/articles/10.3389/frsip.2023.1064138/full?&utm_source=Email_to_authors_&utm_medium=Email&utm_content=T1_11.5e1_author&utm_campaign=Email_publication&field=&journalName=Frontiers_in_Signal_Processing&id=1064138__;!!Mih3wA!HoFCU2dXFWJ4lr9yY3siKM-dK7zb3Bvkp4JnLYaneFgdQpFoOtjXtvLa1JvpkSlLeejnqhPQfZmw5zHn_Hdi2rFXqXs$
> >
> > We did several experimental things in this publication:
> >
> >   - We invited Dr. Sven Hoffmann, who kindly reported the issue with the
> >   solution (which was proven to be correct in our simulation!) Everybody
> can
> >   see his name in pop_runica() line 668, but the current implementation
> >   disables his original idea.
> >   - We did an homage project on ICA voice unmixing i.e. the original
> >   definition of the cocktail-party effect and effectiveness of ICA on
> this
> >   issue. One of the 'voice actors', TzyyPing Jung, actually performed
> >   the same part in the original demo in late 90's. So he played his own
> role
> >   after about 25 years.
> >   - We quoted several private communications with permissions regarding
> >   the issue of the correct way of applying average reference. We
> >   communications with Paul Nunes, Ramesh Srinivasan, Joseph Dien, and
> Dezhong
> >   Yao. In other words, everyone (as far as I know of) who published a
> >   paper/book on this issue. And Andreass Widmann who pointed me to this
> >   problem on the mailing list.
> >   - The second author is a high-school student--He joined us as an intern
> >   student. He turned out to be a fluent programmer and did all the
> analysis
> >   beyond my instructions by figuring out the purpose of the simulation by
> >   himself.
> >
> > It was such a fun project. The rank issue is one of the recurring
> questions
> > in the past EEGLAB workshops. I hope our publication clarifies the
> > background of this long-standing problem and to provide a solution once
> and
> > for all.
> >
> > Makoto
> >
> >
> >
> > On Fri, May 7, 2021 at 2:25 PM Makoto Miyakoshi <mmiyakoshi at ucsd.edu>
> wrote:
> >
> >> Dear subscribers,
> >>
> >> Recently, there are multiple independent posts about the data rank issue
> >> with ICA. In response, I am thinking about running a simulation
> experiment
> >> with a visiting scholar to SCCN as a small project on this issue for
> >> publication. I would appreciate it if you can give me any of the
> following
> >> as an input.
> >>
> >>   - Questions (what is puzzling for you? No need to be shy for asking
> >>   'dumb questions')
> >>   - Requests (if you want to know particularly X and/or Y on this issue,
> >>   I may be able to give you the answer based on the simulation test)
> >>   - Suggestions (about methods, data type, applications, etc)
> >>   - Reports (when ICA failed, what did you see?)
> >>
> >> If you are interested in working with me to make a contribution to this
> >> small project, please reply or contact me mmiyakoshi at ucsd.edu. If your
> >> contribution is substantial, I'll offer you to be a coauthor. Probably
> we
> >> will need as many strange results as possible...?
> >>
> >> Probably this is the first attempt to run an open experiment on the
> EEGLAB
> >> mailing list--please join us and let's find out what happens!
> >>
> >> Makoto
> >>
> >>
> > _______________________________________________
> > Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
> > To unsubscribe, send an empty email to
> eeglablist-unsubscribe at sccn.ucsd.edu
> > For digest mode, send an email with the subject "set digest mime" to
> eeglablist-request at sccn.ucsd.edu
>
> _______________________________________________
> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
> To unsubscribe, send an empty email to
> eeglablist-unsubscribe at sccn.ucsd.edu
> For digest mode, send an email with the subject "set digest mime" to
> eeglablist-request at sccn.ucsd.edu