[Eeglablist] ICA components distorted after rereferencing, saving and reloading

Wed Aug 6 08:12:36 PDT 2025

Hi Cedric,

> I’ve also found that clean_windows() can sometimes remove more data than
necessary, so I generally prefer using clean_asr() alone. But I see that
you use a higher threshold of 8 in your code. I'll try that.

My default setting is SD==20.SD== 8 is a bit aggressive.

I use ASR only to remove pathological data points with values like > 10^3.
This definitely helps ICA. It also helps to improve general stationarity
i.e., when you see the whole recording of an electrode in one screen, the
trace (envelope) should like a rectangle. Use my trimOutlier and you can
see that. I think this is an intuitive and effective way to check data
stationarity.

> In the code example you provided, are you suggesting using reconstruction
mode before ICA? My understanding was that this is not recommended, as it
can confuse ICA; removing the affected periods is usually preferable.
Curious to hear your thoughts.

If there are many event markers for time-/time-freq domain ERP analysis, I
only use reconstruction mode and do not allow window rejection. This way,
the original data length is maintained, so I can avoid subsequent
'boundary' related troubles. Even if the data are continuous like
resting-state analysis, I would still use ASR, and I may use post-ASR auto
window rejection.

> In my experience, removing less early on and dealing with outliers in
this way is better than over-cleaning with ASR.

I agree with you. Data 'cleaning' with ASR is basically diluting your
signals with mock-singals. So S and N decrease together (as all other
methods do), in which we hope N reduces faster than S.

> With thresholds below 15, I’ve even seen ASR remove not just ocular
activity but also alpha oscillations from otherwise clean data. This is
obviously also a bigger conversation that should include how many trials
are available, SNR, etc.

Yes, I've seen that myself. If you use only an amplitude criterion, you
cannot escape from this false positive behavior.
Though we do not have an immediately available countermeasure for this, we
can (1) use lax threshold (SD==10-20); (2) keep eyes on post-cleaning data
statistics; how many datapoints are altered, how much is the variance
change, etc.. I usually see 80-85% of datapoints (across all channels)
remain intact, which guarantees an expected behavior of subsequent median
statistics.

> I’d be interested in joining a community project to tackle this. I’ve
experimented with ML classifiers, particularly for lower-density dry
wearable systems, but I haven’t yet tested how well they transfer to wet
systems.

Nice! Let me discuss it with Hyeonseok. Hyeonseok (Kim) and I have had
enough troubles with my primitive workaround.
Cyril told me his idea on how to improve the behavior of the algorithm
around RANSAC in 2022. I could not find my communication with him on quick
email search, but I'm sure I have it somewhere in my mail box. We can start
from there.

> BurstCriterionRefMaxBadChns is set to 0.075 by default, doesn't feel like
a big difference to set it to 0. And to be clear, this would be only if
some bad channels made it through clean_channels, right? I will use that
from now on, thanks.

This means even if 7.5% of your channels show whatever bad data (like
10^200 microV plasma-generating EEG), ASR will take it anyway to build a
reference data because it will use robust statistics later anyway. This
way, ASR tries to maximize the amount of usable chunks of data by
neglecting 'minor flaws'. Conventional window-based window rejection is
like rejecting all bananas with any black spots. ASR allows black spots up
to 7.5% of a banana because these parts can be easily removed later.

Makoto

On Tue, Aug 5, 2025 at 8:30 PM Cedric Cannard via eeglablist <
eeglablist at sccn.ucsd.edu> wrote:

> Thanks Makoto,
>
> Reading the recent juggler paper has been on my to-do list, and I’m
> looking forward to learning about these new developments.
>
> Quick clarification: I wasn’t suggesting to apply ASR twice. In my
> example, I was just showing how to call the clean_rawdata plugin once to
> detect and remove bad channels, and again for ASR only to remove transient
> artifacts, so one could run either step independently. Personally, I call
> the subfunctions directly with more complexity.
>
> It’s good to know how ASR behaves with more relaxed SD thresholds.
> Although I always inspect visually and haven't seen weird things with very
> conservative thresholds (>60), but that's for removing only very large
> electrode, muscle, or motion artifacts. I’ve also found that
> clean_windows() can sometimes remove more data than necessary, so I
> generally prefer using clean_asr() alone. But I see that you use a higher
> threshold of 8 in your code. I'll try that.
>
> In the code example you provided, are you suggesting using reconstruction
> mode before ICA? My understanding was that this is not recommended, as it
> can confuse ICA; removing the affected periods is usually preferable.
> Curious to hear your thoughts.
>
> In general, I prefer to be conservative with data cleaning, as long as ICA
> can still operate effectively, and then handle any remaining outliers later
> with robust central tendency measures and statistics (e.g., median, trimmed
> mean, GLMs with WLS optimization in LIMO to downweight bad trials, etc.).
> In my experience, removing less early on and dealing with outliers in this
> way is better than over-cleaning with ASR. With thresholds below 15, I’ve
> even seen ASR remove not just ocular activity but also alpha oscillations
> from otherwise clean data. This is obviously also a bigger conversation
> that should include how many trials are available, SNR, etc.
>
> I also agree that clean_channels is not always perfect across montages and
> datasets, especially with lower-density montages (but no method seems to be
> so far). With high-density setups, I’ve had better results using a lower
> correlation threshold (~0.65), a maxtol around 33% of the file, and over
> 200 RANSAC samples for greater robustness (and probably more
> reproducibility?). I share your view that better methods for detecting bad
> channels are needed, and I’d be interested in joining a community project
> to tackle this. I’ve experimented with ML classifiers, particularly for
> lower-density dry wearable systems, but I haven’t yet tested how well they
> transfer to wet systems.
>
> BurstCriterionRefMaxBadChns is set to 0.075 by default, doesn't feel like
> a big difference to set it to 0. And to be clear, this would be only if
> some bad channels made it through clean_channels, right? I will use that
> from now on, thanks.
>
>
> Cedric
>
>
>
> On Tuesday, August 5th, 2025 at 3:25 PM, Makoto Miyakoshi via eeglablist <
> eeglablist at sccn.ucsd.edu> wrote:
>
> > Hi Tomko and
> >
> > Cedric
> >
> >
> > ,
> >
> > Thank you Cedric for helping Tomko. I generally agree with your advice!
> >
> > About applying clean_rawdata() twice, although it does not invalidate the
> > approach, I think the second application is better avoided.
> > If you want to apply aggressive high-amplitude artifact interpolation,
> you
> > may apply aggressive (SD==5 to 8) threshold from the beginning.
> >
> > ASR is basically a solution for performing flexible interpolation for
> every
> > electrode and sliding window independently. Instead of performing
> > electrode-wide interpolation, it uses PCA-decomposed high-amplitude
> > electrode subset rejection and subsequent reference-data-informed
> 'artifact
> > subspace reconstruction' together with the remaining PCs. Howver, when it
> > comes to recovering the 'underlying physiological signals', ASR is not as
> > 'smart' as ICA. So you want to minimize your dependency in data
> > modification on ASR, while maximizing it on ICA, if you use it.
> >
> > A couple of things.
> >
> > 1. I do not recommend the bad electrode rejection function implemented in
> > the current clean_rawdata() suite. The result is not reproducible and it
> > tends to be aggressive. For more simple solution, I have been developing
> an
> > amplitude-based bad channel rejection solution. I've spent great effort
> and
> > time without much success. I once discussed with Cyril an idea for
> > modifying the current implementation in clean_rawdata, but I have never
> > found a chance to try it out. Are there anyone interested in developing
> > this solution with me to publish a technical paper?
> >
> > 2. Two-pass ASR is probably unnecessary. One pass with a lax threshold
> > (>10-20) is good. The purpose of this ASR is save ICA from
> >
> > stopping/exploding from processing high-amplitude outliers. Remember, you
> > let more data to be processed by ASR, you lose more signal contents. See
> > how I keep my eyes on how much data are modified by ASR below.
> >
> > % Step 7: Apply clean_rawdata(). Disable 'BurstRejection' that
> > rejects bad windows instead of interpolates. (12/02/2024).
> > EEG = pop_clean_rawdata(EEG,
> > 'FlatlineCriterion','off','ChannelCriterion','off', ...
> > 'LineNoiseCriterion','off','Highpass','off', ...
> > 'BurstCriterion', 25, 'WindowCriterion', 0.2, ...
> > 'BurstRejection','off','Distance','Euclidian', ...
> > 'BurstCriterionRefMaxBadChns', 0, ...
> > 'BurstCriterionRefTolerances', [-inf 8], ...
> > 'WindowCriterionTolerances',[-Inf 8], 'MaxMem', 4096);
> > survivedDataIdx = find(EEG.etc.clean_sample_mask);
> > rejectedDataIdx = find(~EEG.etc.clean_sample_mask);
> > asrBeforeAfterDiff =
> > sum(originalEEG.data(:,survivedDataIdx)-EEG.data,1);
> > unchangedDataIdx = find(asrBeforeAfterDiff==0);
> > changedDataIdx = find(asrBeforeAfterDiff~=0);
> > windowRejRate = 1-EEG.pnts/originalEEG.pnts;
> > windowInterpolationRate =
> > (EEG.pnts-length(unchangedDataIdx))/EEG.pnts; % Thanks Lisa De
> > Stefano!
> > asrPowerReductionDb = 10log10(var(EEG.data(:,changedDataIdx
> > ),0,2)./var(originalEEG.data(:,changedDataIdx),0,2));
> > windowRejPowReducDb = 10log10(var(EEG.data(:,changedDataIdx
> > ),0,2)./var(originalEEG.data(:,rejectedDataIdx),0,2));
> > EEG.etc.ASR.windowRejectionRate = windowRejRate;
> > EEG.etc.ASR.windowInterpolationRate = windowInterpolationRate;
> > EEG.etc.ASR.varianceReductionInDbByWinRej = windowRejPowReducDb;
> > EEG.etc.ASR.varianceReductionInDbByAsr = asrPowerReductionDb;
> >
> >
> > By the way, Hyeonseok Kim and I consider that a part of the reason why a
> > strangely large standard deviation such as SD==20 (equivalent of
> > 2.75*10^-89; yes, it is 0.00...83 zeros here...0275) is that it
> > intentionally lets in small amount (default 7.5% of the channels numbers)
> > of outliers when building a reference dataset. It skews the
> > (PCA-decomposed) data distributions to the right, making it difficult to
> > set the right-tail cutoff with an SD. For more detail, see our recent
> paper
> > titled Juggler's ASR (
> >
> https://urldefense.com/v3/__https://www.sciencedirect.com/science/article/pii/S0165027025001062__;!!Mih3wA!F4fpx3wQHvjcn4TuDOqyuxQXF9Ee1P-GkSvoQEuDq4SR_ZkkujQ6yYCuOYmQ0PtdYuAFc0w562C1YYqpMFyJggvlEsc$
> ).
> >
> > By completely shutting out outliers, we can theoretically make
> > (PCA-decomposed) distributions more normalized. This normalizes the
> meaning
> > of SD used in ASR closer to our conventional use of SD. If you want to
> try
> > this solution (I actually recommend it unless you are suffering from
> > shortage of clean data for building reference data), use the following
> > option
> >
> > 'BurstCriterionRefMaxBadChns', 0
> >
> >
> > (from https://sccn.ucsd.edu/wiki/Makoto's_useful_EEGLAB_code    )
> >
> > This option enables a 'zero-tolerance' condition in which no outliers can
> > enter the calibration data.
> >
> > Makoto
> >
> > On Mon, Aug 4, 2025 at 1:57 PM Cedric Cannard via eeglablist <
> > eeglablist at sccn.ucsd.edu> wrote:
> >
> > > Tomko,
> > >
> > > > you’re saying that when I account for the rank deficiency using this
> > > > method, I don’t necessarily need to add an all-zero reference
> channel —
> > > > correct?
> > >
> > > No, again, I'm saying they are 2 different thing:
> > > - The modified CAR is to avoid a rank reduction when you re-reference
> to
> > > average (you should always use the modified version to avoid losing a
> rank).
> > > - The rank estimation and PCA reduction is to ensure ICA is never
> > > vulnerable to the rank deficiency issue (whatever the cause is).
> > >
> > > Just use both by default from now on, so you 1) never loose 1 rank when
> > > re-referencing to average, and 2) never accidentally run ICA on rank
> > > deffficient data. That's it.
> > >
> > > Here is the code again for the modified CAR, independent of what your
> > > initial reference was:
> > >
> > > % Calculate effective data rank before CAR
> > > dataRank = sum(eig(cov(double(EEG.data')))>1E-7) % for continuous data
> > >
> > > % Define label for the surrogate channel
> > > refLabel = 'initialReference';
> > >
> > > % Create dummy location struct from an existing channel to preserve
> field
> > > structure
> > > tmpLoc = EEG.chanlocs(1);
> > > tmpLoc.labels = refLabel;
> > > tmpLoc.X = []; tmpLoc.Y = []; tmpLoc.Z = [];
> > > tmpLoc.theta = []; tmpLoc.radius = []; tmpLoc.type = '';
> > >
> > > % Apply average re-reference including the dummy channel (pop_reref()
> > > % automatically appends a zero-filled channel when 'refloc' is
> provided)
> > > EEG = pop_reref(EEG, [], 'refloc', tmpLoc);
> > >
> > > % Remove the dummy channel to restore original channel count
> > > EEG = pop_select(EEG, 'nochannel', {refLabel});
> > >
> > > % Calculate effective data rank after CAR (should be same as before)
> > > dataRank = sum(eig(cov(double(EEG.data')))>1E-7) % for continuous data
> > >
> > > And your code comment says you are removing bad channels, but you are
> also
> > > running ASR in that command. To be very clear, you can separate the two
> > > like this for example:
> > >
> > > % Remove bad channels (data must be highpass filtered!)
> > > EEG = pop_clean_rawdata(EEG,
> > >
> 'FlatlineCriterion',5,'ChannelCriterion',0.75,'LineNoiseCriterion',10,'Highpass','off','BurstCriterion','off','WindowCriterion','off','BurstRejection','off','Distance','Euclidian');
> > >
> > > % Remove very large artifacts with very conservative threshold to
> improve
> > > ICA decomposition (preserving eye blinks)
> > > EEG = pop_clean_rawdata(EEG, 'FlatlineCriterion','off',
> > > 'ChannelCriterion','off','LineNoiseCriterion','off','Highpass','off',
> > > 'BurstCriterion',100,'WindowCriterion','off','BurstRejection','on',
> > > 'Distance','Euclidian','WindowCriterionTolerances','off' );
> > >
> > > If you see too many large artifacts remain, lower threshold to 60-80
> > > range). If you see eye blinks being removed and you don't have large
> > > artifacts in the data in the first place, don't do this step.
> > >
> > > Cedric
> > >
> > > On Monday, August 4th, 2025 at 7:02 AM, Tomko Settgast <
> > > tomko.settgast at uni-wuerzburg.de> wrote:
> > >
> > > > Hi Cédric,
> > > >
> > > > First of all, thanks again for your helpful and quick reply!
> > > > I really appreciate your thoughtful explanations and your patience
> with
> > > > my questions.
> > > > Also, apologies for the late response — I’m currently on vacation and
> > > > only checking emails sporadically.
> > > >
> > > > Regarding your last message: I think I may have expressed myself a
> bit
> > > > ambiguously. I did not mean to suggest adjusting the threshold itself
> > > > (i.e., the 1e-7), but rather that I use your method with that
> threshold to
> > > > estimate the effective rank and adapt the ICA accordingly.
> > > >
> > > > If I understood you correctly, you’re saying that when I account for
> the
> > > > rank deficiency using this method, I don’t necessarily need to add an
> > > > all-zero reference channel — correct?
> > > > Of course, it would be ideal to have a full-rank dataset, but
> > > > realistically my data is often so noisy that I usually have to
> delete and
> > > > interpolate several channels. So I'm not aiming for full rank in any
> case.
> > > >
> > > > Would you still recommend introducing the zero-filled reference
> channel,
> > > > perhaps especially in my case due to the substantial data loss from
> > > > interpolation?
> > > >
> > > > Just for context: the EEG data was recorded with an online earlobe
> > > > reference, and unfortunately, the original reference signal is not
> > > > available in the recordings.
> > > >
> > > > From our discussion so far, here’s what I would consider the most
> > > > appropriate pipeline:
> > > >
> > > > 1. Identify and remove bad channels using clean_rawdata
> > > > 2. Interpolate deleted channels
> > > > 3. Apply CAR
> > > > 4. Estimate rank using eigenvalue threshold
> > > > 5. Run ICA with PCA reduction (according to the effective rank) if
> > > > necessary
> > > > 6. Apply MARA to remove artifact components
> > > >
> > > > If you're interested, I’ve copied the relevant part of my code below.
> > > > I'd be very happy about any feedback:
> > > >
> > > > % [Initial steps: save structure, channel labels, etc.]
> > > >
> > > > % Run clean_rawdata to remove bad channels
> > > > EEG = clean_artifacts(EEG,
> > > > 'FlatlineCriterion',5,'ChannelCriterion',0.85,...
> > > > 'LineNoiseCriterion',4,'Highpass','off','BurstCriterion',5,...
> > > >
> 'WindowCriterion',0.25,'BurstRejection','off','Distance','Euclidian',...
> > > > 'WindowCriterionTolerances',[-Inf 7] );
> > > > EEG = eeg_checkset( EEG );
> > > >
> > > > % Interpolate deleted channels
> > > > EEG = pop_interp(EEG, originalEEG.chanlocs, 'spherical');
> > > > EEG = eeg_checkset( EEG );
> > > >
> > > > % Log interpolated channels to EEG.etc.ica_info
> > > > % [...]
> > > >
> > > > % Re-reference to common average
> > > > EEG = pop_reref( EEG, []);
> > > > EEG = eeg_checkset(EEG);
> > > >
> > > > % Estimate rank using eigenvalue threshold
> > > > data2d = reshape(EEG.data, EEG.nbchan, []);
> > > > covarianceMatrix = cov(double(data2d'));
> > > > eigenvalues = eig(covarianceMatrix);
> > > > rankThreshold = 1e-7;
> > > > dataRank = sum(eigenvalues > rankThreshold);
> > > >
> > > > % Save rank-related info
> > > > % [...]
> > > >
> > > > % Run ICA with PCA if needed
> > > > if dataRank < EEG.nbchan
> > > > EEG = pop_runica(EEG, 'icatype', 'runica', 'extended', 1, 'pca',
> > > > dataRank, 'interrupt', 'on');
> > > > else
> > > > EEG = pop_runica(EEG, 'icatype', 'runica', 'extended', 1,
> 'interrupt',
> > > > 'on');
> > > > end
> > > > EEG = eeg_checkset(EEG);
> > > >
> > > > % Run MARA and reject components
> > > > [artcomps, MARAinfo] = MARA(EEG);
> > > > EEG.reject.MARAinfo = MARAinfo;
> > > > EEG.reject.gcompreject(artcomps) = 1;
> > > > EEG = pop_subcomp(EEG, find(EEG.reject.gcompreject));
> > > > EEG = eeg_checkset(EEG);
> > > > Do you think that would work?
> > > >
> > > > Do you think this approach would be robust enough given the
> limitations
> > > > of my data?
> > > >
> > > > If anything is unclear or if you'd like me to elaborate on any step,
> > > > just let me know!
> > > >
> > > > Best regards from my vacation,
> > > > Tomko 😊
> > > >
> > > > -----Ursprüngliche Nachricht-----
> > > > Von: eeglablist eeglablist-bounces at sccn.ucsd.edu Im Auftrag von
> > > >
> > > > Cedric
> > > >
> > > > Cannard via eeglablist
> > > >
> > > > Gesendet: Freitag, 1. August 2025 09:42
> > > > An: EEGLAB List eeglablist at sccn.ucsd.edu
> > > >
> > > > Betreff: [EXT] Re: [Eeglablist] ICA components distorted after
> > > > rereferencing, saving and reloading
> > > >
> > > > Hi Tomko,
> > > >
> > > > Ah—if you’re referring to the 1e-7, no, you should never need to
> change
> > > > that. That threshold is used to evaluate the data rank anccurately
> and
> > > > ensure any deficiencies are properly accounted for when passed to
> runica().
> > > > The modified common average reference (CAR) method is designed to
> avoid
> > > > losing a rank in the first place, but even if there were a reduction,
> > > > providing the correct data rank to ICA handles it. Ideally, of
> course, you
> > > > want to preserve as much information from your original data as
> possible,
> > > > and the modified CAR helps maintain full rank during average
> referencing.
> > > >
> > > > More severe rank reductions can occur when, for example, multiple bad
> > > > channels are removed and interpolated—say, 5 out of 64, leaving you
> with an
> > > > effective rank of 59. Additional rank loss can result from other,
> less
> > > > visible factors like bridged electrodes or crosstalk due to poor
> shielding.
> > > >
> > > > In short, the modified referencing method helps preserve rank, and
> the
> > > > command-line ICA implementation needs the actual data rank passed in
> to
> > > > function properly.
> > > >
> > > > Note that this is handled automatically in the EEGLAB GUI, but it’s
> > > > important to be aware of when scripting ICA manually.
> > > >
> > > > Makes sense?
> > > >
> > > > Cédric
> > > >
> > > > On Thu, Jul 31, 2025 at 07:07, Tomko Settgast <[
> > > > tomko.settgast at uni-wuerzburg.de](mailto:On Thu, Jul 31, 2025 at
> 07:07,
> > > > Tomko Settgast <<a href=)> wrote:
> > > >
> > > > > Dear Cedric,
> > > > >
> > > > > I am sorry for having caused some confusion!
> > > > > What I meant with "threshold" was the right part of the function
> you
> > > > > have referred to in your paper: sum(eig(cov(double(EEG.data'))) >
> 1e-7).
> > > > >
> > > > > So, my question was: if I use this method to estimate and account
> for
> > > > > rank deficiencies when doing ICA, do I still need to introduce the
> > > > > zero-filled reference channel first, or does this approach already
> account
> > > > > for the rank-deficiencies introduced by not having a reference
> channel and
> > > > > applying CAR before ICA?
> > > > >
> > > > > Regarding clean_rawdata: you mentioned removing channels using a
> lax
> > > > > threshold before running the cleaning step. I assumed this meant
> running
> > > > > clean_rawdata twice — once with a lax threshold to remove
> obviously bad
> > > > > channels, and then again with stricter/default settings. Is that
> correct?
> > > > > And if so, do you apply both passes of clean_rawdata in direct
> sequence, or
> > > > > do you insert ICA in between (i.e., clean_rawdata (lax), ICA,
> clean_rawdata
> > > > > ("normal"))?
> > > > > Or do you use another method to detect very noisy channels prior to
> > > > > cleaning?
> > > > >
> > > > > I hope this clarifies my questions a bit better.
> > > > >
> > > > > Best,
> > > > > Tomko
> > > > >
> > > > > -----Original Message-----
> > > > > From: eeglablist eeglablist-bounces at sccn.ucsd.edu On Behalf Of
> > > > > Cedric Cannard via eeglablist
> > > > > Sent: Thursday, July 31, 2025 8:28 AM
> > > > > To: EEGLAB List eeglablist at sccn.ucsd.edu
> > > > > Subject: [EXT] Re: [Eeglablist] ICA components distorted after
> > > > > rereferencing, saving and reloading
> > > > >
> > > > > Hi Tomko,
> > > > >
> > > > > > using your threshold and applying PCA reduction during ICA alone
> > > > > > cannot account for existing rank deficiencies, correct?
> > > > > > So in any case, I would still need to re-introduce this
> zero-filled
> > > > > > reference channel?
> > > > >
> > > > > "threshold" referring to ASR/clean_rawdata() here? That is just to
> > > > > remove large artifacts before ICA.
> > > > >
> > > > > Feeding the effective data rank to ICA ensures that it accounts for
> > > > > rank deficiencies in the data. For instance, if your dataset has 64
> > > > > electrodes but only 60 effective dimensions (e.g., due to applying
> a common
> > > > > average reference, which reduces rank by 1, and interpolating 3 bad
> > > > > channels), then ICA will only attempt to extract 60 independent
> components
> > > > > instead of 64, which prevents overfitting and instability.
> > > > >
> > > > > Using the modified CAR approach (where a simulated channel filled
> with
> > > > > zeros is appended before re-referencing) allows you to maintain
> full rank
> > > > > during average referencing. This way, you preserve the original
> data rank
> > > > > and don’t need to reintroduce the reference channel later.
> > > > >
> > > > > Hope this helps,
> > > > >
> > > > > Cedric Cannard
> > > > >
> > > > > On Wednesday, July 30th, 2025 at 4:44 AM, Tomko Settgast
> > > > > tomko.settgast at uni-wuerzburg.de wrote:
> > > > >
> > > > > > Dear
> > > > > >
> > > > > > Cedric
> > > > > >
> > > > > > ,
> > > > > >
> > > > > > Thanks again for the super quick and helpful reply!
> > > > > > And yes, of course, it makes sense to bring this back to the
> > > > > > community.
> > > > > >
> > > > > > I will try to follow your recommendations.
> > > > > > Just to be sure: using your threshold and applying PCA reduction
> > > > > > during ICA alone cannot account for existing rank deficiencies,
> correct?
> > > > > > So in any case, I would still need to re-introduce this
> zero-filled
> > > > > > reference channel?
> > > > > >
> > > > > > Regarding your second recommendation - to clarify: Do you apply
> the
> > > > > > clean_rawdata twice directly in sequence (first with a lax
> threshold, then
> > > > > > with a stricter one), or do you run ICA in between, i.e.,
> clean_rawdata
> > > > > > (lax), ICA, clean_rawdata ("normal")?
> > > > > >
> > > > > > Thanks again!
> > > > > >
> > > > > > Best,
> > > > > > Tomko 😊
> > > > > >
> > > > > > -----Ursprüngliche Nachricht-----
> > > > > > Von: eeglablist eeglablist-bounces at sccn.ucsd.edu Im Auftrag von
> > > > > > Cedric Cannard via eeglablist
> > > > > >
> > > > > > Gesendet: Dienstag, 29. Juli 2025 23:01
> > > > > > An: EEGLAB List eeglablist at sccn.ucsd.edu
> > > > > >
> > > > > > Betreff: [EXT] Re: [Eeglablist] ICA components distorted after
> > > > > > rereferencing, saving and reloading
> > > > > >
> > > > > > Dear Tomko,
> > > > > >
> > > > > > Getting this thread back in the eeglablist loop in case it is
> > > > > > helpful to more people later.
> > > > > >
> > > > > > Yes, apply the modified CAR method (zero-filled channel) to
> preserve
> > > > > > effective data rank before ICA:
> > > > > >
> > > > > > % 1) Add a zero-filled surrogate for the initial reference
> refLabel =
> > > > > > 'initialReference'; EEG.data(end+1, :) = 0; EEG.nbchan =
> EEG.nbchan +
> > > > > > 1; EEG.chanlocs(end+1).labels = refLabel; % minimal fields are
> fine
> > > > > > for pop_reref
> > > > > >
> > > > > > % 2) Re-reference to average including the zero-filled reference
> %
> > > > > > use 'refloc' input to explicitly tell EEGLAB that you are
> referencing to a
> > > > > > virtual reference with the label 'initialReference'. EEGLAB
> stores this
> > > > > > information in EEG.ref and internally handles bookkeeping better
> for
> > > > > > projection matrices, reversing the reference or re-referencing
> later.
> > > > > > EEG = pop_reref(EEG, [], 'refloc', struct('labels', refLabel));
> > > > > >
> > > > > > % 3) Remove the zero-filled channel to return to the original
> channel
> > > > > > count EEG = pop_select(EEG, 'nochannel', {refLabel});
> > > > > >
> > > > > > Note: when I can, I generally run clean_rawdata/ASR right after
> > > > > > removing bad channels with a very lax threshold (e.g. 60-100
> depending on
> > > > > > data) to remove large artifacts before ICA to improve
> performance. More
> > > > > > aggressive thresholds can remove eye blinks or alpha waves
> depending on
> > > > > > your data quality, which would then reduce ICA performance at
> extracting
> > > > > > ocular components (you want to preserve the eye activity to
> separate it
> > > > > > more successfully).
> > > > > >
> > > > > > Cedric
> > > > > >
> > > > > > On Saturday, July 26th, 2025 at 4:07 AM, Tomko Settgast
> > > > > > tomko.settgast at uni-wuerzburg.de wrote:
> > > > > >
> > > > > > > Dear Dr. Cannard,
> > > > > > >
> > > > > > > Thank you very much for the quick and helpful response!
> > > > > > >
> > > > > > > As far as I recall your paper, I also do not remember you
> > > > > > > recommending re-referencing after ICA.
> > > > > > > I was trying to follow both the guidelines you provided in your
> > > > > > > paper, and the recommendation from the EEGLab tutorial.
> > > > > > > That is what me lead to this hybrid approach.
> > > > > > > But knowing that the recommendation in EEGLab might be
> incorrect
> > > > > > > actually simplifies things for me because I did not observe
> any problems
> > > > > > > when applying CAR before ICA.
> > > > > > > What may have added to my confusion - making the
> recommendation in
> > > > > > > the tutorial plausible - was that some pipelines may indeed
> appear to apply
> > > > > > > CAR after ICA (e.g., EPOS:
> > > > > > >
> https://urldefense.com/v3/__https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2021.660449/full*h4__;Iw!!Mih3wA!GU-3iC3bMIbuyzXxfHhwMjWvX_aw0Qc2OYibKegEIojxdaSycLRawNNpcStKWPGQT6A1y0gonLnCTlzxbuFdoHs42g$
> > > > > > > , or HAPPE:
> > > > > > >
> https://urldefense.com/v3/__https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2018.00097/full*h5__;Iw!!Mih3wA!GU-3iC3bMIbuyzXxfHhwMjWvX_aw0Qc2OYibKegEIojxdaSycLRawNNpcStKWPGQT6A1y0gonLnCTlzxbuGqsNTIrA$
> > > > > > > ).
> > > > > > > However, I don't know their implementations in detail, and they
> > > > > > > might have already accounted for occurring issues.
> > > > > > >
> > > > > > > I especially want to thank you for the detailed explanation
> > > > > > > regarding why saving and reloading seemingly corrupts the ICA.
> > > > > > > That was the thing that puzzled me the most and they way you
> have
> > > > > > > explained it made immediate sense.
> > > > > > >
> > > > > > > Just to confirm: If I understood you correctly, you would
> > > > > > > recommend the following order: Delete heavily noisy or flat
> channels (I am
> > > > > > > using clean_artifcacts), interpolate these channels, apply
> CAR, run ICA,
> > > > > > > correct?
> > > > > > >
> > > > > > > One additional question regarding the rank-deficiency issues in
> > > > > > > ICA:
> > > > > > > My datasets do not include a reference channel and were
> recorded
> > > > > > > with a unipolar reference (ear lobe/CZ).
> > > > > > > In Makoto's pipeline, if I understood it correctly, it is
> > > > > > > suggested to add zero-filled channel to account for the
> effective
> > > > > > > rank-deficiency when no reference channel is present.
> > > > > > > Now, if I follow your recommendations, i.e., computing the
> > > > > > > approximate true rank with the formula you provided - do you
> think this is
> > > > > > > already sufficient, or would you still recommend adding a
> zero-only channel
> > > > > > > to my data?
> > > > > > >
> > > > > > > Thank you again for you time and the guidance!
> > > > > > >
> > > > > > > Best,
> > > > > > > Tomko
> > > > > > >
> > > > > > > -----Ursprüngliche Nachricht-----
> > > > > > > Von: eeglablist eeglablist-bounces at sccn.ucsd.edu Im Auftrag
> von
> > > > > > >
> > > > > > > Cedric
> > > > > > >
> > > > > > > Cannard via eeglablist
> > > > > > >
> > > > > > > Gesendet: Freitag, 25. Juli 2025 19:59
> > > > > > > An: EEGLAB List eeglablist at sccn.ucsd.edu
> > > > > > >
> > > > > > > Betreff: [EXT] Re: [Eeglablist] ICA components distorted after
> > > > > > > rereferencing, saving and reloading
> > > > > > >
> > > > > > > Hi Tomko,
> > > > > > >
> > > > > > > There may be an error in the tutorial web page. Makoto, Scott,
> or
> > > > > > > Arno, please correct me if I'm wrong here.
> > > > > > >
> > > > > > > If I recall correctly, we do not recommend in the paper to
> > > > > > > re-reference after ICA. Instead, we emphasize correcting the
> average
> > > > > > > referencing before ICA and taking the data rank into account
> while
> > > > > > > computing ICA, so that ICA operates on a properly rank-full
> dataset and
> > > > > > > avoids generating "ghost ICs".
> > > > > > >
> > > > > > > see these quotes:
> > > > > > > "To avoid this issue, we propose two solutions: 1) apply the
> > > > > > > correct average referencing, and 2) calculate the effective
> data rank that
> > > > > > > is used for PCA dimension reduction in applying ICA."
> > > > > > > "The correct method, i.e., including the initial reference when
> > > > > > > re-referencing and then discarding the initial reference
> channel, resulted
> > > > > > > in a successful rank-full decomposition."
> > > > > > >
> > > > > > > Now to address your questions more directly:
> > > > > > >
> > > > > > > EEGLAB stores the ICA decomposition using:
> > > > > > > - EEG.icaweights and EEG.icasphere: together they define the
> > > > > > > unmixing matrix,
> > > > > > > - EEG.icawinv: the mixing matrix (inverse of the unmixing
> matrix),
> > > > > > > - EEG.data: the referenced data ICA was trained on.
> > > > > > > When you apply pop_reref(EEG, []) after ICA, it modifies
> EEG.data,
> > > > > > > but does not update the ICA weights or sphere, nor does it
> recompute
> > > > > > > EEG.icaact if it exists. So now the ICA decomposition no
> longer corresponds
> > > > > > > to the modified data (that is if operations were done between
> your last
> > > > > > > referencing and post-ICA referencing). If you save the set
> like this, ICA
> > > > > > > reconstruction is no longer valid when reloading.
> > > > > > > --> This is why the ICs appear corrupted after reloading — the
> > > > > > > weights are applied to differently referenced data than they
> were trained
> > > > > > > on.
> > > > > > >
> > > > > > > Why does the issue only happen when MARA doesn’t reject any
> > > > > > > components? I haven't used MARA before, but my guess is that
> when MARA
> > > > > > > rejects components, EEGLAB calls pop_subcomp() which updates
> EEG.data to
> > > > > > > reflect the projection of the ICA decomposition with removed
> components.
> > > > > > > This may result in reinitializing the ICA fields in a way that
> masks the
> > > > > > > corruption caused by re-referencing.
> > > > > > >
> > > > > > > But when no components are rejected (gcompreject = 0),
> EEG.data is
> > > > > > > left unchanged from ICA, and then your subsequent call to
> pop_reref
> > > > > > > corrupts the ICA-to-data correspondence.
> > > > > > >
> > > > > > > > Can you confirm whether re-referencing after MARA might break
> > > > > > > > the ICA structure?
> > > > > > >
> > > > > > > Yes.
> > > > > > >
> > > > > > > > Would you recommend re-referencing before ICA, especially if
> PCA
> > > > > > > > is already used to adjust for rank deficiency?
> > > > > > >
> > > > > > > Yes (see above correction about the paper and recommendations).
> > > > > > > You should also interpolate bad channels before ICA (before
> calculating the
> > > > > > > data rank).
> > > > > > >
> > > > > > > > Do you have any suggestions on how to safely apply
> > > > > > > > re-referencing post-ICA?
> > > > > > >
> > > > > > > If for some reason you must re-reference after ICA (e.g., for
> > > > > > > visualization), do one of the following:
> > > > > > >
> > > > > > > % Backup ICA weights before re-referencing icaweights =
> > > > > > > EEG.icaweights; icasphere = EEG.icasphere;
> > > > > > >
> > > > > > > % Apply re-referencing (this alters EEG.data!) EEG =
> pop_reref(EEG,
> > > > > > > []);
> > > > > > >
> > > > > > > % Re-apply ICA on new data manually EEG.icaact = icaweights *
> > > > > > > icasphere * EEG.data;
> > > > > > >
> > > > > > > Cedric Cannard
> > > > > > >
> > > > > > > On Thursday, July 24th, 2025 at 12:47 PM, Tomko Settgast via
> > > > > > > eeglablist eeglablist at sccn.ucsd.edu wrote:
> > > > > > >
> > > > > > > > Dear EEGLAB team,
> > > > > > > >
> > > > > > > > I encountered a puzzling issue while applying common average
> > > > > > > > referencing (CAR) after ICA, as currently recommended in the
> EEGlab
> > > > > > > > tutorial (
> > > > > > > >
> https://urldefense.com/v3/__https://eeglab.org/tutorials/06_RejectArtifacts/RunICA.html*issues-with-data-rank-deficiencies__;Iw!!Mih3wA!Hy4Uj5ky7Dr8niY5yJBllkYbV8sC-8D2mnYBuaJiqvb2aYgEkuwDK9M_9PyFCOjmtEKMU505X4gXGbOKz39zcB4VawTXegSozL76zYdG$
> > > > > > > > ) to address rank deficiencies.
> > > > > > > >
> > > > > > > > In one specific dataset, ICA components appear strongly
> > > > > > > > corrupted after saving and reloading the dataset, although
> they looked
> > > > > > > > perfectly normal before saving. This issue does not occur in
> other datasets
> > > > > > > > processed by the same automated pipeline.
> > > > > > > >
> > > > > > > > Steps to reproduce:
> > > > > > > >
> > > > > > > > 1. Run runica and perform MARA artifact rejection.
> > > > > > > > 2. Apply CAR using pop_reref(EEG, []) after ICA (as
> recommended).
> > > > > > > > * The components looked completely normal at this point (see
> > > > > > > > attached
> > > > > > > > ICsBeforeSaving.gif) 3. Save the dataset using pop_saveset.
> > > > > > > > 4. Close EEGLAB, reload the dataset, and inspect ICA
> components
> > > > > > > > with pop_eegplot .
> > > > > > > > * After reloading, the ICA time series look heavily distorted
> > > > > > > > (see attached ICsAfterReload.gif)
> > > > > > > >
> > > > > > > > To investigate further, I saved the EEG structure as .mat
> > > > > > > > immediately before saving with pop_saveset and compared it
> with the EEG
> > > > > > > > structure after reloading the saved dataset.
> > > > > > > > There are mismatches between the ICA matrices in these two
> > > > > > > > structures - which aligns with the visual differences in the
> component time
> > > > > > > > series.
> > > > > > > >
> > > > > > > > For a better understanding:
> > > > > > > >
> > > > > > > > * MARA did not reject components for this dataset (0 entries
> in
> > > > > > > > gcompreject).
> > > > > > > > * The ICA was computed with proper PCA reduction to the data
> > > > > > > > rank (rank = 10), following the recommendations in Kim et
> al. (2023) (
> > > > > > > >
> https://urldefense.com/v3/__https://doi.org/10.3389/frsip.2023.1064138__;!!Mih3wA!Hy4Uj5ky7Dr8niY5yJBllkYbV8sC-8D2mnYBuaJiqvb2aYgEkuwDK9M_9PyFCOjmtEKMU505X4gXGbOKz39zcB4VawTXegSozNk76r1m$
> > > > > > > > ), so ghost components are unlikely.
> > > > > > > > * I am aware that common average re-referencing is commonly
> not
> > > > > > > > recommended for datasets with a low number of channels but we
> > > > > > > > were trying to follow the guidelines expressed by Hu et al.
> > > > > > > > (2018;
> > > > > > > > 10.1088/1741-2552/aaa13f)
> > > > > > > > * The issue seems to occur only in this particular dataset -
> as
> > > > > > > > far as I can trust my visual comparison.
> > > > > > > > In other datasets where MARA rejected components, I did not
> > > > > > > > encounter this behavior (again, only checked visually) -
> even though the
> > > > > > > > same pipeline and re-referencing strategy was applied.
> > > > > > > >
> > > > > > > > After discussing the issue with ChatGPT, the suggestion came
> up
> > > > > > > > that applying re-referencing after ICA might silently
> disrupt the
> > > > > > > > ICA-to-data mapping, and this mismatch only becomes apparent
> after saving
> > > > > > > > and reloading the dataset.
> > > > > > > >
> > > > > > > > However, what I find particularly confusing is that this
> issue
> > > > > > > > only occurs when MARA does not reject any components - which
> would normally
> > > > > > > > indicate better signal quality.
> > > > > > > >
> > > > > > > > My questions:
> > > > > > > >
> > > > > > > > * Can you confirm whether re-referencing after MARA-based
> > > > > > > > component rejection might break the ICA structure in this
> way?
> > > > > > > > * Is this a known issue with pop_reref in combination with
> ICA,
> > > > > > > > and MARA?
> > > > > > > > * Would you recommend re-referencing before ICA, especially
> if
> > > > > > > > PCA is already used to adjust for rank deficiency?
> > > > > > > > * Do you have any suggestions on how to safely apply
> > > > > > > > re-referencing post-ICA without compromising the ICA
> decomposition?
> > > > > > > >
> > > > > > > > I'd be happy to share the script or provide further
> information
> > > > > > > > if helpful.
> > > > > > > >
> > > > > > > > Thank you very much in advance for your time and support!
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Tomko
> > > > > > > >
> > > > > > > > Tomko Settgast, MSc
> > > > > > > > Section Intervention Psychology
> > > > > > > > University of Würzburg
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > To unsubscribe, send an empty email to
> > > > > > > > eeglablist-unsubscribe at sccn.ucsd.edu or visit
> > > > > > > > https://sccn.ucsd.edu/mailman/listinfo/eeglablist    .
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > To unsubscribe, send an empty email to
> > > > > > > eeglablist-unsubscribe at sccn.ucsd.edu or visit
> > > > > > > https://sccn.ucsd.edu/mailman/listinfo/eeglablist    .
> > > > > >
> > > > > > _______________________________________________
> > > > > > To unsubscribe, send an empty email to
> > > > > > eeglablist-unsubscribe at sccn.ucsd.edu or visit
> > > > > > https://sccn.ucsd.edu/mailman/listinfo/eeglablist    .
> > > > > > _______________________________________________
> > > > > > To unsubscribe, send an empty email to
> > > > > > eeglablist-unsubscribe at sccn.ucsd.edu or visit
> > > > > > https://sccn.ucsd.edu/mailman/listinfo/eeglablist    .
> > > >
> > > > _______________________________________________
> > > > To unsubscribe, send an empty email to
> > > > eeglablist-unsubscribe at sccn.ucsd.edu or visit
> > > > https://sccn.ucsd.edu/mailman/listinfo/eeglablist    .
> > > > _______________________________________________
> > > > To unsubscribe, send an empty email to
> > > > eeglablist-unsubscribe at sccn.ucsd.edu or visit
> > > > https://sccn.ucsd.edu/mailman/listinfo/eeglablist    .
> >
> > _______________________________________________
> > To unsubscribe, send an empty email to
> eeglablist-unsubscribe at sccn.ucsd.edu or visit
> https://sccn.ucsd.edu/mailman/listinfo/eeglablist   .
> _______________________________________________
> To unsubscribe, send an empty email to
> eeglablist-unsubscribe at sccn.ucsd.edu or visit
> https://sccn.ucsd.edu/mailman/listinfo/eeglablist  .