[Eeglablist] Inconsistent results using clean_artifacts

Mon Mar 21 21:24:30 PDT 2022

Dear Cristina, Makoto, Hyeonseok and Arno,

Thank you for all the detailed reports and tests! It's interesting that we
seem to be finding different "solutions" to the same problem and that what
appears to work on one dataset is not replicable for another one. I guess
the preprocessing steps run before *clean_rawdata *might affect the
stability of the final result. I just noticed that Cristina's dataset was
1500000 samples long, which is the threshold in which instabilities
appeared for my dataset (I reported the issue here:
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sccn_clean-5Frawdata_issues_37&d=DwIFaQ&c=-35OiAkTchMrZOngvJPOeA&r=kB5f6DjXkuOQpM1bq5OFA9kKiQyNm1p6x6e36h3EglE&m=46kPVGgLeodwV0vwAZC7Ie0TrBAfx5_KuTjZHmmrOXqYTW0cGGtZaoS-rkFd4VzP&s=3w63tmIf3WGMnY-_GXeZhONW7cq57WYDqvWtyw51t2E&e= ). Makoto, was your dataset
around the same length? It looks like in your case the default value for
the number of samples (50) was stable, if I understood the plot correctly.
I don't think this is the main problem here, but at the moment, it might be
a good guide to understand whether to apply *clean_rawdata* or not.

Another question about fixing *rng(), *as this seems to be the core issue.
Would this not just ensure replicability without actually ensuring correct
detection of noisy channels? If the results are dependent on the seed and
are variable under certain conditions, then it is not given that the
detected channels are actually the bad channels (same for the good
channels). I guess this is not highly problematic for dense array
recordings. Although not desirable, losing one good electrode might not be
the end of the world; unless multiple good electrodes within the same area
are flagged as noisy. However, it might be an issue for recordings with a
few electrodes, like in Cristina's case, in which every electrode carries
lots of information.

Thank you again for this interesting thread,
Have a good day

Daniele

On Tue, 22 Mar 2022 at 15:01, Makoto Miyakoshi via eeglablist <
eeglablist at sccn.ucsd.edu> wrote:

> Dear Cristina, Daniele, and Arno (cc Hyeonseok),
>
> This is a follow up study. Hyeonseok and I ran a test using empirical
> datasets. See the summary below.
>
> https://sccn.ucsd.edu/wiki/Makoto%27s_preprocessing_pipeline#Channel_rejection_using_RANSAC_in_clean_rawdata.28.29_.2803.2F21.2F2022_added.29
> Our results did NOT show increasing 'NumSamples' produces more stable
> results, given rng() is NOT fixed. We wished it does!
> This warrants further investigation.
>
> Makoto
>
> On Thu, Mar 17, 2022 at 10:57 AM Makoto Miyakoshi <mmiyakoshi at ucsd.edu>
> wrote:
>
> > Dear Cristina,
> >
> > Wow, this is such a perfect summary report. I deeply appreciate you took
> > so much time and care to make this happen.
> > You are the best part of the EEGLAB mailing list. Thank you, thank you,
> > thank you!
> >
> > > Second, I would prefer not to discard the RANSAC method to detect bad
> > channels if I find a stable solution. I believe that the RANSAC method is
> > the core for detecting bad channels in the clean_rawdata function.
> >
> > I appreciate you mentioning that. I'll tell you why.
> > In the early 2010's when Christian, the developer of ASR, was working on
> > the offline version of clean_rawdata() upon my request, he gave me a
> > solution once, then told me that he wanted to add one more thing for
> > update. Within a few days, this RANSAC part was implemented. So this
> RANSAC
> > part was one of the final touch ups he specifically wanted to implement.
> >
> > So I agree, I'd love to use his bad-channel rejection. Your confirmation
> > is so valuable for me--increasing the 'NumSamples' to 1000, for example,
> > can make the algorithm's behavior more stable. I'll make it my default
> and
> > use the channel rejection function again. I still would not use 0.8 for
> the
> > correlation criterion though, I'd use 0.6-0.7. Christian did recommend
> > higher values. But the problem of channel rejection is that short-segment
> > of high-amplitude data always biases the selection. Now I quickly checked
> > code of clean_channels(), but the current process is not robustified
> > against the short, high-amplitude burst.
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sccn_clean-5Frawdata_blob_master_clean-5Fchannels.m&d=DwIFaQ&c=-35OiAkTchMrZOngvJPOeA&r=kB5f6DjXkuOQpM1bq5OFA9kKiQyNm1p6x6e36h3EglE&m=iihm7vXbmXPM3roTZyq3HHfjCLd_EvrE7iP_zLcVlArZO35j4N9teP2ZcZOlFBVC&s=YK0XRmMVfmrFp10TRIoVsOlqx0JmAS-uKu53HSldgyQ&e=
> > It seems possible to address this issue. I'll discuss it with colleagues.
> >
> > By the way, I have an update for you ASR enthusiasts which you may be
> > interested in. Let me forward my recent post to the list below.
> >
> > %%%%%%%%%%
> > Relatedly, Hyeonseok and I have been working on a mod for the calibration
> > stage of ASR to process our Juggling data collected by Hiroyuki.
> > We will present the idea at the Mobi meeting 2022.
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__sites.google.com_ucsd.edu_mobi2022_&d=DwIFaQ&c=-35OiAkTchMrZOngvJPOeA&r=kB5f6DjXkuOQpM1bq5OFA9kKiQyNm1p6x6e36h3EglE&m=iihm7vXbmXPM3roTZyq3HHfjCLd_EvrE7iP_zLcVlArZO35j4N9teP2ZcZOlFBVC&s=I9BcssF-6sbBtzzgkAMBAn12ZjN2cpCFbjGMymOzohg&e=
> >
> > The idea is to use single-frame order statistics across electrodes rather
> > than the default sliding window for selecting the calibration data. This
> > way, we can obtain more 'clean' data points without letting
> high-amplitude
> > artifacts into the calibration data (there is a default tolerance
> > value--that is, the default setting allows a small amount of outliers
> sneak
> > into the calibration data, up to 7.5% of electrodes; The proposed
> > method uses 0%.) The proposed method makes subsequent PC distributions
> more
> > Gaussian, which fits the assumption of ASR. Also, the proposed method
> seems
> > to be able to explain, at least partially, the reason why the
> > conventional empirically recommended values for the cutoff SD are
> unusually
> > high, such as SD == 20. We will show both simulation and empirical
> results.
> > Check out the MoBI 2022 conference!
> > %%%%%%%%%%
> >
> > Makoto
> >
> >
> >
> > On Mon, Mar 14, 2022 at 8:55 AM Gil Avila, Cristina <cristina.gil at tum.de
> >
> > wrote:
> >
> >> Thank you all for your input.
> >>
> >>
> >>
> >> First, I have noticed that the set of bad channels is only different
> >> every time I restart EEGLab (please see the code below, I run EEGLab
> >> command inside the loop over repetitions). Otherwise results are stable
> >> (@Arno Could this explain why it passed all the tests?).
> >>
> >>
> >>
> >> Second, I would prefer not to discard the RANSAC method to detect bad
> >> channels if I find a stable solution. I believe that the RANSAC method
> is
> >> the core for detecting bad channels in the clean_rawdata function. The
> two
> >> other options (clean channels based on flat line and on the high
> frequency
> >> activity) seem to me more a preliminary step to the RANSAC. Therefore I
> >> have tested:
> >>
> >>    1. How the ‘ChannelCriterion’ parameter influences the selected bad
> >>    channels. I have tried the values 0.7, 0.8 (default) and 0.9. The
> higher
> >>    the value, the less reproducible is the result. This was not a
> surprise if
> >>    I look at the definition of the ChannelCriterion parameter: ‘if a
> channel
> >>    is correlated at less than this value to an estimate based on other
> >>    channels it is considered abnormal in the given time window’. Still,
> even
> >>    being lax with the correlation threshold (0.7) I don’t get
> reproducible
> >>    results.
> >>    2. How the high-pass bandwidth influences the selected bad channels.
> >>    I have tried a highpass with bandwidth [1 1.5] instead of the
> default [0.25
> >>    0.75] with the ‘ChannelCriterion’ parameter fixed at 0.8. This does
> not
> >>    seem to increase the reproducibility.
> >>    3. How the ‘NumSamples’ RANSAC parameter of clean_artifacts()
> >>    influences the selected bad channels. I have tried with 50
> (default), 100,
> >>    500 and 1000 samples with ‘ChannelCriterion’ fixed at 0.8.
> Increasing this
> >>    parameter to 1000 makes the output more reliable at the cost of more
> >>    computation time (~1.5 min per recording).
> >>
> >>
> >>
> >> Brief comment regarding my data: I am working with eyes-closed
> >> resting-state, 29 channels, recordings of 5 mins of duration sampled at
> 500
> >> Hz (~150000 samples).
> >>
> >> For each case I have run 10 repetitions. You can also find along with
> the
> >> code figures of all test cases. Figures represent how often was each
> >> channel marked bad in each recording.
> >>
> >>
> >>
> >> For reproducibility I attach my code and the small dataset I am using. I
> >> am using most recent versions of EEGLab and clean_rawdata from github.
> >>
> >> Code:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_crisglav_replication-5Fclean-5Frawdata_&d=DwIFaQ&c=-35OiAkTchMrZOngvJPOeA&r=kB5f6DjXkuOQpM1bq5OFA9kKiQyNm1p6x6e36h3EglE&m=iihm7vXbmXPM3roTZyq3HHfjCLd_EvrE7iP_zLcVlArZO35j4N9teP2ZcZOlFBVC&s=oHz9McmYOk26K0uA8eNdyYXpegIPR-kaw_5f0hvlV-o&e=
> >> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_crisglav_replication-5Fclean-5Frawdata_&d=DwQGaQ&c=-35OiAkTchMrZOngvJPOeA&r=pyiMpJA6aQ3IKcfd-jIW1kWlr8b1b2ssGmoavJHHJ7Q&m=9m75cEFE25pnZqvTCnezRor87-PYdjeB2KlL4FhRwDsyrde-Zy2fdp5Ds1Jye6IK&s=bRRPLy36GAqMvYFcRiHOH3FY3hXoxi1qCMMcxJ7EVPA&e=
> >
> >>
> >> Dataset:
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__syncandshare.lrz.de_getlink_fiX7VwVdbGEsMTf46kqrcvx3_rawBIDS&d=DwIFaQ&c=-35OiAkTchMrZOngvJPOeA&r=kB5f6DjXkuOQpM1bq5OFA9kKiQyNm1p6x6e36h3EglE&m=iihm7vXbmXPM3roTZyq3HHfjCLd_EvrE7iP_zLcVlArZO35j4N9teP2ZcZOlFBVC&s=yviziOoR9Lt7TcuVhGqEbc5JTm_lN1dHti4wjW0P-Sg&e=
> >> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__syncandshare.lrz.de_getlink_fiX7VwVdbGEsMTf46kqrcvx3_rawBIDS&d=DwQGaQ&c=-35OiAkTchMrZOngvJPOeA&r=pyiMpJA6aQ3IKcfd-jIW1kWlr8b1b2ssGmoavJHHJ7Q&m=9m75cEFE25pnZqvTCnezRor87-PYdjeB2KlL4FhRwDsyrde-Zy2fdp5Ds1Jye6IK&s=QCLa_vSG7bTiSAxeN1p8HNpZxCvMcpRp02JstwyDFMA&e=
> >
> >>
> >> Note: to test 3) I had to change clean_artifacts code and add in line
> 186
> >>
> >> {'num_samples','NumSamples'}, 50, ... % line 186
> >>
> >> And substitute line 232 by
> >>
> >> [EEG,removed_channels] =
> >>
> clean_channels(EEG,chancorr_crit,line_crit,[],channel_crit_maxbad_time,num_samples);
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >>
> >> Cristina Gil Ávila – PhD candidate
> >>
> >> Department of Neurology
> >>
> >> Technische Universität München
> >>
> >> Munich, Germany
> >>
> >> cristina.gil at tum.de
> >>
> >> painlabmunich.de
> >> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.painlabmunich.de_&d=DwMGaQ&c=-35OiAkTchMrZOngvJPOeA&r=pyiMpJA6aQ3IKcfd-jIW1kWlr8b1b2ssGmoavJHHJ7Q&m=9m75cEFE25pnZqvTCnezRor87-PYdjeB2KlL4FhRwDsyrde-Zy2fdp5Ds1Jye6IK&s=hSQglHuzdgnx2GiKB_bxC1oRqVi-TqsmKbANR39Pcdk&e=
> >
> >>
> >>
> >>
> >
> _______________________________________________
> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
> To unsubscribe, send an empty email to
> eeglablist-unsubscribe at sccn.ucsd.edu
> For digest mode, send an email with the subject "set digest mime" to
> eeglablist-request at sccn.ucsd.edu