[Eeglablist] Inconsistent results using clean_artifacts
Makoto Miyakoshi
mmiyakoshi at ucsd.edu
Tue Mar 22 09:16:19 PDT 2022
Dear Daniele,
> Makoto, was your dataset around the same length?
We used 205-ch, 4200-s long data downsampled to 128 Hz for the test
purpose. I don't think this can cause any problem.
> Would this not just ensure replicability without actually ensuring
correct detection of noisy channels?
Fixing the random seed is definitely a solution to make the result
deterministic. Indeed, the original clean_rawdata() had this solution.
However, that is a solution by computer science. We want to evaluate
statistical stability, which is a separate question. It would be
embarrassing if a different choice of a random seed generates completely
different results.
In other words, even if fixing a random seed superficially addresses the
issue, I believe it is still a good idea to take advantage of this
opportunity to check the result stability across different choices of
random seeds.
Makoto
On Mon, Mar 21, 2022 at 9:24 PM Daniele Scanzi <dsca347 at aucklanduni.ac.nz>
wrote:
> Dear Cristina, Makoto, Hyeonseok and Arno,
>
> Thank you for all the detailed reports and tests! It's interesting that we
> seem to be finding different "solutions" to the same problem and that what
> appears to work on one dataset is not replicable for another one. I guess
> the preprocessing steps run before *clean_rawdata *might affect the
> stability of the final result. I just noticed that Cristina's dataset was
> 1500000 samples long, which is the threshold in which instabilities
> appeared for my dataset (I reported the issue here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sccn_clean-5Frawdata_issues_37&d=DwIFaQ&c=-35OiAkTchMrZOngvJPOeA&r=kB5f6DjXkuOQpM1bq5OFA9kKiQyNm1p6x6e36h3EglE&m=99GZT-uAHjr7v3vjDkV_tIeyj4sNAaKJNlsqN6PDmgYoV_NAWbo69hiZoYZXEpSF&s=bB94ZG9YAXzrUtjqHA-2akAXcrgnXAA9AsVIROIw8bc&e=
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sccn_clean-5Frawdata_issues_37&d=DwMFaQ&c=-35OiAkTchMrZOngvJPOeA&r=pyiMpJA6aQ3IKcfd-jIW1kWlr8b1b2ssGmoavJHHJ7Q&m=6Twh6XupArvxJS1Czj9-7F7spnsBgu8jhP6o9XGI-xeTRrslYBDsufNKTowBiV37&s=sMztA6sQscLjhJZ8Zr-_nG0X0bLHgi1E0wOOUeweW-E&e=>).
> Makoto, was your dataset around the same length? It looks like in your case
> the default value for the number of samples (50) was stable, if I
> understood the plot correctly. I don't think this is the main problem here,
> but at the moment, it might be a good guide to understand whether to apply
> *clean_rawdata* or not.
>
> Another question about fixing *rng(), *as this seems to be the core
> issue. Would this not just ensure replicability without actually ensuring
> correct detection of noisy channels? If the results are dependent on the
> seed and are variable under certain conditions, then it is not given that
> the detected channels are actually the bad channels (same for the good
> channels). I guess this is not highly problematic for dense array
> recordings. Although not desirable, losing one good electrode might not be
> the end of the world; unless multiple good electrodes within the same area
> are flagged as noisy. However, it might be an issue for recordings with a
> few electrodes, like in Cristina's case, in which every electrode carries
> lots of information.
>
> Thank you again for this interesting thread,
> Have a good day
>
> Daniele
>
> On Tue, 22 Mar 2022 at 15:01, Makoto Miyakoshi via eeglablist <
> eeglablist at sccn.ucsd.edu> wrote:
>
>> Dear Cristina, Daniele, and Arno (cc Hyeonseok),
>>
>> This is a follow up study. Hyeonseok and I ran a test using empirical
>> datasets. See the summary below.
>>
>> https://sccn.ucsd.edu/wiki/Makoto%27s_preprocessing_pipeline#Channel_rejection_using_RANSAC_in_clean_rawdata.28.29_.2803.2F21.2F2022_added.29
>> Our results did NOT show increasing 'NumSamples' produces more stable
>> results, given rng() is NOT fixed. We wished it does!
>> This warrants further investigation.
>>
>> Makoto
>>
>> On Thu, Mar 17, 2022 at 10:57 AM Makoto Miyakoshi <mmiyakoshi at ucsd.edu>
>> wrote:
>>
>> > Dear Cristina,
>> >
>> > Wow, this is such a perfect summary report. I deeply appreciate you took
>> > so much time and care to make this happen.
>> > You are the best part of the EEGLAB mailing list. Thank you, thank you,
>> > thank you!
>> >
>> > > Second, I would prefer not to discard the RANSAC method to detect bad
>> > channels if I find a stable solution. I believe that the RANSAC method
>> is
>> > the core for detecting bad channels in the clean_rawdata function.
>> >
>> > I appreciate you mentioning that. I'll tell you why.
>> > In the early 2010's when Christian, the developer of ASR, was working on
>> > the offline version of clean_rawdata() upon my request, he gave me a
>> > solution once, then told me that he wanted to add one more thing for
>> > update. Within a few days, this RANSAC part was implemented. So this
>> RANSAC
>> > part was one of the final touch ups he specifically wanted to implement.
>> >
>> > So I agree, I'd love to use his bad-channel rejection. Your confirmation
>> > is so valuable for me--increasing the 'NumSamples' to 1000, for example,
>> > can make the algorithm's behavior more stable. I'll make it my default
>> and
>> > use the channel rejection function again. I still would not use 0.8 for
>> the
>> > correlation criterion though, I'd use 0.6-0.7. Christian did recommend
>> > higher values. But the problem of channel rejection is that
>> short-segment
>> > of high-amplitude data always biases the selection. Now I quickly
>> checked
>> > code of clean_channels(), but the current process is not robustified
>> > against the short, high-amplitude burst.
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sccn_clean-5Frawdata_blob_master_clean-5Fchannels.m&d=DwIFaQ&c=-35OiAkTchMrZOngvJPOeA&r=kB5f6DjXkuOQpM1bq5OFA9kKiQyNm1p6x6e36h3EglE&m=iihm7vXbmXPM3roTZyq3HHfjCLd_EvrE7iP_zLcVlArZO35j4N9teP2ZcZOlFBVC&s=YK0XRmMVfmrFp10TRIoVsOlqx0JmAS-uKu53HSldgyQ&e=
>> > It seems possible to address this issue. I'll discuss it with
>> colleagues.
>> >
>> > By the way, I have an update for you ASR enthusiasts which you may be
>> > interested in. Let me forward my recent post to the list below.
>> >
>> > %%%%%%%%%%
>> > Relatedly, Hyeonseok and I have been working on a mod for the
>> calibration
>> > stage of ASR to process our Juggling data collected by Hiroyuki.
>> > We will present the idea at the Mobi meeting 2022.
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__sites.google.com_ucsd.edu_mobi2022_&d=DwIFaQ&c=-35OiAkTchMrZOngvJPOeA&r=kB5f6DjXkuOQpM1bq5OFA9kKiQyNm1p6x6e36h3EglE&m=iihm7vXbmXPM3roTZyq3HHfjCLd_EvrE7iP_zLcVlArZO35j4N9teP2ZcZOlFBVC&s=I9BcssF-6sbBtzzgkAMBAn12ZjN2cpCFbjGMymOzohg&e=
>> >
>> > The idea is to use single-frame order statistics across electrodes
>> rather
>> > than the default sliding window for selecting the calibration data. This
>> > way, we can obtain more 'clean' data points without letting
>> high-amplitude
>> > artifacts into the calibration data (there is a default tolerance
>> > value--that is, the default setting allows a small amount of outliers
>> sneak
>> > into the calibration data, up to 7.5% of electrodes; The proposed
>> > method uses 0%.) The proposed method makes subsequent PC distributions
>> more
>> > Gaussian, which fits the assumption of ASR. Also, the proposed method
>> seems
>> > to be able to explain, at least partially, the reason why the
>> > conventional empirically recommended values for the cutoff SD are
>> unusually
>> > high, such as SD == 20. We will show both simulation and empirical
>> results.
>> > Check out the MoBI 2022 conference!
>> > %%%%%%%%%%
>> >
>> > Makoto
>> >
>> >
>> >
>> > On Mon, Mar 14, 2022 at 8:55 AM Gil Avila, Cristina <
>> cristina.gil at tum.de>
>> > wrote:
>> >
>> >> Thank you all for your input.
>> >>
>> >>
>> >>
>> >> First, I have noticed that the set of bad channels is only different
>> >> every time I restart EEGLab (please see the code below, I run EEGLab
>> >> command inside the loop over repetitions). Otherwise results are stable
>> >> (@Arno Could this explain why it passed all the tests?).
>> >>
>> >>
>> >>
>> >> Second, I would prefer not to discard the RANSAC method to detect bad
>> >> channels if I find a stable solution. I believe that the RANSAC method
>> is
>> >> the core for detecting bad channels in the clean_rawdata function. The
>> two
>> >> other options (clean channels based on flat line and on the high
>> frequency
>> >> activity) seem to me more a preliminary step to the RANSAC. Therefore I
>> >> have tested:
>> >>
>> >> 1. How the ‘ChannelCriterion’ parameter influences the selected bad
>> >> channels. I have tried the values 0.7, 0.8 (default) and 0.9. The
>> higher
>> >> the value, the less reproducible is the result. This was not a
>> surprise if
>> >> I look at the definition of the ChannelCriterion parameter: ‘if a
>> channel
>> >> is correlated at less than this value to an estimate based on other
>> >> channels it is considered abnormal in the given time window’.
>> Still, even
>> >> being lax with the correlation threshold (0.7) I don’t get
>> reproducible
>> >> results.
>> >> 2. How the high-pass bandwidth influences the selected bad channels.
>> >> I have tried a highpass with bandwidth [1 1.5] instead of the
>> default [0.25
>> >> 0.75] with the ‘ChannelCriterion’ parameter fixed at 0.8. This does
>> not
>> >> seem to increase the reproducibility.
>> >> 3. How the ‘NumSamples’ RANSAC parameter of clean_artifacts()
>> >> influences the selected bad channels. I have tried with 50
>> (default), 100,
>> >> 500 and 1000 samples with ‘ChannelCriterion’ fixed at 0.8.
>> Increasing this
>> >> parameter to 1000 makes the output more reliable at the cost of more
>> >> computation time (~1.5 min per recording).
>> >>
>> >>
>> >>
>> >> Brief comment regarding my data: I am working with eyes-closed
>> >> resting-state, 29 channels, recordings of 5 mins of duration sampled
>> at 500
>> >> Hz (~150000 samples).
>> >>
>> >> For each case I have run 10 repetitions. You can also find along with
>> the
>> >> code figures of all test cases. Figures represent how often was each
>> >> channel marked bad in each recording.
>> >>
>> >>
>> >>
>> >> For reproducibility I attach my code and the small dataset I am using.
>> I
>> >> am using most recent versions of EEGLab and clean_rawdata from github.
>> >>
>> >> Code:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_crisglav_replication-5Fclean-5Frawdata_&d=DwIFaQ&c=-35OiAkTchMrZOngvJPOeA&r=kB5f6DjXkuOQpM1bq5OFA9kKiQyNm1p6x6e36h3EglE&m=iihm7vXbmXPM3roTZyq3HHfjCLd_EvrE7iP_zLcVlArZO35j4N9teP2ZcZOlFBVC&s=oHz9McmYOk26K0uA8eNdyYXpegIPR-kaw_5f0hvlV-o&e=
>> >> <
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_crisglav_replication-5Fclean-5Frawdata_&d=DwQGaQ&c=-35OiAkTchMrZOngvJPOeA&r=pyiMpJA6aQ3IKcfd-jIW1kWlr8b1b2ssGmoavJHHJ7Q&m=9m75cEFE25pnZqvTCnezRor87-PYdjeB2KlL4FhRwDsyrde-Zy2fdp5Ds1Jye6IK&s=bRRPLy36GAqMvYFcRiHOH3FY3hXoxi1qCMMcxJ7EVPA&e=
>> >
>> >>
>> >> Dataset:
>> >>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__syncandshare.lrz.de_getlink_fiX7VwVdbGEsMTf46kqrcvx3_rawBIDS&d=DwIFaQ&c=-35OiAkTchMrZOngvJPOeA&r=kB5f6DjXkuOQpM1bq5OFA9kKiQyNm1p6x6e36h3EglE&m=iihm7vXbmXPM3roTZyq3HHfjCLd_EvrE7iP_zLcVlArZO35j4N9teP2ZcZOlFBVC&s=yviziOoR9Lt7TcuVhGqEbc5JTm_lN1dHti4wjW0P-Sg&e=
>> >> <
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__syncandshare.lrz.de_getlink_fiX7VwVdbGEsMTf46kqrcvx3_rawBIDS&d=DwQGaQ&c=-35OiAkTchMrZOngvJPOeA&r=pyiMpJA6aQ3IKcfd-jIW1kWlr8b1b2ssGmoavJHHJ7Q&m=9m75cEFE25pnZqvTCnezRor87-PYdjeB2KlL4FhRwDsyrde-Zy2fdp5Ds1Jye6IK&s=QCLa_vSG7bTiSAxeN1p8HNpZxCvMcpRp02JstwyDFMA&e=
>> >
>> >>
>> >> Note: to test 3) I had to change clean_artifacts code and add in line
>> 186
>> >>
>> >> {'num_samples','NumSamples'}, 50, ... % line 186
>> >>
>> >> And substitute line 232 by
>> >>
>> >> [EEG,removed_channels] =
>> >>
>> clean_channels(EEG,chancorr_crit,line_crit,[],channel_crit_maxbad_time,num_samples);
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Cristina Gil Ávila – PhD candidate
>> >>
>> >> Department of Neurology
>> >>
>> >> Technische Universität München
>> >>
>> >> Munich, Germany
>> >>
>> >> cristina.gil at tum.de
>> >>
>> >> painlabmunich.de
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__painlabmunich.de&d=DwMFaQ&c=-35OiAkTchMrZOngvJPOeA&r=pyiMpJA6aQ3IKcfd-jIW1kWlr8b1b2ssGmoavJHHJ7Q&m=6Twh6XupArvxJS1Czj9-7F7spnsBgu8jhP6o9XGI-xeTRrslYBDsufNKTowBiV37&s=GztE-xhiiI-N_RcQIjEmpUz-yqYgcZzqmNsuGRlMmF0&e=>
>> >> <
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.painlabmunich.de_&d=DwMGaQ&c=-35OiAkTchMrZOngvJPOeA&r=pyiMpJA6aQ3IKcfd-jIW1kWlr8b1b2ssGmoavJHHJ7Q&m=9m75cEFE25pnZqvTCnezRor87-PYdjeB2KlL4FhRwDsyrde-Zy2fdp5Ds1Jye6IK&s=hSQglHuzdgnx2GiKB_bxC1oRqVi-TqsmKbANR39Pcdk&e=
>> >
>> >>
>> >>
>> >>
>> >
>> _______________________________________________
>> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
>> To unsubscribe, send an empty email to
>> eeglablist-unsubscribe at sccn.ucsd.edu
>> For digest mode, send an email with the subject "set digest mime" to
>> eeglablist-request at sccn.ucsd.edu
>
>
More information about the eeglablist
mailing list