[Eeglablist] Inconsistent results using clean_artifacts

Dr Cyril Pernet wamcyril at gmail.com
Tue Mar 22 01:44:23 PDT 2022


Hi Makoto,

my understanding of the issue is different - I believe this comes down 
to the uncontrolled random sampling of channel.

in https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sccn_clean-5Frawdata_blob_master_clean-5Fchannels.m&d=DwIDaQ&c=-35OiAkTchMrZOngvJPOeA&r=kB5f6DjXkuOQpM1bq5OFA9kKiQyNm1p6x6e36h3EglE&m=mFxogHRjVapLxIQsoj3iJIOjmW0MXyS12VlH3EKy7EfHV5u7x-Q55cjj5hNzPqoO&s=EaxFu8HROJu_NKj8bBXhcT7la9xQQPptipUyywiHXpo&e= , 
the size of the channel subsets to use for robust reconstruction, as a 
fraction of the total number of channel is set at  0.25 -- this means 
the function computes N samples with 25% channels at the time being 
Interpolated and then computes correlations between them (to get 
consensus between N draws).

Christina had 29 channels - because the random sampling is 
unconstrained, it could pick 7 channels next to each other leading to 
wild interpolations which i think can explain why she needs such high 
number of resamples --

I haven't found the time to test it but I'm thinking that changes in 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sccn_clean-5Frawdata_blob_master_clean-5Fchannels.m-23L177&d=DwIDaQ&c=-35OiAkTchMrZOngvJPOeA&r=kB5f6DjXkuOQpM1bq5OFA9kKiQyNm1p6x6e36h3EglE&m=mFxogHRjVapLxIQsoj3iJIOjmW0MXyS12VlH3EKy7EfHV5u7x-Q55cjj5hNzPqoO&s=IJnKetdRYw2tvMU0n1Szig70LMU3fF4FvL30Lu2J8cQ&e=  
could help. For instance for each resample, meshgrid the channels 
location and remove the 25% selected, if only one whole (ie all channels 
are neighbors) then take another draw. With your 205 channels, this 
issue 'never' occurs and thus 50 or 1000 make little difference. This 
can be tested 1st by downsampling the number of channels - if you see a 
change such as increasing N is more stable then it reproduces 
Christina's error and the proposed constrain in the channel sample might 
be solution  (and maybe there is a trade-off channel number vs number of 
resamples to find, even when constraining the channel sampling)

cyril


> Dear Cristina, Daniele, and Arno (cc Hyeonseok),
>
> This is a follow up study. Hyeonseok and I ran a test using empirical
> datasets. See the summary below.
> https://sccn.ucsd.edu/wiki/Makoto%27s_preprocessing_pipeline#Channel_rejection_using_RANSAC_in_clean_rawdata.28.29_.2803.2F21.2F2022_added.29
> Our results did NOT show increasing 'NumSamples' produces more stable
> results, given rng() is NOT fixed. We wished it does!
> This warrants further investigation.
>
> Makoto
>
> On Thu, Mar 17, 2022 at 10:57 AM Makoto Miyakoshi <mmiyakoshi at ucsd.edu>
> wrote:
>
>> Dear Cristina,
>>
>> Wow, this is such a perfect summary report. I deeply appreciate you took
>> so much time and care to make this happen.
>> You are the best part of the EEGLAB mailing list. Thank you, thank you,
>> thank you!
>>
>>> Second, I would prefer not to discard the RANSAC method to detect bad
>> channels if I find a stable solution. I believe that the RANSAC method is
>> the core for detecting bad channels in the clean_rawdata function.
>>
>> I appreciate you mentioning that. I'll tell you why.
>> In the early 2010's when Christian, the developer of ASR, was working on
>> the offline version of clean_rawdata() upon my request, he gave me a
>> solution once, then told me that he wanted to add one more thing for
>> update. Within a few days, this RANSAC part was implemented. So this RANSAC
>> part was one of the final touch ups he specifically wanted to implement.
>>
>> So I agree, I'd love to use his bad-channel rejection. Your confirmation
>> is so valuable for me--increasing the 'NumSamples' to 1000, for example,
>> can make the algorithm's behavior more stable. I'll make it my default and
>> use the channel rejection function again. I still would not use 0.8 for the
>> correlation criterion though, I'd use 0.6-0.7. Christian did recommend
>> higher values. But the problem of channel rejection is that short-segment
>> of high-amplitude data always biases the selection. Now I quickly checked
>> code of clean_channels(), but the current process is not robustified
>> against the short, high-amplitude burst.
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sccn_clean-5Frawdata_blob_master_clean-5Fchannels.m&d=DwIFaQ&c=-35OiAkTchMrZOngvJPOeA&r=kB5f6DjXkuOQpM1bq5OFA9kKiQyNm1p6x6e36h3EglE&m=iihm7vXbmXPM3roTZyq3HHfjCLd_EvrE7iP_zLcVlArZO35j4N9teP2ZcZOlFBVC&s=YK0XRmMVfmrFp10TRIoVsOlqx0JmAS-uKu53HSldgyQ&e=
>> It seems possible to address this issue. I'll discuss it with colleagues.
>>
>> By the way, I have an update for you ASR enthusiasts which you may be
>> interested in. Let me forward my recent post to the list below.
>>
>> %%%%%%%%%%
>> Relatedly, Hyeonseok and I have been working on a mod for the calibration
>> stage of ASR to process our Juggling data collected by Hiroyuki.
>> We will present the idea at the Mobi meeting 2022.
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__sites.google.com_ucsd.edu_mobi2022_&d=DwIFaQ&c=-35OiAkTchMrZOngvJPOeA&r=kB5f6DjXkuOQpM1bq5OFA9kKiQyNm1p6x6e36h3EglE&m=iihm7vXbmXPM3roTZyq3HHfjCLd_EvrE7iP_zLcVlArZO35j4N9teP2ZcZOlFBVC&s=I9BcssF-6sbBtzzgkAMBAn12ZjN2cpCFbjGMymOzohg&e=
>>
>> The idea is to use single-frame order statistics across electrodes rather
>> than the default sliding window for selecting the calibration data. This
>> way, we can obtain more 'clean' data points without letting high-amplitude
>> artifacts into the calibration data (there is a default tolerance
>> value--that is, the default setting allows a small amount of outliers sneak
>> into the calibration data, up to 7.5% of electrodes; The proposed
>> method uses 0%.) The proposed method makes subsequent PC distributions more
>> Gaussian, which fits the assumption of ASR. Also, the proposed method seems
>> to be able to explain, at least partially, the reason why the
>> conventional empirically recommended values for the cutoff SD are unusually
>> high, such as SD == 20. We will show both simulation and empirical results.
>> Check out the MoBI 2022 conference!
>> %%%%%%%%%%
>>
>> Makoto
>>
>>
>>
>> On Mon, Mar 14, 2022 at 8:55 AM Gil Avila, Cristina <cristina.gil at tum.de>
>> wrote:
>>
>>> Thank you all for your input.
>>>
>>>
>>>
>>> First, I have noticed that the set of bad channels is only different
>>> every time I restart EEGLab (please see the code below, I run EEGLab
>>> command inside the loop over repetitions). Otherwise results are stable
>>> (@Arno Could this explain why it passed all the tests?).
>>>
>>>
>>>
>>> Second, I would prefer not to discard the RANSAC method to detect bad
>>> channels if I find a stable solution. I believe that the RANSAC method is
>>> the core for detecting bad channels in the clean_rawdata function. The two
>>> other options (clean channels based on flat line and on the high frequency
>>> activity) seem to me more a preliminary step to the RANSAC. Therefore I
>>> have tested:
>>>
>>>     1. How the ‘ChannelCriterion’ parameter influences the selected bad
>>>     channels. I have tried the values 0.7, 0.8 (default) and 0.9. The higher
>>>     the value, the less reproducible is the result. This was not a surprise if
>>>     I look at the definition of the ChannelCriterion parameter: ‘if a channel
>>>     is correlated at less than this value to an estimate based on other
>>>     channels it is considered abnormal in the given time window’. Still, even
>>>     being lax with the correlation threshold (0.7) I don’t get reproducible
>>>     results.
>>>     2. How the high-pass bandwidth influences the selected bad channels.
>>>     I have tried a highpass with bandwidth [1 1.5] instead of the default [0.25
>>>     0.75] with the ‘ChannelCriterion’ parameter fixed at 0.8. This does not
>>>     seem to increase the reproducibility.
>>>     3. How the ‘NumSamples’ RANSAC parameter of clean_artifacts()
>>>     influences the selected bad channels. I have tried with 50 (default), 100,
>>>     500 and 1000 samples with ‘ChannelCriterion’ fixed at 0.8. Increasing this
>>>     parameter to 1000 makes the output more reliable at the cost of more
>>>     computation time (~1.5 min per recording).
>>>
>>>
>>>
>>> Brief comment regarding my data: I am working with eyes-closed
>>> resting-state, 29 channels, recordings of 5 mins of duration sampled at 500
>>> Hz (~150000 samples).
>>>
>>> For each case I have run 10 repetitions. You can also find along with the
>>> code figures of all test cases. Figures represent how often was each
>>> channel marked bad in each recording.
>>>
>>>
>>>
>>> For reproducibility I attach my code and the small dataset I am using. I
>>> am using most recent versions of EEGLab and clean_rawdata from github.
>>>
>>> Code: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_crisglav_replication-5Fclean-5Frawdata_&d=DwIFaQ&c=-35OiAkTchMrZOngvJPOeA&r=kB5f6DjXkuOQpM1bq5OFA9kKiQyNm1p6x6e36h3EglE&m=iihm7vXbmXPM3roTZyq3HHfjCLd_EvrE7iP_zLcVlArZO35j4N9teP2ZcZOlFBVC&s=oHz9McmYOk26K0uA8eNdyYXpegIPR-kaw_5f0hvlV-o&e=
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_crisglav_replication-5Fclean-5Frawdata_&d=DwQGaQ&c=-35OiAkTchMrZOngvJPOeA&r=pyiMpJA6aQ3IKcfd-jIW1kWlr8b1b2ssGmoavJHHJ7Q&m=9m75cEFE25pnZqvTCnezRor87-PYdjeB2KlL4FhRwDsyrde-Zy2fdp5Ds1Jye6IK&s=bRRPLy36GAqMvYFcRiHOH3FY3hXoxi1qCMMcxJ7EVPA&e=>
>>>
>>> Dataset:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__syncandshare.lrz.de_getlink_fiX7VwVdbGEsMTf46kqrcvx3_rawBIDS&d=DwIFaQ&c=-35OiAkTchMrZOngvJPOeA&r=kB5f6DjXkuOQpM1bq5OFA9kKiQyNm1p6x6e36h3EglE&m=iihm7vXbmXPM3roTZyq3HHfjCLd_EvrE7iP_zLcVlArZO35j4N9teP2ZcZOlFBVC&s=yviziOoR9Lt7TcuVhGqEbc5JTm_lN1dHti4wjW0P-Sg&e=
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__syncandshare.lrz.de_getlink_fiX7VwVdbGEsMTf46kqrcvx3_rawBIDS&d=DwQGaQ&c=-35OiAkTchMrZOngvJPOeA&r=pyiMpJA6aQ3IKcfd-jIW1kWlr8b1b2ssGmoavJHHJ7Q&m=9m75cEFE25pnZqvTCnezRor87-PYdjeB2KlL4FhRwDsyrde-Zy2fdp5Ds1Jye6IK&s=QCLa_vSG7bTiSAxeN1p8HNpZxCvMcpRp02JstwyDFMA&e=>
>>>
>>> Note: to test 3) I had to change clean_artifacts code and add in line 186
>>>
>>> {'num_samples','NumSamples'}, 50, ... % line 186
>>>
>>> And substitute line 232 by
>>>
>>> [EEG,removed_channels] =
>>> clean_channels(EEG,chancorr_crit,line_crit,[],channel_crit_maxbad_time,num_samples);
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Cristina Gil Ávila – PhD candidate
>>>
>>> Department of Neurology
>>>
>>> Technische Universität München
>>>
>>> Munich, Germany
>>>
>>> cristina.gil at tum.de
>>>
>>> painlabmunich.de
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.painlabmunich.de_&d=DwMGaQ&c=-35OiAkTchMrZOngvJPOeA&r=pyiMpJA6aQ3IKcfd-jIW1kWlr8b1b2ssGmoavJHHJ7Q&m=9m75cEFE25pnZqvTCnezRor87-PYdjeB2KlL4FhRwDsyrde-Zy2fdp5Ds1Jye6IK&s=hSQglHuzdgnx2GiKB_bxC1oRqVi-TqsmKbANR39Pcdk&e=>
>>>
>>>
>>>
> _______________________________________________
> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
> To unsubscribe, send an empty email to eeglablist-unsubscribe at sccn.ucsd.edu
> For digest mode, send an email with the subject "set digest mime" to eeglablist-request at sccn.ucsd.edu

-- 
Dr Cyril Pernet, PhD, OHBM fellow, SSI fellow
Neurobiology Research Unit
Copenhagen University Hospital, Rigshospitalet
Building 8057, Blegdamsvej 9
DK-2100 Copenhagen, Denmark

wamcyril at gmail.com
https://urldefense.proofpoint.com/v2/url?u=https-3A__cpernet.github.io_&d=DwIDaQ&c=-35OiAkTchMrZOngvJPOeA&r=kB5f6DjXkuOQpM1bq5OFA9kKiQyNm1p6x6e36h3EglE&m=mFxogHRjVapLxIQsoj3iJIOjmW0MXyS12VlH3EKy7EfHV5u7x-Q55cjj5hNzPqoO&s=zAJwvVvVGVTVwJni0mtVM2gbDMNuuDGTkjEz3pIYQes&e= 
https://urldefense.proofpoint.com/v2/url?u=https-3A__orcid.org_0000-2D0003-2D4010-2D4632&d=DwIDaQ&c=-35OiAkTchMrZOngvJPOeA&r=kB5f6DjXkuOQpM1bq5OFA9kKiQyNm1p6x6e36h3EglE&m=mFxogHRjVapLxIQsoj3iJIOjmW0MXyS12VlH3EKy7EfHV5u7x-Q55cjj5hNzPqoO&s=3G0IoP6MmH5OoCEOUyDjcDmI3YAdIQh1S6F_pnhlh_I&e= 




More information about the eeglablist mailing list