[Eeglablist] Automated EEG processing pipeline
Arnaud Delorme
adelorme at ucsd.edu
Fri Oct 4 17:12:04 PDT 2024
Dear all,
One of my colleagues asked me to comment on his automated EEG processing pipeline. I thought my comments might be of interest to some of you. Please feel free to react if you do not agree with some of my comments below.
> 1. Load raw data file and channel location.
>
> 2. Filter data with a band pass filter (0.1 to 50 Hz, 60 Hz notch).
Makes sense. I would use linear filters, which are more stable. Also, in theory, the notch at 60 Hz is not needed if you low-pass at 50 Hz (depending on the filter roll-off). Additionally, regarding the high-pass filter at 0.1 Hz, unless you want to examine low frequencies, you might consider using a high-pass filter at 0.5 Hz (see also points 4 and 6). Note that very low frequencies below 0.5 Hz are often contaminated by skin conductance changes due to sweating.
Low-pass filtering below 50 Hz is probably acceptable, but most people would want to retain the higher frequencies. Many processes can take advantage of the additional information (like ICA, for example). If your data is very noisy, it is probably a reasonable approach, though.
> 3. Use EEGLAB function clean_line with default settings.
I would skip this step since you remove line noise in step 2.
> 4. Use EEGLAB function clean_rawdata with default settings on 19 channels dry electrode system.
ASR (in clean_rawdata) works down to 4 channels, according to its author, Christian Kothe. So 19 channels is fine. Due to the large amount of noise and the significant distance between channels, you might want to change the threshold for rejecting bad channels (by default, 0.8 correlation with neighboring channels); otherwise, too many channels might be rejected.
Additionally, this function requires the data to be high-pass filtered at 0.5 Hz. If you do not select the option to filter in clean_rawdata and do not high-pass filter at 0.5 Hz in step 2, the results will be unpredictable. I have used clean_rawdata (and the filter) and reapplied the result (the rejected data portion) to the original unfiltered data (for example, the data high-pass filtered at 0.1 Hz), so that’s a possibility.
For practical purposes, you should high-pass filter at 0.5 Hz in step 2 and ignore the clean_rawdata filtering option.
> 5. Use EEGLAB interpolate function to interpolate missing channels that are removed by above process.
You need to interpolate channels after ICA, so I would move this step after 7. If you interpolate channels before ICA, it will fail to converge. Additionally, if you use EEGLAB STUDY after point 7, the interpolation will be performed at that time.
> 6. Use EEGLAB ICA and ICLabel to correct for blinks in data (90% threshold) and muscle (90% threshold) as in the default ICLabel menu.
I think it should be fine. For ICA, I could use the Picard method, which is Infomax ICA on steroids compared with the default runica function (it has a lower threshold for convergence, so in theory, it is better; it also uses the Newton method like in AMICA for optimizing components). If you have the patience, you could also use AMICA.
Additionally, irrespective of the ICA algorithm, you need the data to be high-pass filtered at 0.5 Hz (sometimes higher), so it is a good argument for doing it at step 2. If you do not high-pass filter before ICA, the components are often meaningless. As for step 4, there is the option of filtering the data just for ICA, run ICA and reuse the component on the unfiltered data. ICA components can be seen as spatial filters, so this approach can make sense in some cases.
The default rejection thresholds for ICA components in ICLabel are good from my perspective (since I set them myself in the plugin and also ran some tests in this paper: https://urldefense.com/v3/__https://www.nature.com/articles/s41598-023-27528-0__;!!Mih3wA!AECl2Qnl02Ppec43Zjsm7Fj56mGCXKdgXWY7rkqBUle5nyBTz21WyWoKbDuuASq1QJZr7xZxyh0LiHob9sSqvQGX$ ). Some people would disagree, though. I know Makoto is much more aggressive in rejecting ICA components.
> 7. Reference to Average reference.
ICLabel will also average reference the data internally to find matching components. I would probably perform average referencing before ICA (so move to step 5). Anecdotal evidence seems to indicate that ICA works better on average-referenced data. Also, average reference is not required unless you want to do source localization (DIPFIT in EEGLAB will automatically average-reference data and components before performing source localization).
It is important that you evaluate some files to see if this approach works. Once the pipeline is fixed with the comments above, it looks good on paper, but real data is unpredictable, especially with a dry sensor headset. In a given number of datasets, you could run some statistics on the number of channels and ICA components rejected to make sure they are in a reasonable range (a good measure of quality is also to count the number of brain components as seen in figure 2 of this paper https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/9441399__;!!Mih3wA!AECl2Qnl02Ppec43Zjsm7Fj56mGCXKdgXWY7rkqBUle5nyBTz21WyWoKbDuuASq1QJZr7xZxyh0LiHob9l5QzkGG$ ). Also, nothing replaces looking at the raw data.
Note that there are similar automated pipelines published on the EEGLAB website https://urldefense.com/v3/__https://eeglab.org/tutorials/11_Scripting/automated_pipeline.html__;!!Mih3wA!AECl2Qnl02Ppec43Zjsm7Fj56mGCXKdgXWY7rkqBUle5nyBTz21WyWoKbDuuASq1QJZr7xZxyh0LiHob9uQoFUwY$
Arno
More information about the eeglablist
mailing list