Makoto's preprocessing pipeline

Jump to: navigation, search

Note from Arnaud Delorme (main developer of EEGLAB): See also this page for a quick pipeline to reject ICA components in single datasets. The opinions expressed below are those of Makoto Miyakoshi and should not be taken as official EEGLAB guidelines.

%%%%%%%%%%%%%%% (Preface updated on 11/27/2019)

A famous slogan in computational science by Buckheit and Donoho (1995).

"An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures." Buckheit JB and Donoho DL. WaveLab and Reproducible Research. In: Antoniadis A and Oppenheim G. (Ed.) Lecture Notes in Statistics 103. Wavelets and Statistics. 1995. Springer-Verlag New York, Inc.

If you have said 'EEG data mining is like walking a mine field,' you are a friend of mine. It's a field of mine because it's not a field of mine. But never mind even if you have never mined EEG data. I'll show you my minecraft.


This is Makoto Miyakoshi's personal recommendation for preprocessing EEG data using EEGLAB. These are NOT Swartz Center's official opinions. This page is admittedly dogmatic, so follow the instruction at your own risk. This is a forever beta version and subject to change anytime. If you have any question, please post it to EEGLAB mailing list (you need to subscribe it from here.) In doing so, please include the keyword Makoto's preprocessing pipeline so that my email filter can pick it up.

I have another SCCN Wiki page Makoto's useful EEGLAB code which is more code oriented to address various frequently encountering practical problems in using EEGLAB.


Change the option to use double precision (05/14/2021 updated)

In EEGLAB2019, 'File' -> 'Preferences'. In EEGLAB 14, from main GUI top left tab, 'File' -> 'Memory and other options' -> 'If set, use single precision under...' uncheck it. Note that when computing ICA etc where double precision is necessary, EEGLAB functions internally convert input data to double precision. So mostly this is taken care of automatically as long as you stick to GUI operations.

This deletion was suggested by Malte Anders on May 13,2021 (link to his archived post) which I agreed. Thank you Malte!

Note that if you want to run calculation that involves matrix operation, you have you check whether the input data is always converted to double at your own risk. Otherwise you will encounter problems like this--This is the case when data rank is calculated with single vs. double precisions right after importing it from .raw file.


Check the path and import data (11/24/2020 updated)

Before importing the data, check these points.

  1. Your full path to eeglab folders and your data does NOT contain any non-alphabetic, non-numeric, non-underscore characters such as !@#$%^&*()-=\`[];', and [space]. For example, if your folder name is 'Experiment 1', that is bad. Change it to 'Experiment1' or 'Experiment_1' for example.
  2. Any folder and file name in your full path, as well as your variable names in Matlab workspace, does not start with numbers. For example, '2018_Oddball_Project' is better changed to 'Oddball_Project_2018'.
  3. Avoid using names that are (likely to be) used as Matlab function. For example, 'eeglab' is a name of a function. So you do not want to use it for a folder name or variable name. If you are not sure if a name is already used by Matlab, type
which -all [variableName]

For example,

>> which -all interp
/usr/local/MATLAB/R2017b/toolbox/shared/controllib/engine/@DynamicSystem/interp.m  % DynamicSystem method

This means that the word 'interp' is used as a function in two places already. You can still use the word 'interp' as a variable name, but you could get into a trouble later. Try to avoid at least known function names such as 'eeglab' when naming a folder, file, or variable.

I'm not telling you that if you have those instances you will immediately fail. On the contrary, using recent operation systems and Matlab, you probably won't encounter any problem--until you use, without knowing, some of EEGLAB functions and plugins the authors of which assumes users follow the classical rules. It is a good practice to eliminate this risk at this very beginning stage.

After importing, check if you have extra channels that are filled with zeros or complete garbage. Never include those problematic channels into ICA, it crashes the process (and for infomax, it slows the process significantly).

EEG.icaact is empty!? (06/15/2021 updated)

This is for Juan Molina--I have encountered this weird bug more than several times. It may be hard to guess the right cause of this issue, but it is related to this section. Actually, this problem is caused by missing eeg_option.m that was supposed to be created properly when you first launched EEGLAB. To diagnose the problem, try the same trick:

which -all eeg_options

If the answer shows only one entry, [eeglab_root]\functions\adminfunc\eeg_options.m, you got this problem. Usually, you have to find two of them, one under the adminfunc, which is like a read-only architype, and the other one under userpath, which is actively overwirtten by EEGLAB every time user change the option settings. If EEGLAB cannot access to this eeg_options.m, 'precompute ICA activations...' is unselected (I guess), hence EEG.icaact is NEVER calculated by eeg_checkcet(). To fix this problem, you should manually copy eeg_options.m file to the userpath. For example, for Windows the default user path is typically \user\Documents.

Here is the update by Jake Garetti, one of the EEGLAB workshop attendees who ran into the problem and solved it! He told me that "Instead of just copying the options file to the userpath, I had to go into functions/sigprocfunc/icadefs.m and manually set EEGOPTION_PATH = userpath;" This change makes a lot of sense.

Downsample if necessary

You may downsample the data to around 250Hz if the original sampling frequency is higher than that. Usually low-pass filter is necessary before downsampling for anti-aliasing, but EEGLAB function applies the anti-aliasing filter automatically. Downsampling is useful to compress the data size. It also help ICA to produce better decomposition by cutting off unnecessary high-frequency information, though we have not tested it quantitatively.

High-pass filter the data at 1-Hz (for ICA, ASR, and CleanLine)(09/23/2019 updated)

The purpose of the application of high-pass filter is to remove the 'baseline drift'. It is generally recommended to use lower cutoff frequencies, for example 0.01 Hz (cooperative subjects) – 0.1 Hz (children and patients) (Luck, S. J. 2005. An introduction to the event-related potential technique. Cambridge: MIT press.) than our recommended 1-Hz pass-band edge (equivalent to their cutoff frequency of 0.5 Hz@-6dB; see this PDF file for filter terminology). The reason for the 1-Hz pass band edge is because

  1. ICA is biased toward high amplitude if data length is finite. EEG data have 1/f power spectral density (PSD) curve. The combination of these two facts means that the leftmost values on the PSD plot will have the most influence on your decomposition result, although the most useful EEG information you are probably looking for is in 3-13 Hz (theta and alpha). Therefore, cutting off the leftmost values help ICA to focus on the important 3-13 Hz.
  2. Low-frequency EEG < 1Hz could be contaminated by sweating etc, which adds spatiotemporal non-stationary and affects ICA. I am a big fan of infraslow EEG research though--after all, no one method is perfect.
  3. Also, the lowest frequency in ERSP (i.e., a blended wavelet/STFT transform) is typically 3 Hz if you want to have a decent frequency resolution (in EEGLAB parameters, 3 cycle at 3 Hz generates 1116 ms sliding window length; If you want to go lower, the sliding window will be longer!) 3 Hz is sufficiently high relative to the 0.5-Hz cutoff.
  4. These qualitative explanations are supported by published data. They (Stefan) concluded that high-pass filtered data at 1-2Hz works best for ICA.

Winkler I, Debener S, Muller KR, Tangermann M. 2015. On the influence of high-pass filtering on ICA-based artifact reduction in EEG-ERP. Conf Proc IEEE Eng Med Biol Soc. 2015 Aug;2015:4101-5.

...As a pre-processing step for ICA, high-pass filtering between 1-2 Hz consistently produced good results in terms of signal-to-noise ratio (SNR), single-trial classification accuracy and the percentage of ’near-dipolar’ ICA components. Relative to no artifact reduction, ICA-based artifact removal significantly improved SNR and classification accuracy. This was not the case for a regression-based approach to remove EOG artifacts.

There are reports that 1-Hz high-pass filter attenuates (or even 'distorts') low-frequency, so-called 'late slow waves' of event-related potential family, such as N400 or P600 (these names are paradigm dependent, of course). To avoid this problem, one can calculate ICA weight matrix and sphereing matrix with 1-Hz high-passed data, then apply it to 0.1-Hz high-passed data. This ICA matrices transfer can be done through EEGLAB GUI 'Edit' -> 'Dataset info'.

Low-pass filter is optional. If CleanLine (see next) does not work well, you may want to simply apply a low-pass filter to cut out above 50/60 Hz. If you use the default eegfiltnew() filter, the transition bandwidth around 50/60 Hz must be 10, so set the 'passband edge' no larger than 45/55 Hz.

Note also that for connectivity analysis using multivariate Granger causality, data are better treated with special filter setup. I summarized it in this section.

(Added on 03/29/2017) When I worked with Marco Petilli on his MoBI data recorded in SCCN, he first found only 1 good IC after ICA. I thought this is the end of the research. But anyway, without much expectations, I tried 2 Hz high-pass filter with transition bandwidth being 1.25 to 1.75 Hz (Blackman-Harris window), a very aggressive parameters. After ICA, I found nearly 10 good ICs! This was the eyes-opening experience. I remember my former SCCN colleague, CK, strongly recommended that I should use transition bandwidth of 0.25 to 0.75 Hz when applying 1-Hz high-pass filter.

What happens to the < 1 Hz data if ICA is calculated on > 1 Hz data and applied to 0.1 Hz data? (08/17/2018 Updated)

As long as the < 1-Hz data are correlated with > 1-Hz data, ICs obtained from > 1-Hz data decomponses > 0.1-Hz data as well with no problem (which is more likely the case). However, if < 1-Hz data are independent of > 1-Hz data, what will happen?

Let's use some imagination to see the situation. You are in a dinner party. On a buffet table, there are big four plates of salad, steak, salmon, and dessert. You want to take some salad to your plate. The moment you reached your arm to the salad plate--power outage happens. All the lights are off, and it's pitch dark. But you decide to take the salad anyway. You know where the salad plate is, and you have a tong and a plate in your hands, so why not? You take the salad to your plate in the darkness, enjoying unexpected inconvenience. Now power is back, lights are on. Guess what you find in your plate! You thought you took the salad, but the salad is drenched with mushroom soup! Maybe someone spilled a bowl of mushroom soup over the big salad plate by accident? Now your salad has a linear combination with some amount of the mushroom soup.

This is the situation when > 1-Hz ICA is applied to > 0.1 Hz data. The > 1-Hz ICA can see only four plates, but > 0.1-Hz ICA may be able to see four plates plus mushroom-soup contaminated area as a new independent component. If you know the contaminated area, you can avoid it to take uncontaminated salad, steak, salmon, and most importantly, desserts.

The concept and calculation of ICA is complicated, the output and how to use it is damn simple. It's just a spatial filter, like locations of the plates. How much mushroom soup you get as a result of taking contaminated salad, steak, salmon, and desserts, depends on above which plate the mushroom soup was mainly spilled.

Import channel info

'Edit' - 'Channel locations'. If you have channel labels such as 'Fz' 'Cz' 'Pz' etc, that means that your channel montage follows International 10-20 system (there is extended version of it; see Oostenveld and Praamstra 2001 Clin Neurophysiol for 10-10 and 10-5 system). If that's the case, click 'Look up locs' button that is located in the center-bottom of the GUI. Choose 'use MNI coordinate file for BEM dipfit model' unless you have special reason not to do so. This will automatically import channel coordinates from internal library. Don't forget to click 'Opt. head center' located right top corner.

If you have digitized channel locations, just 'Opt. head center'. Some vendor use meter as a unit length instead of millimeter. If you find all dipoles localized at the center of the head, suspect this cause and x1000 the scale to see if it fixes the problem (as far as I know, old Polhemus did this once).

Do we want to have dedicated EOG channels? (12/03/2019 added)

(Thanks Maciej Kosiło for bringing up this question) If your EEG recording system has a default setting so that two electrodes are spared to record electrooculogram (EOG), what's the best way to make most out of them?

  1. Use them as EOG channels that are bipolar referenced to each other.
  2. Use them as EOG channels that are monopolar referenced to a scalp recording reference.
  3. Re-purpose them as scalp EEG channels.

If you use ICA in your preprocessing, having dedicated EOG electrode(s) is not that necessary. This is because occular artifacts, either blink or saccade, have quite broad scalp topography which is modeled by ICA generally very well. This is the strength of multivariate method--ICA evaluates whole-scalp distributions. So, I would say 3 is the best answer. 2 is just as good, and in doing so I recommend you use locations of AFp9 and AFp10 defined in international 10-5 system (Oostenveld and Praamstra, 2001) which is virtually as good as canthus (lateral end of eye) of HEOG-L and HEOG-R. In this way, EEGLAB (or any label-based location database) can find their template coordinates automatically. These AFp9 and AFp10 serve as both HEOG monitors and far-frontal EEG channels. In fact, Delorme et al. 2007 demonstrated that medial prefrontal source EEG is most sensitively detected by these upper-facial electrodes. The worst choice, I would say, is 1, because by bipolar referencing you lose data dimension by 1. If these EOG channels share the monopolar reference with EEG channels, you can re-reference them offline to obtain bipolar-referenced EOG data anytime without losing anything.

Remove bad channels

What do you mean bad here? In short, the channels that you can't save even if you reject 5-10% of your datapoints. Removing bad channels is a critical step particularly for average referencing. This is because average reference signal is an average of all channels. If you have channels that are noisy all the time, including such channels to average will introduce noise to all the channels. This issue is focused in detail in Nima's PREP pipeline paper.

If you are looking for controlled, objective rejection criteria, I recommend you check out these solutions

  • Nima's PREP pipeline: See his paper for detail.
  • Christian's clean_rawdata(): This EEGLAB plugin comes with sophisticated bad channel detection algorithm.
  • Makoto's trimOutlier(): This tool does simple jobs in an interactive manner (i.e. to visualize overall statistics, set threshold, confirm results, re-set the threshold if necessary.) Also available for a batch process.

Recently I only use clean_rawdata(). Note that this function saves nice results of cleaning under EEG.etc with which you can identify which channels and datapoints are rejected in the original length. Of course, it's Christian's work!

In writing batch, I recommend you keep the original (i.e. before channel rejection) EEG data in this way so that you can refer to it later for channel interpolation (see below)

% Keep original EEG.
originalEEG = EEG;

Interpolate all the removed channels (03/05/2021 updated)

There are two primary reasons why you want to interpolate the rejected electrodes before average referencing and ICA.

  1. To minimize potential bias in the next average referencing stage. For example, if there are 64 channels, and 16 channels are identified as bad and rejected but only from the right hemisphere. Then, the number of channels in the left vs. right hemispheres are 32 vs. 16, with which average will be biased toward the left hemisphere. To avoid this bias, scalp electrodes may be interpolated.
  2. The rejected electrodes cannot be recovered after ICA (i.e. no easy way to interpolate EEG.icaweights or EEG.icasphere), which may cause an issue later. For example, if you need to calculate group-level IC backprojection at Cz. But if some of your datasets do not have Cz, then you simply cannot do this.

EEGLAB's default interpolation uses spline interpolation, which typically introduces rank issue. If this interpolation were a simply linear interpolation, the problem would be simpler--just a rank deficiency. However, because the time-series data of the spline-interpolated electrode is not a linear sum of others, oftentimes the data rank is erroneously recovered as well. If you apply infomax ICA by calling pop_runica(), it is taken care of automatically. However, if you directly call runica() or runamica(), you have to manually take care of this problem. See this article for detail and a suggested solution, and also this section No. 13 for similar information.

By using spherical interpolation as shown below, you may be able to avoid this falsely rank recovery issue. But if using 'pca' option to adjust rank anyway, it is not particularly helpful.

% Interpolate channels.
EEG = pop_interp(EEG, originalEEG.chanlocs, 'spherical');

Re-reference the data to average (08/02/2020 Updated)

There seems multiple valid ways to reference the scalp EEG data. Here, I recommend average reference because the concept is simple and does not require anything other than the recorded data. Average reference is a approximation of scalp potentials that is independent of reference location in a location of the head.

...with any reference strategy we record the potential at one head location with respect to some other head location. By definition, this potential difference is the result of integration of the electric field over any path connecting the reference point and the point in question. We say any path because electric potentials are fundamentally path-independent. The average reference can provide a good estimate of reference-independent potentials because it integrates many possible paths to the point, depending on the number of electrodes. ("Electric Field of the Brain" by Nunez and Srinivasan, p.297)

If EEG is generated on the cortex with dipolar current distribution and with no external sources, there are same amount of positive and negative potential changes at every moment due to charge conservation. Hence the scalp topography should sum to zero. By re-reference to average channel values, we assume that there is no charge creation that is monopolar sources and sinks. In practice, average reference makes all the ICA scalp topography have zero total potential (i.e. red and blue always balances).

(Update 08/24/2016) Note also that average referencing is often very helpful to suppress the line noise. I experienced that line noise that cannot be rejected by CleanLine, due to a shorter time constant of fluctuation of amplitude envelope of the line noise frequency band (you can visually confirm this by band-pass filtering the data to pass only the line noise and see its time series), could be removed by average referencing. I guess this is related to common-mode rejection? Anyway, I suggest one performs average referencing first then perform CleanLine.

Why should we add zero-filled channel before average referencing? (08/09/2020 Updated; prayer for Nagasaki)

A simple proof that re-referencing to average of non-reference channels is wrong is as follows. (Uniopolar) re-referencing should be able to apply as many times as one wants to any existing and non-existing channel (like averaged channels) without changing data. See the following quote from Hu et al. (2019) Brain Topography.

...The most surprising property is ‘no memory’ indicating that URs [=unipolar references] works independently without the consequences of the URs already applied. It also means one can always safely re‑reference the EEG/ERP recordings with different URs but not worry about if re‑referencing multiple times will accumulate artifacts.

But if you re-reference to the average across non-reference channels (i.e., without including the initial reference channel), you cannot do this. This is the simple proof. A little bit more theoretical proof can be found below.

The following is the description taken from 'Electric Field of the Brain' (Nunez and Srinivasan, 2006) p.294-295

As has been emphasized throughout this chapter there is essentially no difference between recording electrode and reference electrode. Thus, when one records EEG data from N electrodes, the measured potentials Vn (n = 1, 2,… N) are related to the scalp potential Φ(r) with respect to "infinity"...

For the sake of argument, suppose there are only three scalp recordings, , , with reference to infinity to start with. If is chosen as a initial reference when recording, the initially referenced data , , are described as

Note that data rank == 2 (because is 0). Re-referencing these data from REF_Pz to REF_Fz, we obtain

Thus, re-referencing from to yields, or 'recovers', time series of the initial reference channel as . Note the rank of data remains unchanged (data rank == 2). In this way, the bottom line is that data can be freely rereferenced after collection algorithmically. (Dien, 1998. Behavior Research Methods, 30:34-43; Osselton, 1965. Electroenceph Clin Neurophysiol, 19:527-528)

When is chosen as an initial reference, average potentials across the channels is defined as

Because is 0 everywhere in the time domain, it may be omitted as

Thus, the number of terms in the numerator is 2 (i.e., the number of non-zero channels), while the denominator is 3 (i.e., the number of channels including initial reference). However, it is often misunderstood as

If we use without counting the initial reference into account, we obtain the remaining 2 channels as


Apparently, using causes rank deficiency (rank == 1). However, this should not happen as we saw above.

If we use , which takes initial reference into account, for all 3 channels we obtain

Note that the data rank remains to be unchanged (rank == 2).

Ok, then how can we add back the initial reference? Remember, it was

which is continuous zeros. This is why we should add zero-filled channel before average referencing IF your data does not already have it (which I have never seen before).

This operation can be simply achieved by adding +1 to the denominator when you calculate the average across the channels. For example, if you calculate average across 32 channels, sum up all the channel values and divide it by 33. Similarly, if you have 64 channels, divide the sum by 65. If you have 128 channels, divide the sum by 129, etc.

(Update 11/29/2017) I wrote a simple EEGLAB plugin fullRankAveRef() to perform the average reference after adding the original reference channel (i.e., zero-filled). This channel is removed after average referencing, so the total number of channels remains the same (so does data rank, hence full-rank average reference). This solution may be useful for those who have data with small number of channels, such as standard 19 channels in international 10-20 system.

What is rank? (Updated 06/06/2016)

NEW description (Thanks Andreas Widmann for discussion!) Usually, the number of data rank is the same as the number of channels. However, if a pair of channels are bridged, the rank of the data is the number of channels-1. For another example, a pair of equations, 2x+3y+4=0 and 3x+2y+5=0 have rank of 2, but another pair 2x+3y+4=0 and 4x+6y+8=0 have rank of 1 for the two equations are dependent (the latter is exact twice of the former).

EEG data consist of n+1 channels; n probes and 1 reference. The n+1 channel data have rank n because the reference channel data are all zeros and rank deficient. Usually, this reference channel is regarded as useless and excluded for plotting, which is why you usually find your data to have n channels with rank n (i.e. full ranked). It is important that when you perform ICA, you apply it to full-ranked data, otherwise strange things can happen (see below). Your data are usually full-ranked in this way and ready for ICA.

When you apply average reference, you need to put the reference channel back if it is missing (which is usually the case). You can include the reference channel in computing average reference by selecting the 'Add current reference back to the data' option in the re-reference GUI. If you do not have initial reference channel location, in theory you should generate a zero-filled channel and concatenate it to your EEG data matrix as an extra channel (which makes the data rank-deficient) and perform average reference. To make the data full-ranked again, one may (a) drop a channel--choosing the reference channel is reasonable (b) reduce the dimensionality by using a PCA option for ICA.

To perform the abovementioned process on the command line, one can manually compute average reference using adjusted denominator: = bsxfun( @minus,, sum(, 1 ) / ( EEG.nbchan + 1 ) );

The origin of this argument by Joseph Dien can be found here.

OLD description:

Note that average reference reduces the data rank by one (See this page: Imagine v1,v2,..., vk in the equation are non-zero channel data. After average referencing, the sum across all the channels is zero, which proofs linear dependency--Note that average referencing (i.e., v1+v2+...+vk=0) is used to define rank deficiency! Average referencing without including the original reference channel by definition causes rank deficiency; See also this page written by my colleague Jason Palmer). The easy way to make the data full-ranked is to discard (any) one channel after average referencing. If you don't do this, ICA may return invalid results.

Examples of artifact from ICA (Modified 09/08/2020)

Below I show an example of what happens when you run ICA on rank-deficient data. There were 54 channels. Data were average referenced end epoched to -1 to 2 sec. After setting EEGLAB option to use double precision (thanks Andreas for clarifying this!), Matlab rank(,:)) returned 53. AMICA log showed data dimension is 54. Notice the complementary scalp topographies (left) and time-series (right), and interesting spectra patterns (middle) in which IC 9 and 10 does not even respect the filter effect. Rejecting one channel and re-running ICA fixed this problem, and the pathological ghost ICs got disappeared.

Figure 1. A pair of ghost IC scalp maps generated by ICA on rank-reduced data. Figure 2. Spectra analysis results. Figure 3. Raw EEG time series.

(Data and figures by courtesy of Caterina Piazza)

Additional comments on 12/15/14: By mistake, I applied AMICA on 66 datasets (5-min resting, 60-70ch, ASR-cleaned with SD==6) that were average referenced but no channel discarded (i.e. rank-deficient). AMICA failed in 3/66 datasets (4.5%). Here is the out.txt log from the failed one. Note that 'num eigs kept =65' come from EEG.nbchan==65.

minimum eigenvalues =  -9.00329954361518855E-14 0.47905289012901225 0.5242944714252078
maximum eigenvalues =  998.61704681443121 262.85021915136872 219.07946516116806
num eigs kept =  65
iter 1 lrate = 0.0500000000 LL = NaN nd = NaN, D = 0.17987E-01 0.17987E-01 (1.19 s, 0.7 h)
iter 1 lrate = 0.0500000000 LL = NaN nd = NaN, D = 0.16862E-01 0.16862E-01 (1.35 s, 0.8 h)
iter 1 lrate = 0.0500000000 LL = NaN nd = NaN, D = 0.17511E-01 0.17511E-01 (1.28 s, 0.7 h)
iter 1 lrate = 0.0500000000 LL = NaN nd = NaN, D = 0.16970E-01 0.16970E-01 (1.38 s, 0.8 h)
iter 1 lrate = 0.0500000000 LL = NaN nd = NaN, D = 0.17459E-01 0.17459E-01 (1.22 s, 0.7 h)
... done.Execution time: 0.00 h

Here is a sample code of this part.

% Apply average reference after adding initial reference
EEG.nbchan = EEG.nbchan+1;,:) = zeros(1, EEG.pnts);
EEG.chanlocs(1,EEG.nbchan).labels = 'initialReference';
EEG = pop_reref(EEG, []);
EEG = pop_select( EEG,'nochannel',{'initialReference'});

Remove line noise using CleanLine (04/25/2020 updated)

CleanLine is an EEGLAB plugin developed by my colleague Tim Mullen (now Qusp CEO) based on one of the Chronux functions. It's not a standard frequency filter, but it uses sliding window to adaptively estimate sine wave amplitude to subtract. It does not make a hole in the background EEG spectrum.

For parameter setting, I usually set the step size == window size for the maximum rejection (from experience), and others set to the default. According to Nima's PREP paper, before running this function, data nonstationarity should be addressed by high-pass filter beforehand (Bigdely-Shamlo et al., 2015).

There are cases CleanLine does not suppress the line noise sufficiently. Most likely, the reason is that the line noise is not stationary. See the sample screenshot shown below. Channel F4 shows constant envelope width across time. This shows stationarity and CleanLine should work well on this channel. However, notice that most of other channels show fluctuating 60-Hz envelopes, and their time constants are shorter than the typical CleanLine sliding window length (default 4 s). This shows violation of stationarity, and CleanLine cannot work well on them (I tweaked the parameters but have never got luck). You can easily verify it on your data by applying the EEGLAB filter with the setting "lower edge = higher edge = 60 Hz" for example. If it turned out to be the case unfortunately, you should either apply conventional notch filtera or low-pass filter to cut off before the line frequency.


Some update--if you see harmonics such as 60 Hz and 120 Hz, in the time domain such a signal does NOT appear as the sum of constant 60 Hz and 120 Hz sine waves, but 60 Hz with spiky peaks and/or troughs. In other word, 120 Hz is only present occasionally in a phase-coupled way. In this case, I think SIFT's assumption is violated and does not work. Even if you see 120 Hz (or whatever harmonics peak on power spectral density plot) reduced, it does not mean the phase-coupled harmonics is specifically removed, SIFT does not work in that way. In this sense, probably it does not make sense to enter [60 120] but it only make sense to enter [60] i.e., the principal frequency of line noise (if you think I'm wrong, please write me, I'd be happy to learn.)

If you data are recorded with Biosemi, you MUST re-reference the data to *ANY* channel (including average channels; by the way, by including the original reference channel(s) you can re-reference to other channel(s) repeatedly without loosing information). This is because Biosemi lets users take care of common-mode rejection offline. What is common-mode noise? You can see non-stationary 60Hz amplitude wax and wane in a remarkably non-stationary way, which could not be modeled and subtracted by CleanLine at all. However, re-referencing erased it perfectly. So I guess that's an example of common-mode noise.

Validation (04/25/2020 added)

I heard that CleanLine implemented in Bigdely-Shamlo and colleague's PREP Pipeline, called cleanLineNoise(), received some bug fix when implemented. To verify this info, I ran the following simulation test: single-channel 1000-Hz sampled data with 60-s long data of (1) 60-Hz sine wave ('Without noise'), (2) 60-Hz sine wave plus pink noise with 1:1 variance ratio ('With noise') were processed with the original CleanLine and cleanLineNoise, the latter of which is the PREP-implemented version of the former, to remove this 60 Hz sine wave using the same parameters (which you can check in the shared code you can download below). The results below demonstrated that the PREP-implemented version indeed seemed to outperform performance of the original CleanLine. The presence of background pink noise does not seem to affect the performance difference. I thought I used the same parameters for CleanLine and cleanLineNoise, but for validation purpose I share my simulation code so that anyone can verify and replicate this results. Based on these results, I will update my recommended preprocessing pipeline soon.

Without noise, frequency domain performance comparison. Note the reduction of 60-Hz power peak.

Psd noNoise.jpg

Without noise, time domain performance comparison. Note the amplitude of the residual of the 60 Hz sine wave.

Residual noNoise.jpg

With noise (SNR = 0dB), frequency domain performance comparison.

Psd withNoise.jpg

Without noise, time domain performance comparison.

Residual withNoise.jpg

Download Matlab code to replicate this simulation. Please contact me if you find an error or different parameters used for two applications in my code. The downloaded file is .zip format which is because I cannot upload .m file but I can .zip file in this wiki page.

Download Matlab .m code

Epoch data to -1 to 2 sec (09/11/2019 updated)

  • If you have sufficient number of data points (i.e. [number of channels^2] x 30) after epoching, epoching before ICA is recommended.
    • A merit of epoching before ICA is that so that ICA can focus on peri-event brain dynamics while not to be 'disturbed' by datapoints from period of non-interest (i.e., non-event-related brain/non-brain activities). See also this section.
  • If you have insufficient number of data points, or your Stimulus Onset Asynchrony (SOA) is shorter than 3 seconds, do not epoch. Instead, use continuous data to perform ICA, then epoch. Follow this steps. If overlapped epochs are submitted to ICA, it will learn the duplicated part as positively weighted data.
  • The relatively long epoch is for later wavelet transform. For detail, see Slide 20 and 21 of this file. In short, epoch length of -1 to 2 sec is to have -0.5 to 1.5 sec window after wavelet transform (i.e. ERPS and ITC in EEGLAB terminology) with 3 cycle at 3 Hz wavelet which generates 1116 ms window length.
  • As you see above, epoching can be done anytime, before or after ICA/dipfit. In other words, epoching is NOT a necessary condition to perform ICA.
  • You do not need to 'correct baseline' i.e. subtract mean of the 'baseline period' which is typically -1000 to 0 ms. Importantly, David Groppe, our former colleague, reported that if short baseline such as -100 to 0 ms is subtracted before ICA, it makes ICA performance worse. This is because mean value of such short baseline can easily fluctuate by 'background' alpha rhythm, for example, that changes the DC level of all the channels independently... for ICA this is a very bad situation. Longer the baseline period, closer you are to the previously applied high-pass filter that guarantees mean zero hence safer. Actually, as long as the data are properly high-pass filtered, it is unnecessary to 'correct baseline' here, thus we can minimize the risk of confusing ICA. See Arno's comment below.

If you extract data epochs before running ICA (as advised by Makoto below), make sure that the baseline is long enough (at least 500 ms) or that you do not remove the baseline from your data epochs. See the following article for more information Groppe, D.M., Makeig, S., & Kutas, M. (2009) Identifying reliable independent components via split-half comparisons. NeuroImage, 45 pp.1199-1211.

Reject epochs for cleaning (11/13/2019 Updated)

Aim for around 5-10% rejection rate (Delorme et al., 2007). Apply amplitude threshold of -500 to 500 uV (don't capture eye blinks; you may want to exclude channels near AFp1 and AFp2 for this purpose), reject the selected epochs, and apply improbability test with 6SD for single channels and 2SD for all channels, and reject the selected again.

An idea for fast and efficient trial rejection (11/13/2019 Updated)

I moved this section to here with some update.

Adjust data rank for ICA (05/17/2019 updated)

When performing ICA, a critical parameter is how many ICs to generate. Usually and by default, the number of ICs to generate is the same as the number of the channels you have. However, this assumption holds ONLY IF your data are guaranteed to be full ranked (to see what it means, jump back to this section). Your data could become rank-deficient by

  1. re-referencing without including the initial reference (rank deficient by 1)
  2. interpolating k channel (rank deficient by k), or
  3. bridging n channels (rank deficient by n-1)

If ICA is forced to generate more number of ICs than the input data rank, 'ghost ICs' appears. pop_runica() has its own rank checker which uses Matlab's rank() function. However, though it can detect the case of 1) improper re-referencing and 2) channel bridging, it DOES NOT detect the rank reduction caused by channel interpolation using EEGLAB (using 'spherical' algorithm). This is because the interpolation is not linear, which adds very small eigenvalues that are counted as non-zero dimension. In fact, pop_runica() already has a heuristic threshold of 10^-7 for eigenvalues to reject such ill-conditioned dimensions, but it is currently used only to generate warning message (which I missed apparently). If you want to use this safer criterion to reject potentially ill-conditioned dimensions, add a minor change to pop_runica() from line 535 as follows (note 'max' is changed to 'min'):

function tmprank2 = getrank(tmpdata);
    tmprank = rank(tmpdata);
    %Here: alternate computation of the rank by Sven Hoffman
    %tmprank = rank(tmpdata(:,1:min(3000, size(tmpdata,2)))); old code
    covarianceMatrix = cov(tmpdata', 1);
    [E, D] = eig (covarianceMatrix);
    rankTolerance = 1e-7;
    tmprank2=sum (diag (D) > rankTolerance);
    if tmprank ~= tmprank2
        fprintf('Warning: fixing rank computation inconsistency (%d vs %d) most likely because running under Linux 64-bit Matlab\n', tmprank, tmprank2);
        %tmprank2 = max(tmprank, tmprank2);
        tmprank2 = min(tmprank, tmprank2);

Thus, it is strongly recommended if your preprocessing makes your data rank deficient by x in total (i.e., they add up!), you explicitly specify (number_of_channels - x) as your 'true' data rank in one of the following ways. Again, this becomes particularly important after channel interpolation, since pop_runica()'s rank checker does not detect the rank deficiency and allows to produce 'ghost ICs' (by the way, same happens to ASR--any PCA/ICA solution would be affected similarly!) To address the issue, you either

  1. reduce data dimension by using 'pca' or 'pcakeep' option for runica() and runamica() respectively, to the number of (number_of_channels - x), OR
  2. reject *ANY* channels by x. Interestingly, they do not have to be the exact same channels that are interpolated. You may want to determine the new channel location montage by using Nima's loc_subsets() function which tells you which channels to be rejected to achieve maximally uniform channel distribution.
% Discard channels to make the data full ranked.
channelSubset = loc_subsets(EEG.chanlocs, yourDataRankHere);
EEG = pop_select( EEG,'channel', channelSubset{1});
EEG = pop_chanedit(EEG, 'eval','chans = pop_chancenter( chans, [],[]);');

Run ICA (06/26/2018 updated)

What is assumption of stationarity in ICA? (09/11/2019 updated)

ICA assumes that ALL the data points submitted are from the same state. For example, if you perform a P300 task while recording EEG, you are making the assumption that all the data points are related to the P300 task. From this viewpoint, it becomes clear that it is more desireable to epoch your data to the event of interest first then perform ICA (data length being permitted) rather than performing ICA on continuous data. What about eye-blinks? It is definitely not related to P300 task (or may be temporally correlated with event structure, if blink timings are behaviorally controlled), and it's not even from the brain. However, if a blink occurs frequently enough during the recording, it becomes a part of stationarity. What about non-task-relevant (i.e., non-event-related) absent mindedness or random thoughts reflected in EEG between trials or blocks? These could be disturbing to the assumtion of stationarity you made for ICA because such EEG signal may not have any repeating patterns or similar statistical characteristics (such as probability density function) across time. Therefore, using ERP paradigms actually gives a very nice justification to use ICA as ERP paradigms reinforces EEG data stationarity by massive repetitions of the same/similar types of stimulus-response sequences. The most ideal for ICA would be quasi steady-state responses, such as mismatch negativity.

  • Is data stationarity more 'attention-drawing' for ICA than large amplitude in signal? I have seen many times that only a single event of EEG response but with a very large amplitude can attract ICA's attention and make it waste multiple ICs to decompose it. So apparently, high amplitude signal is a very strong attention drawer for ICA than a regular-amplitude, well-stationary signals. This is why data cleaning is critical before ICA. It also explains why ICA is so good at decomposing eye blinks--their source location is fixed, potential distribution over the scalp is fixed and broad, and the amplitude is large. By the way, the reason why ICA is biased to source magnitude in practice is because realistic data has finite length.

General tips for performing ICA (06/26/2018 updated)

  • ICA has a bias toward high amplitude, unless the data has infinite length. EEG signal usually has 1/f power spectral density, which means low-frequency signal tend to dominate ICA results. This is why you have a lot of alpha components and no isolated gamma components; gamma power is always very low relative to theta and alpha.
  • ICA may fail if you apply it on data that consists of concatenated blocks with different capping, even if the data are from a same subject. This is because every time an electrode cap is applied, their relative locations to the EEG sources in the brain are not the same. This violates the assumption of spatial stationarity between the electrodes and EEG sources throughout the recording. In other words, To apply a single ICA, your subject cannot take off the electrode cap! If you did this, then ICA may show ICs with very similar scalp topographies with differently block-separated activations (active only in some blocks, otherwise green) in ERPimage.
  • For the same reason, re-gelling electrodes during break times is not recommended. There were cases that ICA split data before and after the re-gelling. Even only one channel out of the original position and it will happen, so be careful.
  • Include EOG, ECG, and other channels as long as they share the initial reference electrode with scalp electrodes. If EOG channels are referenced to each other (i.e., bipolar), exclude them.
  • If there are channels with terrible EMG contamination, excluding them from ICA may improve results. This is because ICA cannot model EMG sources because EMG sources are not spatially stationary but spreads along with muscle fibers.
  • You need minimum of (number of channel)^2 x 20 to 30 data points to perform ICA. The number ’20 to 30’ should increase as the number of channels increases. This empirical formula if from past publications from SCCN. This value was however determined purely empirically and we have not performed any systematic investigation on it (should you be interested in carrying out this study, contact us!) Moreover, there is a good counter-example for this rule of thumb--See the next one.
  • You may want to downsample the data (with more than 128 channels) to near 100Hz. This is the case when you give up gamma band (40-80Hz) and higher due to presence of strong line noise that CleanLine() could not remove very well (CleanLine's assumption is that the noise envelope is stationary, which often does not hold). Although not officially studied or reported, in principle this aggressive downsampling not only helps to accelerate the process but also may improve quality of decomposition due to simplifying the data. Run ICA on the downsampled data and obtain the decomposition matrices (i.e. EEG.icaweight and EEG.icasphere), and apply them to the non-downsampled data. Interestingly, in this case the number of datapoints is severely lower than the number recommended above, but in most of cases it is fine (But I did encounter several cases where AMICA failed due to downsampling; in this case, AMICA worked fine by using the original 256Hz sampling rate, so when coding you might want to use try... catch...). Judging from this, after all it is NOT the absolute number of data points that determines the quality of decomposition, but probably the length of actual time. If you think about using very high sampling rate (10kHz for example) to gain just the number of datapoints to improve ICA results... which does not seem to work to me. For this reason, using higher than 256Hz is probably not very helpful to improve quality of ICA decomposition.
  • You may want to reject epochs on IC activations and run ICA again (Delorme et al., 2007). This is because it will free ICA from data points difficult to explain. A good principle is that feed ICA only with typical (i.e., stationary) data. If you run ICA again on this cleaned data, theoretically you can improve ICA quality, but in practice the difference is placebo level unless you reject 1/3 of data (but remember, due to ICA's bias toward ampltiude, rejecting high amplitude noise/artifact is always immediately helpful, even if it is 0.001% of total data length. Note that high amplitude I mean here cannot be eye blink artifact, they are still benign; What I mean here is those of non-biological origin with > 1000-2000uV.) In general, ICA results are more robust than we think. If your data are already very clean, it is usually impossible to further improve ICA results.

Rejecting ICs to improve the next ICA? Or what is rank and linear algebra (07/16/2018 updated)

Rejecting ICs to improve the next ICA is the idea that keeps alluring many of us but never works. Let us confirm the process and the reason why it does not work.

  1. You have 100ch data.
  2. You run ICA on it to obtain 100 ICs.
  3. You identify 1 eye-blink component.
  4. You reject it. Now you have 99 ICs.
  5. You back-project the 99 ICs to obtain 100ch reconstructed scalp channel signals.
  6. You run ICA again on the reconstructed 100ch data, hoping to obtain another 100 ICs. The new ICA result should be cleaner because this time ICA is free from eye-blink artifact.
  7. You find somehow 100ch ICA gives you only 99 ICs. Certainly the blink IC is gone, but the rest of 99 ICs remains identical as before. .

Where did you fail? It is when you thought you could obtain another set of 100 ICs. Below we see this problem in detail.

Usually, your 100ch recording is rank 100 (to see it, execute double(rank( in Matlab command line) unless some of your channels are bridged/shorted, or average reference was applied. If two channels are bridged, their signals become identical. Clearly, this is not the useful way to use multiple channels. The important principle here is that for channel data to be meaningful, it should not be explained as combination of others (being identical is an extreme case). To see this more clearly, let me give you a simpler example below.

  1. You have Fz, Cz, and Pz data.
  2. You reject Cz data.
  3. You interpolate Cz with 50% of Fz and 50% of Pz.

In this case, you probably see that signal at Cz is not that useful, since it is just made up. In this case, there are 3 channels, but only with rank of 2. By the way, 3 channels with rank of 3 is called full rank, and any state where rank < number of channels is called rank deficient.

Let's stick to this example to explore further. Imagine that you run ICA on Fz, Cz, and Pz to obtain 3 ICs. You know ICA's performance often goes beyond our intuition to almost a magical level (as Clarke said, Any sufficiently advanced technology is indistinguishable from magic). But do you know where the three ICA activations come from? They are not out of thin air, but they are just the result of re-blending the three scalp channel values! For example, IC 1 is 29% of Fz, 33% of Cz, and 38% of Pz, and so on.

  • Q1. Does such a simple re-blending can really reveal the allegedly effective EEG source activations?
  • A1. Yes. To make a claim to be an effective EEG sources, there needs several assumptions of spatial stationarity of the sources etc.
  • Q2. Who determines this blending percentages?
  • A2. That is ICA's primary output.
  • Q3. Where can I find this blending percentages?
  • A3. Each column of EEG.icawinv represents the one-IC-to-all-channels ratio (not exactly percentage). You re-blend the channel values following this ratio to calculate obtain IC activations.

This blending process is called linear transform. The same applies not only to ICA but also to any kind of linear method such as PCA, Non-negative matrix factorization, etc. Wavelet transform is also a type of linear transform, as it reconstructs a given 1-D signal with a blend of basic waves (wavelet daughters) with different frequencies.

Now, let us proceed more. We have 3-channel data Fz, Cz, and Pz, but Cz is an interpolation of 50% of Cz and 50% of Pz. If we run ICA on this three channel data, what happens? Do you think ICA will still give you three independent components, even though your Cz is mere blending of Fz and Pz? Now you think about the meaning of the word independent. If you have signals in Fz and Pz and their blending ratio, you can 100% predict Cz's signal behavior. You probably see that Cz does not carry unique information at all. Rank deficiency indicates this kind of situation. Namely, your number of channels is inflated due to redundancy.

By the way, in the case of average referencing, data redundancy is not concentrated on one channel but in a sense uniformly spread across all the channels, which is why it is hard to see why average referencing causes rank deficiency. In this case, think about what happens when you perform average referencing on two channel data, Fz and Pz. After subtracting average reference, they will be identical, just polarity reversed.)

If you want, I can give you more extreme example: Instead of blending Fz and Pz with 1:1 ratio, we can set the blending ratio to be 0.99:0.01, 0.98:0.02, 0.97:0.03,... to generate 100 channel data. As you see, you can generate as many channels as you want in this way. However, even if you have 100ch data genereated in this way, unique information is present only in Fz and Pz. You can't create information out of nowhere.

Ok, let's get back to the current question: if we run ICA on 3-channel rank-2 data, what happens? The answer is, you obtain only 2 ICs. The number of ICs you obtain is equal to the rank of the data, not the number of channels (as we see, the number of channels could be easily inflated by generating redundancy). This explains the very original question of why the second ICA returns only 99 ICs instead of 100 ICs. When you rejected the blink IC, the back-projected scalp channel data is no longer full-ranked, but it became rank deficient by 1. Because ICA cannot generate independent component out of nowhere (it is NOT a real magic after all!), it will give you the same number of ICs as the data rank. By the way, the rank check is not ICA's inherent property. In the realistic implementation, rank check is performed by other functions. If ICA is set to run to generate 100 ICs from rank 99 data, it either stops in failure or generate weird result (see 'ghost ICs' explained below).

Now, the last question: why do the 99 ICs remain the same? If you read all the explanations above, now you should know what ICA cares and what not. ICA does not care superficial representation on scalp channels in the sense that channel data are just one of the infinite linear combinations of the source activations (imagine your electrode cap, which has 128ch covering entire scalp area, is rotated slightly to arbitrary direction so that all the channel positions are shifted into nameless positions. Under this situation, ICA will still give you the same source locations with the same source activations. ICA sees the channels as projections of sources.) Instead, ICA cares who are independent across channels. Because the identified blink IC is (theoretically) independent of any other ICs, presence or absence of the blink IC does not affect the state of other ICs, hence decomposition is unaffected.

Tips for copying ICA results to other datasets (06/26/2018 updated)

  • You can copy a set of ICA-related matrices i.e., EEG.icaweight and EEG.icasphere to other datasets. EEGLAB GUI operation supports this process. Thus you can for example learn ICA matrix from aggressively cleaned data, while applying the results to the mildly cleaned data to save data points.
  • In doing so, however, if noise is present which ICA did not learn, the noise will be present in the ICA results.
  • However, make it ABSOLUTELY clear about the two datasets for the following points.
  1. They have the same number of channels in the same order.
  2. They have the same (re-)referencing status.

The problem here is that even if you make a serious mistake, EEGLAB does not let you know about it. Therefore, you have to check it yourself carefully. To confirm if you did not make any mistake, [Plot] -> [Component activations (scroll)] to see the IC activations. If the copying process is successful, you see variances (i.e., amplitude) of ICs are sorted as usual, the largest on the top and the smallest in the bottom. Note that scalp maps for ICs (which is EEG.icawinv itself) look normal even if you have this problem, so do not be misled. Always check IC activations. If the copying process ended up with failure, you will notice the following symptoms.

  1. IC activations are not sorted in the order of variance i.e., it does not follow the rule that largest on top and smallest at bottom.
  2. Multiple IC activations are temporally correlated (often a pair with polarity inversion)
  3. Some of IC activations, which is not necessarily top-ranked, shows abnormally large amplitudes.

See below for the example.

Figure 4. Corrupted data due to failure in copying ICA weight matrix.

  • There is a paper claiming that ICA (actually, any linear decomposition) distorts phase (Montefusco-Siegmund et al. (2013). Effects of ocular artifact removal through ICA decomposition on EEG phase.) Some people hence claim that one should avoid using ICA at all. This claim is illegitimate (as Arno says 'resounding no' in his page linked below). In this wiki page, my colleagues and I illustrated the mathematical process of how phases are synthesized before and after component rejection to show that 'this is a due change in phase and nothing surprising'. See also this page written by Arno showing empirical demonstration that rejects this claim.

To learn how to evaluate EEG and artifact ICs (01/24/2017 updated)

Please visit this site to start the training. This nice project was designed by my colleague Luca Pion-Tonachini.

Unofficial instruction to install AMICA (07/17/2018 updated)

Download AMICA from EEGLAB extension manager [File]-> [Manage EEGLAB extensions] -> [Data processing extensions] -> Check AMICA to Install. Below I show old information.

  • Use AMICA developed by former SCCN colleague Jason Palmer. AMICA does better job than Infomax (see Delorme et al., 2012).
  • Below I show an example of successful installation for Windows 7 64bit. I also heard a report of successful installation on Mac OS Sierra on Core i7, using Matlab 2014b and EEGLAB 13_6_5b. This does not mean EEGAB 13 is more stable for this purpose; Use the current version of EEGLAB.
  • DO NOT use characters other than ALPHABET and NUMBERS in the file path. For example, if you have a space in the file path, IT WILL FAIL! (Thank you Xuan Tran for helping me to find this out)
Unofficial instruction to install AMICA for 64bit Windows.
1. Go to Jason Palmer's website to find the links to the files.
2. Create a folder named 'amica' under /eeglab/plugins
3. Download and copy the following 5 items to /eeglab/plugins/amica
   -amica15mkl.exe (Windows 7/10 64-bit binary)
4. Download and copy the following 2 items to /Windows/System32
   -fmpich2.dll (Windows DLL, save in directory with binary)
   -libiomp5md.dll (Windows DLL, save in directory with binary)
5. Download and execute mpich2-1.4-win-x86-64.msi (for 64bit Windows) from
   Double-click the downloaded .msi file to install it.
6. Install postAmicaUtility() plugin from EEGLAB plugin manager. Otherwise, there is no GUI button to load AMICA results.
   Below I show an exmaple code to load AMICA results from command line. Note that the variable 'dataName' is the name used for AMICA result output folder.
   EEG.etc.amicaResultStructure = loadmodout15(['/data/projects/example/amicaResults/' dataName]);
   EEG.icaweights = EEG.etc.amicaResultStructure.W;
   EEG.icasphere  = EEG.etc.amicaResultStructure.S;
   EEG = eeg_checkset(EEG. 'ica');

RMS option issue (07/02/2020 added)

By default, 'If set, scale ICA component activities RMS (room mean square) in microvolt (recommended)' under 'Memory and other options' is on. However, this does NOT take effect UNLESS a check function eeg_checkset() is ran. This means that loading .set files is sufficient (but this overwrites 'EEG.etc.icaweights_beforerms' as well, which is inconvenient and confusing). Hence, if you are to save the data before moving to the next step, there is no problem because in loading the data eeg_checkset() is ran. However, if you continue to process your data after ICA, the variables including EEG.icaweights, EEG.icaspheres, EEG.icawinv, EEG.icaact are NOT scaled to RMS. If you do not know about this fact, you may encounter a problem like this (Thank you Seyed!) so beware of this behavior.

Does ICA distort phase? (07/27/2020 updated)

This question is sometimes asked in this form, but this is rather a strange one. An example of a linear decomposition is 7+5 = 4+8. As such, that itself should not do anything at all (other than suggesting a different view of the data, and some of them may be useful; for example, 10+2 may be more understandable than 7+5 for a good reason, and another linear transform of 2+2+2+2+2+2 seems to have different usefulness as well).

If you 'remove component' after ICA, it DOES change the data. For example, if you remove 4 from 4+8, does this change the result? Of course yes. 4+8 and 8 are two different things. The real question is: Do you call it distortion? No. (Return to Wittgenstein--the meaning of the word is the use of the word)

For those who want to see more formal proof of the fact that post-ICA component rejection does NOT distort the phase, I prepared a separate page. This proof simply shows that IC rejection causes linear phase change that is the essentially the same as removing 5 out of 7+5, which you do not call 'distortion'.

There is also an empirical demonstration available in this page.

Is using PCA absolutely bad for ICA? (09/08/2020 added)

(This is for Nir Ofir) Since Fiorenzo published this paper, I started to hear this new superstition--'Using PCA is bad for ICA.' That's not what he meant. Fiorenzo is my good friend and a colleague. When he was writing the paper, I had a chance to discuss the issue with him in Berlin. He told me that sometimes people use PCA dimension reduction to speed up the process of ICA, but it deteriorates quality of ICA decomposition. Yes, there is no doubt about it. However, people who do not have background knowledge about these things seemed to misunderstand it (perhaps by reducing dimensions of his original explanation), that 'using PCA is absolutely bad for ICA.'

The reality is, there are cases in which using PCA is TOTALLY fine and even recommended as the best solution. One such case that causes NO LOSS of decomposition quality is to use PCA to specify the true rank of data. For example, you have 100 channel data but for some reason it has rank of 99 (perhaps average reference is applied without including the initial reference? This is however a suboptimal approach though; See this section). Typically, ICA algorithm comes with a rank checker (with heuristically determined minimum eigenvalue threshold of 10^-7, see pop_runica line 531) so feeding a rank-deficient data usually does not cause an issue. However, rarely it does; See this section to see what can happen. In order to avoid this rare problem, using 'pca', 99 (or 'pcakeep', 99 for the case of AMICA) in this case is PERFECTLY fine. It only removes risk of running into this rare 'ghost-IC' problem, and most likely it just does the same job as the implemented automatic full-rank adjuster in code to run ICA.

Another case in which use of PCA dimension reduction is justified is to process data that is too short to produce the number of ICs as many as the number of electrodes. For example, suppose you have 128-ch EEG data. Great. But what if your recording time is only 5-min? Producing 128 ICs from 5-min recording would cause serious shortage of data to learn (remember SCCN's rule-of-thumb formula to estimate minimum data length for ICA, which is the number of electrode ^2 x k, where k is minimum 20 to 30...). In such a case, using PCA is practically a requirement. By the way, you can also achieve the same effect by discarding electrodes, which however seems a self-destructive solution and reminds me of a story of Procrustes' bed. If you think discarding scalp electrode is more natural, organic, and non-GMO solution compared with using PCA, that is wrong. Rather, using PCA saves you electrodes in this case while reducing the number of ICs properly to avoid shortage of data to learn.

Head model coregistration and estimate single equivalent current dipoles (07/02/2020 Update)

Head model coregistration

Now we want to fit a source model, an equivalent current dipole, to the scalp topography generated by ICA. Before doing this, there is an important process of head model coregistration. Your data has electrode locations that are defined in its xyz coordinates. EEGLAB has an electric forward model of the head defined in its xyz coordinates. The problem here is the two space coordinates may or may not correspond to each other. For example, in some coordinate system x-axis is left-right and y-axis is anterior-posterior, while for other coordinate system this xy relation is opposite. Actually, it is often the case that the left-right axis and the anterior-posterior axis are swapped between your electrode locations and EEGLAB's head mode so that front in electrode space means right in the model head space! To address this inconsistency, you need to coregister the two coordinate systems such that front means front and right means right. 'Tools' -> 'Locate dipoles using DIPFIT 2.x' -> 'Head model and settings'. For detail of the procedure, see EEGLAB online manual.

A possible issue you may encounter here (11/07/2020 added)

This is for Syanah Wynn. EEGLAB's head model coregistration uses millimeter as a default unit. However, sometimes your electrode locations are registered in meter or centimeter, depending on how the electrode location data are generated. In the figure below, I show an example of what you would see if centimeter is the unit for the electrode locations instead of millimeter.


You can see your current electrode locations (green) clustered at the center of the head (do not forget to push 'Mesh Off' button on the left-hand side). If this is the case, the following code is the solution.

for m = 1:length(EEG.chanlocs)
    EEG.chanlocs(m).X = EEG.chanlocs(m).X*10;
    EEG.chanlocs(m).Y = EEG.chanlocs(m).Y*10;
    EEG.chanlocs(m).Z = EEG.chanlocs(m).Z*10;

This is equvalent to converting centimeter to millimeter. Similarly, if meter is used for your electrode location coordinates, the following is the solution.

for m = 1:length(EEG.chanlocs)
    EEG.chanlocs(m).X = EEG.chanlocs(m).X*1000;
    EEG.chanlocs(m).Y = EEG.chanlocs(m).Y*1000;
    EEG.chanlocs(m).Z = EEG.chanlocs(m).Z*1000;

Single equivalent current dipole fitting

Once the coregistration is done, your data is ready to perform equivalent current dipole model fitting. 'Tools' -> 'Locate dipoles using DIPFIT 2.x' -> 'Autofit (coarse fit, fine fit & plot)'. You need to run coarse fit first then fine fit next, but this 'Autofit' does both together anyway. Note that 'Rejection threshold RV (%)' here should be 100, otherwise you'll have trouble in making STUDY later. Do not change anything on the GUI and just press 'OK'. Because we are fitting a single dipole to a scalp projection of each independent component which is readily (most of the time single- and sometimes symmetrically bilaterally-) dipolar, this estimation process is greatly eased and also eliminates complication in determining the optimum number of dipoles. For more detail, see below.

Deep dipole results

This ICA + single dipole fitting approach has abovementioned beauty but also has a known issue of producing physiologically invalid deep dipoles. I discussed the issue below.

Search for and estimate symmetrically constrained bilateral dipoles

Download fitTwoDipole() plugin for EEGLAB and perform it after fitting single dipoles. It is compatible with eegh, and you can make a batch. The code is like this: EEG = fitTwoDipoles(EEG, 'LRR', 35); Do not forget to site Caterina's paper.

Create STUDY

When creating STUDY with epoched data (assuming that you have within-subject conditions), I recommend that you use one set file per subject and never save the data separately for each condition (which is an old approach used before was implemented). In theory both works, but there were reports of problems caused by separate .set files. I recommend you use to let STUDY know the factorial design of the analysis.

When creating STUDY, I usually load all the .set files into EEGLAB using pop_loadset() with 'loadmode', 'info' (it may be possible to do it from GUI too, but it is convenient to input EEG.subject and altogether at this point). Importantly, creating a STUDY cleans your data because it performs IC rejection based on the two criteria: 1) dipoles with > 15% residual variance--Artoni et al. (2014) showed (indirect) evidence of why this value is about optimum; 2) dipoles outside brain--no argument. This is why you don't need to bother to manually reject ICs at the individual level. Don't forget to set before you start precompute. After precompute, you precluster the ICs by choosing clustering criteria.

Avoid double dipping (03/07/2019 updated)

If you are going to test ERP/ERSP/ITC/PSD difference (for example) across conditions at the end of the day, DO NOT choose any of them to cluster ICs. This problem is called 'double dipping', a type of circular inference that is known to inflate false positive rate. If you have not heard of the word 'circular inference', 'double dipping', or 'voodoo correlation', see Kriegeskorte et al., 2008, Kriegeskorte et al., 2009, and Ed Vul's voodoo correlation site. Note that these were published more than 10 years ago. Note also that much more fMRI researchers are aware of the issue than EEG researchers. So 'I have never heard of it from any of my colleagues' can not be an excuse. Ok, so why can you not use ERP to cluster ICs, then to compare ERP amplitude across conditions? If I understand the problem correctly, there are at least two issues:

  1. The resultant clusters have, by definition, smallest variance in the given measure (i.e., ERP) within a group. Given that mean values remains the same, variance becomes smaller means increase in probability for getting significant results i.e., inflation of Type I error (false positive).
  2. I heard some say the above problem can be avoided by using non-parametric test. Fine. However, before performing actual inferential statistics, you form IC clusters, in which ICs with dissimilar ERP waveforms are actually kicked out (because that's what you are doing.) But it is a good scientific practice to kick out any data points just because they are 'against your hypothesis/assumption'?

Then, how can we avoid this issue? The answer is simple: USE ONLY DIPOLE LOCATIONS to cluster ICs, and never use any measure that is derived from time/frequency domains. EEGLAB does allow users to choose multiple clustering criteria, which is misleading. If you do not want to experience hard time to explain to your reviewers why you chose ERP/ERSP/ITC/PSD to cluster ICs, just use dipole locations.

Can we determine the optimum number of IC clusters? (01/14/2020 added)

For Fran and Seyed, I would like to share my solution. But this is usable only for the case where you have only dipole locations as a clustering criterion. Sorry it is not nicely wrapped up into an EEGLAB plugin, but you have to run some code, which is a bit too bulky to paste here so please be referred to this page.

Why using dipole locations does not cause double dipping? (03/07/2019 updated)

Dipole location is estimated from an IC scalp topography, which is a direct property of ICA (it is a column of EEG.icawinv). Why do I say that using dipole locations is safe in terms of double dipping? You can empirically prove it with an simple experiment: Randomize the order of channel assignments before you perform ICA. What do you get after ICA? You will get the identical results in time-series and power spectral density. However, you will get randomized scalp topography. What does this mean? This is a proof that ICA is agnostic on spatial information. Therefore, using dipole locations estimated from scalp maps does not cause double dipping in testing time-domain and frequency-domain data.

A side note--here, an interesting question is that 'Then, why are ICA results always associated with nice dipolar scalp topographies?' Indeed, in Delorme et al. (2012), they even made a claim that more mutual information reduced, more dipolar their scalp topographies become. Let us tentatively call it 'independence-dipolarity identitiy (IDID)'. As I mentioned, because ICA is agnostic on spatial information, dipolarity of the scalp topographies is not a direct property of ICA itself. Thus, an explanation for IDID is non-trivial. Unfortunately, SCCN has not published any 'official' explanation for this, but instead has been using it as a proof that ICA meets physiology. It seems convincing, but to really convince researchers from other fields, we will need good supporting evidence in electrophysiology.

Why is my ERSP/ITC data shorter than ERP epoch? (08/20/2021 added)

This is for Sean Gilmore, but it is also one of the most often asked questions. To explain this, I copied two slides from my previous EEGLAB workshop slides below. To make it clear, this phenomenon of shortening the epoch after time-frequency transformation is NOT specific to EEGLAB or newtimef(), but this is a general phenomenon related to a concept cone of influence (COI) (see this webpage in the Matlab help center for definition of COI.) In the left panel below, one of the lines from the command line log of newtimef() says 'The window size used is 279 samples (1116 ms) wide'. This describes the width of the longest wavelet. Actually, if the target frequency goes higher, this window should become shorter (unless you specify short-term Fourier Transform by specifying the cycle increase parameter 0). This change of the window length shapes the 'cone'. But in EEGLAB, the algorithm does not care to shape the cone. Instead, it simply cuts everything outside the narrowest part of the 'cone' that is necessarily at the lowest frequency. See the right panel below and notice the difference between the onsets of the EEG start and the ERSP start. Why is the ERSP start delayed? Actually this delay is only apparent because the sliding window does take the initial data points for calculation. But because one sliding window only generates one time point, the total number of points between before and after the wavelet transform cannot be the same. In the current case, you lose one sliding-window long datapoints minus one. This datapoint loss is divided into two parts: the half of the window goes to the beginning of the data, and the other half of the window goes to the ending of the epoch. Thus, if the newtimef() log line says 'The window size used is 279 samples (1116 ms) wide', you lose 278/2=139 points (1112/2=556 ms) from the beginning and ending of the epoch after wavelet transform. When you epoch the data, if you know you will perform time-frequency analysis later, you have to take this datapoint loss after wavelet transform. Otherwise, you have to redo your process from epoching again.

Slide2.jpg Slide3.jpg

A tip to compute time-frequency transform (i.e. ERSP & ITC) (08/20/2021 updated)

EEGLAB uses its own way to define wavelet basis. Technically it is a mixture of short-term Fourier Transform and Wavelet Transform since the cycle number varies by default (for detail, see the archived communication starting with this. This was the best discussion ever in the history of the mailing list.) Now, the problem is how to set the parameters to perform it properly. Of course the best parameters depends on your application. But such logically self-evident statement carries no information. I recommend 3 cycles at 3 Hz for the lowest frequency, and about 7-10 cycles at 50 Hz (or 15-20 cycles at 100 Hz, if 100 Hz is the maximum frequency you want to analyze.) I say this because I remember people used 7 cycles to analyze 40Hz gamma in early 2000's. This 'number of cycle' parameter is not very sensitive at high frequencies, and the difference of 2 or 3 would not make noticeable differences. I copied a slide from my previous EEGLAB workshop slides below. Comparison between 5 cycles vs. 25 cycles at 50 Hz. Overall, my advice is that do not be too nervous about it.


What is the optimum and maximum number of clusters? (11/30/2020 edited)

This article is based on a review work by Nattapong Thammasan. For the optimum number of IC clusters for k-means algorithm, see my another Wiki article. For the maximum number of clusters, there is no hard limit of course, but from several literatures the suggested number is 30. First, in Nima's Measure Projection paper, there is this description: 'a desired number of IC clusters (10–30).' However, this paper does not show the reason for this number, so not a very strong evidence for the current purpose. In this David Groppe's paper, there is the following description: 'Based on ICASSO's cluster quality index for each cluster, 30 clusters were chosen as corresponding to reliable ICs.' This ICASSO is the algorithm used here by Himberg et al (they published other important ICA papers too). This paper by Groppe and colleagues seems so far the best justification to set the upper bound of the number of clusters for k-means to 30. Finally, our collaborator Caterina used 60 clusters and 30 clusters for infant and adult, separately, using PSD and scalp maps as criteria to separate artifacts from brain sources in this paper. Her two-stage IC-clustering approach resembles what I used to recommend (until Luca's ICLabel became available--then I don't do the two-stage IC clustering. For the first clustering, just PSD is used as a clustering criterion to reject artifact clusters which mainly focuses on muscle ICs). If you want to use a large number of IC clusters in EEGLAB STUDY, I recommend you set it to 30 and cite Groppe et al. (2009) for justification, and other two papers as optional.

Notes (Re-organized 04/23/2020)

How to cite this page (04/23/2020 added)

Before showing the example format, let me discuss validity of citing this page. The are two issues.

  1. Validity of science--Nothing is new in this page. All methods described are either published or logically derived in each situation. I am not proposing a new idea that needs to be validated. As such, probably you don't need to bother to cite this page. This page is like an extension of our EEGLAB mailing list. If you learn something new in this page, it is like you learn something new from a conversation with your colleagues. However, sometimes you may need evidence for a specific process. In case you do not see a reference associated for what you need to cite, please write me directly so that I can point you to proper references. I would be also happy to write a paragraph of rebuttal to your reviewers if they give you unreasonable criticism.
  2. Validity of citing an URL to papers--I don't think it is authentic because websites are subject to changes or even deletion. On the other hand, it is becoming more common to cite Github pages in papers. I guess citing a website is a transitional solution to an emerging situation in science, and as such it seems ok to me as long as it is more useful than not.

That being said, let's talk about the practical solution. As my background is psychology, I would stick to my home ground's standard protocol, which is American Psychological Association (APA) style. Here is an example and explanations of the case of citing a Wikipedia page using APA style. However, this is Swartz Center's website using Wiki style and not a part of Wikipedia. Given those differences, if you would ever want to cite this page, the format you should be using is the following.

Makoto's preprocessing pipeline. (n.d.). Retrieved April 23, 2020, from's_preprocessing_pipeline

Make sure you change the date from 'April 23, 2020' to the date you read this page. It is indeed strange to cite an URL starting with someone's name like this. Historically, Makoto's was added to this wikipage's title to show hereticalness (i.e., being unofficial) in contrast with SCCN's official teachings, as the note of caution in red on the top of this page indicates. It was certainly not to advertise my name.

Alternative, more automated pipeline using ASR (11/13/2019 updated)

One of the drawbacks in the abovementioned cleaning approach is that if a short burst of artifact appears in one of two hundred channel data, for example, you still need to give up the entire epoch. The more channels you have, more probable it is for you to encounter this issue, like a false positive rate that inflates as you increase the number of tests, so you need to reject more epochs... do you think this is a good deal? Did you spend money to buy a recording system with more channels to reject more data epochs? Something is wrong... if you agree to this argument, using Artifact Subspace Reconstruction (ASR) is a good way to address this issue. I put a summary of ASR on the different page for detail.

Suggested preprocessing pipeline (11/27/2019 updated)

  1. Change the option to use double precision
  2. Import data
  3. Downsample if necessary
  4. High-pass filter the data at 1-Hz (for ICA, ASR, and CleanLine)
  5. Import channel info
  6. Remove line noise using CleanLine
  7. Apply clean_rawdata() to reject bad channels and correct continuous data using Artifact Subspace Reconstruction (ASR).
  8. Interpolate all the removed channels
  9. Re-reference the data to average
  10. Run AMICA using calculated data rank with 'pcakeep' option (or runica() using 'pca' option)
  11. Estimate single equivalent current dipoles
  12. Search for and estimate symmetrically constrained bilateral dipoles
  13. Generate probabilistic IC labels using IClabel() for IC rejection
  14. Epoch IC-rejected data to -1 to 2 sec to event onset (So far is the subject-level process)
  15. Create final STUDY specifying full (This is the group-level process using STUDY)

The control over clean_rawdata() was transferred to the main developer of EEGLAB. The main GUI no longer looks like this, but it should be more or less the same. I'll update it someday.

Figure 5. Sample parameter settings for pop_clean_rawdata.

  • The first box is disabled (-1) because I assume there is no flat channels. This can be turned on if you want.
  • The second box is disabled (-1) because FIR high-pass filter is already applied. I think this plugins applies IIR since this cleaning solution is originally for realtime online data processing.
  • The third box means that if a channel is correlated to the surrounding channels less than 0.8, the channels is rejected. If it rejects too many channels, use less value.
  • The fourth box is disabled (-1) because line noise is supposed to be already taken care of.
  • The fifth box is the ASR. This is what it does.
  1. Find calibration data (i.e. the cleanest part of data)
  2. Apply 0.5-s sliding window principal component analysis (PCA) on the continuous data.
  3. Classify PCs into high variance (in this case, 8 standard deviation from the calibration data) or normal variance.
  4. Reconstruct high variance subspace (i.e. group of channels associated by a PC) with normal variance subspace. An important assumption is that neighboring channels are heavily correlated due to volume conductance and scalp mixing (by the way... because of this, the central limit theorem works and the scalp channel signals become closer to Gaussian than source signals; this guarantees ICA. See also Arnaud Delorme's explanation of ICA as non-Gaussianity maximization). This is how one can estimate a signal of one channel from signals of adjacent channels. From my experience, I would advice that one should start with very lax value, such as 10-20, to use it sparingly to correct only absolutely bad glitches that critically influences ICA's performance. Nima's result showed that ASR with SD=20 produced the best result. My interpretation for this is that we should let ICA explain the data as much as possible so the correction using PCA-based method should be minimized, and the empirical balancing point is somewhere around 10-20.
  • The sixth box is the window rejection criterion. If more than 0.25 of channels are judged to be bad even after ASR, the window will be rejected.

When using ASR, this is a good guidance to control the total amount of rejection.

  • You need more channels -> Use lower values in the third box
  • You need more data points -> Use lower values in the fifth box (i.e. correct more using ASR) and use higher values in the sixth box (i.e. reject less windows).

Important note: clean_rawdata() generates channel and datapoint masks that indicate rejections. They are stored under EEG.etc.

How to perform automated IC selection using IClabel() (11/13/2019 updated)

IClabel() is a relatively new EEGLAB plugin written by my former colleague Luca Pion-Tonachini (you pronounce tonakini, not tonacheeni). I found this plugin wonderful. I thought I can finally replace my manual component selection process with this completely automated process. Here, I show an example code of how to perform brain component selection with label probability > 0.7 and residual variance < 0.15. In case you do not know what 'residual variance' mean, see the next subsection.

% Perform IC rejection using ICLabel scores and r.v. from dipole fitting.
EEG       = IClabel(EEG, 'default');
brainIdx  = find(EEG.etc.ic_classification.ICLabel.classifications(:,1) >= 0.7);
rvList    = [EEG.dipfit.model.rv];
goodRvIdx = find(rvList < 0.15); 
goodIcIdx = intersect(brainIdx, goodRvIdx);
EEG = pop_subcomp(EEG, goodIcIdx, 0, 1);
EEG.etc.ic_classification.ICLabel.classifications = EEG.etc.ic_classification.ICLabel.classifications(goodIcIdx,:);

What does 'residual variance' mean (11/13/2019 updated)

'Residual variance' is a typical Swartz Center jargon (other examples includes 'PVAF', 'dipolarity', and 'temporally maximally independent'.) But the problem of 'residual variance' is, I believe, that it is underdescribed. For example, residual, from what operation? Variance, across what? In fact, until several years ago I probably could not describe how it is calculated (another such example of IC ERP... you'll be probably surprised to know how complicated it is.) I created a slide to explain the concept of 'Residual variance'.


To answer my previous question, 'residual' means the difference in 'IC scalp projection' minus 'theoretical dipole projection', and 'variance' means 'variance of the differences across all the scalp channels'.

However, another layer of difficulty is to see the relations across IC, equivalent current dipole, and its scalp projection. ICA generates mixing matrix (A in X=AS) and unmixing matrix (A^-1 in S= A^-1)X). Scalp projection of IC is just a visualization of the columns of A^-1; you tile the values in this column in the order of your register channels and you obtain the IC scalp maps. Equivalent current dipole is fitted to this IC scalp map. After fitting the dipole, you can calculate noiseless scalp projection from the estimated dipole. Finally, the noiseless scalp projection is compared with the IC scalp map to calculate residual variance in the following way.

residual_variance = var(IC_scalp_map - noiseless_scalp_projection)/var(IC_scalp_map)

Does the number of channels affect 'residual variance'? (04/10/2020 added)

Yes, the more number of scalp electrodes you have, more likely you see higher 'residual variance'. This is simply because you have more probability to get noise if you have more electrodes. This seems related to the reason why multiple comparison correction is necessary in statistics--as you increase the number of electrodes, probability of having at least one bad electrode inflates as well. With 19-electrode system I saw really low residual variances from time to time which I have never seen with a dense electrode array system, but that does not necessarily mean 19-electrode system's data is better.

Comments on ASR (updated 04/23/2020)

Despite increasing popularity, ASR's mechanism is poorly understood. ASR's mechanism is complicated indeed. It is partly because it was originally developed for real-time BCI application purpose. To provide insight and basic outline of the algorithm, I prepared some documents. Please be referred to this page to see them. The mantenance of ASR was officially transferred to the main EEG developers.

ASR FAQ (01/07/2020 updated)

  • Q1. In the screenshot below, why the quiet portion between 0.4-0.5 was needed to be removed?


(The screenshot by courtesy of Małgorzata Plechawska-Wójcik)

  • A1. Three possible problems:
  1. The data are noisy all the time, so 'calibration data' (i.e., the cleanest part of the data) also contains noisiness.
  2. The left-tail cutoff is too high (default -3.5) in RMS Z-score distribution.
  3. The low-frequency drift is present.

I will explain each of them below.

1) The 'calibration data' (i.e., the cleanest part of the data) ASR chose in the very initial stage of the process contains high amplitudes on average than these 'quiet' chunks. So it is relative to within-data variance; if your data are generally noisy, you will see more rejections of these 'quiet' chunks.

2) Let me explain the mechanism to begin with. The final window rejection is performed in the following way:

  1. Applying the custom heuristic IIR spectral weighting (8th order Yule-Walker). See the screenshot below (taken from Christian's PPT).8thYuleWalker.jpg
  2. Take root-mean-square (RMS) amplitude of the filtered data.
  3. Perform Z-score transform.
  4. Find datapoints outside -3.5 and 7.0 as still noisy.

Notice that Z-score below -3.5 is determined to be 'still noisy'. If you don't want to discard any data chunk for being 'too quiet' (RMS variance too small) at all (which is understandable as long as I see the above example), please change the value [-3.5] into [-7.0], [-10.0], or even [-inf] (i.e., negative infinity) at clean_artifacts() line 184. You may also want to make the same change in line 183 for consistency.

 line 183
 Before: {'burst_crit_reftolerances','BurstCriterionRefTolerances'}, [-3.5 5.5], ...
 After : {'burst_crit_reftolerances','BurstCriterionRefTolerances'}, [-inf 5.5], ...
 line 184
 Before: {'window_crit_tolerances','WindowCriterionTolerances'}, [-3.5 7], ...
 After : {'window_crit_tolerances','WindowCriterionTolerances'}, [-inf 7], ...

This will disable ASR to reject chunks of data for being too 'quiet' (RMS variance too small).

3) The custom heuristic IIR spectral weighting (8th order Yule-Walker) has a quite a (inverse) drop near DC, which does not exactly follows 1/f patterns. This means that more aggressive, high cut-off frequency in designing high-pass filter for DC suppression is justified for this filter. Maybe the remaining low-frequency drift emphasized by this heuristic IIR filter went over the rejection criterion.

  • Q2. I do not want to reject this much data. What parameter should I change?
  • A2. If you use lax threshold for ASR SD, e.g., 8 -> 20, then ASR will clean less data, therefore there will be more chance for the final window rejection. Another factor to control the window rejection rate is the final 'window rejection'. 0.3 means 'If more than 30% of channels still show above-threshold RMS amplitudes, reject this window' Therefore, if you want to reject less data, you should set this value to be high. If you set it to 1, no window will be rejected so that your data length will be equal between before and after the cleaning.
  • Q3. Does ASR reject signal and noise altogether?
  • A3. See this page for the answer.

How to determine the parameter for ASR (updated 09/28/2018)

There is a dedicated study by my colleague Chi-Yuan Chang at SCCN. Let me cite Chi-Yuan's conclusion here. 'Empirical results show that the optimal ASR parameter is between 10 and 100, which is small enough to remove activities from artifacts and eye-related components and large enough to retain signals from brain-related components.' If you hear SD == 10 it sounds crazily far from the mean, but for the particular distribution of RMS channel EEG this is the value (hence 'Empirical results show...'). Also, in our unpublished study done by Nima, ASR with SD==20 showed the best cleaning performance in terms of his robust mutual information reduction measure compared with other parameters (ASR with SD==40) or other non-ASR methods. So I would recommend something between 10-20.

Further optimizing the preprocessing order? (02/28/2020 updated)

You may find a lot of 'alpha flooding' (Makeig and Onton, 2011) all over the channels in clinical data, which ASR rejects a lot of windows. In this case, performing average referencing before ASR may help. The suggested preprocessing order should be:

  1. Change the option to use double precision
  2. Import data
  3. Downsample if necessary
  4. High-pass filter the data at 1-Hz (for ICA, ASR, and CleanLine)
  5. Import channel info
  6. Remove line noise using CleanLine
  7. Apply clean_rawdata() only to reject bad channels.
  8. Interpolate all the removed channels
  9. Re-reference the data to average
  10. Apply clean_rawdata() only to perform Artifact Subspace Reconstruction (ASR).
  11. Re-reference the data to average again--this is to reset the data to be zero-sum across channels again.
  12. Run AMICA using calculated data rank with 'pcakeep' option (or runica() using 'pca' option)
  13. Estimate single equivalent current dipoles
  14. Search for and estimate symmetrically constrained bilateral dipoles

To determine which preprocessing is better, the standard one or the modified one, requires an empirical study with good amount of data and computation.

For purists who do not want to use any form of interpolation (03/16/2018 updated)

If you cannot bear with interpolation, which I think is common among us psychologists and non-engineer EEG researchers, I recommend trimOutlier(). It simply rejects above-threshold channels and datapoints in amplitude. The purpose of this plugin is to perform simple and conventional data rejection conveniently. The cleaning performance may not be as good as ASR, but it never performs interpolation, which could be a relief for skeptics (I am). One can perform quick but thorough cleaning for ICA (ICA does not care how data are chopped up), and the obtained ICA weight matrix can be applied to the less-aggressively cleaned data. It is also a very useful tool to visualize all channels and datapoints in one window (somehow I don't know any other tool that does this).

Also, check out 'The PREP Pipeline: Standardized preprocessing for large-scale EEG analysis' by Bigdely-Shamlo, Mullen, Kothe, Su, and Robbins (2015) for the theory of upstream EEG preprocessing, as well as their tools freely available for Matlab.

Example of batch code to preprocess multiple subjects (01/12/2017 updated)

See this page.

Dependency across the preprocessing stages (07/05/2019 updated)

During the 28th EEGLAB workshop at UCSD, I heard from a group of people that they want to see the reasons of the order of the preprocessing stages.To answer this question means to show dependency of the stages in preprocessing. To respond to this unexpectedly popular request, I prepared the following document (those who joined my dinner table, sorry for taking so long time!) The numbers on top of the connecting curve lines indicate reference numbers for the explanations shown below. In the boxes, parenthesis means an optional process.


  1. [Remove non-filterable artifacts -> High-pass filter the data]: This preprocessing is necessary for the cases such as simultaneous fMRI-EEG or simultaneous TMS-EEG (see this page). The high-amplitude artifact of the external origin reaches to the order of millivolt. If you apply a standard frequency filter, it will leave a massive residue on the data (if you wonder how it happens, see this page). Therefore, you should NOT apply a standard filter to these (rather rare) types of data. If you are curious to know what to do with the artifact, see this page for the case of simultaneous EEG-TMS.
  2. [(Downsample the data) -> (Run Source Information Flow Toolbox)]: Downsampling to something like 100 Hz is generally recommended for connectivity analyses. The datapoint-to-parameter ratio is calculated by ((num_IC)^2*model_order)/window_length*num_trials. If you downsample the data, it will (probably) reduce the model order and (by definition) reduce the window_length. Because you want to have datapoint-to-parameter ratio < 0.1, meaning that 10 data points are used to estimate one parameter point, and smaller is better (i.e., more data points are available to estimate the model). Generally speaking, donwsampling to around 100 Hz will worsen the datapoint-to-parameter ratio. However, sometimes it is still necessary to do it. For example if your data set has a bad line noise for which you needed to apply a low-pass filter to push the high 50/60 Hz peak into the maximum suppression frequency range. However, because you don't want to have stopband within the Nyquist frequency range (which is half of your sampling rate in Hz), if you have a big 50/60Hz line noise which cannot be reduced by CleanLine (which does happen quite often, unfortunately), the only possibility is to downsample it so that the 50/60Hz is outside the Nyquist frequency. In doing so, do not forget to apply the additional two inputs, like EEG = pop_resample(EEG, 100, 0.8, 0.4); 0.8 means 100/2*0.8 = 40Hz is the -6dB point, and 0.4 means 100/2*0.4 = 20Hz as a transition band width. Thus, EEG = pop_resample(EEG, 100, 0.8, 0.4); means 'Downsample the data to 100Hz using antialiasing filter with 40Hz cutoff point (-6dB) and transition bandwidth of 20 Hz so that the pass-band edge is 40-20/2 = 30 Hz, and the stop band is 40+20/2 = 50Hz.' See more info here.
  3. [High-pass filter the data -> CleanLine to remove line noise]: If I remember correctly, if you have a large DC drift in your data, CleanLine does not work well. It's been while since I had tested it, so this needs to be revisited again to double check.
  4. [High-pass filter the data -> (Epoch the data to an event onset)]: Filtering after epoching (i.e., crop out the peri-event data points and maybe stack in the third dimension;,:,:)) is not a bad idea. The FIR filter, which could be considered as a sliding window process, cannot process near the edge of the epoch (half of the width of the FIR window). The same happens around the 'boundary' event in EEGLAB. When filtering process finds the 'boundary', it will stop when the right edge of the sliding window hit the 'boundary', and resume from the point where the left edge of the sliding window is touching the 'boundary', so those data points that has the width of the sliding window and centered at the 'boundary' data are skipped (which is why you see the unfiltered original noise around 'boundary'). Particularly, FIR high-pass filter requires a quite long filter order (>2-3 sec), so it does not work after epoching. By the way, filter order means the width of the sliding window in data points) which could be even longer than your epoch.
  5. [High-pass filter the data -> Run ICA]: There is a published study showing that high-pass filter up to 2Hz improves ICA's performance. See this paper by Winkler et al.(2015).
  6. [High-pass filter the data -> (Run Source Information Flow Toolbox)]: High-pass filter improves the MVAR estimation. This makes sense intuitively that if you have a low-frequency activity, you will need a long time window to explain it, hence it will require the longer model order. SIFT has a piecewise sliding-window linear regression, but it does not mean it does better job than the standard FIR filter; Barnett and Seth (2011) showed that multivariate Granger causality is in theory invariant under zero-phase (a.k.a. phase-invariant) filter. They do recommend filtering to achieve stationarity (e.g., drift, line noise) See Seth, Barrett, Bernett (2015). For more detail, see this page.
  7. [Import channel info -> Remove bad channels with clean_rawdata()]: When the channel rejection function is called from clean_rawdata(), depending on the channel location availability, it changes the algorithm. It is of course better to perform it with the presence of channel location information so that it can use geometrical information of the scalp electrodes.
  8. [Import channel info -> Interpolate removed channels]: For channel interpolation, using the default channel location is necessary because the algorithm uses spherical spline.
  9. [Import channel info -> Coregister sensor locations to the head model]: The imported channel location is coregistered with the template head model, such as MNI (i.e., Montreal Neurological Institute) standard head model or 4-shell spherical head model. If the imported channel information contains an error, naturally the subsequent coregistration could be affected.
  10. [Remove bad channels with clean_rawdata() -> Interpolate removed channels]: Depending on which scalp electrodes are excluded, the subsequent channel interpolation is performed.
  11. [Remove bad channels with clean_rawdata() -> Run ICA]: Removing n number of scalp electrodes and the maximum rank (i.e., the number of independent time series in linear algebra) of your sensor-level data is reduced by n. Note that by interpolating the scalp electrode recovers the number of (apparent) channels, but the data rank will never be full-rank again (because, again, the rank is the number of independent time series. Anything that are interpolated out of existing data is NOT independent of the those that are mixed.) ICA, either Infomax or AMICA, would know the true rank of data by running its rank checker before starting ICA, so usually you don't need to do anything. Let me use an example. There is 64-ch data, from which you rejected 4 channels. Now you have 60 channels. You interpolated the rejected channels so that now 64 channels are back. However, your interpolated 64 channel data has rank of 60, and ICA will return 60 ICs, not 64.
  12. [Interpolate removed channels -> Re-reference to the average]: If you perform average reference, it literally computes the average across all the channels. Imagine you have reject much more channels on the right hemisphere than those on the left hemisphere. Do you think the calculated 'average' is unbiased? Definitely not. This is why channel interpolation is recommended before average referencing.
  13. [Interpolate removed channels -> Run ICA]: Why do I recommend channel interpolation before ICA? For example, after building STUDY structure, you may want to back-project the clustered ICs to a specific scalp electrode(s), say Cz, with different choices of IC clusters (see here). However, back-projecting IC(s) to the rejected electrode(s) is probably impossible, because ICA never had a chance to learn how the data observed at the rejected channel(s) correlate with other channels. This is why it is recommended to perform spatial channel interpolation BEFORE ICA, so that ICA generates sphering matrix that takes the recovered channels into account. ICA itself does not need to care about the number of the scalp channels, it only cares the data rank. It is PCA's job that performs dimension reduction to make the input to ICA full-ranked. With or without the 'recovered' (i.e., interpolated) channels, as long as they are just linearly interpolated using other channels (on this point, spline interpolation is not necessarily perfectly linear, so weird things can happen if data rank is incorrectly estimated! It will create a ghost, as discussed on this page), the size of the ICA's weight matrix is unchanged. Just sphering matrix changes it size. Hence this interpolation is not a requirement, but if you want to perform group-level back-projection, then you NEED to do it. It does not hurt at all to do it anyways, so my recommendation is to make it a part of standard routine all the time.
  14. [Re-reference to the average -> Run ICA]: ICA result is basically independent of the choice of reference, except this minor thing: If you don't perform average reference before ICA, you might obtain IC scalp topography (which are projections from ICs to the scalp) that are all read or all blue. This is because of DC across the scalp channels on the topography (i.e., the channels of the topo are not zero-sum; but they are not exactly zero-sum if you follow this method; this is the right way to apply average reference).
  15. [CleanLine to remove line noise -> Run ICA]: If the line noise fortunately very stationary (i.e., you see stationary envelope after band-pass filtering at the 50/60Hz) and CleanLine works well, then ICA will not need to decompose line noise, which should help ICA's performance. Theoretically, extended infomax and AMICA can model sub-Gaussian distribution, and the probabilistic density function of the line noise is the typical example of sub-Gaussian distribution. However, how stationary the line noise in the data is another story, it may be sensitive to subject's body movement etc. Generally speaking, ICA is not that good at modeling the line noise, so it is better to let CleanLine or other filters to take care of the line noise.
  16. [CleanLine to remove line noise -> (Run Source Information Flow Toolbox)]: If and only if your power spectral density has a smooth tail up to the Nyquist frequency that is above the line frequency, and if CleanLine can remove the line noise efficiently because it fortunately shows stationary temporal characteristics, performing CleanLine to remove the line noise helps to make the power spectral density of the data (either sensor-level or ICA-decomposed) further smooth, which helps in modeling the data using MVAR i.e., the model order could be shorter.
  17. [(Epoch the data to an event onset) -> Run ICA]: One critical assumption about ICA is stationarity. This means that any part of the data you sample, they show the same statistical property in distribution i.e., probabilistic density function. The best way to ensure data stationarity is to perform event-related recording paradigm so that your data are filled with repeated brain responses as many as possible. If your recording contains a lot of non-event-related part of data, such as between-block resting, you may consider to epoch the data so that ICA is fed only with heavily repeated brain responses (after all, ICA goes very well with ERP paradigm!) However, if your paradigm is typical mismatch negativity where there is no 'outside epoch' data points, ICA does not benefit from epoching at all. By the way, being epoched or not does not matter to ICA; epoched data are reshaped so that all the epochs are concatenated into a 2-D array of channels x timepoints.
  18. [(Epoch the data to an event onset) -> (Run Source Information Flow Toolbox)]: Again, in modeling MVAR, the datapoint-to-parameter ratio is calculated by ((num_IC)^2*model_order)/window_length*num_trials. If the data is continuous and there is no trial, the denominator 'num_trials' is 1, so the only way to gain the datapoint-to-parameter ratio is to use a very long sliding window like 30 seconds. The more trials, smaller the datapoint-to-parameter ratio becomes. If there are conditions you can combine, you may want to consider doing so for the connectivity analysis.
  19. [Run ICA -> Coregister the sensor locations]: The IC scalp maps you see is rendered on your definition of sensor locations. There is a pre-calculated electric forward model of the head separately, which does not know your definition of your coordinate system in sensor locations. To make the IC scalp topography physiologically meaningful, you need to coregister your channel space to that of the pre-calculated head model.
  20. [Run ICA -> (Run Source Information Flow Toolbox)]: There is a known issue in calculating connectivity directly at the sensor level. For detail, see Brunner et al. (2016). Thus, some process that can address volume conduction is necessary. ICA's assumption is that it returns the stationary and temporally independent sources, which addresses the issue of volume conduction.
  21. [Coregister the sensor locations -> Fit single and symmetrically-bilateral dipoles]: If you coregister the sensor locations to the electric head model and you don't perform equivalent current dipole model fitting, it's pointless! The dipole fitting process is like converting the 2-D information of IC scalp topography, which can be interpreted as scalp projections from the stationary and temporally independent sources, into 3-D space with dipole model. The important fact is that the original 2-D information, upon which 3-D dipole location is estimated, is directly derived from ICA. So it's not too much to say that the estimated dipole locations are indicated by ICA itself. To fit symmetrical bilateral dipoles, you need a plugin developed by Caterina Piazza.
  22. [Fit single and symmetrically-bilateral dipoles -> (Run Source Information Flow Toolbox)]: SIFT has an option to show either scalp topographies or dipole locations in showing the time-frequency grid of the connectivity. If you don't calculate the dipole fitting, you can't show the dipole locations to indicate the graph nodes.

SIFT tips (08/06/2019 updated)

I found that the equation used in calculating datapoint-to-parameter ratio is incorrect. According to Schlögl and Supp (2006) as well as Korzeniewska et al. (2008), the equation must be (num_IC*model_order)/window_length*num_trials. This change affects how you determine parameters, in a 'better' way I would say as shown below: 1) more ICs can be included; 2) less number of trials can be tolerated; 3) shorter sliding window can be used. This change will particularly impacts continuous data analysis, as the current equation would probably allow sliding window length of a few minutes! In fact, this datapoint-to-parameter ratio has been a limiting factor for me to apply SIFT with confidence.

To obtain the corrected datapoint-to-parameter ratio based on the above-suggested reason, make the following change on est_checkMVARParams() line 85

%winrat = ((M^2)*p)/(wlen*T);
winrat = ((M)*p)/(wlen*T);

That being said, when I discussed this issue with the author, he also told me that if we make a correction, the estimate would be overly lax and would even become useless. I kind of see the point from my experience. Somehow, most of SIFT validation measures are always either too lax (the stability test) or too stringent (the whiteness tests), which are generally hard to follow. In conclusion, I would recommend the above fix to enjoy more degrees of freedom in the analysis design, while trying to stay as conservative (i.e., lower the number, more conservative!) as possible.

Special contents since 100,000 page views

The contents in this section may or may not be directly related to methodology of preprocessing EEG data, hence 'special contents'. I uploaded my past presentations that summarizes my understanding and view at that time, which contains unpublished observations and opinions.

Electroencephalosophy (For 100,000 page views, 09/06/2019 updated)

See my talk at Santa Fe Institute on 07/11/2019. The presented PowerPoint file can be downloaded from here.

Electroencephalosophy and critical neurophysiology Page 06.jpg

The concepts Makoto's pessimism, scalp EEG as adolesc-i-ence, and one-bit information generator were first presented at a keynote lecture for Neuroadaptive Technology 2017 (program (6.3MB)). The presented PowerPoint file can be downloaded from here (8.2MB). I call an investigation to determine the limit of scalp-recorded EEG electroencephalosophy.

NAT17 Keynote Page 33.jpg

Wavelet analysis tips (For 110,000 page views, 11/27/2019)

Did you know that 'Wavelet analysis' implemented in EEGLAB is NOT a pure wavelet analysis? Actually, it is a hybrid between pure wavelet (i.e., changes window size while the number of cycles fixed) and short-term Fourier Transform (i.e., window size fixed, while the number of cycles changes). If you are interested in knowing more about this topic, I recommend you search for the keywords Units in output of timefreq - wavelet normalization in the following archive page of our EEGLAB mailing list. Very nice clarification was done there, and possible solutions were discussed. You will also notice gorgeous members there. The best moment of EEGLAB mailing list.

Physiologically invalid deep dipoles? (For 130,000 page views, 07/26/2021 Update)

In dipole fitting results, sometimes you find deep dipoles outside the gray matter in basal ganglia, ventricle, and cerebellum. However, scalp-measureable EEG should be generated by a massive parallel arrays of pyramidal neurons in the gray matter (Nunez and Srinivasan 2005 'Electric Field of the Brain' Chapter 1 Section 2). How do you reconcile with these apparently wrong results? There is no study I've seen that explains this known bias (if you know such studies, let me know). So here is my explanation after studying the book 'Electric Field of the Brain'. Most of scalp-measurable EEG activity is generated by relatively large cortical patch (10-20 cm^2) which is more appropriate to be described as a dipole sheet rather than a single dipole (a theoretical single dipole has no area, only pole distance exists). An important difference between a single dipole and a dipole sheet is distance-dependent attenuation ratio: the attenuation ratio of a single dipole follows inverse square of distance, but a dipole sheet does not! See the figure below.


Notice that the color bars are both -1 to 1, which means both plots are normalized. Electric field of a sheet dipole that consists of 2580 dipoles are NOT 2580 times stronger single dipole, but its distance-dependent attenuation ratio is critically altered (for more detail, see my lab presentation slides in 2019 (2.6MB) and my 2020 lab presentation slides (1.4MB)). To put it simple, potential produced by a dipole sheet can travel much further than a single dipole. From this fact, we can easily predict what would happen if we approximate a dipole sheet with a single dipole--the estimation of distance must generate serious error. I think this is the reason why we sometimes find deep dipoles. This is a systematic error produced by using a single dipole model to approximate a dipole sheet source, and the error should become larger when the dipole sheet becomes larger (i.e., violating the assumption of single-dipole model that the source model approximated should have zero area).

Transducer array effect (For 140,000 page views, 10/09/2020 added)

I suggest we see a cortical dipole sheet as a transducer array that consists of macrocolumns. Transducer array is a technology to control beam density and direction. Compared with a single transducer (i.e. a single dipole), a transducer array (i.e. a dipole sheet) shows various interesting properties which has been exploited in various engineering field from radar technologies to home audio. As I have been repeatedly saying, even without phased array technology, a simple transducer array can still project energy further while suppressing sidelobes. Moreover, it may shed light to describing emerging (old-new) field of EEG phenomenon, traveling waves. Traveling waves have been mainly reported in ECoG (see Youtube video of the 'Princess Leia brainwaves'--by the way, such spatially non-stationary sources cannot be decomposed by ICA). Such a traveling wave may be described as a phased-array behavior of a dipole sheet. For example, if an ECoG data shows a pattern of traveling waves captured by a dense electrode array, we can ask a question what technology is necessary for a controller of phased-array transducers to replicate it, assuming that the brain should have the same (or alternative) mechanism. In this way, I believe understanding the relation to this transducer array is critical when we consider the behavior of a dipole sheet as a major source of EEG. However, this effect does not seem to have a dedicated name, which is inconvenient for us to discuss this issue. So let's call it transducer array effect tentatively. For example, the term can be used in this way ('A meaning of the word is the use of the word'): 'A broader dipole sheet can project current further because it receives less distant-dependent attenuation due to the transducer array effect.

(09/21/2020 added) In the 'Electric Field of the Brain' (Nunez and Srinivasan, 2006) p.313, there is the following description that supports this view:

It is also possible that best-fit dipoles have no relation at all to genuine brain sources. They might, for example, occur in deep tissue when the actual sources occupy cortical surface areas ranging from a few to hundreds of squared centimeters as discussed in chapters 2 and 6. Clearly, any interpretation of equivalent dipoles depends critically on the spatial extent of these sources.

After all, an equivalent current dipole is a (arealess) model, which is fit to the (stationary) ICA model. Keeping the distinction between these (computational) models and the real electrophysiology seems critical to save us from getting into the confusion. In this sense, I take revisionist's approach to the past SCCN's claims rather than taking Onton and Makeieg (2006) or Delorme et al. (2012) literally to identify ICA sources with the physiological sources. There are a few pieces of critical counterevidence to these claims that are not explained yet, and this deep dipole seems the outstanding case to me.

Ground truth of EMG removal vs. data cleaning with ASR+ICA (For 150,000 page views, 12/20/2020 added)

One of the reviewers wrote me 'EEG gamma-band range is known to be contaminated by EMG. Probably many EEG researchers will not be convinced even if you claim that ASR and ICA can deal with it.' This is a good question indeed. If we make a claim, it must be supported by data. The best evidence I have ever seen about EMG's contribution to scalp-recorded EEG is Whitham et al. (2007). They used anesthesia to remove EMG! Let me cite their Figure 1 below for detailed discussion.


The top and bottom panels show PSD near CPz (linked-ear reference) different subjects. A is without paralysis, B is with paralysis. There is apparent difference above 30Hz. Isn't it shocking? The difference is about 5 (B) to 10 (A) dB at around 45 Hz (right before the 50 Hz line noise comes in). We can use this range as a ground truth of EMG contribution to CPz. Now, let's compare the effect of ASR + ICA with the paralysis. In our data (n = 141), there are no earlobe electrodes, so I re-reference all 38 electrodes to T7/T8. There is no CPz either, so I averaged CP1 and CP2.


Note that in order to align the two curves, I normalized the alpha peak by adding +1.3 dB to the PSD of the cleaned-data (i.e. this could be counted as loss of signal). At 45 Hz, the difference with and without the cleaning process is > 5 dB. Well, this is not too bad, pretty comparable to the extreme method like paralysis!

Why does IC rejection increase gamma power, or why is an IC not broadband-independent? (For 160,000 page views, 05/10/2021 added)

SHOCKED? If yes, check out my recent lab presentation (6.3MB). I think this observation made SCCN's conventional dogma, namely the physiological interpretation of ICs (i.e. 'ICs are effective EEG sources') require to be revised. A simple way to address the practical problem is to apply the band-pass filter first then apply the spatial filter (ICA). However, the cost in taking this approach is that the ICs will no longer be consistent across frequencies. However, this is actually such a natural thing if you think about it: who would expect that the broad low-frequency EEG sources and focal high-frequency EEG sources must be completely overlapped? By the way, please remember that this Wiki page only shows my personal opinion that does not represent SCCN's official opinion.

IcaRejectionLabPresentation updated Page 32.jpg

A big scalp topography? (For 170,000 page views, 07/26/2021 updated)

If you have visited the SCCN website several years ago, you may have seen this movie on the top page. There is a variation of the same move using different source locations with random phases here. Recently, I found that these movies are misleading and should not be taken granted without caution. It is because I believe the color scale must be WAY off from typical EEG studies. An EEG recorder has a noise floor around 1-2 V, which determines the smallest signal it can record meaningfully. According to Huigen (2000) 'Noise in biopotential recording using surface electrodes', Fig 2.5 shows the upper acceptable noise level in EEG recording (1 per cent) is 1 V. Therefore, even if a signal is present at the surface of the scalp, if the amplitude is smaller than 1-2 V, we cannot record it. Now the question is how large the signal is in the scalp topographies in question. Here is a quiz for you: What is the most critical factor among various EEG source properties that determines the scalp-recorded signal amplitude? Yes, the area of the cortical source, given the neurons included are more or less in sync (Later I'll come back to this question about synchronization.) If this answer does not sound familiar to you, I recommend you see my presentation slides above. The distance-dependent attenuation rate dramatically decreases as the cortical source area size increases due to a simple transducer array effect (it needs a good name to be remembered by the way!) In order for an unaveraged single-trial EEG to be scalp-measured within a reasonable amplitude range, say 10-30 microV, the source of the signal should have typically 6.5 cm^2 or larger area (Nunez and Srinivasan, 2006). However, in these movies, the cortical sources appear smaller than that. Maybe these are the source models of averaged ERPs (but it does not seem to apply to the second movie). But more critically, the generated electric field spreads across the entire scalp surface. Such a broad distribution necessarily requires a broad cortical source regions, but that is not affordable since the movies show averaged-ERP-sized small source regions. Therefore, the only reasonable explanation is that the color scale used has a unit of nanoV/picoV/femtoV (now EEG meets MEG!) instead of V. Alternatively, the amplitude of cortical source activity may be amplified by 10^3-10^9 times so that we can use V unit, but this is physiologically impossible. By the way, in the Electric Field of the Brain (Nunez and Srinivasan, 2006) p.83, the authors introduce an example of wrong simulation that shows a small central dipole with 5V amplitude, which is 10^4 times larger than normal values. To avoid these mistakes, it is important to see the relation between the source area size and distance-dependent attenuation ratio.

So, what does it mean when we find a big radial scalp distribution in ICA results (which we always do)? I believe it means ICA modeled a broadly distributed cortical source that oscillates temporally independently. In other words, I support the large patch hypothesis.

By the way, in the 'small patch hypothesis' (see my lab presentation slides (1.4MB) again) which I criticize, one of the justifications was that if the 'active patch' area size is large, the group of neurons included in it cannot maintain the synchronous states. However, Hari et al. (1997) reported that 'synchronous activity of 1% of cells in a cortical area of 1 cm^2 would determine 96.5% of the total signal.' (Note this is an MEG study, not EEG--apparently, MEG can target smaller area in the cortex). By the way, the ratio of the thalamic relay cells is 0.63% in macaque BA17 (O'Kusky and Colonnier, 1982). In the experiment reported by Cooper et al. (1965), punching 2 pin holes recorded attenuation ratio of 20000:1 but increasing the number of pin holes to 12 decreased the attenuation ratio to 2:1 over 6.45 cm^2 area. This transducer array effect seems to appear counter-intuitively effective for most of us non-transducer-array engineers, so we EEG researchers should remember it well.