Makoto's preprocessing pipeline

Jump to: navigation, search

Note from Arnaud Delorme (main developer of EEGLAB): See also this page for a quick pipeline to reject ICA components in single datasets. The opinions expressed below are those of Makoto Miyakoshi and should not be taken as official EEGLAB guidelines.

%%%%%%%%%%%%%%% (Preface updated on 04/05/2023)

A famous concept about reproducibility in the field of computational science which was originally proposed by Jon Claerbout and summarized into a slogan by Buckheit and Donoho (1995) (they are all from Stanford University):

"An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures." Buckheit JB and Donoho DL. WaveLab and Reproducible Research. In: Antoniadis A and Oppenheim G. (Ed.) Lecture Notes in Statistics 103. Wavelets and Statistics. 1995. Springer-Verlag New York, Inc. (source)

Reading back the original document, I found the following statement is also suggestive.

"The scientific findings may turn out to be a knowledge of parameter settings for this complex software environment that seem to lead to good results on real datasets."

Ok here we go. If you say 'EEG data mining is like walking a mine field,' you are a friend of mine. It's a field of mine because it's not a field of mine. But never mind even if you have never mined EEG data. I'll show you my minecraft.


This is Makoto Miyakoshi's personal recommendations for preprocessing EEG data using EEGLAB. These are NOT Swartz Center's official opinions. This page is admittedly dogmatic, so follow the instruction at your own risk. This Wiki page is a forever beta version and subject to change anytime. If you have any question, please post it to EEGLAB mailing list (you need to subscribe it from here.) In doing so, please include the keyword Makoto's preprocessing pipeline so that my email filter can pick it up.

I have another SCCN Wiki page Makoto's useful EEGLAB code which is more coding oriented to address various frequently encountering practical problems in using EEGLAB.


Change the option to use double precision (05/14/2021 updated)

In EEGLAB2019, 'File' -> 'Preferences'. In EEGLAB 14, from main GUI top left tab, 'File' -> 'Memory and other options' -> 'If set, use single precision under...' uncheck it. Note that when computing ICA etc where double precision is necessary, EEGLAB functions internally convert input data to double precision. So mostly this is taken care of automatically as long as you stick to GUI operations.

This deletion was suggested by Malte Anders on May 13,2021 (link to his archived post) which I agreed. Thank you Malte!

Note that if you want to run calculation that involves matrix operation, you have you check whether the input data is always converted to double at your own risk. Otherwise you will encounter problems like this--This is the case when data rank is calculated with single vs. double precisions right after importing it from .raw file.


Check the path and import data (11/24/2020 updated)

Before importing the data, check these points.

  1. Your full path to eeglab folders and your data does NOT contain any non-alphabetic, non-numeric, non-underscore characters such as !@#$%^&*()-=\`[];', and [space]. For example, if your folder name is 'Experiment 1', that is bad. Change it to 'Experiment1' or 'Experiment_1' for example.
  2. Any folder and file name in your full path, as well as your variable names in Matlab workspace, does not start with numbers. For example, '2018_Oddball_Project' is better changed to 'Oddball_Project_2018'.
  3. Avoid using names that are (likely to be) used as Matlab function. For example, 'eeglab' is a name of a function. So you do not want to use it for a folder name or variable name. If you are not sure if a name is already used by Matlab, type
which -all [variableName]

For example,

>> which -all interp
/usr/local/MATLAB/R2017b/toolbox/shared/controllib/engine/@DynamicSystem/interp.m  % DynamicSystem method

This means that the word 'interp' is used as a function in two places already. You can still use the word 'interp' as a variable name, but you could get into a trouble later. Try to avoid at least known function names such as 'eeglab' when naming a folder, file, or variable.

I'm not telling you that if you have those instances you will immediately fail. On the contrary, using recent operation systems and Matlab, you probably won't encounter any problem--until you use, without knowing, some of EEGLAB functions and plugins the authors of which assumes users follow the classical rules. It is a good practice to eliminate this risk at this very beginning stage.

After importing, check if you have extra channels that are filled with zeros or complete garbage. Never include those problematic channels into ICA, it crashes the process (and for infomax, it slows the process significantly).

EEG.icaact is empty!? (06/15/2021 updated)

Failure to generate eeg_options.m (06/15/2021 updated)

This is for Juan Molina--I have encountered this weird bug more than several times. It may be hard to guess the right cause of this issue, but it is related to this section. Actually, this problem is caused by missing eeg_option.m that was supposed to be created properly when you first launched EEGLAB. To diagnose the problem, try the same trick:

which -all eeg_options

If the answer shows only one entry, [eeglab_root]\functions\adminfunc\eeg_options.m, you got this problem. Usually, you have to find two of them, one under the adminfunc, which is like a read-only architype, and the other one under userpath, which is actively overwirtten by EEGLAB every time user change the option settings. If EEGLAB cannot access to this eeg_options.m, 'precompute ICA activations...' is unselected (I guess), hence EEG.icaact is NEVER calculated by eeg_checkcet(). To fix this problem, you should manually copy eeg_options.m file to the userpath. For example, for Windows the default user path is typically \user\Documents.

Here is the update by Jake Garetti, one of the EEGLAB workshop attendees who ran into the problem and solved it! He told me that "Instead of just copying the options file to the userpath, I had to go into functions/sigprocfunc/icadefs.m and manually set EEGOPTION_PATH = userpath;" This change makes a lot of sense.

Wrong EEGLAB option selected (05/24/2022 added)

This is reported by Siwen Wang (Thanks!)

I also countered the problem which EEG.icaact is emptied every time I tries to save the data. The suggestion by Juan Molina is great, but for me, the problem wasn’t that. First, I think EEGLab automatically checks if icaact is equal to

If it’s not equal, then the function checklist will erase it. Then what I also found is that in the eeglab GUI, File->Preference, you need to checkmark this. Else, the icaact will get erased when you try to save it. It was a simple solution but still took me 2 hours to figure out, and in the end, it worked for me! I was able to keep my icaact after saving the file.


Downsample if necessary

You may downsample the data to around 250Hz if the original sampling frequency is higher than that. Usually low-pass filter is necessary before downsampling for anti-aliasing, but EEGLAB function applies the anti-aliasing filter automatically. Downsampling is useful to compress the data size. It also help ICA to produce better decomposition by cutting off unnecessary high-frequency information, though we have not tested it quantitatively.

High-pass filter the data at 1-Hz (for ICA, ASR, and CleanLine)(05/18/2022 updated)

The purpose of the application of high-pass filter is to remove the 'baseline drift'. It is generally recommended to use lower cutoff frequencies, for example 0.01 Hz (cooperative subjects) – 0.1 Hz (children and patients) (Luck, S. J. 2005. An introduction to the event-related potential technique. Cambridge: MIT press.) than our recommended 1-Hz pass-band edge (equivalent to their cutoff frequency of 0.5 Hz@-6dB; see this PDF file for filter terminology). The reason for the 1-Hz pass band edge is because

  1. ICA is biased toward high amplitude if data length is finite. EEG data have 1/f power spectral density (PSD) curve. The combination of these two facts means that the leftmost values on the PSD plot will have the most influence on your decomposition result, although the most useful EEG information you are probably looking for is in 3-13 Hz (theta and alpha). Therefore, cutting off the leftmost values help ICA to focus on the important 3-13 Hz.
  2. Low-frequency EEG < 1Hz could be contaminated by sweating etc, which adds spatiotemporal non-stationary and affects ICA. I am a big fan of infraslow EEG research though--after all, no one method is perfect.
  3. Also, the lowest frequency in ERSP (i.e., a blended wavelet/STFT transform) is typically 3 Hz if you want to have a decent frequency resolution (in EEGLAB parameters, 3 cycle at 3 Hz generates 1116 ms sliding window length; If you want to go lower, the sliding window will be longer!) 3 Hz is sufficiently high relative to the 0.5-Hz cutoff.
  4. These qualitative explanations are supported by published data. They (Stefan) concluded that high-pass filtered data at 1-2Hz works best for ICA.

Here is a quote from Winkler et al. (2015)

...As a pre-processing step for ICA, high-pass filtering between 1-2 Hz consistently produced good results in terms of signal-to-noise ratio (SNR), single-trial classification accuracy and the percentage of ’near-dipolar’ ICA components. Relative to no artifact reduction, ICA-based artifact removal significantly improved SNR and classification accuracy. This was not the case for a regression-based approach to remove EOG artifacts.

On the other hand, there are reports that 1-Hz high-pass filter attenuates (or even 'distorts') low-frequency, so-called 'late slow waves' of event-related potential family, such as N400 or P600 (these names are paradigm dependent, of course). It is nicely demonstrated in Rousselet (2012) Figure A3. To avoid this problem, one may calculate an ICA weight matrix and an sphereing matrix with 1-Hz high-pass filtered data, then apply it to 0.1-Hz high-pass filtered data. This ICA matrices transfer can be done through EEGLAB GUI 'Edit' -> 'Dataset info'. Note that if you do this, you can't claim the validity of the decomposition between 0.1-1.0 Hz, technically speaking. For more detail, see below.

Low-pass filter is optional. If CleanLine (see next) does not work well, you may want to simply apply a low-pass filter to cut out above 50/60 Hz. If you use the default eegfiltnew() filter, the transition bandwidth around 50/60 Hz must be 10, so set the 'passband edge' no larger than 45/55 Hz.

Note also that for connectivity analysis using multivariate Granger causality, data are better treated with special filter setup. I summarized it in this section.

(Added on 03/29/2017) When I worked with Marco Petilli on his MoBI data recorded in SCCN, he first found only 1 good IC after ICA. I thought this is the end of the research. But anyway, without much expectations, I tried 2 Hz high-pass filter with transition bandwidth being 1.25 to 1.75 Hz (Blackman-Harris window), a very aggressive parameters. After ICA, I found nearly 10 good ICs! This was the eyes-opening experience. I remember my former SCCN colleague, CK, strongly recommended that I should use transition bandwidth of 0.25 to 0.75 Hz when applying 1-Hz high-pass filter.

What happens to the < 1 Hz data if ICA is calculated on > 1 Hz data and applied to 0.1 Hz data? (05/18/2022 Updated)

As long as the < 1-Hz data are well-correlated with > 1-Hz data, ICs obtained from > 1-Hz data would decomponse > 0.1-Hz data as well, which is more likely the case. However, if < 1-Hz data are independent of > 1-Hz data, what happens?

Let's use an imagination to see this situation from a different angle. You are in a buffet party. On a table, there are big four plates of salad, steak, salmon, and a pudding for a dessert. You want to take some salad to your plate. The moment you reached your arm to the salad plate, suddenly black out happens. All the lights are off, and you are in the pitch dark. But you decide to continue to take the salad to your plate anyway. You remember where the salad plate is, and you have a tong and a plate in your hands, so why not? You take the salad to your plate in the darkness, enjoying the unexpected inconvenience. Now power is back, lights are on. Guess what you find in your plate? You thought you took the salad, but the salad is drenched with mushroom soup. Maybe someone spilled a bowl of mushroom soup over the salad plate in the darkness. Now your salad is a linear combination with salad and a mushroom soup (which is an acceptable combination, compared with a pudding over salmon).

This is the situation when > 1-Hz ICA is applied to > 0.1 Hz data. The > 1-Hz ICA can only see the (stationary) four plates, but > 0.1-Hz ICA can see four plates plus (dynamically introduced) mushroom-soup contaminated area as a new independent component. If you could see the contaminated area, you could have avoided that area. But in the darkness, you could not see it.

ICA is just a spatial filter, like the locations of the four plates on the table that distinguish the independent contents, like the salmon separated from the pudding. How much mushroom soup (i.e. non-present content in the learning data, hence not decomposed; in this context, 0.1-1.0 Hz EEG info) you get when you blindly (i.e. ICA could not see 0.1-1.0 Hz info) take the mushroom-soup-contaminated salad, steak, salmon, and a pudding depends on the spatial distribution of the mushroom soup spilled over these plates. If you apply a band-pass filter between 0.1-1.0Hz and evaluate its spatiotemporal distributions, you can roughly estimate its impact on ICs. But again, practically speaking, 0.1-1.0Hz data is less likely to be that independent of the low-freq range of the > 1.0 Hz data, so most likely it does not hurt very much. If you ask me if this high-pass filter trick is more useful or not, I'd say there is no general answer and, unfortunately, I have to use engineer's cliche which I dislike, 'It depends on your data and a goal'. That said, my best advice is this: If you want to stand on a methodologically safer side and do not want to void our warranty, do not use this trick. Only if you know what you are doing and are ready to prove why it is better, do it. It's not a type of trick that you want to use as a default choice.

To provide a good answer to this question, we need a simulation study. If anyone is interested in writing a technical paper on this topic with me, contact me!

Import channel info

'Edit' - 'Channel locations'. If you have channel labels such as 'Fz' 'Cz' 'Pz' etc, that means that your channel montage follows International 10-20 system (there is extended version of it; see Oostenveld and Praamstra 2001 Clin Neurophysiol for 10-10 and 10-5 system). If that's the case, click 'Look up locs' button that is located in the center-bottom of the GUI. Choose 'use MNI coordinate file for BEM dipfit model' unless you have special reason not to do so. This will automatically import channel coordinates from internal library. Don't forget to click 'Opt. head center' located right top corner.

If you have digitized channel locations, just 'Opt. head center'. Some vendor use meter as a unit length instead of millimeter. If you find all dipoles localized at the center of the head, suspect this cause and x1000 the scale to see if it fixes the problem (as far as I know, old Polhemus did this once).

Do we want to have dedicated EOG channels? (12/03/2019 added)

(Thanks Maciej Kosiło for bringing up this question) If your EEG recording system has a default setting so that two electrodes are spared to record electrooculogram (EOG), what's the best way to make most out of them?

  1. Use them as EOG channels that are bipolar referenced to each other.
  2. Use them as EOG channels that are monopolar referenced to a scalp recording reference.
  3. Re-purpose them as scalp EEG channels.

If you use ICA in your preprocessing, having dedicated EOG electrode(s) is not that necessary. This is because occular artifacts, either blink or saccade, have quite broad scalp topography which is modeled by ICA generally very well. This is the strength of multivariate method--ICA evaluates whole-scalp distributions. So, I would say 3 is the best answer. 2 is just as good, and in doing so I recommend you use locations of AFp9 and AFp10 defined in international 10-5 system (Oostenveld and Praamstra, 2001) which is virtually as good as canthus (lateral end of eye) of HEOG-L and HEOG-R. In this way, EEGLAB (or any label-based location database) can find their template coordinates automatically. These AFp9 and AFp10 serve as both HEOG monitors and far-frontal EEG channels. In fact, Delorme et al. 2007 demonstrated that medial prefrontal source EEG is most sensitively detected by these upper-facial electrodes. The worst choice, I would say, is 1, because by bipolar referencing you lose data dimension by 1. If these EOG channels share the monopolar reference with EEG channels, you can re-reference them offline to obtain bipolar-referenced EOG data anytime without losing anything.

Remove bad channels

What do you mean bad here? In short, the channels that you can't save even if you reject 5-10% of your datapoints. Removing bad channels is a critical step particularly for average referencing. This is because average reference signal is an average of all channels. If you have channels that are noisy all the time, including such channels to average will introduce noise to all the channels. This issue is focused in detail in Nima's PREP pipeline paper.

If you are looking for controlled, objective rejection criteria, I recommend you check out these solutions

  • Nima's PREP pipeline: See his paper for detail.
  • Christian's clean_rawdata(): This EEGLAB plugin comes with sophisticated bad channel detection algorithm.
  • Makoto's trimOutlier(): This tool does simple jobs in an interactive manner (i.e. to visualize overall statistics, set threshold, confirm results, re-set the threshold if necessary.) Also available for a batch process.

Recently I only use clean_rawdata(). Note that this function saves nice results of cleaning under EEG.etc with which you can identify which channels and datapoints are rejected in the original length. Of course, it's Christian's work!

In writing batch, I recommend you keep the original (i.e. before channel rejection) EEG data in this way so that you can refer to it later for channel interpolation (see below)

% Keep original EEG.
originalEEG = EEG;

Interpolate all the removed channels (06/14/2021 updated)

There are two primary reasons why you want to interpolate the rejected electrodes before average referencing and ICA.

  1. To minimize a bias in the next average referencing. For example, if there are 64 channels and 16 channels are rejected only from the right hemisphere, the average potential across the electrodes is biased toward the left hemisphere. To avoid this bias, scalp electrodes should be interpolated.
  2. IC topographies for the average-referenced data is zero-meaned, which is easier to evaluate. Otherwise, IC scalp topos could be all red or blue (you can still re-define the color scale though.)
  3. The rejected electrodes cannot be recovered after ICA because there is no way to interpolate EEG.icaweights or EEG.icasphere. This can cause issues in the downstream of the processes. For example, you may need to calculate group-level IC backprojection at Cz. However, if some of your subjects have the Cz electrode rejected, you cannot calculate it.

EEGLAB's default interpolation uses spline interpolation. It introduces rank issue because non-linearly interpolated data does not cause a clean rank deficiency; instead, it adds an additional dimension whose eigenvalue is near-zero, such as < 10^8. In other words, the condition number is high which makes the problem ill-conditioned. Such a small eigenvalue is numerically non-zero but for our ICA algorithm it is practically zero, hence it causes ghost IC problems etc.. For more info, see this article and this section No. 13 for similar information.

Deleted section (06/14/2022 updated)

EEGLAB's default interpolation method is 'spherical'. If we could specify a completely linear interpolation method, we could avoid the rank deficiency problem (but the accuracy on interpolation would be worse). Anyways, the following code does not do a job.

By using spherical interpolation as shown below, you may be able to avoid this falsely rank recovery issue. But if using 'pca' option to adjust rank anyway, it is not particularly helpful.

% Interpolate channels. EEG = pop_interp(EEG, originalEEG.chanlocs, 'spherical');

What does EEGLAB do for the interpolation? (06/14/2022 updated)

This is for Maruti Mishra. EEGLAB function eeg_interp() seems an implementation of Perrin et al. (1989). Kang et al. (2015) summarized Perrin et al. (1989) and concluded that m=4 and n=7 are the recommended parameters as quoted below, which are also used in eeg_interp().

In the spherical spline interpolation method, the two parameters m and n have critical effects in the accuracy of the interpolation. The constant m is the “order” of the interpolation. By increasing m, the sum in g_m(x) of (2) converges more rapidly, leading to a smoother interpolation curve (Ferree, 2006). The summation in (2) should be truncated with a certain number of n in practice, and Perrin et al. (1989) stated that only the first seven terms in the summation were adequate to obtain an accuracy of 10^−6 on g(x) with m = 4. Since m = 4 and n = 7 were suggested the original paper of spherical spline interpolation for EEG, these values have been used frequently in EEG studies although some papers suggested to use other values.

However, Kang et al. (2015) recommend m=3 and n=50 because these choices minimize artifacts that affect the inter-electrode phase analysis. These parameters can be used by modifying eeg_interp() (I believe; never tried myself). Kang and colleagues also concluded the optimum regularization parameter is 10^-8, but I don't think eeg_interp() uses this parameter. I mean, Kang and colleagues says Perrin et al. (1989) introduced , but did Perrin et al. (1989) discuss this regularization parameter ?

Re-reference the data to average (08/02/2020 Updated)

There seems multiple valid ways to reference the scalp EEG data. Here, I recommend average reference because the concept is simple and does not require anything other than the recorded data. Average reference is a approximation of scalp potentials that is independent of reference location in a location of the head.

...with any reference strategy we record the potential at one head location with respect to some other head location. By definition, this potential difference is the result of integration of the electric field over any path connecting the reference point and the point in question. We say any path because electric potentials are fundamentally path-independent. The average reference can provide a good estimate of reference-independent potentials because it integrates many possible paths to the point, depending on the number of electrodes. ("Electric Field of the Brain" by Nunez and Srinivasan, p.297)

If EEG is generated on the cortex with dipolar current distribution and with no external sources, there are same amount of positive and negative potential changes at every moment due to charge conservation. Hence the scalp topography should sum to zero. By re-reference to average channel values, we assume that there is no charge creation that is monopolar sources and sinks. In practice, average reference makes all the ICA scalp topography have zero total potential (i.e. red and blue always balances).

(Update 08/24/2016) Note also that average referencing is often very helpful to suppress the line noise. I experienced that line noise that cannot be rejected by CleanLine, due to a shorter time constant of fluctuation of amplitude envelope of the line noise frequency band (you can visually confirm this by band-pass filtering the data to pass only the line noise and see its time series), could be removed by average referencing. I guess this is related to common-mode rejection? Anyway, I suggest one performs average referencing first then perform CleanLine.

Why should we add zero-filled channel before average referencing? (08/09/2020 Updated; prayer for Nagasaki)

A simple proof that re-referencing to average of non-reference channels is wrong is as follows. (Uniopolar) re-referencing should be able to apply as many times as one wants to any existing and non-existing channel (like averaged channels) without changing data. See the following quote from Hu et al. (2019) Brain Topography.

...The most surprising property is ‘no memory’ indicating that URs [=unipolar references] works independently without the consequences of the URs already applied. It also means one can always safely re‑reference the EEG/ERP recordings with different URs but not worry about if re‑referencing multiple times will accumulate artifacts.

But if you re-reference to the average across non-reference channels (i.e., without including the initial reference channel), you cannot do this. This is the simple proof. A little bit more theoretical proof can be found below.

The following is the description taken from 'Electric Field of the Brain' (Nunez and Srinivasan, 2006) p.294-295

As has been emphasized throughout this chapter there is essentially no difference between recording electrode and reference electrode. Thus, when one records EEG data from N electrodes, the measured potentials Vn (n = 1, 2,… N) are related to the scalp potential Φ(r) with respect to "infinity"...

For the sake of argument, suppose there are only three scalp recordings, , , with reference to infinity to start with. If is chosen as a initial reference when recording, the initially referenced data , , are described as

Note that data rank == 2 (because is 0). Re-referencing these data from REF_Pz to REF_Fz, we obtain

Thus, re-referencing from to yields, or 'recovers', time series of the initial reference channel as . Note the rank of data remains unchanged (data rank == 2). In this way, the bottom line is that data can be freely rereferenced after collection algorithmically. (Dien, 1998. Behavior Research Methods, 30:34-43; Osselton, 1965. Electroenceph Clin Neurophysiol, 19:527-528)

When is chosen as an initial reference, average potentials across the channels is defined as

Because is 0 everywhere in the time domain, it may be omitted as

Thus, the number of terms in the numerator is 2 (i.e., the number of non-zero channels), while the denominator is 3 (i.e., the number of channels including initial reference). However, it is often misunderstood as

If we use without counting the initial reference into account, we obtain the remaining 2 channels as


Apparently, using causes rank deficiency (rank == 1). However, this should not happen as we saw above.

If we use , which takes initial reference into account, for all 3 channels we obtain

Note that the data rank remains to be unchanged (rank == 2).

Ok, then how can we add back the initial reference? Remember, it was

which is continuous zeros. This is why we should add zero-filled channel before average referencing IF your data does not already have it (which I have never seen before).

This operation can be simply achieved by adding +1 to the denominator when you calculate the average across the channels. For example, if you calculate average across 32 channels, sum up all the channel values and divide it by 33. Similarly, if you have 64 channels, divide the sum by 65. If you have 128 channels, divide the sum by 129, etc.

(Update 11/29/2017) I wrote a simple EEGLAB plugin fullRankAveRef() to perform the average reference after adding the original reference channel (i.e., zero-filled). This channel is removed after average referencing, so the total number of channels remains the same (so does data rank, hence full-rank average reference). This solution may be useful for those who have data with small number of channels, such as standard 19 channels in international 10-20 system.

What is rank? (Updated 06/06/2016)

NEW description (Thanks Andreas Widmann for discussion!) Usually, the number of data rank is the same as the number of channels. However, if a pair of channels are bridged, the rank of the data is the number of channels-1. For another example, a pair of equations, 2x+3y+4=0 and 3x+2y+5=0 have rank of 2, but another pair 2x+3y+4=0 and 4x+6y+8=0 have rank of 1 for the two equations are dependent (the latter is exact twice of the former).

EEG data consist of n+1 channels; n probes and 1 reference. The n+1 channel data have rank n because the reference channel data are all zeros and rank deficient. Usually, this reference channel is regarded as useless and excluded for plotting, which is why you usually find your data to have n channels with rank n (i.e. full ranked). It is important that when you perform ICA, you apply it to full-ranked data, otherwise strange things can happen (see below). Your data are usually full-ranked in this way and ready for ICA.

When you apply average reference, you need to put the reference channel back if it is missing (which is usually the case). You can include the reference channel in computing average reference by selecting the 'Add current reference back to the data' option in the re-reference GUI. If you do not have initial reference channel location, in theory you should generate a zero-filled channel and concatenate it to your EEG data matrix as an extra channel (which makes the data rank-deficient) and perform average reference. To make the data full-ranked again, one may (a) drop a channel--choosing the reference channel is reasonable (b) reduce the dimensionality by using a PCA option for ICA.

To perform the abovementioned process on the command line, one can manually compute average reference using adjusted denominator: = bsxfun( @minus,, sum(, 1 ) / ( EEG.nbchan + 1 ) );

The origin of this argument by Joseph Dien can be found here.

OLD description:

Note that average reference reduces the data rank by one (See this page: Imagine v1,v2,..., vk in the equation are non-zero channel data. After average referencing, the sum across all the channels is zero, which proofs linear dependency--Note that average referencing (i.e., v1+v2+...+vk=0) is used to define rank deficiency! Average referencing without including the original reference channel by definition causes rank deficiency; See also this page written by my colleague Jason Palmer). The easy way to make the data full-ranked is to discard (any) one channel after average referencing. If you don't do this, ICA may return invalid results.

Examples of artifact from ICA (Modified 09/08/2020)

Below I show an example of what happens when you run ICA on rank-deficient data. There were 54 channels. Data were average referenced end epoched to -1 to 2 sec. After setting EEGLAB option to use double precision (thanks Andreas for clarifying this!), Matlab rank(,:)) returned 53. AMICA log showed data dimension is 54. Notice the complementary scalp topographies (left) and time-series (right), and interesting spectra patterns (middle) in which IC 9 and 10 does not even respect the filter effect. Rejecting one channel and re-running ICA fixed this problem, and the pathological ghost ICs got disappeared.

Figure 1. A pair of ghost IC scalp maps generated by ICA on rank-reduced data. Figure 2. Spectra analysis results. Figure 3. Raw EEG time series.

(Data and figures by courtesy of Caterina Piazza)

Additional comments on 12/15/14: By mistake, I applied AMICA on 66 datasets (5-min resting, 60-70ch, ASR-cleaned with SD==6) that were average referenced but no channel discarded (i.e. rank-deficient). AMICA failed in 3/66 datasets (4.5%). Here is the out.txt log from the failed one. Note that 'num eigs kept =65' come from EEG.nbchan==65.

minimum eigenvalues =  -9.00329954361518855E-14 0.47905289012901225 0.5242944714252078
maximum eigenvalues =  998.61704681443121 262.85021915136872 219.07946516116806
num eigs kept =  65
iter 1 lrate = 0.0500000000 LL = NaN nd = NaN, D = 0.17987E-01 0.17987E-01 (1.19 s, 0.7 h)
iter 1 lrate = 0.0500000000 LL = NaN nd = NaN, D = 0.16862E-01 0.16862E-01 (1.35 s, 0.8 h)
iter 1 lrate = 0.0500000000 LL = NaN nd = NaN, D = 0.17511E-01 0.17511E-01 (1.28 s, 0.7 h)
iter 1 lrate = 0.0500000000 LL = NaN nd = NaN, D = 0.16970E-01 0.16970E-01 (1.38 s, 0.8 h)
iter 1 lrate = 0.0500000000 LL = NaN nd = NaN, D = 0.17459E-01 0.17459E-01 (1.22 s, 0.7 h)
... done.Execution time: 0.00 h

Here is a sample code of this part.

% Apply average reference after adding initial reference
EEG.nbchan = EEG.nbchan+1;,:) = zeros(1, EEG.pnts);
EEG.chanlocs(1,EEG.nbchan).labels = 'initialReference';
EEG = pop_reref(EEG, []);
EEG = pop_select( EEG,'nochannel',{'initialReference'});

A study on the ghost IC was published (added on 04/04/2023)

The name 'ghost IC' is now more official because I published this paper. The summary of the paper is as follows

  1. We ran a simulation test to determine that the 'effective rank' for the ICA is eigenvalues >10^-6.
  2. If the number of engenvalues smaller than 10^-6 is >0, you will have the ghost IC that invalidates your ICA result.
  3. The effective rank deficiency can arise from (1) incorrect re-referencing to the average potential; (2) (spherical Spline) electrode interpolation. If you do these, make sure you explicitly check the effective rank of the input data and use pca option to reduce data dimension so that ICA can worn on full-ranked data.

Additionally, the paper also focuses on the correct way of performing re-reference to the average. The conclusion is the same as what this page recommends: Add a zero-filled channel as the initial reference, calculate the average potential, then subtract it from all the channels. This way, the smallest eigenvalue of the data (after removing the 'initial reference') is about 10^0, which is very large compared with the limit of the effective rank.

Remove line noise using CleanLine (04/25/2020 updated)

CleanLine is an EEGLAB plugin developed by my colleague Tim Mullen (now Qusp CEO) based on one of the Chronux functions. It's not a standard frequency filter, but it uses sliding window to adaptively estimate sine wave amplitude to subtract. It does not make a hole in the background EEG spectrum.

For parameter setting, I usually set the step size == window size for the maximum rejection (from experience), and others set to the default. According to Nima's PREP paper, before running this function, data nonstationarity should be addressed by high-pass filter beforehand (Bigdely-Shamlo et al., 2015).

There are cases CleanLine does not suppress the line noise sufficiently. Most likely, the reason is that the line noise is not stationary. See the sample screenshot shown below. Channel F4 shows constant envelope width across time. This shows stationarity and CleanLine should work well on this channel. However, notice that most of other channels show fluctuating 60-Hz envelopes, and their time constants are shorter than the typical CleanLine sliding window length (default 4 s). This shows violation of stationarity, and CleanLine cannot work well on them (I tweaked the parameters but have never got luck). You can easily verify it on your data by applying the EEGLAB filter with the setting "lower edge = higher edge = 60 Hz" for example. If it turned out to be the case unfortunately, you should either apply conventional notch filtera or low-pass filter to cut off before the line frequency.


Some update--if you see harmonics such as 60 Hz and 120 Hz, in the time domain such a signal does NOT appear as the sum of constant 60 Hz and 120 Hz sine waves, but 60 Hz with spiky peaks and/or troughs. In other word, 120 Hz is only present occasionally in a phase-coupled way. In this case, I think SIFT's assumption is violated and does not work. Even if you see 120 Hz (or whatever harmonics peak on power spectral density plot) reduced, it does not mean the phase-coupled harmonics is specifically removed, SIFT does not work in that way. In this sense, probably it does not make sense to enter [60 120] but it only make sense to enter [60] i.e., the principal frequency of line noise (if you think I'm wrong, please write me, I'd be happy to learn.)

If you data are recorded with Biosemi, you MUST re-reference the data to *ANY* channel (including average channels; by the way, by including the original reference channel(s) you can re-reference to other channel(s) repeatedly without loosing information). This is because Biosemi lets users take care of common-mode rejection offline. What is common-mode noise? You can see non-stationary 60Hz amplitude wax and wane in a remarkably non-stationary way, which could not be modeled and subtracted by CleanLine at all. However, re-referencing erased it perfectly. So I guess that's an example of common-mode noise.

Validation (04/25/2020 added)

I heard that CleanLine implemented in Bigdely-Shamlo and colleague's PREP Pipeline, called cleanLineNoise(), received some bug fix when implemented. To verify this info, I ran the following simulation test: single-channel 1000-Hz sampled data with 60-s long data of (1) 60-Hz sine wave ('Without noise'), (2) 60-Hz sine wave plus pink noise with 1:1 variance ratio ('With noise') were processed with the original CleanLine and cleanLineNoise, the latter of which is the PREP-implemented version of the former, to remove this 60 Hz sine wave using the same parameters (which you can check in the shared code you can download below). The results below demonstrated that the PREP-implemented version indeed seemed to outperform performance of the original CleanLine. The presence of background pink noise does not seem to affect the performance difference. I thought I used the same parameters for CleanLine and cleanLineNoise, but for validation purpose I share my simulation code so that anyone can verify and replicate this results. Based on these results, I will update my recommended preprocessing pipeline soon.

Without noise, frequency domain performance comparison. Note the reduction of 60-Hz power peak.

Psd noNoise.jpg

Without noise, time domain performance comparison. Note the amplitude of the residual of the 60 Hz sine wave.

Residual noNoise.jpg

With noise (SNR = 0dB), frequency domain performance comparison.

Psd withNoise.jpg

Without noise, time domain performance comparison.

Residual withNoise.jpg

Download Matlab code to replicate this simulation. Please contact me if you find an error or different parameters used for two applications in my code. The downloaded file is .zip format which is because I cannot upload .m file but I can .zip file in this wiki page.

Download Matlab .m code

Recommended parameters for CleanLine (06/13/2024 added)

CleanLine has two versions of implementations. According to the developers, the first version had a bug related to the incorrect use of a variable, which did not generate an error message. As a result of this issue, the first version did not function as expected. I reported this problem in my article here. The developer of the newer version identified and resolved this problem by rewriting the code. In order to utilize the newer implementation, users are required to specify the variables, as it is not the default option. Please refer to the screenshot below, specifically the bottom box labeled "Use new implementation (send us feedback)". By default, it is set to logical(false). Change it to logical(true). This is equivalent to utilizing the CleanLineNoise algorithm that I mentioned in my paper. Frankly, using logical(false) in this context is nonsensical and I do not understand why this trap has been there for years. This default option completely destroys the effect of CleanLine.


The third item from the top is 'p-value for detection of significant sinusoid' The default is 0.01. I tried 0.05 as well. Note that p=0.05 is more aggressive removal of the line noise modeled than p=0.01; p=0.01 applies more stringent criterion for modeling and fitting the line noise. Below, I show the comparison across three data sets. I think the default choice of 0.01 is better than 0.05. The original developer did a good job when choosing the default criterion here.


Epoch data to -1 to 2 sec (09/11/2019 updated)

  • If you have sufficient number of data points (i.e. [number of channels^2] x 30) after epoching, epoching before ICA is recommended.
    • A merit of epoching before ICA is that so that ICA can focus on peri-event brain dynamics while not to be 'disturbed' by datapoints from period of non-interest (i.e., non-event-related brain/non-brain activities). See also this section.
  • If you have insufficient number of data points, or your Stimulus Onset Asynchrony (SOA) is shorter than 3 seconds, do not epoch. Instead, use continuous data to perform ICA, then epoch. Follow this steps. If overlapped epochs are submitted to ICA, it will learn the duplicated part as positively weighted data.
  • The relatively long epoch is for later wavelet transform. For detail, see Slide 20 and 21 of this file. In short, epoch length of -1 to 2 sec is to have -0.5 to 1.5 sec window after wavelet transform (i.e. ERPS and ITC in EEGLAB terminology) with 3 cycle at 3 Hz wavelet which generates 1116 ms window length.
  • As you see above, epoching can be done anytime, before or after ICA/dipfit. In other words, epoching is NOT a necessary condition to perform ICA.
  • You do not need to 'correct baseline' i.e. subtract mean of the 'baseline period' which is typically -1000 to 0 ms. Importantly, David Groppe, our former colleague, reported that if short baseline such as -100 to 0 ms is subtracted before ICA, it makes ICA performance worse. This is because mean value of such short baseline can easily fluctuate by 'background' alpha rhythm, for example, that changes the DC level of all the channels independently... for ICA this is a very bad situation. Longer the baseline period, closer you are to the previously applied high-pass filter that guarantees mean zero hence safer. Actually, as long as the data are properly high-pass filtered, it is unnecessary to 'correct baseline' here, thus we can minimize the risk of confusing ICA. See Arno's comment below.

If you extract data epochs before running ICA (as advised by Makoto below), make sure that the baseline is long enough (at least 500 ms) or that you do not remove the baseline from your data epochs. See the following article for more information Groppe, D.M., Makeig, S., & Kutas, M. (2009) Identifying reliable independent components via split-half comparisons. NeuroImage, 45 pp.1199-1211.

Reject epochs for cleaning (11/13/2019 Updated)

Aim for around 5-10% rejection rate (Delorme et al., 2007). Apply amplitude threshold of -500 to 500 uV (don't capture eye blinks; you may want to exclude channels near AFp1 and AFp2 for this purpose), reject the selected epochs, and apply improbability test with 6SD for single channels and 2SD for all channels, and reject the selected again.

An idea for fast and efficient trial rejection (11/13/2019 Updated)

I moved this section to here with some update.

Adjust data rank for ICA (05/17/2019 updated)

When performing ICA, a critical parameter is how many ICs to generate. Usually and by default, the number of ICs to generate is the same as the number of the channels you have. However, this assumption holds ONLY IF your data are guaranteed to be full ranked (to see what it means, jump back to this section). Your data could become rank-deficient by

  1. re-referencing without including the initial reference (rank deficient by 1)
  2. interpolating k channel (rank deficient by k), or
  3. bridging n channels (rank deficient by n-1)

If ICA is forced to generate more number of ICs than the input data rank, 'ghost ICs' appears. pop_runica() has its own rank checker which uses Matlab's rank() function. However, though it can detect the case of 1) improper re-referencing and 2) channel bridging, it DOES NOT detect the rank reduction caused by channel interpolation using EEGLAB (using 'spherical' algorithm). This is because the interpolation is not linear, which adds very small eigenvalues that are counted as non-zero dimension. In fact, pop_runica() already has a heuristic threshold of 10^-7 for eigenvalues to reject such ill-conditioned dimensions, but it is currently used only to generate warning message (which I missed apparently). If you want to use this safer criterion to reject potentially ill-conditioned dimensions, add a minor change to pop_runica() from line 535 as follows (note 'max' is changed to 'min'):

function tmprank2 = getrank(tmpdata);
    tmprank = rank(tmpdata);
    %Here: alternate computation of the rank by Sven Hoffman
    %tmprank = rank(tmpdata(:,1:min(3000, size(tmpdata,2)))); old code
    covarianceMatrix = cov(tmpdata', 1);
    [E, D] = eig (covarianceMatrix);
    rankTolerance = 1e-7;
    tmprank2=sum (diag (D) > rankTolerance);
    if tmprank ~= tmprank2
        fprintf('Warning: fixing rank computation inconsistency (%d vs %d) most likely because running under Linux 64-bit Matlab\n', tmprank, tmprank2);
        %tmprank2 = max(tmprank, tmprank2);
        tmprank2 = min(tmprank, tmprank2);

Thus, it is strongly recommended if your preprocessing makes your data rank deficient by x in total (i.e., they add up!), you explicitly specify (number_of_channels - x) as your 'true' data rank in one of the following ways. Again, this becomes particularly important after channel interpolation, since pop_runica()'s rank checker does not detect the rank deficiency and allows to produce 'ghost ICs' (by the way, same happens to ASR--any PCA/ICA solution would be affected similarly!) To address the issue, you either

  1. reduce data dimension by using 'pca' or 'pcakeep' option for runica() and runamica() respectively, to the number of (number_of_channels - x), OR
  2. reject *ANY* channels by x. Interestingly, they do not have to be the exact same channels that are interpolated. You may want to determine the new channel location montage by using Nima's loc_subsets() function which tells you which channels to be rejected to achieve maximally uniform channel distribution.
% Discard channels to make the data full ranked.
channelSubset = loc_subsets(EEG.chanlocs, yourDataRankHere);
EEG = pop_select( EEG,'channel', channelSubset{1});
EEG = pop_chanedit(EEG, 'eval','chans = pop_chancenter( chans, [],[]);');


What is the minimum length of data to perform ICA? (07/04/2022 added)

SCCN recommends (number_of_channels)^2 x 20. The written evidence can find in Makeig and Onton (2011) ERP Features and EEG Dynamics: An ICA Perspective which is a chapter from Oxford Handbook of Event-Related Potential Components. The following description is quoted from the section Data Requirements

The most frequent mistake researchers make in attempting to apply ICA to their data is to attempt to apply ICA decomposition to too few data points. For data with large numbers of channels (64 or more), we suggest it is optimal to decompose a number of time points at least 20 or more times the number of channels squared. This is only a heuristic standard, not a strict minimum, and using this much data does not in itself guarantee an optimal decomposition. For very dense scalp arrays, this standard could require an unreasonable amount of data. For example, to decompose 256-channel data 20 × 256^2 time points at 256 points/sec would require over 80 minutes of recording time and occupy nearly 1.5 GB, though by this same standard for 64-channel data a 22-minute recording occupying about 0.35 GB would suffice.

Note This is only a heuristic standard, not a strict minimum. If you are interested in designing a study on this issue for further clarification, please contact me. By the way, the subsequent description is also interesting.

We are not as sure about the influence of sampling rate on ICA decomposition. Doubling the sampling rate during a recording period shortened by half might not produce as effective a decomposition, since the higher frequencies captured in the data acquired with a higher sampling rate would be small, relative to lower frequency activity, and might have lower source-signal-to-noise ratio. See Onton & Makeig (Onton and Makeig, 2006) for further discussion.

I agree with the authors. See my evidence for ICA's frequency dependency here. Given EEG's 1/f power distribution combined with ICA's bias to amplitude/power, adding high-frequency info would be pretty meaningless. In fact, one of my colleagues, who developed online ICA algorithm, once told me that he thought the length in real world would be more important than the number of data points. I agree with him. If we use 192kHz sampling, 10 second of EEG recording generates 1.9M data points. Can we perform ICA on this 10 s data to obtain satisfactory 128-ch data decomposition? But this ICA can 'see' only 10 s of brain behavior, when the dominant EEG power is in 10 Hz (i.e. only 100 oscillations in the data)? I doubt it. For the same reason, I intentionally downsample the data before ICA (and only for ICA) to 100 Hz. Not only does it speed up the process, it also cuts unnecessary high-freq contents, like denoising.

What is assumption of stationarity in ICA? (09/11/2019 updated)

ICA assumes that ALL the data points submitted are from the same state. For example, if you perform a P300 task while recording EEG, you are making the assumption that all the data points are related to the P300 task. From this viewpoint, it becomes clear that it is more desireable to epoch your data to the event of interest first then perform ICA (data length being permitted) rather than performing ICA on continuous data. What about eye-blinks? It is definitely not related to P300 task (or may be temporally correlated with event structure, if blink timings are behaviorally controlled), and it's not even from the brain. However, if a blink occurs frequently enough during the recording, it becomes a part of stationarity. What about non-task-relevant (i.e., non-event-related) absent mindedness or random thoughts reflected in EEG between trials or blocks? These could be disturbing to the assumtion of stationarity you made for ICA because such EEG signal may not have any repeating patterns or similar statistical characteristics (such as probability density function) across time. Therefore, using ERP paradigms actually gives a very nice justification to use ICA as ERP paradigms reinforces EEG data stationarity by massive repetitions of the same/similar types of stimulus-response sequences. The most ideal for ICA would be quasi steady-state responses, such as mismatch negativity.

  • Is data stationarity more 'attention-drawing' for ICA than large amplitude in signal? I have seen many times that only a single event of EEG response but with a very large amplitude can attract ICA's attention and make it waste multiple ICs to decompose it. So apparently, high amplitude signal is a very strong attention drawer for ICA than a regular-amplitude, well-stationary signals. This is why data cleaning is critical before ICA. It also explains why ICA is so good at decomposing eye blinks--their source location is fixed, potential distribution over the scalp is fixed and broad, and the amplitude is large. By the way, the reason why ICA is biased to source magnitude in practice is because realistic data has finite length.

General tips for performing ICA (06/26/2018 updated)

  • ICA has a bias toward high amplitude, unless the data has infinite length. EEG signal usually has 1/f power spectral density, which means low-frequency signal tend to dominate ICA results. This is why you have a lot of alpha components and no isolated gamma components; gamma power is always very low relative to theta and alpha.
  • ICA may fail if you apply it on data that consists of concatenated blocks with different capping, even if the data are from a same subject. This is because every time an electrode cap is applied, their relative locations to the EEG sources in the brain are not the same. This violates the assumption of spatial stationarity between the electrodes and EEG sources throughout the recording. In other words, To apply a single ICA, your subject cannot take off the electrode cap! If you did this, then ICA may show ICs with very similar scalp topographies with differently block-separated activations (active only in some blocks, otherwise green) in ERPimage.
  • For the same reason, re-gelling electrodes during break times is not recommended. There were cases that ICA split data before and after the re-gelling. Even only one channel out of the original position and it will happen, so be careful.
  • Include EOG, ECG, and other channels as long as they share the initial reference electrode with scalp electrodes. If EOG channels are referenced to each other (i.e., bipolar), exclude them.
  • If there are channels with terrible EMG contamination, excluding them from ICA may improve results. This is because ICA cannot model EMG sources because EMG sources are not spatially stationary but spreads along with muscle fibers.
  • You need minimum of (number of channel)^2 x 20 to 30 data points to perform ICA. The number ’20 to 30’ should increase as the number of channels increases. This empirical formula if from past publications from SCCN. This value was however determined purely empirically and we have not performed any systematic investigation on it (should you be interested in carrying out this study, contact us!) Moreover, there is a good counter-example for this rule of thumb--See the next one.
  • You may want to downsample the data (with more than 128 channels) to near 100Hz. This is the case when you give up gamma band (40-80Hz) and higher due to presence of strong line noise that CleanLine() could not remove very well (CleanLine's assumption is that the noise envelope is stationary, which often does not hold). Although not officially studied or reported, in principle this aggressive downsampling not only helps to accelerate the process but also may improve quality of decomposition due to simplifying the data. Run ICA on the downsampled data and obtain the decomposition matrices (i.e. EEG.icaweight and EEG.icasphere), and apply them to the non-downsampled data. Interestingly, in this case the number of datapoints is severely lower than the number recommended above, but in most of cases it is fine (But I did encounter several cases where AMICA failed due to downsampling; in this case, AMICA worked fine by using the original 256Hz sampling rate, so when coding you might want to use try... catch...). Judging from this, after all it is NOT the absolute number of data points that determines the quality of decomposition, but probably the length of actual time. If you think about using very high sampling rate (10kHz for example) to gain just the number of datapoints to improve ICA results... which does not seem to work to me. For this reason, using higher than 256Hz is probably not very helpful to improve quality of ICA decomposition.
  • You may want to reject epochs on IC activations and run ICA again (Delorme et al., 2007). This is because it will free ICA from data points difficult to explain. A good principle is that feed ICA only with typical (i.e., stationary) data. If you run ICA again on this cleaned data, theoretically you can improve ICA quality, but in practice the difference is placebo level unless you reject 1/3 of data (but remember, due to ICA's bias toward ampltiude, rejecting high amplitude noise/artifact is always immediately helpful, even if it is 0.001% of total data length. Note that high amplitude I mean here cannot be eye blink artifact, they are still benign; What I mean here is those of non-biological origin with > 1000-2000uV.) In general, ICA results are more robust than we think. If your data are already very clean, it is usually impossible to further improve ICA results.

Rejecting ICs to improve the next ICA? Or what is rank and linear algebra (07/16/2018 updated)

Rejecting ICs to improve the next ICA is the idea that keeps alluring many of us but never works. Let us confirm the process and the reason why it does not work.

  1. You have 100ch data.
  2. You run ICA on it to obtain 100 ICs.
  3. You identify 1 eye-blink component.
  4. You reject it. Now you have 99 ICs.
  5. You back-project the 99 ICs to obtain 100ch reconstructed scalp channel signals.
  6. You run ICA again on the reconstructed 100ch data, hoping to obtain another 100 ICs. The new ICA result should be cleaner because this time ICA is free from eye-blink artifact.
  7. You find somehow 100ch ICA gives you only 99 ICs. Certainly the blink IC is gone, but the rest of 99 ICs remains identical as before. .

Where did you fail? It is when you thought you could obtain another set of 100 ICs. Below we see this problem in detail.

Usually, your 100ch recording is rank 100 (to see it, execute double(rank( in Matlab command line) unless some of your channels are bridged/shorted, or average reference was applied. If two channels are bridged, their signals become identical. Clearly, this is not the useful way to use multiple channels. The important principle here is that for channel data to be meaningful, it should not be explained as combination of others (being identical is an extreme case). To see this more clearly, let me give you a simpler example below.

  1. You have Fz, Cz, and Pz data.
  2. You reject Cz data.
  3. You interpolate Cz with 50% of Fz and 50% of Pz.

In this case, you probably see that signal at Cz is not that useful, since it is just made up. In this case, there are 3 channels, but only with rank of 2. By the way, 3 channels with rank of 3 is called full rank, and any state where rank < number of channels is called rank deficient.

Let's stick to this example to explore further. Imagine that you run ICA on Fz, Cz, and Pz to obtain 3 ICs. You know ICA's performance often goes beyond our intuition to almost a magical level (as Clarke said, Any sufficiently advanced technology is indistinguishable from magic). But do you know where the three ICA activations come from? They are not out of thin air, but they are just the result of re-blending the three scalp channel values! For example, IC 1 is 29% of Fz, 33% of Cz, and 38% of Pz, and so on.

  • Q1. Does such a simple re-blending can really reveal the allegedly effective EEG source activations?
  • A1. Yes. To make a claim to be an effective EEG sources, there needs several assumptions of spatial stationarity of the sources etc.
  • Q2. Who determines this blending percentages?
  • A2. That is ICA's primary output.
  • Q3. Where can I find this blending percentages?
  • A3. Each column of EEG.icawinv represents the one-IC-to-all-channels ratio (not exactly percentage). You re-blend the channel values following this ratio to calculate obtain IC activations.

This blending process is called linear transform. The same applies not only to ICA but also to any kind of linear method such as PCA, Non-negative matrix factorization, etc. Wavelet transform is also a type of linear transform, as it reconstructs a given 1-D signal with a blend of basic waves (wavelet daughters) with different frequencies.

Now, let us proceed more. We have 3-channel data Fz, Cz, and Pz, but Cz is an interpolation of 50% of Cz and 50% of Pz. If we run ICA on this three channel data, what happens? Do you think ICA will still give you three independent components, even though your Cz is mere blending of Fz and Pz? Now you think about the meaning of the word independent. If you have signals in Fz and Pz and their blending ratio, you can 100% predict Cz's signal behavior. You probably see that Cz does not carry unique information at all. Rank deficiency indicates this kind of situation. Namely, your number of channels is inflated due to redundancy.

By the way, in the case of average referencing, data redundancy is not concentrated on one channel but in a sense uniformly spread across all the channels, which is why it is hard to see why average referencing causes rank deficiency. In this case, think about what happens when you perform average referencing on two channel data, Fz and Pz. After subtracting average reference, they will be identical, just polarity reversed.)

If you want, I can give you more extreme example: Instead of blending Fz and Pz with 1:1 ratio, we can set the blending ratio to be 0.99:0.01, 0.98:0.02, 0.97:0.03,... to generate 100 channel data. As you see, you can generate as many channels as you want in this way. However, even if you have 100ch data genereated in this way, unique information is present only in Fz and Pz. You can't create information out of nowhere.

Ok, let's get back to the current question: if we run ICA on 3-channel rank-2 data, what happens? The answer is, you obtain only 2 ICs. The number of ICs you obtain is equal to the rank of the data, not the number of channels (as we see, the number of channels could be easily inflated by generating redundancy). This explains the very original question of why the second ICA returns only 99 ICs instead of 100 ICs. When you rejected the blink IC, the back-projected scalp channel data is no longer full-ranked, but it became rank deficient by 1. Because ICA cannot generate independent component out of nowhere (it is NOT a real magic after all!), it will give you the same number of ICs as the data rank. By the way, the rank check is not ICA's inherent property. In the realistic implementation, rank check is performed by other functions. If ICA is set to run to generate 100 ICs from rank 99 data, it either stops in failure or generate weird result (see 'ghost ICs' explained below).

Now, the last question: why do the 99 ICs remain the same? If you read all the explanations above, now you should know what ICA cares and what not. ICA does not care superficial representation on scalp channels in the sense that channel data are just one of the infinite linear combinations of the source activations (imagine your electrode cap, which has 128ch covering entire scalp area, is rotated slightly to arbitrary direction so that all the channel positions are shifted into nameless positions. Under this situation, ICA will still give you the same source locations with the same source activations. ICA sees the channels as projections of sources.) Instead, ICA cares who are independent across channels. Because the identified blink IC is (theoretically) independent of any other ICs, presence or absence of the blink IC does not affect the state of other ICs, hence decomposition is unaffected.

Tips for copying ICA results to other datasets (06/26/2018 updated)

  • You can copy a set of ICA-related matrices i.e., EEG.icaweight and EEG.icasphere to other datasets. EEGLAB GUI operation supports this process. Thus you can for example learn ICA matrix from aggressively cleaned data, while applying the results to the mildly cleaned data to save data points.
  • In doing so, however, if noise is present which ICA did not learn, the noise will be present in the ICA results.
  • However, make it ABSOLUTELY clear about the two datasets for the following points.
  1. They have the same number of channels in the same order.
  2. They have the same (re-)referencing status.

The problem here is that even if you make a serious mistake, EEGLAB does not let you know about it. Therefore, you have to check it yourself carefully. To confirm if you did not make any mistake, [Plot] -> [Component activations (scroll)] to see the IC activations. If the copying process is successful, you see variances (i.e., amplitude) of ICs are sorted as usual, the largest on the top and the smallest in the bottom. Note that scalp maps for ICs (which is EEG.icawinv itself) look normal even if you have this problem, so do not be misled. Always check IC activations. If the copying process ended up with failure, you will notice the following symptoms.

  1. IC activations are not sorted in the order of variance i.e., it does not follow the rule that largest on top and smallest at bottom.
  2. Multiple IC activations are temporally correlated (often a pair with polarity inversion)
  3. Some of IC activations, which is not necessarily top-ranked, shows abnormally large amplitudes.

See below for the example.

Figure 4. Corrupted data due to failure in copying ICA weight matrix.

  • There is a paper claiming that ICA (actually, any linear decomposition) distorts phase (Montefusco-Siegmund et al. (2013). Effects of ocular artifact removal through ICA decomposition on EEG phase.) Some people hence claim that one should avoid using ICA at all. This claim is illegitimate (as Arno says 'resounding no' in his page linked below). In this wiki page, my colleagues and I illustrated the mathematical process of how phases are synthesized before and after component rejection to show that 'this is a due change in phase and nothing surprising'. See also this page written by Arno showing empirical demonstration that rejects this claim.

To learn how to evaluate EEG and artifact ICs (01/24/2017 updated)

Please visit this site to start the training. This nice project was designed by my colleague Luca Pion-Tonachini.

Unofficial instruction to install and use AMICA (11/08/2023 updated)

Download AMICA from EEGLAB extension manager [File]-> [Manage EEGLAB extensions] -> [Data processing extensions] -> Check AMICA to Install. Below I show old information.

  • Use AMICA developed by former SCCN colleague Jason Palmer. AMICA does better job than Infomax (see Delorme et al., 2012).
  • Below I show an example of successful installation for Windows 7 64bit. I also heard a report of successful installation on Mac OS Sierra on Core i7, using Matlab 2014b and EEGLAB 13_6_5b. This does not mean EEGAB 13 is more stable for this purpose; Use the current version of EEGLAB.
  • DO NOT use characters other than ALPHABET and NUMBERS in the file path (including file names). For example, if you have a space in the file path, IT WILL FAIL! (Thank you Xuan Tran and Mahdi Mollaei for helping me to find this out)
Unofficial instruction to install AMICA for 64bit Windows.
1. Go to Jason Palmer's website to find the links to the files.
2. Create a folder named 'amica' under /eeglab/plugins
3. Download and copy the following 5 items to /eeglab/plugins/amica
   -amica15mkl.exe (Windows 7/10 64-bit binary)
4. Download and copy the following 2 items to /Windows/System32
   -fmpich2.dll (Windows DLL, save in directory with binary)
   -libiomp5md.dll (Windows DLL, save in directory with binary)
5. Download and execute mpich2-1.4-win-x86-64.msi (for 64bit Windows) from
   Double-click the downloaded .msi file to install it.
6. Install postAmicaUtility() plugin from EEGLAB plugin manager. Otherwise, there is no GUI button to load AMICA results.
   Below I show an exmaple code to load AMICA results from command line. Note that the variable 'dataName' is the name used for AMICA result output folder.
   EEG.etc.amicaResultStructure = loadmodout15(['/data/projects/example/amicaResults/' dataName]);
   EEG.icaweights = EEG.etc.amicaResultStructure.W;
   EEG.icasphere  = EEG.etc.amicaResultStructure.S;
   EEG = eeg_checkset(EEG. 'ica');

RMS option issue (07/02/2020 added)

By default, 'If set, scale ICA component activities RMS (room mean square) in microvolt (recommended)' under 'Memory and other options' is on. However, this does NOT take effect UNLESS a check function eeg_checkset() is ran. This means that loading .set files is sufficient (but this overwrites 'EEG.etc.icaweights_beforerms' as well, which is inconvenient and confusing). Hence, if you are to save the data before moving to the next step, there is no problem because in loading the data eeg_checkset() is ran. However, if you continue to process your data after ICA, the variables including EEG.icaweights, EEG.icaspheres, EEG.icawinv, EEG.icaact are NOT scaled to RMS. If you do not know about this fact, you may encounter a problem like this (Thank you Seyed!) so beware of this behavior.

Does ICA distort phase? (07/27/2020 updated)

This question is sometimes asked in this form, but this is rather a strange one. An example of a linear decomposition is 7+5 = 4+8. As such, that itself should not do anything at all (other than suggesting a different view of the data, and some of them may be useful; for example, 10+2 may be more understandable than 7+5 for a good reason, and another linear transform of 2+2+2+2+2+2 seems to have different usefulness as well).

If you 'remove component' after ICA, it DOES change the data. For example, if you remove 4 from 4+8, does this change the result? Of course yes. 4+8 and 8 are two different things. The real question is: Do you call it distortion? No. (Return to Wittgenstein--the meaning of the word is the use of the word)

For those who want to see more formal proof of the fact that post-ICA component rejection does NOT distort the phase, I prepared a separate page. This proof simply shows that IC rejection causes linear phase change that is the essentially the same as removing 5 out of 7+5, which you do not call 'distortion'.

There is also an empirical demonstration available in this page.

Is using PCA absolutely bad for ICA? (12/03/2021 updated.)

(This is for Nir Ofir) The short answer is no. Since Fiorenzo published this paper, I started to hear this wrong overgeneralization: Using principal component (PC) analysis (PCA) is bad for ICA. The truth is,

  1. CASE1: Using PCA for dimension reduction is fine.
  2. CASE2: Using PCA for dimension reduction is justifiable.
  3. CASE3: Using PCA for dimension reduction is wrong and cannot be recommended.

To start with, let's confirm what it means to reduce data dimension (or rank) by using PCA. PCA finds a low-dimensional subspace that captures most of the variance in the data. After you decompose the input data using PCA, you reject n PCs from the smallest ones, then backproject the remaining PCs to the scalp electrodes. In this way, you can reduce data rank without discarding the actual scalp electrodes.

Case 1 It is fine to use PCA to specify the true rank of data in the rank-deficient data. For example, you have 100 channel data but for some reason it has rank of 99 (perhaps average reference is applied without including the initial reference? This is however a suboptimal approach though; See this section). Typically, EEGLAB-implemented ICA solvers comes with a rank checker (with heuristically determined minimum eigenvalue threshold of 10^-7, see pop_runica line 531) so feeding a rank-deficient data usually does not cause an issue. However, rarely it does; See this section to see what can happen. In order to avoid this rare problem, using 'pca', 99 (or 'pcakeep', 99 for the case of AMICA) in this case is PERFECTLY fine and even recommended. It eliminates the risk of running into this rare 'ghost-IC' problem, and ICA does it automatically whenever it detects rank-deficientcy in the data.

Case 2 PCA dimension reduction is justified to process data that is too short to produce as many number of ICs as the number of electrodes. For example, suppose you have a high-density 128-ch EEG data. Great. But what if your recording time is only 5-min long? Producing 128 ICs from 5-min recording would cause serious shortage of data to learn (remember SCCN's rule-of-thumb formula to estimate minimum data length for ICA, which is the number of electrode ^2 x k, where k is minimum 20 to 30...). In such a case, there are two choices: either reduce the data rank using PCA or remove electrodes. In either way, good portion of original information is lost. It is not easy to determine which is better, unfortunately, but using PCA may be smarter in most of scenarios where you remove only small number of data dimensions (up to 10% of the number of electrodes). It may be also noted that due to 1/f power distribution, I predict that signals in the lower frequency range is more robust against the dimension reduction effect than those in the higher frequency. See this section to confirm the related fact.

Case 3 PCA dimension reduction should NOT be used just to speed up the process of ICA!

Head model coregistration and estimate single equivalent current dipoles (07/02/2020 Update)

Matlab dependencies (11/15/2023 added)

You need the optimization toolbox. Otherwise, you will see NaN% results in r.v., a residual variance from fitting a theoretical projection from an estimated dipole. (Thanks Andre Coetzee!)

Head model coregistration

Now we want to fit a source model, an equivalent current dipole, to the scalp topography generated by ICA. Before doing this, there is an important process of head model coregistration. Your data has electrode locations that are defined in its xyz coordinates. EEGLAB has an electric forward model of the head defined in its xyz coordinates. The problem here is the two space coordinates may or may not correspond to each other. For example, in some coordinate system x-axis is left-right and y-axis is anterior-posterior, while for other coordinate system this xy relation is opposite. Actually, it is often the case that the left-right axis and the anterior-posterior axis are swapped between your electrode locations and EEGLAB's head mode so that front in electrode space means right in the model head space! To address this inconsistency, you need to coregister the two coordinate systems such that front means front and right means right. 'Tools' -> 'Locate dipoles using DIPFIT 2.x' -> 'Head model and settings'. For detail of the procedure, see EEGLAB online manual.

A possible issue you may encounter here (11/07/2020 added)

This is for Syanah Wynn. EEGLAB's head model coregistration uses millimeter as a default unit. However, sometimes your electrode locations are registered in meter or centimeter, depending on how the electrode location data are generated. In the figure below, I show an example of what you would see if centimeter is the unit for the electrode locations instead of millimeter.


You can see your current electrode locations (green) clustered at the center of the head (do not forget to push 'Mesh Off' button on the left-hand side). If this is the case, the following code is the solution.

for m = 1:length(EEG.chanlocs)
    EEG.chanlocs(m).X = EEG.chanlocs(m).X*10;
    EEG.chanlocs(m).Y = EEG.chanlocs(m).Y*10;
    EEG.chanlocs(m).Z = EEG.chanlocs(m).Z*10;

This is equvalent to converting centimeter to millimeter. Similarly, if meter is used for your electrode location coordinates, the following is the solution.

for m = 1:length(EEG.chanlocs)
    EEG.chanlocs(m).X = EEG.chanlocs(m).X*1000;
    EEG.chanlocs(m).Y = EEG.chanlocs(m).Y*1000;
    EEG.chanlocs(m).Z = EEG.chanlocs(m).Z*1000;

Single equivalent current dipole fitting

Once the coregistration is done, your data is ready to perform equivalent current dipole model fitting. 'Tools' -> 'Locate dipoles using DIPFIT 2.x' -> 'Autofit (coarse fit, fine fit & plot)'. You need to run coarse fit first then fine fit next, but this 'Autofit' does both together anyway. Note that 'Rejection threshold RV (%)' here should be 100, otherwise you'll have trouble in making STUDY later. Do not change anything on the GUI and just press 'OK'. Because we are fitting a single dipole to a scalp projection of each independent component which is readily (most of the time single- and sometimes symmetrically bilaterally-) dipolar, this estimation process is greatly eased and also eliminates complication in determining the optimum number of dipoles. For more detail, see below.

Deep dipole results

This ICA + single dipole fitting approach has abovementioned beauty but also has a known issue of producing physiologically invalid deep dipoles. I discussed the issue below.

Search for and estimate symmetrically constrained bilateral dipoles

Download fitTwoDipole() plugin for EEGLAB and perform it after fitting single dipoles. It is compatible with eegh, and you can make a batch. The code is like this: EEG = fitTwoDipoles(EEG, 'LRR', 35); Do not forget to site Caterina's paper.

Create STUDY

When creating STUDY with epoched data (assuming that you have within-subject conditions), I recommend that you use one set file per subject and never save the data separately for each condition (which is an old approach used before was implemented). In theory both works, but there were reports of problems caused by separate .set files. I recommend you use to let STUDY know the factorial design of the analysis.

When creating STUDY, I usually load all the .set files into EEGLAB using pop_loadset() with 'loadmode', 'info' (it may be possible to do it from GUI too, but it is convenient to input EEG.subject and altogether at this point). Importantly, creating a STUDY cleans your data because it performs IC rejection based on the two criteria: 1) dipoles with > 15% residual variance--Artoni et al. (2014) showed (indirect) evidence of why this value is about optimum; 2) dipoles outside brain--no argument. This is why you don't need to bother to manually reject ICs at the individual level. Don't forget to set before you start precompute. After precompute, you precluster the ICs by choosing clustering criteria.

Avoid double dipping (03/07/2019 updated)

If you are going to test ERP/ERSP/ITC/PSD difference (for example) across conditions at the end of the day, DO NOT choose any of them to cluster ICs. This problem is called 'double dipping', a type of circular inference that is known to inflate false positive rate. If you have not heard of the word 'circular inference', 'double dipping', or 'voodoo correlation', see Kriegeskorte et al., 2008, Kriegeskorte et al., 2009, and Ed Vul's voodoo correlation site. Note that these were published more than 10 years ago. Note also that much more fMRI researchers are aware of the issue than EEG researchers. So 'I have never heard of it from any of my colleagues' can not be an excuse. Ok, so why can you not use ERP to cluster ICs, then to compare ERP amplitude across conditions? If I understand the problem correctly, there are at least two issues:

  1. The resultant clusters have, by definition, smallest variance in the given measure (i.e., ERP) within a group. Given that mean values remains the same, variance becomes smaller means increase in probability for getting significant results i.e., inflation of Type I error (false positive).
  2. I heard some say the above problem can be avoided by using non-parametric test. Fine. However, before performing actual inferential statistics, you form IC clusters, in which ICs with dissimilar ERP waveforms are actually kicked out (because that's what you are doing.) But it is a good scientific practice to kick out any data points just because they are 'against your hypothesis/assumption'?

Then, how can we avoid this issue? The answer is simple: USE ONLY DIPOLE LOCATIONS to cluster ICs, and never use any measure that is derived from time/frequency domains. EEGLAB does allow users to choose multiple clustering criteria, which is misleading. If you do not want to experience hard time to explain to your reviewers why you chose ERP/ERSP/ITC/PSD to cluster ICs, just use dipole locations.

Can we determine the optimum number of IC clusters? (01/14/2020 added)

For Fran and Seyed, I would like to share my solution. But this is usable only for the case where you have only dipole locations as a clustering criterion. Sorry it is not nicely wrapped up into an EEGLAB plugin, but you have to run some code, which is a bit too bulky to paste here so please be referred to this page.

How can we cite these reasonings? (03/07/2022 added)

This is requested by Daghan Piskin. Off the top of my head, I remember these two papers. Unfortunately, nowadays it is unwelcomed to spend precious main text space on this kind of technical details, and I often need to move them to Supplement. But you can still cite them. Probably it is difficult to find these descriptions in other EEG-ICA papers. By the way, the reason why I use dipole locations and not IC scalp topos, although the former is computed from the latter, is that IC scalp topos change a lot depending on dipole angles. However, dipole angles generally reflects individual differences in gyrification patterns, which are often data of non-interest. If an estimated dipolar source is in a wall of a sulcus, the location tends to be deeper, but the difference is not as dramatic as the scalp topos. So I'd say dipole locations are more robust against individual differences in the sense that dipole depth differences are less sensitive to individual differences in cortical anatomy than scalp topos are.

Miyakoshi et al. (2020)

...K-means clustering was performed using IC dipole locations only without any time-frequency data to avoid circular inference in the subsequent statistical tests (Kriegeskorte et al. 2009). The number of clusters was determined by using optimization algorithms including Silhouette (Rousseeuw 1987), Davies-Bouldin (Davies and Bouldin 1979), and Calinski-Harabasz (Calinski and Harabasz 1974) to see if their suggestion would converge. We adopted the minimum number suggested to maximize the unique subject ratio per IC cluster.

Miyakoshi et al. (2021)

...and used EEGLAB STUDY structures to group the brain ICs spatially using the k- means algorithm applied to the equivalent dipole locations. The silhouette algorithm (Rousseeuw, 1987) was applied to obtain an optimum number of clusters, which is found for these data to be 9.
...The number of IC clusters (= k) was determined by using the Silhouette algorithm (Rousseeuw, 1987) on all the xyz coordinates of the equivalent current dipoles fitted for IC scalp topographies (i.e., weights in the columns of the mixing matrix). Care was taken not to include any time/frequency domain metric on this clustering to avoid circularity in the later inferential statistics (Kriegeskorte et al., 2009).

Why using dipole locations does not cause double dipping? (03/07/2019 updated)

Dipole location is estimated from an IC scalp topography, which is a direct property of ICA (it is a column of EEG.icawinv). Why do I say that using dipole locations is safe in terms of double dipping? You can empirically prove it with an simple experiment: Randomize the order of channel assignments before you perform ICA. What do you get after ICA? You will get the identical results in time-series and power spectral density. However, you will get randomized scalp topography. What does this mean? This is a proof that ICA is agnostic on spatial information. Therefore, using dipole locations estimated from scalp maps does not cause double dipping in testing time-domain and frequency-domain data.

A side note--here, an interesting question is that 'Then, why are ICA results always associated with nice dipolar scalp topographies?' Indeed, in Delorme et al. (2012), they even made a claim that more mutual information reduced, more dipolar their scalp topographies become. Let us tentatively call it 'independence-dipolarity identitiy (IDID)'. As I mentioned, because ICA is agnostic on spatial information, dipolarity of the scalp topographies is not a direct property of ICA itself. Thus, an explanation for IDID is non-trivial. Unfortunately, SCCN has not published any 'official' explanation for this, but instead has been using it as a proof that ICA meets physiology. It seems convincing, but to really convince researchers from other fields, we will need good supporting evidence in electrophysiology.

Why is my ERSP/ITC data shorter than ERP epoch? (08/20/2021 added)

This is for Sean Gilmore, but it is also one of the most often asked questions. To explain this, I copied two slides from my previous EEGLAB workshop slides below. To make it clear, this phenomenon of shortening the epoch after time-frequency transformation is NOT specific to EEGLAB or newtimef(), but this is a general phenomenon related to a concept cone of influence (COI) (see this webpage in the Matlab help center for definition of COI.) In the left panel below, one of the lines from the command line log of newtimef() says 'The window size used is 279 samples (1116 ms) wide'. This describes the width of the longest wavelet. Actually, if the target frequency goes higher, this window should become shorter (unless you specify short-term Fourier Transform by specifying the cycle increase parameter 0). This change of the window length shapes the 'cone'. But in EEGLAB, the algorithm does not care to shape the cone. Instead, it simply cuts everything outside the narrowest part of the 'cone' that is necessarily at the lowest frequency. See the right panel below and notice the difference between the onsets of the EEG start and the ERSP start. Why is the ERSP start delayed? Actually this delay is only apparent because the sliding window does take the initial data points for calculation. But because one sliding window only generates one time point, the total number of points between before and after the wavelet transform cannot be the same. In the current case, you lose one sliding-window long datapoints minus one. This datapoint loss is divided into two parts: the half of the window goes to the beginning of the data, and the other half of the window goes to the ending of the epoch. Thus, if the newtimef() log line says 'The window size used is 279 samples (1116 ms) wide', you lose 278/2=139 points (1112/2=556 ms) from the beginning and ending of the epoch after wavelet transform. When you epoch the data, if you know you will perform time-frequency analysis later, you have to take this datapoint loss after wavelet transform. Otherwise, you have to redo your process from epoching again.

Slide2.jpg Slide3.jpg

A tip to compute time-frequency transform (i.e. ERSP & ITC) (08/20/2021 updated)

EEGLAB uses its own way to define wavelet basis. Technically it is a mixture of short-term Fourier Transform and Wavelet Transform since the cycle number varies by default (for detail, see the archived communication starting with this. This was the best discussion ever in the history of the mailing list.) Now, the problem is how to set the parameters to perform it properly. Of course the best parameters depends on your application. But such logically self-evident statement carries no information. I recommend 3 cycles at 3 Hz for the lowest frequency, and about 7-10 cycles at 50 Hz (or 15-20 cycles at 100 Hz, if 100 Hz is the maximum frequency you want to analyze.) I say this because I remember people used 7 cycles to analyze 40Hz gamma in early 2000's. This 'number of cycle' parameter is not very sensitive at high frequencies, and the difference of 2 or 3 would not make noticeable differences. I copied a slide from my previous EEGLAB workshop slides below. Comparison between 5 cycles vs. 25 cycles at 50 Hz. Overall, my advice is that do not be too nervous about it.


What is the optimum and maximum number of clusters? (11/30/2020 edited)

This article is based on a review work by Nattapong Thammasan. For the optimum number of IC clusters for k-means algorithm, see my another Wiki article. For the maximum number of clusters, there is no hard limit of course, but from several literatures the suggested number is 30. First, in Nima's Measure Projection paper, there is this description: 'a desired number of IC clusters (10–30).' However, this paper does not show the reason for this number, so not a very strong evidence for the current purpose. In this David Groppe's paper, there is the following description: 'Based on ICASSO's cluster quality index for each cluster, 30 clusters were chosen as corresponding to reliable ICs.' This ICASSO is the algorithm used here by Himberg et al (they published other important ICA papers too). This paper by Groppe and colleagues seems so far the best justification to set the upper bound of the number of clusters for k-means to 30. Finally, our collaborator Caterina used 60 clusters and 30 clusters for infant and adult, separately, using PSD and scalp maps as criteria to separate artifacts from brain sources in this paper. Her two-stage IC-clustering approach resembles what I used to recommend (until Luca's ICLabel became available--then I don't do the two-stage IC clustering. For the first clustering, just PSD is used as a clustering criterion to reject artifact clusters which mainly focuses on muscle ICs). If you want to use a large number of IC clusters in EEGLAB STUDY, I recommend you set it to 30 and cite Groppe et al. (2009) for justification, and other two papers as optional.

Notes (Re-organized 04/23/2020)

How to cite this page (04/23/2020 added)

Before showing the example format, let me discuss validity of citing this page. The are two issues.

  1. Validity of science--Nothing is new in this page. All methods described are either published or logically derived in each situation. I am not proposing a new idea that needs to be validated. As such, probably you don't need to bother to cite this page. This page is like an extension of our EEGLAB mailing list. If you learn something new in this page, it is like you learn something new from a conversation with your colleagues. However, sometimes you may need evidence for a specific process. In case you do not see a reference associated for what you need to cite, please write me directly so that I can point you to proper references. I would be also happy to write a paragraph of rebuttal to your reviewers if they give you unreasonable criticism.
  2. Validity of citing an URL to papers--I don't think it is authentic because websites are subject to changes or even deletion. On the other hand, it is becoming more common to cite Github pages in papers. I guess citing a website is a transitional solution to an emerging situation in science, and as such it seems ok to me as long as it is more useful than not.

That being said, let's talk about the practical solution. As my background is psychology, I would stick to my home ground's standard protocol, which is American Psychological Association (APA) style. Here is an example and explanations of the case of citing a Wikipedia page using APA style. However, this is Swartz Center's website using Wiki style and not a part of Wikipedia. Given those differences, if you would ever want to cite this page, the format you should be using is the following.

Makoto's preprocessing pipeline. (n.d.). Retrieved April 23, 2020, from's_preprocessing_pipeline

Make sure you change the date from 'April 23, 2020' to the date you read this page. It is indeed strange to cite an URL starting with someone's name like this. Historically, Makoto's was added to this wikipage's title to show hereticalness (i.e., being unofficial) in contrast with SCCN's official teachings, as the note of caution in red on the top of this page indicates. It was certainly not to advertise my name.

Alternative, more automated pipeline using ASR (10/18/2023 updated)

One of the drawbacks in the abovementioned cleaning approach is that if a short burst of artifact appears in one of two hundred channel data, for example, you need to give up the entire epoch. Thus, the more channels you have, more probable it is to encounter this issue, probabilistically speaking, like a false positive rate that inflates as you increase the number of tests. In other words, you are doomed to spend more budget to buy a fancier recording system with more channels to reject more data.

I suggest Artifact Subspace Reconstruction (ASR) is a way to go. For a brief summary of ASR, see my commentary on ASR. If you want to see my lecture on ASR, see this Youtube video, which is a recording of one of my talks at a summer lecture of HEALthy Brain and Child Development Study (HBCD) that took place on July 23, 2023. I also put a summary of ASR on the different page for detail.

Suggested preprocessing pipeline (11/27/2019 updated)

  1. Change the option to use double precision
  2. Import data
  3. Downsample if necessary
  4. High-pass filter the data at 1-Hz (for ICA, ASR, and CleanLine)
  5. Import channel info
  6. Remove line noise using CleanLine
  7. Apply clean_rawdata() to reject bad channels and correct continuous data using Artifact Subspace Reconstruction (ASR).
  8. Interpolate all the removed channels
  9. Re-reference the data to average
  10. Run AMICA using calculated data rank with 'pcakeep' option (or runica() using 'pca' option)
  11. Estimate single equivalent current dipoles
  12. Search for and estimate symmetrically constrained bilateral dipoles
  13. Generate probabilistic IC labels using IClabel() for IC rejection
  14. Epoch IC-rejected data to -1 to 2 sec to event onset (So far is the subject-level process)
  15. Create final STUDY specifying full (This is the group-level process using STUDY)

The control over clean_rawdata() was transferred to the main developer of EEGLAB. The main GUI no longer looks like this, but it should be more or less the same. I'll update it someday.

Figure 5. Sample parameter settings for pop_clean_rawdata.

  • The first box is disabled (-1) because I assume there is no flat channels. This can be turned on if you want.
  • The second box is disabled (-1) because FIR high-pass filter is already applied. I think this plugins applies IIR since this cleaning solution is originally for realtime online data processing.
  • The third box means that if a channel is correlated to the surrounding channels less than 0.8, the channels is rejected. If it rejects too many channels, use less value.
  • The fourth box is disabled (-1) because line noise is supposed to be already taken care of.
  • The fifth box is the ASR. This is what it does.
  1. Find calibration data (i.e. the cleanest part of data)
  2. Apply 0.5-s sliding window principal component analysis (PCA) on the continuous data.
  3. Classify PCs into high variance (in this case, 8 standard deviation from the calibration data) or normal variance.
  4. Reconstruct high variance subspace (i.e. group of channels associated by a PC) with normal variance subspace. An important assumption is that neighboring channels are heavily correlated due to volume conductance and scalp mixing (by the way... because of this, the central limit theorem works and the scalp channel signals become closer to Gaussian than source signals; this guarantees ICA. See also Arnaud Delorme's explanation of ICA as non-Gaussianity maximization). This is how one can estimate a signal of one channel from signals of adjacent channels. From my experience, I would advice that one should start with very lax value, such as 10-20, to use it sparingly to correct only absolutely bad glitches that critically influences ICA's performance. Nima's result showed that ASR with SD=20 produced the best result. My interpretation for this is that we should let ICA explain the data as much as possible so the correction using PCA-based method should be minimized, and the empirical balancing point is somewhere around 10-20.
  • The sixth box is the window rejection criterion. If more than 0.25 of channels are judged to be bad even after ASR, the window will be rejected.

When using ASR, this is a good guidance to control the total amount of rejection.

  • You need more channels -> Use lower values in the third box
  • You need more data points -> Use lower values in the fifth box (i.e. correct more using ASR) and use higher values in the sixth box (i.e. reject less windows).

Important note: clean_rawdata() generates channel and datapoint masks that indicate rejections. They are stored under EEG.etc.

How to perform automated IC selection using IClabel() (11/13/2019 updated)

IClabel() is a relatively new EEGLAB plugin written by my former colleague Luca Pion-Tonachini (you pronounce tonakini, not tonacheeni). I found this plugin wonderful. I thought I can finally replace my manual component selection process with this completely automated process. Here, I show an example code of how to perform brain component selection with label probability > 0.7 and residual variance < 0.15. In case you do not know what 'residual variance' mean, see the next subsection.

% Perform IC rejection using ICLabel scores and r.v. from dipole fitting.
EEG       = IClabel(EEG, 'default');
brainIdx  = find(EEG.etc.ic_classification.ICLabel.classifications(:,1) >= 0.7);
rvList    = [EEG.dipfit.model.rv];
goodRvIdx = find(rvList < 0.15); 
goodIcIdx = intersect(brainIdx, goodRvIdx);
EEG = pop_subcomp(EEG, goodIcIdx, 0, 1);
EEG.etc.ic_classification.ICLabel.classifications = EEG.etc.ic_classification.ICLabel.classifications(goodIcIdx,:);

What does 'residual variance' mean (11/13/2019 updated)

'Residual variance' is a typical Swartz Center jargon (other examples includes 'PVAF', 'dipolarity', and 'temporally maximally independent'.) But the problem of 'residual variance' is, I believe, that it is underdescribed. For example, residual, from what operation? Variance, across what? In fact, until several years ago I probably could not describe how it is calculated (another such example of IC ERP... you'll be probably surprised to know how complicated it is.) I created a slide to explain the concept of 'Residual variance'.


To answer my previous question, 'residual' means the difference in 'IC scalp projection' minus 'theoretical dipole projection', and 'variance' means 'variance of the differences across all the scalp channels'.

However, another layer of difficulty is to see the relations across IC, equivalent current dipole, and its scalp projection. ICA generates mixing matrix (A in X=AS) and unmixing matrix (A^-1 in S= A^-1)X). Scalp projection of IC is just a visualization of the columns of A^-1; you tile the values in this column in the order of your register channels and you obtain the IC scalp maps. Equivalent current dipole is fitted to this IC scalp map. After fitting the dipole, you can calculate noiseless scalp projection from the estimated dipole. Finally, the noiseless scalp projection is compared with the IC scalp map to calculate residual variance in the following way.

residual_variance = var(IC_scalp_map - noiseless_scalp_projection)/var(IC_scalp_map)

Does the number of channels affect 'residual variance'? (04/10/2020 added)

Yes, the more number of scalp electrodes you have, more likely you see higher 'residual variance'. This is simply because you have more probability to get noise if you have more electrodes. This seems related to the reason why multiple comparison correction is necessary in statistics--as you increase the number of electrodes, probability of having at least one bad electrode inflates as well. With 19-electrode system I saw really low residual variances from time to time which I have never seen with a dense electrode array system, but that does not necessarily mean 19-electrode system's data is better.

ASR FAQ (01/07/2020 updated)

  • Q1. In the screenshot below, why the quiet portion between 0.4-0.5 was needed to be removed?


(The screenshot by courtesy of Małgorzata Plechawska-Wójcik)

  • A1. Three possible problems:
  1. The data are noisy all the time, so 'calibration data' (i.e., the cleanest part of the data) also contains noisiness.
  2. The left-tail cutoff is too high (default -3.5) in RMS Z-score distribution.
  3. The low-frequency drift is present.

I will explain each of them below.

1) The 'calibration data' (i.e., the cleanest part of the data) ASR chose in the very initial stage of the process contains high amplitudes on average than these 'quiet' chunks. So it is relative to within-data variance; if your data are generally noisy, you will see more rejections of these 'quiet' chunks.

2) Let me explain the mechanism to begin with. The final window rejection is performed in the following way:

  1. Applying the custom heuristic IIR spectral weighting (8th order Yule-Walker). See the screenshot below (taken from Christian's PPT).8thYuleWalker.jpg
  2. Take root-mean-square (RMS) amplitude of the filtered data.
  3. Perform Z-score transform.
  4. Find datapoints outside -3.5 and 7.0 as still noisy.

Notice that Z-score below -3.5 is determined to be 'still noisy'. If you don't want to discard any data chunk for being 'too quiet' (RMS variance too small) at all (which is understandable as long as I see the above example), please change the value [-3.5] into [-7.0], [-10.0], or even [-inf] (i.e., negative infinity) at clean_artifacts() line 184. You may also want to make the same change in line 183 for consistency.

 line 183
 Before: {'burst_crit_reftolerances','BurstCriterionRefTolerances'}, [-3.5 5.5], ...
 After : {'burst_crit_reftolerances','BurstCriterionRefTolerances'}, [-inf 5.5], ...
 line 184
 Before: {'window_crit_tolerances','WindowCriterionTolerances'}, [-3.5 7], ...
 After : {'window_crit_tolerances','WindowCriterionTolerances'}, [-inf 7], ...

This will disable ASR to reject chunks of data for being too 'quiet' (RMS variance too small).

3) The custom heuristic IIR spectral weighting (8th order Yule-Walker) has a quite a (inverse) drop near DC, which does not exactly follows 1/f patterns. This means that more aggressive, high cut-off frequency in designing high-pass filter for DC suppression is justified for this filter. Maybe the remaining low-frequency drift emphasized by this heuristic IIR filter went over the rejection criterion.

  • Q2. I do not want to reject this much data. What parameter should I change?
  • A2. If you use lax threshold for ASR SD, e.g., 8 -> 20, then ASR will clean less data, therefore there will be more chance for the final window rejection. Another factor to control the window rejection rate is the final 'window rejection'. 0.3 means 'If more than 30% of channels still show above-threshold RMS amplitudes, reject this window' Therefore, if you want to reject less data, you should set this value to be high. If you set it to 1, no window will be rejected so that your data length will be equal between before and after the cleaning.
  • Q3. Does ASR reject signal and noise altogether?
  • A3. See this page for the answer.

How to determine the parameter for ASR (updated 09/28/2018)

There is a dedicated study by my colleague Chi-Yuan Chang at SCCN. Let me cite Chi-Yuan's conclusion here. 'Empirical results show that the optimal ASR parameter is between 10 and 100, which is small enough to remove activities from artifacts and eye-related components and large enough to retain signals from brain-related components.' If you hear SD == 10 it sounds crazily far from the mean, but for the particular distribution of RMS channel EEG this is the value (hence 'Empirical results show...'). Also, in our unpublished study done by Nima, ASR with SD==20 showed the best cleaning performance in terms of his robust mutual information reduction measure compared with other parameters (ASR with SD==40) or other non-ASR methods. So I would recommend something between 10-20.

Download my final ASR version 1.10 (11/17/2021 added)

This is the last version I managed (ver. 1.10). zipped plugin folder (152KB). Until this version, this plugin was managed only by two developers, Christian and me. There is also a Wiki I wrote here.

Channel rejection using RANSAC in clean_rawdata() (03/21/2022 added)

This is for Cristina Gil Avila. There is a channel rejection stage using spatial correlations (IF electrode locations are provided) in the upstream of the abovementioned ASR. It uses random sample consensus or RANSAC. Unless random seed is fixed, the output results slightly vary due to the nature of the algorithm. Hyeonseok and I tested if increasing the number of 'nsample' helps to stabilize the outcome. We used empirical datasets that had 205 channel EEG recorded during three-ball juggling. The factorial design of the study is 4 (subjects) x 4 (nsample: 50, 100, 500, 1000). The same bad-channel selection process was repeated 8 times for each condition, during which random seed was NOT reset (the line was commented out, and reset only once before the 8-times loop started). If the result was 'retain', +1. If the result was 'reject', +0. Hence the final results showing 8 means 8 times +1 (retain), while the final result showing 0 means 8 times +0 (reject). If you see values bewteen 1-7, that indicates inconsistent results across 8 iterations. We made two observations:

  1. 'nsample'==1000 does not necessarily produce more robust results than 'nsample'==50.
  2. The results from 'nsample'==1000 could be different from those from 'nsample'==50.

Our conclusion is that controlling the behavior of this algorithm is trickier than just increasing the number of 'nsample'. More investigation is necessary.


Further optimizing the preprocessing order? (02/28/2020 updated)

You may find a lot of 'alpha flooding' (Makeig and Onton, 2011) all over the channels in clinical data, which ASR rejects a lot of windows. In this case, performing average referencing before ASR may help. The suggested preprocessing order should be:

  1. Change the option to use double precision
  2. Import data
  3. Downsample if necessary
  4. High-pass filter the data at 1-Hz (for ICA, ASR, and CleanLine)
  5. Import channel info
  6. Remove line noise using CleanLine
  7. Apply clean_rawdata() only to reject bad channels.
  8. Interpolate all the removed channels
  9. Re-reference the data to average
  10. Apply clean_rawdata() only to perform Artifact Subspace Reconstruction (ASR).
  11. Re-reference the data to average again--this is to reset the data to be zero-sum across channels again.
  12. Run AMICA using calculated data rank with 'pcakeep' option (or runica() using 'pca' option)
  13. Estimate single equivalent current dipoles
  14. Search for and estimate symmetrically constrained bilateral dipoles

To determine which preprocessing is better, the standard one or the modified one, requires an empirical study with good amount of data and computation.

For purists who do not want to use any form of interpolation (03/16/2018 updated)

If you cannot bear with interpolation, which I think is common among us psychologists and non-engineer EEG researchers, I recommend trimOutlier(). It simply rejects above-threshold channels and datapoints in amplitude. The purpose of this plugin is to perform simple and conventional data rejection conveniently. The cleaning performance may not be as good as ASR, but it never performs interpolation, which could be a relief for skeptics (I am). One can perform quick but thorough cleaning for ICA (ICA does not care how data are chopped up), and the obtained ICA weight matrix can be applied to the less-aggressively cleaned data. It is also a very useful tool to visualize all channels and datapoints in one window (somehow I don't know any other tool that does this).

Also, check out 'The PREP Pipeline: Standardized preprocessing for large-scale EEG analysis' by Bigdely-Shamlo, Mullen, Kothe, Su, and Robbins (2015) for the theory of upstream EEG preprocessing, as well as their tools freely available for Matlab.

Example of batch code to preprocess multiple subjects (01/12/2017 updated)

See this page.

Dependency across the preprocessing stages (07/05/2019 updated)

During the 28th EEGLAB workshop at UCSD, I heard from a group of people that they want to see the reasons of the order of the preprocessing stages.To answer this question means to show dependency of the stages in preprocessing. To respond to this unexpectedly popular request, I prepared the following document (those who joined my dinner table, sorry for taking so long time!) The numbers on top of the connecting curve lines indicate reference numbers for the explanations shown below. In the boxes, parenthesis means an optional process.


  1. [Remove non-filterable artifacts -> High-pass filter the data]: This preprocessing is necessary for the cases such as simultaneous fMRI-EEG or simultaneous TMS-EEG (see this page). The high-amplitude artifact of the external origin reaches to the order of millivolt. If you apply a standard frequency filter, it will leave a massive residue on the data (if you wonder how it happens, see this page). Therefore, you should NOT apply a standard filter to these (rather rare) types of data. If you are curious to know what to do with the artifact, see this page for the case of simultaneous EEG-TMS.
  2. [(Downsample the data) -> (Run Source Information Flow Toolbox)]: Downsampling to something like 100 Hz is generally recommended for connectivity analyses. The datapoint-to-parameter ratio is calculated by ((num_IC)^2*model_order)/window_length*num_trials. If you downsample the data, it will (probably) reduce the model order and (by definition) reduce the window_length. Because you want to have datapoint-to-parameter ratio < 0.1, meaning that 10 data points are used to estimate one parameter point, and smaller is better (i.e., more data points are available to estimate the model). Generally speaking, donwsampling to around 100 Hz will worsen the datapoint-to-parameter ratio. However, sometimes it is still necessary to do it. For example if your data set has a bad line noise for which you needed to apply a low-pass filter to push the high 50/60 Hz peak into the maximum suppression frequency range. However, because you don't want to have stopband within the Nyquist frequency range (which is half of your sampling rate in Hz), if you have a big 50/60Hz line noise which cannot be reduced by CleanLine (which does happen quite often, unfortunately), the only possibility is to downsample it so that the 50/60Hz is outside the Nyquist frequency. In doing so, do not forget to apply the additional two inputs, like EEG = pop_resample(EEG, 100, 0.8, 0.4); 0.8 means 100/2*0.8 = 40Hz is the -6dB point, and 0.4 means 100/2*0.4 = 20Hz as a transition band width. Thus, EEG = pop_resample(EEG, 100, 0.8, 0.4); means 'Downsample the data to 100Hz using antialiasing filter with 40Hz cutoff point (-6dB) and transition bandwidth of 20 Hz so that the pass-band edge is 40-20/2 = 30 Hz, and the stop band is 40+20/2 = 50Hz.' See more info here.
  3. [High-pass filter the data -> CleanLine to remove line noise]: If I remember correctly, if you have a large DC drift in your data, CleanLine does not work well. It's been while since I had tested it, so this needs to be revisited again to double check.
  4. [High-pass filter the data -> (Epoch the data to an event onset)]: Filtering after epoching (i.e., crop out the peri-event data points and maybe stack in the third dimension;,:,:)) is not a bad idea. The FIR filter, which could be considered as a sliding window process, cannot process near the edge of the epoch (half of the width of the FIR window). The same happens around the 'boundary' event in EEGLAB. When filtering process finds the 'boundary', it will stop when the right edge of the sliding window hit the 'boundary', and resume from the point where the left edge of the sliding window is touching the 'boundary', so those data points that has the width of the sliding window and centered at the 'boundary' data are skipped (which is why you see the unfiltered original noise around 'boundary'). Particularly, FIR high-pass filter requires a quite long filter order (>2-3 sec), so it does not work after epoching. By the way, filter order means the width of the sliding window in data points) which could be even longer than your epoch.
  5. [High-pass filter the data -> Run ICA]: There is a published study showing that high-pass filter up to 2Hz improves ICA's performance. See this paper by Winkler et al.(2015).
  6. [High-pass filter the data -> (Run Source Information Flow Toolbox)]: High-pass filter improves the MVAR estimation. This makes sense intuitively that if you have a low-frequency activity, you will need a long time window to explain it, hence it will require the longer model order. SIFT has a piecewise sliding-window linear regression, but it does not mean it does better job than the standard FIR filter; Barnett and Seth (2011) showed that multivariate Granger causality is in theory invariant under zero-phase (a.k.a. phase-invariant) filter. They do recommend filtering to achieve stationarity (e.g., drift, line noise) See Seth, Barrett, Bernett (2015). For more detail, see this page.
  7. [Import channel info -> Remove bad channels with clean_rawdata()]: When the channel rejection function is called from clean_rawdata(), depending on the channel location availability, it changes the algorithm. It is of course better to perform it with the presence of channel location information so that it can use geometrical information of the scalp electrodes.
  8. [Import channel info -> Interpolate removed channels]: For channel interpolation, using the default channel location is necessary because the algorithm uses spherical spline.
  9. [Import channel info -> Coregister sensor locations to the head model]: The imported channel location is coregistered with the template head model, such as MNI (i.e., Montreal Neurological Institute) standard head model or 4-shell spherical head model. If the imported channel information contains an error, naturally the subsequent coregistration could be affected.
  10. [Remove bad channels with clean_rawdata() -> Interpolate removed channels]: Depending on which scalp electrodes are excluded, the subsequent channel interpolation is performed.
  11. [Remove bad channels with clean_rawdata() -> Run ICA]: Removing n number of scalp electrodes and the maximum rank (i.e., the number of independent time series in linear algebra) of your sensor-level data is reduced by n. Note that by interpolating the scalp electrode recovers the number of (apparent) channels, but the data rank will never be full-rank again (because, again, the rank is the number of independent time series. Anything that are interpolated out of existing data is NOT independent of the those that are mixed.) ICA, either Infomax or AMICA, would know the true rank of data by running its rank checker before starting ICA, so usually you don't need to do anything. Let me use an example. There is 64-ch data, from which you rejected 4 channels. Now you have 60 channels. You interpolated the rejected channels so that now 64 channels are back. However, your interpolated 64 channel data has rank of 60, and ICA will return 60 ICs, not 64.
  12. [Interpolate removed channels -> Re-reference to the average]: If you perform average reference, it literally computes the average across all the channels. Imagine you have reject much more channels on the right hemisphere than those on the left hemisphere. Do you think the calculated 'average' is unbiased? Definitely not. This is why channel interpolation is recommended before average referencing.
  13. [Interpolate removed channels -> Run ICA]: Why do I recommend channel interpolation before ICA? For example, after building STUDY structure, you may want to back-project the clustered ICs to a specific scalp electrode(s), say Cz, with different choices of IC clusters (see here). However, back-projecting IC(s) to the rejected electrode(s) is probably impossible, because ICA never had a chance to learn how the data observed at the rejected channel(s) correlate with other channels. This is why it is recommended to perform spatial channel interpolation BEFORE ICA, so that ICA generates sphering matrix that takes the recovered channels into account. ICA itself does not need to care about the number of the scalp channels, it only cares the data rank. It is PCA's job that performs dimension reduction to make the input to ICA full-ranked. With or without the 'recovered' (i.e., interpolated) channels, as long as they are just linearly interpolated using other channels (on this point, spline interpolation is not necessarily perfectly linear, so weird things can happen if data rank is incorrectly estimated! It will create a ghost, as discussed on this page), the size of the ICA's weight matrix is unchanged. Just sphering matrix changes it size. Hence this interpolation is not a requirement, but if you want to perform group-level back-projection, then you NEED to do it. It does not hurt at all to do it anyways, so my recommendation is to make it a part of standard routine all the time.
  14. [Re-reference to the average -> Run ICA]: ICA result is basically independent of the choice of reference, except this minor thing: If you don't perform average reference before ICA, you might obtain IC scalp topography (which are projections from ICs to the scalp) that are all read or all blue. This is because of DC across the scalp channels on the topography (i.e., the channels of the topo are not zero-sum; but they are not exactly zero-sum if you follow this method; this is the right way to apply average reference).
  15. [CleanLine to remove line noise -> Run ICA]: If the line noise fortunately very stationary (i.e., you see stationary envelope after band-pass filtering at the 50/60Hz) and CleanLine works well, then ICA will not need to decompose line noise, which should help ICA's performance. Theoretically, extended infomax and AMICA can model sub-Gaussian distribution, and the probabilistic density function of the line noise is the typical example of sub-Gaussian distribution. However, how stationary the line noise in the data is another story, it may be sensitive to subject's body movement etc. Generally speaking, ICA is not that good at modeling the line noise, so it is better to let CleanLine or other filters to take care of the line noise.
  16. [CleanLine to remove line noise -> (Run Source Information Flow Toolbox)]: If and only if your power spectral density has a smooth tail up to the Nyquist frequency that is above the line frequency, and if CleanLine can remove the line noise efficiently because it fortunately shows stationary temporal characteristics, performing CleanLine to remove the line noise helps to make the power spectral density of the data (either sensor-level or ICA-decomposed) further smooth, which helps in modeling the data using MVAR i.e., the model order could be shorter.
  17. [(Epoch the data to an event onset) -> Run ICA]: One critical assumption about ICA is stationarity. This means that any part of the data you sample, they show the same statistical property in distribution i.e., probabilistic density function. The best way to ensure data stationarity is to perform event-related recording paradigm so that your data are filled with repeated brain responses as many as possible. If your recording contains a lot of non-event-related part of data, such as between-block resting, you may consider to epoch the data so that ICA is fed only with heavily repeated brain responses (after all, ICA goes very well with ERP paradigm!) However, if your paradigm is typical mismatch negativity where there is no 'outside epoch' data points, ICA does not benefit from epoching at all. By the way, being epoched or not does not matter to ICA; epoched data are reshaped so that all the epochs are concatenated into a 2-D array of channels x timepoints.
  18. [(Epoch the data to an event onset) -> (Run Source Information Flow Toolbox)]: Again, in modeling MVAR, the datapoint-to-parameter ratio is calculated by ((num_IC)^2*model_order)/window_length*num_trials. If the data is continuous and there is no trial, the denominator 'num_trials' is 1, so the only way to gain the datapoint-to-parameter ratio is to use a very long sliding window like 30 seconds. The more trials, smaller the datapoint-to-parameter ratio becomes. If there are conditions you can combine, you may want to consider doing so for the connectivity analysis.
  19. [Run ICA -> Coregister the sensor locations]: The IC scalp maps you see is rendered on your definition of sensor locations. There is a pre-calculated electric forward model of the head separately, which does not know your definition of your coordinate system in sensor locations. To make the IC scalp topography physiologically meaningful, you need to coregister your channel space to that of the pre-calculated head model.
  20. [Run ICA -> (Run Source Information Flow Toolbox)]: There is a known issue in calculating connectivity directly at the sensor level. For detail, see Brunner et al. (2016). Thus, some process that can address volume conduction is necessary. ICA's assumption is that it returns the stationary and temporally independent sources, which addresses the issue of volume conduction.
  21. [Coregister the sensor locations -> Fit single and symmetrically-bilateral dipoles]: If you coregister the sensor locations to the electric head model and you don't perform equivalent current dipole model fitting, it's pointless! The dipole fitting process is like converting the 2-D information of IC scalp topography, which can be interpreted as scalp projections from the stationary and temporally independent sources, into 3-D space with dipole model. The important fact is that the original 2-D information, upon which 3-D dipole location is estimated, is directly derived from ICA. So it's not too much to say that the estimated dipole locations are indicated by ICA itself. To fit symmetrical bilateral dipoles, you need a plugin developed by Caterina Piazza.
  22. [Fit single and symmetrically-bilateral dipoles -> (Run Source Information Flow Toolbox)]: SIFT has an option to show either scalp topographies or dipole locations in showing the time-frequency grid of the connectivity. If you don't calculate the dipole fitting, you can't show the dipole locations to indicate the graph nodes.

SIFT tips

Learning SIFT using its own simulator (08/30/2022 added)

See this tutorial. It uses Schelter et al. (2009) Eq. 3.1 and compares their results with the SIFT-calculated ones using RPDC and dDTF08.

Caution for evaluating the datapoint-to-parameter ratio (08/06/2019 updated)

I found that the equation used in calculating datapoint-to-parameter ratio is incorrect. According to Schlögl and Supp (2006) as well as Korzeniewska et al. (2008), the equation must be (num_IC*model_order)/window_length*num_trials. This change affects how you determine parameters, in a 'better' way I would say as shown below: 1) more ICs can be included; 2) less number of trials can be tolerated; 3) shorter sliding window can be used. This change will particularly impacts continuous data analysis, as the current equation would probably allow sliding window length of a few minutes! In fact, this datapoint-to-parameter ratio has been a limiting factor for me to apply SIFT with confidence.

To obtain the corrected datapoint-to-parameter ratio based on the above-suggested reason, make the following change on est_checkMVARParams() line 85

%winrat = ((M^2)*p)/(wlen*T);
winrat = ((M)*p)/(wlen*T);

That being said, when I discussed this issue with the author, he also told me that if we make a correction, the estimate would be overly lax and would even become useless. I kind of see the point from my experience. Somehow, most of SIFT validation measures are always either too lax (the stability test) or too stringent (the whiteness tests), which are generally hard to follow. In conclusion, I would recommend the above fix to enjoy more degrees of freedom in the analysis design, while trying to stay as conservative (i.e., lower the number, more conservative!) as possible.

Downsample data to <100 Hz to keep the PSD simple (12/30/2023 added)

It helps to suppress the model order low. See this page for more detail.

How the STUDY determines polarities of ERPs and corresponding IC scalp topos? (03/22/2022 added)

Isn't it one of the least documented processes? It took me half an hour to discover it. It is partly because it is particularly inconvenient to use a debug mode for STUDY because when STUDY's GUI is open, Matlab process is stack there. I don't want to repeat this process again, and in case my long-term memory about it expires in the future, let me write it down a brief report here. std_comppol() is the dedicated function that takes care of polarity issue, which is called by std_readtopoclust() called by std_readdata() called by std_erpplot() called by pop_clustedit() called by inputgui() called by pop_clustedit(). Welcome to the world of EEGLAB!

Now, let's take a look at this std_comppol() at EEGLAB's Github page. From this code, it it clear that it performs:

  1. Calculate the cluster-mean IC topography.
  2. Calculate the correlation between the cluster-mean IC topography obtained above and each IC topography included in the cluster one by one. When the correlation is <0, flip the sign of the current IC topography from 1 to -1.
  3. Repeat the above processes for 3 times to make sure that all the scalp polarities are positively correlated with the updated cluster mean.

There is a known issue of polarity uncertainty in IC ERPs/topos: A negative 'source' ERP projected through a negative peak topography generates a positive peak at the scalp sensor, which is indistinguishable from a positive 'source' ERP projected through a positive peak topography. The above solution has been used in EEGLAB since the appearance of STUDY in the early to the middle 2000's.

Correlation maximization vs. covariance maximization (03/30/2023)

When writing the above article, I remembered that some time ago I asked my colleague Masaki to come up with a better, deterministic and reasonable solution for the issue of IC polarity alignment. He said he needed time to think about it. The very next day, he showed up in my office and told me an idea to align the IC polarities in the context of the general eigenvalue problem. Just recently, he and I published this idea (see this article). In this paper, we call EEGLAB's default method as the correlation maximization method, while ours as the covariance maximization method. If you apply the proposed method to ERPs (EEGLAB's default aligns polarity of scalp topos), the algorithm determines the IC polarities for each IC cluster so that ERP cancellations within the cluster is minimized within the given window. Note it is not a solution for the fundamental problem of the polarity indeterminacy: there is no solution for that. For example, when you 'optimize' the IC polarities for the source ICs for N170 using our method, you have to make sure that the resultant IC ERP peak at 170 ms is facing down to make sure it resembles N170, not P170 (do not forget to multiple -1 not only to ERP but also to scalp topos so that -1 x -1 = 1) We have to use our prior knowledge there to make the results resemble data polarity you are familiar with. Note also that if you use a very wide window, only low-frequency temporal components will be dominant because of the 1/f principle. In other words, only polarities of the slow bumps are taken care while polarities of high-frequency smaller bumps will be neglected--that's what 1/f power distribution generally suggests.

How to optimize the performance of ICLabel (11/29/2022 added)

I always use ICLabel after applying ICA. ICLabel is a solution to label ICs with one of seven class labels, namely Brain, Muscle, Eye, Heart, Line Noise, Channel Noise, and Other. These labels can be found under EEG.etc.ic_classification.ICLabel.classes. ICLabel does NOT apply hard-coded thresholds to determine these labels. Instead, ICLabel 'extrapolates' classification results by contributors using a machine learning solution. In doing so, ICLabel evaluates the input data on three criteria: Scalp topos, power spectral densities, and autocorrelation models. These three features corresponds to the spatial domain, frequency domain, and time domain, respectively. If you want to use ICLabel smartly, you want to help it by applying some preprocessing. For example, see the following comparison--the data were sampled at 1024Hz and contain high level of line noise at 60Hz. If you enter this data straightly to ICLabel, you get 'Line Noise' class label. However, if you remove this line noise peak before entering into ICLabel, in this case by applying a simple low-pass filter at 55Hz (FIR Hamming, transition bandwidth 5Hz), you get 'Brain' class label. Meanwhile, downsampling the data into 256Hz made only trivial difference: 'Brain' probability 96.1 vs. 96.0 with and without downsampling, respectively. By using imaginations on how the human contributors evaluated the randomly selected datasets with various numbers of channels and sampling rates, you can control the behavior of ICLabel to the desired directions.


Special contents since 100,000 page views

The contents in this section may or may not be directly related to methodology of preprocessing EEG data, hence 'special contents'. I uploaded my past presentations that summarizes my understanding and view at that time, which contains unpublished observations and opinions.

Electroencephalosophy (For 100,000 page views, 09/06/2019 created, 10/05/2023 updated)

Electroencephalosophy is a portmanteau word I invented which is a mixture of electroencephalography and philosophy. Kant wrote:

'…and it was the duty of philosophy to destroy the illusions which had their origin in misconceptions, whatever darling hopes and valued expectations may be ruined by its explanations' (Kant, Critique of pure reason)

As Kant tried to set the limit to the use of pure reason, I wanted to determine the limit to the use and interpretation of EEG. Many of our wrong EEG practices derive from our misconceptions. Ignorance is not just a void of validated knowledge (and you have to be a Socrates to discern your ignorance) but often it takes a form of substituted explanations based on unvalidated guesses about scientific facts. I found that learning EEG basics with 'Electric Field of the Brain' is more like unlearning old illusions than acquiring a new skill set. This is why I identify the process of learning EEG basics as a process of philosophy (somewhat). See my talk at Santa Fe Institute on 07/11/2019 to see what my electroencephalosophy is like. Notice also the distinction I propose between the easy problem and the hard problem of the EEG science. The presented PowerPoint file can be downloaded from here.

Electroencephalosophy and critical neurophysiology Page 06.jpg

In another talk, I also coined a word adolesc-i-ence which means adolescence of science. My point is that EEG science is still young and immature. In the same presentation, I also proposed concepts such as Makoto's pessimism and one-bit information generator. These concepts were presented at a keynote lecture for NAT17 (Neuroadaptive Technology 2017) (program (6.3MB)). The presented PowerPoint file can be downloaded from here (8.2MB). By the way, Pedro Lopes from U Chicago posted a photo to Twitter in which I'm citing a scene from Ghost in the Shell. Maybe this is the first documentation of an EEG researcher citing Masamune Shiro in a scientific meeting?

NAT17 Keynote Page 33.jpg

Speaking of philosophy, this is my recent favorite.

Philosophy of science is as useful to scientists as ornithology is to birds. (Feynman)

Let me also quote this one.

6.53 The correct method in philosophy would really be the following: to say nothing except what can be said, i.e. propositions of natural science—i.e. something that has nothing to do with philosophy—and then, whenever someone else wanted to say something metaphysical, to demonstrate to him that he had failed to give a meaning to certain signs in his propositions. Although it would not be satisfying to the other person—he would not have the feeling that we were teaching him philosophy—this method would be the only strictly correct one. (Wittgenstein, Tractatus)

To sum up, the wrong use of the pure reason, and/or the wrong use of words--these are the sources of misconceptions and illusions. There are so many of them in the field of EEG research--that's why there are works to do!

Wavelet analysis tips (For 110,000 page views, 11/27/2019)

Did you know that 'Wavelet analysis' implemented in EEGLAB is NOT a pure wavelet analysis? Actually, it is a hybrid between pure wavelet (i.e., changes window size while the number of cycles fixed) and short-term Fourier Transform (i.e., window size fixed, while the number of cycles changes). If you are interested in knowing more about this topic, I recommend you search for the keywords Units in output of timefreq - wavelet normalization in the following archive page of our EEGLAB mailing list. Very nice clarification was done there, and possible solutions were discussed. You will also notice gorgeous members there. The best moment of EEGLAB mailing list.

Physiologically invalid deep dipoles? (For 130,000 page views, 07/26/2021 Update)

In dipole fitting results, sometimes you find deep dipoles outside the gray matter in basal ganglia, ventricle, and cerebellum. However, scalp-measureable EEG should be generated by a massive parallel arrays of pyramidal neurons in the gray matter (Nunez and Srinivasan 2006 'Electric Field of the Brain' Chapter 1 Section 2). How do you reconcile with these apparently wrong results? There is no study I've seen that explains this known bias (if you know such studies, let me know). So here is my explanation after studying the book 'Electric Field of the Brain'. Most of scalp-measurable EEG activity is generated by relatively large cortical patch (10-20 cm^2) which is more appropriate to be described as a dipole sheet rather than a single dipole (a theoretical single dipole has no area, only pole distance exists). An important difference between a single dipole and a dipole sheet is distance-dependent attenuation ratio: the attenuation ratio of a single dipole follows inverse square of distance, but a dipole sheet does not! See the figure below.


Notice that the color bars are both -1 to 1, which means both plots are normalized. Electric field of a sheet dipole that consists of 2580 dipoles are NOT 2580 times stronger single dipole, but its distance-dependent attenuation ratio is critically altered (for more detail, see my lab presentation slides in 2019 (2.6MB) and my 2020 lab presentation slides (1.4MB)). To put it simple, potential produced by a dipole sheet can travel much further than that produced by a single dipole. From this fact, we can easily predict what would happen if we approximate a dipole sheet with a single dipole--the estimation of distance must generate serious error. I think this is the reason why we sometimes find deep dipoles. This is a systematic error produced by using a single dipole model to approximate a dipole sheet source, and the error should become larger when the dipole sheet becomes larger (i.e., violating the assumption of single-dipole model that the source model approximated should have zero area).

Transducer array effect (For 140,000 page views, 10/09/2020 added)

I suggest we see a cortical dipole sheet as a transducer array that consists of macrocolumns. Transducer array is a technology to control beam density and direction. Compared with a single transducer (i.e. a single dipole), a transducer array (i.e. a dipole sheet) shows various interesting properties which has been exploited in various engineering field from radar technologies to home audio. As I have been repeatedly saying, even without phased array technology, a simple transducer array can still project energy further while suppressing sidelobes. Moreover, it may shed light to describing emerging (old-new) field of EEG phenomenon, traveling waves. Traveling waves have been mainly reported in ECoG (see Youtube video of the 'Princess Leia brainwaves'--by the way, such spatially non-stationary sources cannot be decomposed by ICA). Such a traveling wave may be described as a phased-array behavior of a dipole sheet. For example, if an ECoG data shows a pattern of traveling waves captured by a dense electrode array, we can ask a question what technology is necessary for a controller of phased-array transducers to replicate it, assuming that the brain should have the same (or alternative) mechanism. In this way, I believe understanding the relation to this transducer array is critical when we consider the behavior of a dipole sheet as a major source of EEG. However, this effect does not seem to have a dedicated name, which is inconvenient for us to discuss this issue. So let's call it transducer array effect tentatively. For example, the term can be used in this way ('A meaning of the word is the use of the word'): 'A broader dipole sheet can project current further because it receives less distant-dependent attenuation due to the transducer array effect.

(09/21/2020 added) In the 'Electric Field of the Brain' (Nunez and Srinivasan, 2006) p.313, there is the following description that supports this view:

It is also possible that best-fit dipoles have no relation at all to genuine brain sources. They might, for example, occur in deep tissue when the actual sources occupy cortical surface areas ranging from a few to hundreds of squared centimeters as discussed in chapters 2 and 6. Clearly, any interpretation of equivalent dipoles depends critically on the spatial extent of these sources.

After all, an equivalent current dipole is a (arealess) model, which is fit to the (stationary) ICA model. Keeping the distinction between these (computational) models and the real electrophysiology seems critical to save us from getting into the confusion. In this sense, I take revisionist's approach to the past SCCN's claims rather than taking Onton and Makeieg (2006) or Delorme et al. (2012) literally to identify ICA sources with the physiological sources. There are a few pieces of critical counterevidence to these claims that are not explained yet, and this deep dipole seems the outstanding case to me.

Ground truth of EMG removal vs. data cleaning with ASR+ICA (For 150,000 page views, 12/20/2020 added)

One of the reviewers wrote me 'EEG gamma-band range is known to be contaminated by EMG. Probably many EEG researchers will not be convinced even if you claim that ASR and ICA can deal with it.' This is a good question indeed. If we make a claim, it must be supported by data. The best evidence I have ever seen about EMG's contribution to scalp-recorded EEG is Whitham et al. (2007). They used anesthesia to remove EMG! Let me cite their Figure 1 below for detailed discussion.


The top and bottom panels show PSD near CPz (linked-ear reference) different subjects. A is without paralysis, B is with paralysis. There is apparent difference above 30Hz. Isn't it shocking? The difference is about 5 (B) to 10 (A) dB at around 45 Hz (right before the 50 Hz line noise comes in). We can use this range as a ground truth of EMG contribution to CPz. Now, let's compare the effect of ASR + ICA with the paralysis. In our data (n = 141), there are no earlobe electrodes, so I re-reference all 38 electrodes to T7/T8. There is no CPz either, so I averaged CP1 and CP2.


Note that in order to align the two curves, I normalized the alpha peak by adding +1.3 dB to the PSD of the cleaned-data (i.e. this could be counted as loss of signal). At 45 Hz, the difference with and without the cleaning process is > 5 dB. Well, this is not too bad, pretty comparable to the extreme method like paralysis!

Why does IC rejection increase gamma power, or why is an IC not broadband-independent? (For 160,000 page views, 05/10/2021 added, 06/27/2022 updated)

SHOCKED? A part of evidence was published as Figure 8 in Miyakoshi et al. (2021) Epilepsy Research, 178:106809. In the replicated figure below, please see the rightmost column for the Gamma band. Yellow regions represent scalp power increase after IC rejection. This interesting frequency dependency derives from the fact that temporal independence in the ICA results has frequency dependency (lower frequencies are better ICAed, higher frequencies are poorly ICAed), which further derives from the fact that EEG signal has 1/f power distribution by which ICA results are biased.


For detail, check out my recent lab presentation (6.3MB). I think this observation calls into question SCCN's conventional dogma, namely the physiological interpretation of ICA (i.e. 'ICs are effective EEG sources.')

In fact, Scott once asked the same question in Makeig et al. (2004) Trends in Cognitive Sciences: In the 'Box 2. Questions for future research', the second point goes 'Can ICA model near-DC and high-frequency gamma band dynamics?' Here my results indicate the answer is no, but anyway at this point he was aware of the potential issue of frequency dependency. Nonetheless, when he jump to the conclusion that ICA results are effective EEG sources, he made the implicit assumption that this claim is valid across frequencies. But, if you think about it, I think it becomes clear without too much difficulty that spatially broad networks in delta and theta ranges and spatially focal networks in beta and gamma ranges cannot coincide in the same spatial filter.

A simple solution to this problem seems to apply the band-pass filter first then apply the spatial filter (ICA). The cost of taking this approach is that the ICs will no longer be consistent across frequencies. This idea needs to be validated though.

IcaRejectionLabPresentation updated Page 32.jpg

A big scalp topography? (For 170,000 page views, 12/18/2023 updated)

If you have visited the SCCN website several years ago, you may have seen this movie on the top page. There is a variation of the same move using different source locations with random phases here. Recently, I found that these movies are misleading and should not be taken granted without caution. It is because I believe the color scale must be WAY off from typical EEG studies. An EEG recorder has a noise floor around 1-2 V, which determines the smallest signal it can record meaningfully. According to Huigen (2000) 'Noise in biopotential recording using surface electrodes', Fig 2.5 shows the upper acceptable noise level in EEG recording (1 per cent) is 1 V. Therefore, even if a signal is present at the surface of the scalp, if the amplitude is smaller than 1-2 V, we cannot record it. Now the question is how large the signal is in the scalp topographies in question. Here is a quiz for you: What is the most critical factor among various EEG source properties that determines the scalp-recorded signal amplitude? Yes, the area of the cortical source, given the neurons included are more or less in sync (Later I'll come back to this question about synchronization.) If this answer does not sound familiar to you, I recommend you see my presentation slides above. The distance-dependent attenuation rate dramatically decreases as the cortical source area size increases due to a simple transducer array effect (it needs a good name to be remembered by the way!) In order for an unaveraged single-trial EEG to be scalp-measured within a reasonable amplitude range, say 10-30 microV, the source of the signal should have typically 6.5 cm^2 or larger area (Nunez and Srinivasan, 2006). However, in these movies, the cortical sources appear smaller than that. Maybe these are the source models of averaged ERPs (but it does not seem to apply to the second movie). But more critically, the generated electric field spreads across the entire scalp surface. Such a broad distribution necessarily requires a broad cortical source regions, but that is not affordable since the movies show averaged-ERP-sized small source regions. Therefore, the only reasonable explanation is that the color scale used has a unit of nanoV/picoV/femtoV (now EEG meets MEG!) instead of V. Alternatively, the amplitude of cortical source activity may be amplified by 10^3-10^9 times so that we can use V unit, but this is physiologically impossible. By the way, in the Electric Field of the Brain (Nunez and Srinivasan, 2006) p.83, the authors introduce an example of wrong simulation that shows a small central dipole with 5V amplitude, which is 10^4 times larger than normal values. To avoid these mistakes, it is important to see the relation between the source area size and distance-dependent attenuation ratio.

So, what does it mean when we find a big radial scalp distribution in ICA results (which we always do)? I believe it means ICA modeled a broadly distributed cortical source that oscillates temporally independently. In other words, I support the large patch hypothesis.

By the way, in the 'small patch hypothesis' (see my lab presentation slides (1.4MB) again) which I criticize, one of the justifications was that if the 'active patch' area size is large, the group of neurons included in it cannot maintain the synchronous states. However, Hari et al. (1997) reported that 'synchronous activity of 1% of cells in a cortical area of 1 cm^2 would determine 96.5% of the total signal.' (Note this is an MEG study, not EEG--apparently, MEG can target smaller area in the cortex). By the way, the ratio of the thalamic relay cells is 0.63% in macaque BA17 (O'Kusky and Colonnier, 1982 "There are thus about 160 neurons in the striate cortex for each geniculocortical relay cell: of these, 55 are in layers IVA and IVC. These figures do not represent the true degree of divergence of geniculocortical axons. Each individual axon is not restricted to a vertical column of 160 neurons: rather, each spreads out tangentially and overlaps with its neighbors. For each mm^2 of tangential spread there are potentially thousands of neurons on which it might form synaptic contacts in layer IVC alone."). In the experiment reported by Cooper et al. (1965), punching 2 pin holes recorded attenuation ratio of 20000:1 but increasing the number of pin holes to 12 decreased the attenuation ratio to 2:1 over 6.45 cm^2 area. This transducer array effect seems to appear counter-intuitively effective for most of us non-transducer-array engineers, so we EEG researchers should remember it well.

What does AMICA do? (For 180,000 page views, 11/05/2021 added)


This is the world first visualization of iteration-by-iteration changes of AMICA scalp topography which now you can watch on Youtube. I searched this video from my old email log (dated on August 23, 2013) to show to Timo Berg to answer his question. Note that after 750th iteration there is no big jump in the log likelihood trace (bottom right), and after 1250th iteration there is no scalp topo order change. I forgot what 'Mean corr. coef.' was for. Note also that the scalp maps at step0010 (i.e. 10th iteration) is all very you know why? It is because immediately after starting the process the scalp topos reflects the results of sphering. Below, I cite the famous Delorme et al. 2012 Figure 4. When we plot the sphering result, it appears as a huge outlier on the top-right corner that is covered by the inset plot.


Does a broad dipole layer produce a depth bias when fitted with a single dipole? (For 190,000 page views, 02/22/2022 added, 07/18/2022 updated)

The short answer is YES. The long answer is as follows:

Sfn2019 crop.jpg

The figure above is taken from my poster presentation at Neuroscience 2019 i.e. the annual meeting of SfN. The full poster is available from here: Miyakoshi et al. Poster at Neuroscience 2019 (8.4MB). Note that my approach in this poster to estimate the size of a cortical dipole layer is not validated, so let's keep it aside for now and focus only on the summary of empirical data. Using 1222 datasets with 38 electrodes (this magnificent database is courtesy of Greg Light and, UCSD Psychiatry), I showed that 22978/44247 ICs (52%) were classified as 'Brain' according to ICLabel (Pion-Tonachini et al., 2019). By the way, My 2022 paper with Hiroki reported that using 19-channel EEG data, brain IC rates were 53% for awake and 74% for sleep. Onton and Makeig (2006) reported:

In our experience, applying ICA decomposition to 31-channel data typically yields 5–15 nonartifact EEG components comparable with those obtained from high-density recordings.

Note the last part--they say the absolute number of brain ICs are probably comparable regardless of the number of electrodes. This lead me to the idea of 'effective degrees of the freedom in scalp EEG data' discussed in my presentation slides. Anyway, out of all the 'Brain' ICs, 10679/22978 (47%) ICs had fitted dipoles localized in the cortical regions, while 4066/22978 (18%) were localized in subcortical regions. The ratio between the cortical and the subcortical ICs was 2.6:1. In other words, about 27% of 'Brain' ICs were localized in the subcortical regions. If SCCN makes a claim that dipolarity of the ICA-derived scalp topographies is evidence of physiological validity, what is the explanation for these physiologically invalid deep dipoles?

I made a hypothesis: ICA should return a scalp projection from a broad dipole layer (or sheet). I call it a large patch hypothesis, in contrast with Scott's small patch hypothesis which can be found in Delorme et al. (2012) and Akalin Acar et al. (2016). My guess is simple: if a single dipole is used to approximate a broad dipole layer, the depth bias should arise as an error.

To test this hypothesis, I designed a simple closed-loop simulation test combining a forward modeling and inverse solution (thanks to Robert Oostenveld's help--I read his email and attached code 20 times, and also Noriaki Kanayama's encouragement for my investigation toward this direction) I used a 4-shell spherical model: Tissue radii, [0.080 0.081 0.086 0.092] (m) for the brain, CSF, Skull, and Scalp, respectively (the same applied below); Tissue conductivity: [0.33 1.650 0.0132 0.033] (S/m) for the brain-to-skull conductivity ratio (BSCR)=25, and [0.33 1.650 0.004125 0.033] for BSCR=80. BSCR=25 is an updated value used more widely today, while BSCR=80 is the EEGLAB default and used a while ago. The factorial design of the study was 2 (BSCR 25 vs. 80) x 10 (dipole layer radii, ranges from 1 to 50 mm) x 5 (dipole layer depth from the surface of the gray matter, ranges from 2 to 52 mm) The results are shown below.

BSCR25 (8.0MB)

BSCR80 (7.6MB)

If you want to replicate the result, here is the EEGLAB plugin simUDL which was developed by Noriaki's request. A screenshot is shown below. Let me provide it as a limited distribution only from here for now. Download EEGLAB plugin simUDL


Here are my impressions.

  1. The depth bias arose as predicted.
  2. However, the effect of the depth bias does not seem strong enough: it is largest 20-30mm. This seems insufficient to push down more than 1/4 of dipoles below subcortical regions. Comparing with these results, ICA-generated scalp topographies appear to be projected from genuinely deep dipoles rather than a shallow but broad dipole sheet.

When I compare the upper-bound size of the small-patch hypothesis (1 cm^2) with the lower-bound size of the large-batch hypothesis (6.5 cm^2), there are about 6-7 times of differences in the maximum potential at the scalp.


To conclude, although the result favors the large-patch hypothesis over the small-patch hypothesis, it does not provide a definitive explanation. The other possibility I may try is to use a so-called 'realistic' head model (Paul Nunez was against this name because it is misleading given that conductivity parameters are still uncertain; If I remember correctly, he suggested instead 'more realistic head model'). Torbjorn Ness kindly gave me advice and code to try the New York head model (by the way, I found Stefan Haufe was partially involved in this project. What a guy.) I may try it out to see if it favors deep dipoles more than the spherical model does. However, honestly I feel it is rather unlikely. The ICA+dipfit-yielded dipoles are ridiculously deep, we need 40-50mm depth bias to replicate it!

Now I wonder if the depth bias is related something other than the forward/inverse solutions, such as temporal averaging. ICA is by no means a frame-by-frame decomposition (spline Laplacian is). It's quite an opposite. ICA results are more like temporal average. As thousands of trial averaging allows to detect ABR signals from the brainstem, running ICA on hour-long data may enables to detect temporally averaged sources, which is merely a statistical concept. Who knows how such temporal averaging results in deeper source localization, but according to my methodical doubt this should be the next target.

dB in power ratio or amplitude ratio? (For 200,000 page views, 04/11/2022 added, 12/19/2023 updated)

Well, the first thing first--when you see dB in EEGLAB, it actually means , which means the denominator is 1 . So this is not very much a ratio, but it is just a log-converted values originally in . Now, when you see Wikipedia article for decibell, you realize there are two ways to use this decibel thing. If we use Amplitude ratio, x2 +6dB and x10 +20dB. However, if we use Power ratio, x2 +3dB and x10 +10dB.

In the Power ratio, the first coefficient 10 means you are using deci-Bell (deci- means x10), not Bell. In the Amplitude ratio, you see 20 instead of 10 because when you convert amplitude to power you square the value i.e. from to , and this squaring operation becomes a multiplication after log transform, hence

Which one does EEGLAB use? Is it the Power ratio or Amplitude ratio? Is there a difference when it is used for PSD and ERSP? I get often confused myself, so I decided to validate it once and for all. For the Matlab code to replicate the simulation that generates the figure below, see this page. Here is the conclusion.

  • For power spectral density (PSD) calculated by spectopo(), the unit is dB in the Amplitude ratio.
  • For event-related spectral perturbation (ERSP) calcualted by newtimef(), the unit is dB in Amplitude ratio.

When you see the plots blow, in both PSD and ERSP, x2 in amplitude is associated with +6dB change in both PSD and ERSP. Note that EEGLAB-calculated PSD is smaller than Matlab-calculated PSD by a constant amount. This is because EEGLAB uses Welch's method with Hamming window (optionally, Blackman can be selected; Rectangular window cannot be selected somehow). So this difference does not mean a calculation error but it comes from the difference in algorithms.

PsdValidation.jpeg ErspValidation.jpeg

By the way, here is a potentially interesting story for you about the (mis)use of dB in a following published document: Delorme, Sejnowski, and Makeig (2007) NeuroImage 34:1443-1449. The mystery is '-50dB 1/30 0.000001'. I have been puzzled since it was published, but today I finally gave it a detective work and I believe I can explain how it happened. To begin with, the paper has a following statement in p.1443 (abstract):

...ranging in size from thirty times smaller (-50dB) to the size of EEG data themselves (0dB)

Here, -50dB 1/30 is stated. This is already the biggest mystery because -50dB in the Power ratio is 0.000001 while in the Amplitude ratio it is 0.003162. But 1/30 is 0.0333... Thus, this statement does not make sense either in the Power ratio or in the Amplitude ratio. How can we interpret this? My guess is that the authors made a mistake when they read the order of magnitude by one such that 0.0032 (-50dB in the Amplitude ratio) as 0.033. In other words, the authors maybe wanted to say 1/300 instead of 1/30. If my guess is correct here, it indicates the authors used Amplitude ratio in the abstract.

On the other hand, in p.1445 (Figure 1 legend) there is another statement that contradicts the above statement.

Artifacts shown here were the largest simulated (0dB); the smallest were more than 100,000 times smaller (-50dB)

Here, -50dB == 0.000001 is stated. This statement is correct if the authors used Power ratio. When a signal POWER is 100,000 times smaller, its amplitude is 316.2 times smaller because sqrt(100000)=316.2. The inverse of 316.2 is close to 1/300, which is the number I guessed above using the assumption that the authors used Amplitude ratio in the abstract.

Thus, my conclusion is

  1. The authors tried to use the Amplitude ratio in the abstract, and they meant -50dB 0.003162.
  2. But they made a mistake in counting the number of zeros, and they concluded that -50dB (0.003162) is close to 1/30, although it is actually close to 1/300.
  3. In contrast, in the Figure 1 legend, they used the Power ratio and they correctly mentioned -50dB == 0.000001. However, using the Power ratio in this paper seems inappropriate--see my discussion below.

The design philosophy of EEGLAB is to use 10*log10 conversion throughout when handling EEG power which is the conversion to the Amplitude ratio. This is why you almost never see the microV^2 unit when using EEGLAB (the only exception I may have seen is when I calculated ERSP with no baseline). Thus, for most of the EEGLAB users, dB means the Amplitude ratio (i.e., x2 is +6dB and x10 is +20dB). Given this self-initiated practice, why did the authors suddenly change from the Amplitude ratio to the Power ratio in this technical paper? Apparently, the authors confused themselves, and I have been also confused for the past 15 years. Instead of using the Power ratio without explanation, the authors could have been consistent with the use of the Amplitude ratio and stated clearly that 'We used dB in the Amplitude ratio' Because this clarification is missing, it has been impossible for readers to determine whether the authors actually wanted to use the Power ratio or the Amplitude ratio.

(Added on 12/19/2023) I found an interesting story about the difference between amplitude dB and power dB in Richard Hamming's lecture (a link to Youtube).

"I was having lunch with West Street people, the guys who built the machines and such other things. I was just one of the users. We ate at the cafeteria there. All during the lunch, they talked about salary raises. We got a two dB salary raise this time, we got 3 dB salary raises that time, and so on. Coming from Los Alamos, of course I know it's half-life. Knowing some astronomy, I know it's magnitude. But they talk that way. As we are getting ready to pick up our stuff and put them on trays, I say 'Pardon me gentlemen, but will you tell me whether this salary is amplitude or power?" Difference, you know, 10 log or 20 log, right? They did not know there was no agreement. They both start opening their mouth. They contradict themselves. There turned out that they had not known what they were talking about the whole damn lunch. Jargon can confuse yourself. Beware of Jargon!"

Is EEG a wave? (For 210,000 page views, 08/31/2022 added, 01/04/2024 updated)

Back in 70's, a young male EEG researcher met a girl in a party. She asked him what he did in research. He said he was working on a question whether EEG is a wave or not. She said 'Yes, I think so.' An old man told me this story, stared at me for a moment, and laughed. This then-young researcher later published this study in 1974, then authored the first edition of this book in 1981. His idea on this, called the global theory, is best summarized by the second edition of this book from 2006 or this paper from 2014.

In my opinion, the global theory is the most impactful yet most ignored view (the senior author Paul requested to delete these words, but I still want to leave them). If there are local-global interactions, it would change the rule of the game in neuroscience and give EEG a new position. One thing I struggled when following this view was that it took me some time (and direct discussions with the author) to figure out what is the medium of the wave. I tended to think the wave was the electric wave, but it was not! The medium of the wave is actually a synaptic action density field, which is usually non-zero in a given cortical area. Hence it could be positive or negative relative to the baseline state due to the local (and traveling) excitatory and inhibitory states.

Q. Why should higher temporal cortical frequencies occur with higher spatial frequencies?

A. Because this is what waves do: It is called the dispersion relation. (Nunez, 2011, p.1298)

Update--I added several short video clips taken from the 3-hour long conversation between Michael Levin and Lex Fridman. The senior author suggested the relation to the global theory. I thought it provides an excellent context to discuss the global theory, which is the advanced biological viewpoint.

PDF version, no movies (1.7MB) PPT version, with movies (330MB)

Critical thoughts on the physiological interpretation of ICA (For 220,000 page views, 11/30/2022 added)

This is a summary of my 20 years of end-user experiences with ICA. The presentation material contains slides from my previous lab presentations that are available from this page. I gave this talk at INC Chalk Talk on 11/03/2022. Coincidentally, I left SCCN on 12/04/2022 to start a new position at Cincinnati Children's Hospital. This is the end of my ICA journey at SCCN, but I will continue to work on the ICA-based EEG methodologies in my next career stage.

Abstract: Swartz Center for Computational Neuroscience (SCCN) has been pioneering the use of independent component analysis (ICA) on human EEG data and played a central role in establishing the physiological interpretation of ICA. I will first review the premises of the physiological interpretation of ICA and supporting evidence based on published works between 2007-2012. Then I will challenge it on the following points: (1) Dependency on power distribution across frequencies—or how component rejection introduces high-frequency artifacts; (2) The accompanying small patch hypothesis and the relevant issues such as patch-size dependency on data SNR and depth bias; (3) The strong dimension reduction effect—the low dimensionality as a result of ICA may not necessarily mean revealing genuine property of the data. I conclude that the physiological interpretation of ICA on EEG contains several inappropriate claims due to lack of evidence and misunderstandings, which need to be corrected by adding appropriate limitations to its use and interpretation for ICA to become a truly general EEG analysis tool.

Recorded video.

PowerPoint slides (28MB)

Insights into the modern biology helps understand the global theory? (For 230,000 page views, 03/30/2023 added)

Around Oct 2022, Paul Nunez told me to watch this Youtube podcast because it may be helpful to gain insight into his global theory. I watched it and I was blown away. I wanted to learn what was discussed there as carefully as possible. So I transcribed it (File:Michael Levin Lex Fridman.docx) and even translated it into Japanese (File:Michael Levin Lex Fridman jp.docx), which took 4-5 months for me. Probably, Paul wanted to show me that the concept of "multi-scale competency architecture" discussed in the dialogue is a good analogue of the global theory. But this dialogue is way more than that for me. It fundamentally changed my views of biology and biological systems. Agential materials and multi-scale competency architecture, DNS vs biology from which novel regenerative medicine arises, Heyflick limit and planarian as a proof of immortality, cancer and gap junction, etc etc... By the way, Fridman mantions the game of life in the dialogue. If you haven't heard of it, I recommend you watch this series of video. There are 9 movies in this series, total of about 1 hour. Turn on English caption.

As an EEG researcher, I became more and more aware of the fact that we are rather blind to larger things, such as stationarity, the influence of 1/f power distribution, and volume conduction. There are common misconceptions about them in the field. This reminds me of the phrase someone famously said, "if you tell a lie big enough and keep repeating it, people will eventually come to believe it." Such social dynamics is not an exception in our research community. Another German once wrote, "it was the duty of philosophy to destroy the illusions which had their origin in misconceptions." Isn't it a philosophical problem we have to address in parallel with cognitive questions? This is the beginning of electroencephalosophy...

Two presentations at a seminar: EEG preprocessing and generative mechanism (For 240,000 page views, 09/21/2023 added, 10/18/2023 updated)

Here are my recent presentations on NIH HBCD summer seminars. One is about the preprocessing ([Youtube link, File:On preprocessing EEG data.pdf) and the other one is about generative mechanism mainly focusing on thalamo-cortical loop (Youtube link, File:On EEG's generative mechanism.pdf). If you want to hear these presentations in person (or any presentations on this page), please write to me. If you are group of three or larger, I'd be happy do it for you.

I'm currently visiting Mike Cohen in a small village called Briançon. He is enjoying a digital nomad life here. I'm wondering what I should discuss with him tomorrow!

Update: Over the empty plates from breakfast, I had the last discussion with Mike before moving to Padova. As a summary of our days of discussions, I proposed the concept Mike Cohen Test to him, which is an attempt to replicate someone's claim by using (white)noise. He said he was honored.

Why are 'brain' ICs dipolar, and why do they show positive-dominant scalp topos? (For 250,000 page views, 12/11/2023 added)

Here are two questions that bothered me:

(1) Why do good 'brain' ICs show dipolar scalp topos although 2/3 of the cortex is in sulci?

(2) Why do these dipolar 'brain' IC scalp topos show red (positive) centers?

I published the answer here. Below I show the summary of my observations and intrepretations.

The answer to (1): It is because scalp-recorded EEG is insensitive to sulcal sources compared with gyral sources. This finding justifies the use of lissencephalic (i.e. no sulci) brain model proposed in Electric Fields of the Brain (Nunez and Srinivasan, 2006) together with Spline Laplacian. This also supports the view that the major source of scalp-recordable EEG is pretty broad (minimum 6.45 cm^2) which requires a continuum of multiple gyral crowns.

I did not write it in the paper, but the result refutes the claim that ICA is a high-resolution EEG spatial filter because ICA is basically blind to 2/3 of the cortex. In fact, it seems ICA results are always dominated by high-power, low-frequency, and very broad sources. I presented this view in my past lecture, and I will publish it in the near future.

The answer to (2): It is because EEGLAB's ICA sets all ICs' initial topos red centered (i.e. positive dominant). Thus, unless necessary, the algorithm does not flip the polarities.

Now you wonder--when does the ICA algorithm flip the polarity to produce 'blue' centered (i.e. negative dominant) ICs? I found that those blue-centered ICs tend to show poor physiological validity with large index numbers. A known clear exception for this rule is ICs localized for the motor cortex.

People use ICA to clean EEG. I use EEG to glean ICA: I invented ICA gleaning!

Downsampling reduces data variance? (For 260,000 page views, 02/02/2024 added)

One of my colleagues Ernie told me that PVAF used in pvaftopo() (this calculation is different from PVAF calculated in envtopo()--the former calculates variance across time, while the latter calculates variance across channels) changes values counterintuitively a lot in some cases. So I started investigation. I found that at least one of the causes of this change, which is always a reduction, is due to the anti-aliasing filter used in the downsampling function, pop_resample() for the current case. You can see my demonstration using a white noise (the left bar) that when I change the sampling rate from 200Hz to 100Hz, if I use pop_resample() it reduces the data variance (the middle bar), but if I sample alternating time points, the variance remains the same (the right bar). Note that the use of antialiasing filter is generally necessary to avoid aliasing (duh).


Note that the reduction of more than half of the variance from the original data is because white noise has a uniform power distribution across frequency ranges and the sampling rate was changed to the half. The antialiasing filter removed the power of the upper half of the frequency ranges (the exact half frequency range point is called a Nyquist frequency). You can see that the antialiasing filter probably cuts into the point below the Nyquist frequency from the fact that more than half of the variance is gone compared with the original.

Based on this observation, we can predict next that if we use a pink noise, which has 1/f power distribution across frequencies, the amount of variance reduction should be diminished. We can verify this prediction by seeing the plot below.


Ok, so far so good. My intuition works well because I'm using simulated data. Now it's time to test whether my intuition generalizes to empirical data. I grabbed my favorite open database that contains n=208 healthy adults performing a resting task with eyes closed and open (Babayan et al., 2019) to compare variance of IC activations (EEG.icaact) across time between sampling rate 250Hz vs. 100Hz. The results are shown below.


Hmm...this is only marginally satisfying. Do these results mean ICs with lower index numbers (i.e. more dominant in variance) are closer to the pink noise? In both EC and EO conditions, after IC index 20, all the ICs lost 40% of the original variance just by the antialiasing filter for downsampling--are they so close to the white noise? Is this something we should be readily expecting? I can verify this prediction by calculating their PSD 'exponents' using FOOOF. Also, if I split the data into brain-ICs and artifact-ICs and calculate this bar graph again separately, it should verify my understanding. But I stop here for now. I'll save the fun for later.

Though I do not fully understand the cause of the issue yet, now I can imagine that downsampling can impact the calculation of PVAF depending on the nature of ICs. In order to avoid the issue, you should avoid using an antialiasing filter at your own risk.

If you have any idea, please contact me.

Contribution of lemniscal vs. extralemniscal thalamic nucleus to scalp-recorded EEG (For 270,000 page views, 05/17/2024 added)

When observing the auditory steady-state response (ASSR), or similar steady-state responses evoked by stimuli in other modalities, you will notice an initial onset response followed by a subsequent response that more faithfully reflects the steady-state properties of the stimulus. I assumed that was just the way they were and never questioned why. Recently, and unexpectedly (I mean by writing a commentary on ASR), I came across a group of researchers who had an eloquent explanation for it. According to their explanation, the 'initial onset response' mentioned above is essentially a type of Vertex Potential dominated by extralemniscal (i.e., supramodal) thalamic response, while the subsequent response faithful to the stimulus properties is due to lemniscal (i.e., modality-specific) thalamic response. THIS BLEW MY MIND. They are talking about a VERY important aspect of ERP phenomena, and their proposed view definitely changes how you interpret them. Let me quote their explanations below. It's a bit long, but does not matter.

...Information about the sensory environment is relayed by two main sensory pathways. Lemniscal pathways convey high-fidelity information in a given sensory modality to its primary sensory cortex, while extralemniscal pathways convey low-fidelity information to diffuse thalamic and cortical targets [ref]. This anatomo-functional dichotomy is relevant to interpret our results: while the lemniscal system is sensitive to fine-grained stimulus features of one sensory modality, the extralemniscal system is sensitive to supramodal environmental changes, and rapidly habituates to repetitive stimulation [ref]. There is clear evidence suggesting that the Vertex Potential, which is a supramodal response that rapidly habituates to repetitive stimulation [ref], results from the activation of the extralemniscal system. (Somervail et al., 2021)
...Recently, we argued that Vertex Potentials reflect the activity of the extralemniscal system [ref; the quote above], an interpretation which was once popular but has since been largely forgotten [ref]. Extralemniscal sensory pathways run in parallel to canonical modality-specific lemniscal pathways, and transmit low-fidelity information to supramodal thalamic nuclei that project widely to the cortex and striatum [ref]. Several lines of evidence suggest that the Vertex Potential is the cortical consequence of the activation of the extralemniscal system. For example, unlike neurons in lemniscal relay nuclei, extralemniscal thalamic neurons respond to stimuli of several modalities [ref], and the Vertex Potential largely reflects supramodal cortical activity [ref]. Both the Vertex Potential and extralemniscal thalamic responses rapidly habituate to stimuli repeated at short and predictable intervals [ref]. More direct evidence is that general anesthetics abolish extralemniscal responses while leaving lemniscal responses intact, and abolish the Vertex Potential elicited by auditory stimulation without affecting the modalityspecific lateralised EEG responses [ref]. In contrast, the Vertex Potential elicited by sudden auditory stimuli is largely unaffected by bilateral ablation of the primary auditory cortex, while the early lateralised responses were totally abolished [ref]. (Somervail et al., 2022)