Makoto's preprocessing pipeline

From SCCN
Jump to: navigation, search

Note from Arnaud Delorme (main developer of EEGLAB): See also this page for a quick pipeline to reject ICA components in single datasets. The opinions expressed below are those of Makoto Miyakoshi and should not be taken as official EEGLAB guidelines.

%%%%%%%%%%%%%%% (Preface updated on 11/27/2019)

A famous slogan in computational science by Buckheit and Donoho (1995).

"An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures." Buckheit JB and Donoho DL. WaveLab and Reproducible Research. In: Antoniadis A and Oppenheim G. (Ed.) Lecture Notes in Statistics 103. Wavelets and Statistics. 1995. Springer-Verlag New York, Inc.

If you have said 'EEG data mining is like walking a mine field,' you are a friend of mine. It's a field of mine because it's not a field of mine. But never mind even if you have never mined EEG data. I'll show you my minecraft. This is Makoto Miyakoshi's personal recommendation for preprocessing EEG data using EEGLAB. These are NOT Swartz Center's official opinions. This page is admittedly dogmatic, so follow the instruction at your own risk. This is a forever beta version and subject to change anytime. If you have any question, please post it to EEGLAB mailing list (you need to subscribe it from here.) In doing so, please include the keyword Makoto's preprocessing pipeline so that my email filter can pick it up.

I have another SCCN Wiki page Makoto's useful EEGLAB code which is more code oriented to address various frequently encountering practical problems in using EEGLAB.

Contents

Change the option to use double precision

In EEGLAB main GUI top left tab, 'File' -> 'Memory and other options' -> 'If set, use single precision under...' uncheck it.

Check the path and import data (01/10/2019 updated)

Before importing the data, check these points.

  1. Your full path to eeglab folders and your data does NOT contain any non-alphabetic, non-numeric, non-underscore characters such as !@#$%^&*()-=\`[];', and [space]. For example, if your folder name is 'Experiment 1', that is bad. Change it to 'Experiment1' or 'Experiment_1' for example.
  2. Any folder and file name in your full path, as well as your variable names in Matlab workspace, does not start with numbers. For example, '2018_Oddball_Project' is better changed to 'Oddball_Project_2018'.
  3. Avoid using names that are (likely to be) used as Matlab function. For example, 'eeglab' is a name of a function. So you do not want to use it for a folder name or variable name. If you are not sure if a name is already used by Matlab, type
which -all [variableName]

For example,

>> which -all interp
/usr/local/MATLAB/R2017b/toolbox/signal/signal/interp.m
/usr/local/MATLAB/R2017b/toolbox/shared/controllib/engine/@DynamicSystem/interp.m  % DynamicSystem method

This means that the word 'interp' is used as a function in two places already. You can still use the word 'interp' as a variable name, but you could get into a trouble later. Try to avoid at least known function names such as 'eeglab' when naming a folder, file, or variable.

I'm not telling you that if you have those instances you will immediately fail. On the contrary, using recent operation systems and Matlab, you probably won't encounter any problem--until you use, without knowing, some of EEGLAB functions and plugins the authors of which assumes users follow the classical rules. It is a good practice to eliminate this risk at this very beginning stage.

After importing, check if you have extra channels that are filled with zeros or complete garbage. Never include those problematic channels into ICA, it crashes the process (and for infomax, it slows the process significantly).

Downsample if necessary

You may downsample the data to around 250Hz if the original sampling frequency is higher than that. Usually low-pass filter is necessary before downsampling for anti-aliasing, but EEGLAB function applies the anti-aliasing filter automatically. Downsampling is useful to compress the data size. It also help ICA to produce better decomposition by cutting off unnecessary high-frequency information, though we have not tested it quantitatively.

High-pass filter the data at 1-Hz (for ICA, ASR, and CleanLine)(09/23/2019 updated)

The purpose of the application of high-pass filter is to remove the 'baseline drift'. It is generally recommended to use lower cutoff frequencies, for example 0.01 Hz (cooperative subjects) – 0.1 Hz (children and patients) (Luck, S. J. 2005. An introduction to the event-related potential technique. Cambridge: MIT press.) than our recommended 1-Hz pass-band edge (equivalent to their cutoff frequency of 0.5 Hz@-6dB; see this PDF file for filter terminology). The reason for the 1-Hz pass band edge is because

  1. ICA is biased toward high amplitude if data length is finite. EEG data have 1/f power spectral density (PSD) curve. The combination of these two facts means that the leftmost values on the PSD plot will have the most influence on your decomposition result, although the most useful EEG information you are probably looking for is in 3-13 Hz (theta and alpha). Therefore, cutting off the leftmost values help ICA to focus on the important 3-13 Hz.
  2. Low-frequency EEG < 1Hz could be contaminated by sweating etc, which adds spatiotemporal non-stationary and affects ICA. I am a big fan of infraslow EEG research though--after all, no one method is perfect.
  3. Also, the lowest frequency in ERSP (i.e., a blended wavelet/STFT transform) is typically 3 Hz if you want to have a decent frequency resolution (in EEGLAB parameters, 3 cycle at 3 Hz generates 1116 ms sliding window length; If you want to go lower, the sliding window will be longer!) 3 Hz is sufficiently high relative to the 0.5-Hz cutoff.
  4. These qualitative explanations are supported by published data. They (Stefan) concluded that high-pass filtered data at 1-2Hz works best for ICA.

Winkler I, Debener S, Muller KR, Tangermann M. 2015. On the influence of high-pass filtering on ICA-based artifact reduction in EEG-ERP. Conf Proc IEEE Eng Med Biol Soc. 2015 Aug;2015:4101-5.

...As a pre-processing step for ICA, high-pass filtering between 1-2 Hz consistently produced good results in terms of signal-to-noise ratio (SNR), single-trial classification accuracy and the percentage of ’near-dipolar’ ICA components. Relative to no artifact reduction, ICA-based artifact removal significantly improved SNR and classification accuracy. This was not the case for a regression-based approach to remove EOG artifacts.

There are reports that 1-Hz high-pass filter attenuates (or even 'distorts') low-frequency, so-called 'late slow waves' of event-related potential family, such as N400 or P600 (these names are paradigm dependent, of course). To avoid this problem, one can calculate ICA weight matrix and sphereing matrix with 1-Hz high-passed data, then apply it to 0.1-Hz high-passed data. This ICA matrices transfer can be done through EEGLAB GUI 'Edit' -> 'Dataset info'.

Low-pass filter is optional. If CleanLine (see next) does not work well, you may want to simply apply a low-pass filter to cut out above 50/60 Hz. If you use the default eegfiltnew() filter, the transition bandwidth around 50/60 Hz must be 10, so set the 'passband edge' no larger than 45/55 Hz.

Note also that for connectivity analysis using multivariate Granger causality, data are better treated with special filter setup. I summarized it in this section.

(Added on 03/29/2017) When I worked with Marco Petilli on his MoBI data recorded in SCCN, he first found only 1 good IC after ICA. I thought this is the end of the research. But anyway, without much expectations, I tried 2 Hz high-pass filter with transition bandwidth being 1.25 to 1.75 Hz (Blackman-Harris window), a very aggressive parameters. After ICA, I found nearly 10 good ICs! This was the eyes-opening experience. I remember my former SCCN colleague, CK, strongly recommended that I should use transition bandwidth of 0.25 to 0.75 Hz when applying 1-Hz high-pass filter.

What happens to the < 1 Hz data if ICA is calculated on > 1 Hz data and applied to 0.1 Hz data? (08/17/2018 Updated)

As long as the < 1-Hz data are correlated with > 1-Hz data, ICs obtained from > 1-Hz data decomponses > 0.1-Hz data as well with no problem (which is more likely the case). However, if < 1-Hz data are independent of > 1-Hz data, what will happen?

Let's use some imagination to see the situation. You are in a dinner party. On a buffet table, there are big four plates of salad, steak, salmon, and dessert. You want to take some salad to your plate. The moment you reached your arm to the salad plate--power outage happens. All the lights are off, and it's pitch dark. But you decide to take the salad anyway. You know where the salad plate is, and you have a tong and a plate in your hands, so why not? You take the salad to your plate in the darkness, enjoying unexpected inconvenience. Now power is back, lights are on. Guess what you find in your plate! You thought you took the salad, but the salad is drenched with mushroom soup! Maybe someone spilled a bowl of mushroom soup over the big salad plate by accident? Now your salad has a linear combination with some amount of the mushroom soup.

This is the situation when > 1-Hz ICA is applied to > 0.1 Hz data. The > 1-Hz ICA can see only four plates, but > 0.1-Hz ICA may be able to see four plates plus mushroom-soup contaminated area as a new independent component. If you know the contaminated area, you can avoid it to take uncontaminated salad, steak, salmon, and most importantly, desserts.

The concept and calculation of ICA is complicated, the output and how to use it is damn simple. It's just a spatial filter, like locations of the plates. How much mushroom soup you get as a result of taking contaminated salad, steak, salmon, and desserts, depends on above which plate the mushroom soup was mainly spilled.

Import channel info

'Edit' - 'Channel locations'. If you have channel labels such as 'Fz' 'Cz' 'Pz' etc, that means that your channel montage follows International 10-20 system (there is extended version of it; see Oostenveld and Praamstra 2001 Clin Neurophysiol for 10-10 and 10-5 system). If that's the case, click 'Look up locs' button that is located in the center-bottom of the GUI. Choose 'use MNI coordinate file for BEM dipfit model' unless you have special reason not to do so. This will automatically import channel coordinates from internal library. Don't forget to click 'Opt. head center' located right top corner.

If you have digitized channel locations, just 'Opt. head center'. Some vendor use meter as a unit length instead of millimeter. If you find all dipoles localized at the center of the head, suspect this cause and x1000 the scale to see if it fixes the problem (as far as I know, old Polhemus did this once).

Do we want to have dedicated EOG channels? (12/03/2019 added)

(Thanks Maciej Kosiło for bringing up this question) If your EEG recording system has a default setting so that two electrodes are spared to record electrooculogram (EOG), what's the best way to make most out of them?

  1. Use them as EOG channels that are bipolar referenced to each other.
  2. Use them as EOG channels that are monopolar referenced to a scalp recording reference.
  3. Re-purpose them as scalp EEG channels.

If you use ICA in your preprocessing, having dedicated EOG electrode(s) is not that necessary. This is because occular artifacts, either blink or saccade, have quite broad scalp topography which is modeled by ICA generally very well. This is the strength of multivariate method--ICA evaluates whole-scalp distributions. So, I would say 3 is the best answer. 2 is just as good, and in doing so I recommend you use locations of AFp9 and AFp10 defined in international 10-5 system (Oostenveld and Praamstra, 2001) which is virtually as good as canthus (lateral end of eye) of HEOG-L and HEOG-R. In this way, EEGLAB (or any label-based location database) can find their template coordinates automatically. These AFp9 and AFp10 serve as both HEOG monitors and far-frontal EEG channels. In fact, Delorme et al. 2007 demonstrated that medial prefrontal source EEG is most sensitively detected by these upper-facial electrodes. The worst choice, I would say, is 1, because by bipolar referencing you lose data dimension by 1. If these EOG channels share the monopolar reference with EEG channels, you can re-reference them offline to obtain bipolar-referenced EOG data anytime without losing anything.

Remove bad channels

What do you mean bad here? In short, the channels that you can't save even if you reject 5-10% of your datapoints. Removing bad channels is a critical step particularly for average referencing. This is because average reference signal is an average of all channels. If you have channels that are noisy all the time, including such channels to average will introduce noise to all the channels. This issue is focused in detail in Nima's PREP pipeline paper.

If you are looking for controlled, objective rejection criteria, I recommend you check out these solutions

  • Nima's PREP pipeline: See his paper for detail.
  • Christian's clean_rawdata(): This EEGLAB plugin comes with sophisticated bad channel detection algorithm.
  • Makoto's trimOutlier(): This tool does simple jobs in an interactive manner (i.e. to visualize overall statistics, set threshold, confirm results, re-set the threshold if necessary.) Also available for a batch process.

Recently I only use clean_rawdata(). Note that this function saves nice results of cleaning under EEG.etc with which you can identify which channels and datapoints are rejected in the original length. Of course, it's Christian's work!

In writing batch, I recommend you keep the original (i.e. before channel rejection) EEG data in this way so that you can refer to it later for channel interpolation (see below)

% Keep original EEG.
originalEEG = EEG;

Interpolate all the removed channels (05/17/2019 updated)

This is to minimize a potential bias in the next average referencing stage. For example, if there are 64 channels, and 16 channels are identified as bad and rejected but only from the right hemisphere. Then, the number of channels in the left vs. right hemispheres are 32 vs. 16, with which average will be biased toward the left hemisphere. To prevent this from happening, channels are better to be interpolated. Note that this causes rank deficiency, so either PCA dimension reduction or channel rejection will be necessary before ICA. See the sample code (I haven't tested if 'spherical' is the most reasonable solution though.) The reason why you want to do it before ICA is because after ICA you can't recover EEG.icasphere for the rejected channel. For detail, see this section No. 13.

% Interpolate channels.
EEG = pop_interp(EEG, originalEEG.chanlocs, 'spherical');

Re-reference the data to average (07/18/2019 Updated)

If EEG is generated on the cortex with dipolar current distribution and with no external sources, there are same amount of positive and negative potential changes at every moment due to charge conservation. Hence the scalp topography should sum to zero. By re-reference to average channel values, we assume that there is no charge creation that is monopolar sources and sinks. In practice, average reference makes all the ICA scalp topography have zero total potential (i.e. red and blue always balances).

(Update 08/24/2016) Note also that average referencing is often very helpful to suppress the line noise. I experienced that line noise that cannot be rejected by CleanLine, due to a shorter time constant of fluctuation of amplitude envelope of the line noise frequency band (you can visually confirm this by band-pass filtering the data to pass only the line noise and see its time series), could be removed by average referencing. I guess this is related to common-mode rejection? Anyway, I suggest one performs average referencing first then perform CleanLine.

Why should we add zero-filled channel before average referencing? (03/04/2020 Updated)

A simple proof that re-referencing to average of non-reference channels is wrong is as follows. (Uniopolar) re-referencing should be able to apply as many times as one wants to any existing and non-existing channel (like averaged channels) without changing data. See the following quote from Hu et al. (2019) Brain Topography.

...The most surprising property is ‘no memory’ indicating that URs [=unipolar references] works independently without the consequences of the URs already applied. It also means one can always safely re‑reference the EEG/ERP recordings with different URs but not worry about if re‑referencing multiple times will accumulate artifacts.

But if you re-reference to the average across non-reference channels (i.e., without including the initial reference channel), you cannot do this. This is the simple proof. A little bit more theoretical proof can be found below.

For the sake of argument, suppose there are only three scalp recordings, , , with reference to infinity to start with. If is chosen as a initial reference when recording, the initially referenced data , , are described as

Note that data rank == 2 (because is 0). Re-referencing these data from REF_Pz to REF_Fz, we obtain

Thus, re-referencing from to yields, or 'recovers', time series of the initial reference channel as . Note the rank of data remains unchanged (data rank == 2). In this way, the bottom line is that data can be freely rereferenced after collection algorithmically. (Dien, 1998. Behavior Research Methods, 30:34-43; Osselton, 1965. Electroenceph Clin Neurophysiol, 19:527-528)

When is chosen as an initial reference, average potentials across the channels is defined as

Because is 0 everywhere in the time domain, it may be omitted as

Thus, the number of terms in the numerator is 2 (i.e., the number of non-zero channels), while the denominator is 3 (i.e., the number of channels including initial reference). However, it is often misunderstood as

If we use without counting the initial reference into account, we obtain the remaining 2 channels as

Thus,

Apparently, using causes rank deficiency (rank == 1). However, this should not happen as we saw above. On this account, I think our expalantion on this page is wrong.


If we use , which takes initial reference into account, for all 3 channels we obtain

Note that the data rank remains to be unchanged (rank == 2).


Ok, then how can we add back the initial reference? Remember, it was

which is continuous zeros. This is why we should add zero-filled channel before average referencing IF your data does not already have it (which I have never seen before).

This operation can be simply achieved by adding +1 to the denominator when you calculate the average across the channels. For example, if you calculate average across 32 channels, sum up all the channel values and divide it by 33. Similarly, if you have 64 channels, divide the sum by 65. If you have 128 channels, divide the sum by 129, etc.

(Update 11/29/2017) I wrote a simple EEGLAB plugin fullRankAveRef() to perform the average reference after adding the original reference channel (i.e., zero-filled). This channel is removed after average referencing, so the total number of channels remains the same (so does data rank, hence full-rank average reference). This solution may be useful for those who have data with small number of channels, such as standard 19 channels in international 10-20 system.

What is rank? (Updated 06/06/2016)

NEW description (Thanks Andreas Widmann for discussion!) Usually, the number of data rank is the same as the number of channels. However, if a pair of channels are bridged, the rank of the data is the number of channels-1. For another example, a pair of equations, 2x+3y+4=0 and 3x+2y+5=0 have rank of 2, but another pair 2x+3y+4=0 and 4x+6y+8=0 have rank of 1 for the two equations are dependent (the latter is exact twice of the former).

EEG data consist of n+1 channels; n probes and 1 reference. The n+1 channel data have rank n because the reference channel data are all zeros and rank deficient. Usually, this reference channel is regarded as useless and excluded for plotting, which is why you usually find your data to have n channels with rank n (i.e. full ranked). It is important that when you perform ICA, you apply it to full-ranked data, otherwise strange things can happen (see below). Your data are usually full-ranked in this way and ready for ICA.

When you apply average reference, you need to put the reference channel back if it is missing (which is usually the case). You can include the reference channel in computing average reference by selecting the 'Add current reference back to the data' option in the re-reference GUI. If you do not have initial reference channel location, in theory you should generate a zero-filled channel and concatenate it to your EEG data matrix as an extra channel (which makes the data rank-deficient) and perform average reference. To make the data full-ranked again, one may (a) drop a channel--choosing the reference channel is reasonable (b) reduce the dimensionality by using a PCA option for ICA.

To perform the abovementioned process on the command line, one can manually compute average reference using adjusted denominator: EEG.data = bsxfun( @minus, EEG.data, sum( EEG.data, 1 ) / ( EEG.nbchan + 1 ) );

The origin of this argument by Joseph Dien can be found here.

OLD description:

Note that average reference reduces the data rank by one (See this page: Imagine v1,v2,..., vk in the equation are non-zero channel data. After average referencing, the sum across all the channels is zero, which proofs linear dependency--Note that average referencing (i.e., v1+v2+...+vk=0) is used to define rank deficiency! Average referencing without including the original reference channel by definition causes rank deficiency; See also this page written by my colleague Jason Palmer). The easy way to make the data full-ranked is to discard (any) one channel after average referencing. If you don't do this, ICA may return invalid results.

Below I show an example of what happens when you run ICA on rank-deficient data. There were 54 channels. Data were average referenced end epoched to -1 to 2 sec. After setting EEGLAB option to use double precision (thanks Andreas for clarifying this!), Matlab rank(EEG.data(:,:)) returned 53. AMICA log showed data dimension is 54. Notice the complementary scalp topographies (left) and time-series (right), and interesting spectra patterns (middle) in which IC 9 and 10 does not even respect the filter effect. Rejecting one channel and re-running ICA fixed this problem, and the pathological ghost ICs got disappeared.

Figure 1. A pair of ghost IC scalp maps generated by ICA on rank-reduced data. Figure 2. Spectra analysis results. Figure 3. Raw EEG time series.

(Data and figures by courtesy of Caterina Piazza)

Additional comments on 12/15/14: By mistake, I applied AMICA on 66 datasets (5-min resting, 60-70ch, ASR-cleaned with SD==6) that were averaged referenced but no channel discarded (i.e. rank-deficient). AMICA failed in 3/66 datasets (4.5%). Here is the out.txt log from the failed one. Note that 'num eigs kept =65' come from EEG.nbchan==65.

minimum eigenvalues =  -9.00329954361518855E-14 0.47905289012901225 0.5242944714252078
maximum eigenvalues =  998.61704681443121 262.85021915136872 219.07946516116806
num eigs kept =  65
iter 1 lrate = 0.0500000000 LL = NaN nd = NaN, D = 0.17987E-01 0.17987E-01 (1.19 s, 0.7 h)
iter 1 lrate = 0.0500000000 LL = NaN nd = NaN, D = 0.16862E-01 0.16862E-01 (1.35 s, 0.8 h)
iter 1 lrate = 0.0500000000 LL = NaN nd = NaN, D = 0.17511E-01 0.17511E-01 (1.28 s, 0.7 h)
iter 1 lrate = 0.0500000000 LL = NaN nd = NaN, D = 0.16970E-01 0.16970E-01 (1.38 s, 0.8 h)
iter 1 lrate = 0.0500000000 LL = NaN nd = NaN, D = 0.17459E-01 0.17459E-01 (1.22 s, 0.7 h)
... done.Execution time: 0.00 h

Here is a sample code of this part.

% Apply average reference after adding initial reference
EEG.nbchan = EEG.nbchan+1;
EEG.data(end+1,:) = zeros(1, EEG.pnts);
EEG.chanlocs(1,EEG.nbchan).labels = 'initialReference';
EEG = pop_reref(EEG, []);
EEG = pop_select( EEG,'nochannel',{'initialReference'});

Remove line noise using CleanLine (05/17/2018 Updated)

CleanLine is an EEGLAB plugin developed by my colleague Tim Mullen (now Qusp CEO) based on one of the Chronux functions. It's not a standard frequency filter, but it uses sliding window to adaptively estimate sine wave amplitude to subtract. It does not make a hole in the background EEG spectrum.

For parameter setting, I usually set the step size == window size for the maximum rejection (from experience), and others set to the default. According to Nima's PREP paper, before running this function, data nonstationarity should be addressed by high-pass filter beforehand (Bigdely-Shamlo et al., 2015).

There are cases CleanLine does not suppress the line noise sufficiently. Most likely, the reason is that the line noise is not stationary. See the sample screenshot shown below. Channel F4 shows constant envelope width across time. This shows stationarity and CleanLine should work well on this channel. However, notice that most of other channels show fluctuating 60-Hz envelopes, and their time constants are shorter than the typical CleanLine sliding window length (default 4 s). This shows violation of stationarity, and CleanLine cannot work well on them (I tweaked the parameters but have never got luck). You can easily verify it on your data by applying the EEGLAB filter with the setting "lower edge = higher edge = 60 Hz" for example. If it turned out to be the case unfortunately, you should either apply conventional notch filtera or low-pass filter to cut off before the line frequency.

CleanLineEample.png

Some update--if you see harmonics such as 60 Hz and 120 Hz, in the time domain such a signal does NOT appear as the sum of constant 60 Hz and 120 Hz sine waves, but 60 Hz with spiky peaks and/or troughs. In other word, 120 Hz is only present occasionally in a phase-coupled way. In this case, I think SIFT's assumption is violated and does not work. Even if you see 120 Hz (or whatever harmonics peak on power spectral density plot) reduced, it does not mean the phase-coupled harmonics is specifically removed, SIFT does not work in that way. In this sense, probably it does not make sense to enter [60 120] but it only make sense to enter [60] i.e., the principal frequency of line noise (if you think I'm wrong, please write me, I'd be happy to learn.)

If you data are recorded with Biosemi, you MUST re-reference the data to *ANY* channel (including average channels; by the way, by including the original reference channel(s) you can re-reference to other channel(s) repeatedly without loosing information). This is because Biosemi lets users take care of common-mode rejection offline. What is common-mode noise? You can see non-stationary 60Hz amplitude wax and wane in a remarkably non-stationary way, which could not be modeled and subtracted by CleanLine at all. However, re-referencing erased it perfectly. So I guess that's an example of common-mode noise.

Epoch data to -1 to 2 sec (09/11/2019 updated)

  • If you have sufficient number of data points (i.e. [number of channels^2] x 30) after epoching, epoching before ICA is recommended.
    • A merit of epoching before ICA is that so that ICA can focus on peri-event brain dynamics while not to be 'disturbed' by datapoints from period of non-interest (i.e., non-event-related brain/non-brain activities). See also this section.
  • If you have insufficient number of data points, or your Stimulus Onset Asynchrony (SOA) is shorter than 3 seconds, do not epoch. Instead, use continuous data to perform ICA, then epoch. Follow this steps. If overlapped epochs are submitted to ICA, it will learn the duplicated part as positively weighted data.
  • The relatively long epoch is for later wavelet transform. For detail, see Slide 20 and 21 of this file. In short, epoch length of -1 to 2 sec is to have -0.5 to 1.5 sec window after wavelet transform (i.e. ERPS and ITC in EEGLAB terminology) with 3 cycle at 3 Hz wavelet which generates 1116 ms window length.
  • As you see above, epoching can be done anytime, before or after ICA/dipfit. In other words, epoching is NOT a necessary condition to perform ICA.
  • You do not need to 'correct baseline' i.e. subtract mean of the 'baseline period' which is typically -1000 to 0 ms. Importantly, David Groppe, our former colleague, reported that if short baseline such as -100 to 0 ms is subtracted before ICA, it makes ICA performance worse. This is because mean value of such short baseline can easily fluctuate by 'background' alpha rhythm, for example, that changes the DC level of all the channels independently... for ICA this is a very bad situation. Longer the baseline period, closer you are to the previously applied high-pass filter that guarantees mean zero hence safer. Actually, as long as the data are properly high-pass filtered, it is unnecessary to 'correct baseline' here, thus we can minimize the risk of confusing ICA. See Arno's comment below.

If you extract data epochs before running ICA (as advised by Makoto below), make sure that the baseline is long enough (at least 500 ms) or that you do not remove the baseline from your data epochs. See the following article for more information Groppe, D.M., Makeig, S., & Kutas, M. (2009) Identifying reliable independent components via split-half comparisons. NeuroImage, 45 pp.1199-1211.

Reject epochs for cleaning (11/13/2019 Updated)

Aim for around 5-10% rejection rate (Delorme et al., 2007). Apply amplitude threshold of -500 to 500 uV (don't capture eye blinks; you may want to exclude channels near AFp1 and AFp2 for this purpose), reject the selected epochs, and apply improbability test with 6SD for single channels and 2SD for all channels, and reject the selected again.

An idea for fast and efficient trial rejection (11/13/2019 Updated)

I moved this section to here with some update.

Adjust data rank for ICA (05/17/2019 updated)

When performing ICA, a critical parameter is how many ICs to generate. Usually and by default, the number of ICs to generate is the same as the number of the channels you have. However, this assumption holds ONLY IF your data are guaranteed to be full ranked (to see what it means, jump back to this section). Your data could become rank-deficient by

  1. re-referencing without including the initial reference (rank deficient by 1)
  2. interpolating k channel (rank deficient by k), or
  3. bridging n channels (rank deficient by n-1)

If ICA is forced to generate more number of ICs than the input data rank, 'ghost ICs' appears. pop_runica() has its own rank checker which uses Matlab's rank() function. However, though it can detect the case of 1) improper re-referencing and 2) channel bridging, it DOES NOT detect the rank reduction caused by channel interpolation using EEGLAB (using 'spherical' algorithm). This is because the interpolation is not linear, which adds very small eigenvalues that are counted as non-zero dimension. In fact, pop_runica() already has a heuristic threshold of 10^-7 for eigenvalues to reject such ill-conditioned dimensions, but it is currently used only to generate warning message (which I missed apparently). If you want to use this safer criterion to reject potentially ill-conditioned dimensions, add a minor change to pop_runica() from line 535 as follows (note 'max' is changed to 'min'):

function tmprank2 = getrank(tmpdata);
 
    tmprank = rank(tmpdata);
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %Here: alternate computation of the rank by Sven Hoffman
    %tmprank = rank(tmpdata(:,1:min(3000, size(tmpdata,2)))); old code
    covarianceMatrix = cov(tmpdata', 1);
    [E, D] = eig (covarianceMatrix);
    rankTolerance = 1e-7;
    tmprank2=sum (diag (D) > rankTolerance);
    if tmprank ~= tmprank2
        fprintf('Warning: fixing rank computation inconsistency (%d vs %d) most likely because running under Linux 64-bit Matlab\n', tmprank, tmprank2);
        %tmprank2 = max(tmprank, tmprank2);
        tmprank2 = min(tmprank, tmprank2);
    end;

Thus, it is strongly recommended if your preprocessing makes your data rank deficient by x in total (i.e., they add up!), you explicitly specify (number_of_channels - x) as your 'true' data rank in one of the following ways. Again, this becomes particularly important after channel interpolation, since pop_runica()'s rank checker does not detect the rank deficiency and allows to produce 'ghost ICs' (by the way, same happens to ASR--any PCA/ICA solution would be affected similarly!) To address the issue, you either

  1. reduce data dimension by using 'pca' or 'pcakeep' option for runica() and runamica() respectively, to the number of (number_of_channels - x), OR
  2. reject *ANY* channels by x. Interestingly, they do not have to be the exact same channels that are interpolated. You may want to determine the new channel location montage by using Nima's loc_subsets() function which tells you which channels to be rejected to achieve maximally uniform channel distribution.
% Discard channels to make the data full ranked.
channelSubset = loc_subsets(EEG.chanlocs, yourDataRankHere);
EEG = pop_select( EEG,'channel', channelSubset{1});
EEG = pop_chanedit(EEG, 'eval','chans = pop_chancenter( chans, [],[]);');

Run ICA (06/26/2018 updated)

What is assumption of stationarity in ICA? (09/11/2019 updated)

ICA assumes that ALL the data points submitted are from the same state. For example, if you perform a P300 task while recording EEG, you are making the assumption that all the data points are related to the P300 task. From this viewpoint, it becomes clear that it is more desireable to epoch your data to the event of interest first then perform ICA (data length being permitted) rather than performing ICA on continuous data. What about eye-blinks? It is definitely not related to P300 task (or may be temporally correlated with event structure, if blink timings are behaviorally controlled), and it's not even from the brain. However, if a blink occurs frequently enough during the recording, it becomes a part of stationarity. What about non-task-relevant (i.e., non-event-related) absent mindedness or random thoughts reflected in EEG between trials or blocks? These could be disturbing to the assumtion of stationarity you made for ICA because such EEG signal may not have any repeating patterns or similar statistical characteristics (such as probability density function) across time. Therefore, using ERP paradigms actually gives a very nice justification to use ICA as ERP paradigms reinforces EEG data stationarity by massive repetitions of the same/similar types of stimulus-response sequences. The most ideal for ICA would be quasi steady-state responses, such as mismatch negativity.

  • Is data stationarity more 'attention-drawing' for ICA than large amplitude in signal? I have seen many times that only a single event of EEG response but with a very large amplitude can attract ICA's attention and make it waste multiple ICs to decompose it. So apparently, high amplitude signal is a very strong attention drawer for ICA than a regular-amplitude, well-stationary signals. This is why data cleaning is critical before ICA. It also explains why ICA is so good at decomposing eye blinks--their source location is fixed, potential distribution over the scalp is fixed and broad, and the amplitude is large. By the way, the reason why ICA is biased to source magnitude in practice is because realistic data has finite length.

General tips for performing ICA (06/26/2018 updated)

  • ICA has a bias toward high amplitude, unless the data has infinite length. EEG signal usually has 1/f power spectral density, which means low-frequency signal tend to dominate ICA results. This is why you have a lot of alpha components and no isolated gamma components; gamma power is always very low relative to theta and alpha.
  • ICA may fail if you apply it on data that consists of concatenated blocks with different capping, even if the data are from a same subject. This is because every time an electrode cap is applied, their relative locations to the EEG sources in the brain are not the same. This violates the assumption of spatial stationarity between the electrodes and EEG sources throughout the recording. In other words, To apply a single ICA, your subject cannot take off the electrode cap! If you did this, then ICA may show ICs with very similar scalp topographies with differently block-separated activations (active only in some blocks, otherwise green) in ERPimage.
  • For the same reason, re-gelling electrodes during break times is not recommended. There were cases that ICA split data before and after the re-gelling. Even only one channel out of the original position and it will happen, so be careful.
  • Include EOG, ECG, and other channels as long as they share the initial reference electrode with scalp electrodes. If EOG channels are referenced to each other (i.e., bipolar), exclude them.
  • If there are channels with terrible EMG contamination, excluding them from ICA may improve results. This is because ICA cannot model EMG sources because EMG sources are not spatially stationary but spreads along with muscle fibers.
  • You need minimum of (number of channel)^2 x 20 to 30 data points to perform ICA. The number ’20 to 30’ should increase as the number of channels increases. This empirical formula if from past publications from SCCN. This value was however determined purely empirically and we have not performed any systematic investigation on it (should you be interested in carrying out this study, contact us!) Moreover, there is a good counter-example for this rule of thumb--See the next one.
  • You may want to downsample the data (with more than 128 channels) to near 100Hz. This is the case when you give up gamma band (40-80Hz) and higher due to presence of strong line noise that CleanLine() could not remove very well (CleanLine's assumption is that the noise envelope is stationary, which often does not hold). Although not officially studied or reported, in principle this aggressive downsampling not only helps to accelerate the process but also may improve quality of decomposition due to simplifying the data. Run ICA on the downsampled data and obtain the decomposition matrices (i.e. EEG.icaweight and EEG.icasphere), and apply them to the non-downsampled data. Interestingly, in this case the number of datapoints is severely lower than the number recommended above, but in most of cases it is fine (But I did encounter several cases where AMICA failed due to downsampling; in this case, AMICA worked fine by using the original 256Hz sampling rate, so when coding you might want to use try... catch...). Judging from this, after all it is NOT the absolute number of data points that determines the quality of decomposition, but probably the length of actual time. If you think about using very high sampling rate (10kHz for example) to gain just the number of datapoints to improve ICA results... which does not seem to work to me. For this reason, using higher than 256Hz is probably not very helpful to improve quality of ICA decomposition.
  • You may want to reject epochs on IC activations and run ICA again (Delorme et al., 2007). This is because it will free ICA from data points difficult to explain. A good principle is that feed ICA only with typical (i.e., stationary) data. If you run ICA again on this cleaned data, theoretically you can improve ICA quality, but in practice the difference is placebo level unless you reject 1/3 of data (but remember, due to ICA's bias toward ampltiude, rejecting high amplitude noise/artifact is always immediately helpful, even if it is 0.001% of total data length. Note that high amplitude I mean here cannot be eye blink artifact, they are still benign; What I mean here is those of non-biological origin with > 1000-2000uV.) In general, ICA results are more robust than we think. If your data are already very clean, it is usually impossible to further improve ICA results.

Rejecting ICs to improve the next ICA? Or what is rank and linear algebra (07/16/2018 updated)

Rejecting ICs to improve the next ICA is the idea that keeps alluring many of us but never works. Let us confirm the process and the reason why it does not work.

  1. You have 100ch data.
  2. You run ICA on it to obtain 100 ICs.
  3. You identify 1 eye-blink component.
  4. You reject it. Now you have 99 ICs.
  5. You back-project the 99 ICs to obtain 100ch reconstructed scalp channel signals.
  6. You run ICA again on the reconstructed 100ch data, hoping to obtain another 100 ICs. The new ICA result should be cleaner because this time ICA is free from eye-blink artifact.
  7. You find somehow 100ch ICA gives you only 99 ICs. Certainly the blink IC is gone, but the rest of 99 ICs remains identical as before. .

Where did you fail? It is when you thought you could obtain another set of 100 ICs. Below we see this problem in detail.

Usually, your 100ch recording is rank 100 (to see it, execute double(rank(EEG.data)) in Matlab command line) unless some of your channels are bridged/shorted, or average reference was applied. If two channels are bridged, their signals become identical. Clearly, this is not the useful way to use multiple channels. The important principle here is that for channel data to be meaningful, it should not be explained as combination of others (being identical is an extreme case). To see this more clearly, let me give you a simpler example below.

  1. You have Fz, Cz, and Pz data.
  2. You reject Cz data.
  3. You interpolate Cz with 50% of Fz and 50% of Pz.

In this case, you probably see that signal at Cz is not that useful, since it is just made up. In this case, there are 3 channels, but only with rank of 2. By the way, 3 channels with rank of 3 is called full rank, and any state where rank < number of channels is called rank deficient.

Let's stick to this example to explore further. Imagine that you run ICA on Fz, Cz, and Pz to obtain 3 ICs. You know ICA's performance often goes beyond our intuition to almost a magical level (as Clarke said, Any sufficiently advanced technology is indistinguishable from magic). But do you know where the three ICA activations come from? They are not out of thin air, but they are just the result of re-blending the three scalp channel values! For example, IC 1 is 29% of Fz, 33% of Cz, and 38% of Pz, and so on.

  • Q1. Does such a simple re-blending can really reveal the allegedly effective EEG source activations?
  • A1. Yes. To make a claim to be an effective EEG sources, there needs several assumptions of spatial stationarity of the sources etc.
  • Q2. Who determines this blending percentages?
  • A2. That is ICA's primary output.
  • Q3. Where can I find this blending percentages?
  • A3. Each column of EEG.icawinv represents the one-IC-to-all-channels ratio (not exactly percentage). You re-blend the channel values following this ratio to calculate obtain IC activations.

This blending process is called linear transform. The same applies not only to ICA but also to any kind of linear method such as PCA, Non-negative matrix factorization, etc. Wavelet transform is also a type of linear transform, as it reconstructs a given 1-D signal with a blend of basic waves (wavelet daughters) with different frequencies.

Now, let us proceed more. We have 3-channel data Fz, Cz, and Pz, but Cz is an interpolation of 50% of Cz and 50% of Pz. If we run ICA on this three channel data, what happens? Do you think ICA will still give you three independent components, even though your Cz is mere blending of Fz and Pz? Now you think about the meaning of the word independent. If you have signals in Fz and Pz and their blending ratio, you can 100% predict Cz's signal behavior. You probably see that Cz does not carry unique information at all. Rank deficiency indicates this kind of situation. Namely, your number of channels is inflated due to redundancy.

By the way, in the case of average referencing, data redundancy is not concentrated on one channel but in a sense uniformly spread across all the channels, which is why it is hard to see why average referencing causes rank deficiency. In this case, think about what happens when you perform average referencing on two channel data, Fz and Pz. After subtracting average reference, they will be identical, just polarity reversed.)

If you want, I can give you more extreme example: Instead of blending Fz and Pz with 1:1 ratio, we can set the blending ratio to be 0.99:0.01, 0.98:0.02, 0.97:0.03,... to generate 100 channel data. As you see, you can generate as many channels as you want in this way. However, even if you have 100ch data genereated in this way, unique information is present only in Fz and Pz. You can't create information out of nowhere.

Ok, let's get back to the current question: if we run ICA on 3-channel rank-2 data, what happens? The answer is, you obtain only 2 ICs. The number of ICs you obtain is equal to the rank of the data, not the number of channels (as we see, the number of channels could be easily inflated by generating redundancy). This explains the very original question of why the second ICA returns only 99 ICs instead of 100 ICs. When you rejected the blink IC, the back-projected scalp channel data is no longer full-ranked, but it became rank deficient by 1. Because ICA cannot generate independent component out of nowhere (it is NOT a real magic after all!), it will give you the same number of ICs as the data rank. By the way, the rank check is not ICA's inherent property. In the realistic implementation, rank check is performed by other functions. If ICA is set to run to generate 100 ICs from rank 99 data, it either stops in failure or generate weird result (see 'ghost ICs' explained below).

Now, the last question: why do the 99 ICs remain the same? If you read all the explanations above, now you should know what ICA cares and what not. ICA does not care superficial representation on scalp channels in the sense that channel data are just one of the infinite linear combinations of the source activations (imagine your electrode cap, which has 128ch covering entire scalp area, is rotated slightly to arbitrary direction so that all the channel positions are shifted into nameless positions. Under this situation, ICA will still give you the same source locations with the same source activations. ICA sees the channels as projections of sources.) Instead, ICA cares who are independent across channels. Because the identified blink IC is (theoretically) independent of any other ICs, presence or absence of the blink IC does not affect the state of other ICs, hence decomposition is unaffected.

Tips for copying ICA results to other datasets (06/26/2018 updated)

  • You can copy a set of ICA-related matrices i.e., EEG.icaweight and EEG.icasphere to other datasets. EEGLAB GUI operation supports this process. Thus you can for example learn ICA matrix from aggressively cleaned data, while applying the results to the mildly cleaned data to save data points.
  • In doing so, however, if noise is present which ICA did not learn, the noise will be present in the ICA results.
  • However, make it ABSOLUTELY clear about the two datasets for the following points.
  1. They have the same number of channels in the same order.
  2. They have the same (re-)referencing status.

The problem here is that even if you make a serious mistake, EEGLAB does not let you know about it. Therefore, you have to check it yourself carefully. To confirm if you did not make any mistake, [Plot] -> [Component activations (scroll)] to see the IC activations. If the copying process is successful, you see variances (i.e., amplitude) of ICs are sorted as usual, the largest on the top and the smallest in the bottom. Note that scalp maps for ICs (which is EEG.icawinv itself) look normal even if you have this problem, so do not be misled. Always check IC activations. If the copying process ended up with failure, you will notice the following symptoms.

  1. IC activations are not sorted in the order of variance i.e., it does not follow the rule that largest on top and smallest at bottom.
  2. Multiple IC activations are temporally correlated (often a pair with polarity inversion)
  3. Some of IC activations, which is not necessarily top-ranked, shows abnormally large amplitudes.

See below for the example.

Figure 4. Corrupted data due to failure in copying ICA weight matrix.

  • There is a paper claiming that ICA (actually, any linear decomposition) distorts phase (Montefusco-Siegmund et al. (2013). Effects of ocular artifact removal through ICA decomposition on EEG phase.) Some people hence claim that one should avoid using ICA at all. This claim is illegitimate (as Arno says 'resounding no' in his page linked below). In this wiki page, my colleagues and I illustrated the mathematical process of how phases are synthesized before and after component rejection to show that 'this is a due change in phase and nothing surprising'. See also this page written by Arno showing empirical demonstration that rejects this claim.

To learn how to evaluate EEG and artifact ICs (01/24/2017 updated)

Please visit this site to start the training. This nice project was designed by my colleague Luca Pion-Tonachini.

Unofficial instruction to install AMICA (07/17/2018 updated)

Download AMICA from EEGLAB extension manager [File]-> [Manage EEGLAB extensions] -> [Data processing extensions] -> Check AMICA to Install. Below I show old information.

  • Use AMICA developed by former SCCN colleague Jason Palmer. AMICA does better job than Infomax (see Delorme et al., 2012).
  • Below I show an example of successful installation for Windows 7 64bit. I also heard a report of successful installation on Mac OS Sierra on Core i7, using Matlab 2014b and EEGLAB 13_6_5b. This does not mean EEGAB 13 is more stable for this purpose; Use the current version of EEGLAB.
  • DO NOT use characters other than ALPHABET and NUMBERS in the file path. For example, if you have a space in the file path, IT WILL FAIL! (Thank you Xuan Tran for helping me to find this out)
Unofficial instruction to install AMICA for 64bit Windows.
1. Go to Jason Palmer's website to find the links to the files.
2. Create a folder named 'amica' under /eeglab/plugins
3. Download and copy the following 5 items to /eeglab/plugins/amica
   -eegplugin_amica.m
   -pop_runamica.m
   -runamica15.m
   -loadmodout15.m
   -amica15mkl.exe (Windows 7/10 64-bit binary)
4. Download and copy the following 2 items to /Windows/System32
   -fmpich2.dll (Windows DLL, save in directory with binary)
   -libiomp5md.dll (Windows DLL, save in directory with binary)
5. Download and execute mpich2-1.4-win-x86-64.msi (for 64bit Windows) from http://www.mpich.org/static/downloads/1.4/
   Double-click the downloaded .msi file to install it.
6. Install postAmicaUtility() plugin from EEGLAB plugin manager. Otherwise, there is no GUI button to load AMICA results.
   Below I show an exmaple code to load AMICA results from command line. Note that the variable 'dataName' is the name used for AMICA result output folder.
   EEG.etc.amicaResultStructure = loadmodout15(['/data/projects/example/amicaResults/' dataName]);
   EEG.icaweights = EEG.etc.amicaResultStructure.W;
   EEG.icasphere  = EEG.etc.amicaResultStructure.S;
   EEG = eeg_checkset(EEG. 'ica');

Estimate single equivalent current dipoles

'Tools' -> 'Locate dipoles using DIPFIT 2.x' -> 'Autofit (coarse fit, fine fit & plot)'. You need to run coarse fit first then fine fit next, but this 'Autofit' does both together anyway. Note that 'Rejection threshold RV (%)' here should be 100, otherwise you'll have trouble in making STUDY later. Do not change anything on the GUI and just press 'OK'.

In the results, there is a known issue of 'too deep dipoles'. Small portion of dipoles are often estimated in corpus callosum, basal ganglia, or cerebellum. In order to observe measurable EEG in scalp, a group of neurons needs to be organized in cito- and chrono-architecturally to fire in sync, but these regions do not have such mechanism. So these dipole locations are impossible. You'd better admit that these localization results are electrophysiologically invalid, and give interpretation only after pulling their positions toward surface appropriately (i.e. using some physiological fact or EEG-literature evidence). Their raw positions still works fine as a clustering constraint, since physiological validity does not matter there. Personally, I think that this kind of systematic errors and biases derive from a clear reason. Once it is clarified, we can apply the correction systematically.

Search for and estimate symmetrically constrained bilateral dipoles

Download fitTwoDipole() plugin for EEGLAB and perform it after fitting single dipoles. It is compatible with eegh, and you can make a batch. The code is like this: EEG = fitTwoDipoles(EEG, 'LRR', 35); Do not forget to site Caterina's paper.

Create STUDY (01/05/2017 updated)

When creating STUDY with epoched data (assuming that you have within-subject conditions), I recommend that you use one set file per subject and never save the data separately for each condition (which is an old approach used before STUDY.design was implemented). In theory both works, but there were reports of problems caused by separate .set files. I recommend you use STUDY.design to let STUDY know the factorial design of the analysis.

When creating STUDY, I usually load all the .set files into EEGLAB using pop_loadset() with 'loadmode', 'info' (it may be possible to do it from GUI too, but it is convenient to input EEG.subject and EEG.group altogether at this point). Importantly, creating a STUDY cleans your data because it performs IC rejection based on the two criteria: 1) dipoles with > 15% residual variance--Artoni et al. (2014) showed (indirect) evidence of why this value is about optimum; 2) dipoles outside brain--no argument. This is why you don't need to bother to manually reject ICs at the individual level. Don't forget to set STUDY.design before you start precompute. After precompute, you precluster the ICs by choosing clustering criteria.

Avoid double dipping (03/07/2019 updated)

If you are going to test ERP/ERSP/ITC/PSD difference (for example) across conditions at the end of the day, DO NOT choose any of them to cluster ICs. This problem is called 'double dipping', a type of circular inference that is known to inflate false positive rate. If you have not heard of the word 'circular inference', 'double dipping', or 'voodoo correlation', see Kriegeskorte et al., 2008, Kriegeskorte et al., 2009, and Ed Vul's voodoo correlation site. Note that these were published more than 10 years ago. Note also that much more fMRI researchers are aware of the issue than EEG researchers. So 'I have never heard of it from any of my colleagues' can not be an excuse. Ok, so why can you not use ERP to cluster ICs, then to compare ERP amplitude across conditions? If I understand the problem correctly, there are at least two issues:

  1. The resultant clusters have, by definition, smallest variance in the given measure (i.e., ERP) within a group. Given that mean values remains the same, variance becomes smaller means increase in probability for getting significant results i.e., inflation of Type I error (false positive).
  2. I heard some say the above problem can be avoided by using non-parametric test. Fine. However, before performing actual inferential statistics, you form IC clusters, in which ICs with dissimilar ERP waveforms are actually kicked out (because that's what you are doing.) But it is a good scientific practice to kick out any data points just because they are 'against your hypothesis/assumption'?

Then, how can we avoid this issue? The answer is simple: USE ONLY DIPOLE LOCATIONS to cluster ICs, and never use any measure that is derived from time/frequency domains. EEGLAB does allow users to choose multiple clustering criteria, which is misleading. If you do not want to experience hard time to explain to your reviewers why you chose ERP/ERSP/ITC/PSD to cluster ICs, just use dipole locations.

Can we determine the optimum number of IC clusters? (01/14/2020 added)

For Fran and Seyed, I would like to share my solution. But this is usable only for the case where you have only dipole locations as a clustering criterion. Sorry it is not nicely wrapped up into an EEGLAB plugin, but you have to run some code, which is a bit too bulky to paste here so please be referred to this page.

Why using dipole locations does not cause double dipping? (03/07/2019 updated)

Dipole location is estimated from an IC scalp topography, which is a direct property of ICA (it is a column of EEG.icawinv). Why do I say that using dipole locations is safe in terms of double dipping? You can empirically prove it with an simple experiment: Randomize the order of channel assignments before you perform ICA. What do you get after ICA? You will get the identical results in time-series and power spectral density. However, you will get randomized scalp topography. What does this mean? This is a proof that ICA is agnostic on spatial information. Therefore, using dipole locations estimated from scalp maps does not cause double dipping in testing time-domain and frequency-domain data.

A side note--here, an interesting question is that 'Then, why are ICA results always associated with nice dipolar scalp topographies?' Indeed, in Delorme et al. (2012), they even made a claim that more mutual information reduced, more dipolar their scalp topographies become. Let us tentatively call it 'independence-dipolarity identitiy (IDID)'. As I mentioned, because ICA is agnostic on spatial information, dipolarity of the scalp topographies is not a direct property of ICA itself. Thus, an explanation for IDID is non-trivial. Unfortunately, SCCN has not published any 'official' explanation for this, but instead has been using it as a proof that ICA meets physiology. It seems convincing, but to really convince researchers from other fields, we will need good supporting evidence in electrophysiology.

A tip to compute time-frequency transform (i.e. ERSP & ITC) (01/11/2017 updated)

EEGLAB uses its own way to define wavelet basis. Technically it is a mixture of short-term Fourier Transform and Wavelet Transform since the cycle number varies by default (for detail, see the archived communication starting with this. This was the best discussion ever in the history of the mailing list.) Now, the problem is how to set the parameters to perform it properly. Of course the best parameters depends on your application. But such logically self-evident statement carries no information. I recommend 3 cycles at 3 Hz for the lowest frequency, and about 7-10 cycles at 50 Hz (or 15-20 cycles at 100 Hz, if 100 Hz is the maximum frequency you want to analyze.) I say this because I remember people used 7 cycles to analyze 40Hz gamma in early 2000's. This 'number of cycle' parameter is not very sensitive at high frequencies, and the difference of 2 or 3 would not make noticeable differences. I copied some of slides from my previous EEGLAB workshop slides below. See below for the comparison between 5 cycles vs. 25 cycles at 50 Hz below. Overall, my advice is that do not be too nervous about it.


Slide1.jpg Slide2.jpg Slide3.jpg

Alternative, more automated pipeline using ASR (11/13/2019 updated)

One of the drawbacks in the abovementioned cleaning approach is that if a short burst of artifact appears in one of two hundred channel data, for example, you still need to give up the entire epoch. The more channels you have, more probable it is for you to encounter this issue, like a false positive rate that inflates as you increase the number of tests, so you need to reject more epochs... do you think this is a good deal? Did you spend money to buy a recording system with more channels to reject more data epochs? Something is wrong... if you agree to this argument, using Artifact Subspace Reconstruction (ASR) is a good way to address this issue. I put a summary of ASR on the different page for detail.

Suggested preprocessing pipeline (11/27/2019 updated)

  1. Change the option to use double precision
  2. Import data
  3. Downsample if necessary
  4. High-pass filter the data at 1-Hz (for ICA, ASR, and CleanLine)
  5. Import channel info
  6. Remove line noise using CleanLine
  7. Apply clean_rawdata() to reject bad channels and correct continuous data using Artifact Subspace Reconstruction (ASR).
  8. Interpolate all the removed channels
  9. Re-reference the data to average
  10. Run AMICA using calculated data rank with 'pcakeep' option (or runica() using 'pca' option)
  11. Estimate single equivalent current dipoles
  12. Search for and estimate symmetrically constrained bilateral dipoles
  13. Generate probabilistic IC labels using IClabel() for IC rejection
  14. Epoch IC-rejected data to -1 to 2 sec to event onset (So far is the subject-level process)
  15. Create final STUDY specifying full STUDY.design (This is the group-level process using STUDY)

The control over clean_rawdata() was transferred to the main developer of EEGLAB. The main GUI no longer looks like this, but it should be more or less the same. I'll update it someday.

Figure 5. Sample parameter settings for pop_clean_rawdata.

  • The first box is disabled (-1) because I assume there is no flat channels. This can be turned on if you want.
  • The second box is disabled (-1) because FIR high-pass filter is already applied. I think this plugins applies IIR since this cleaning solution is originally for realtime online data processing.
  • The third box means that if a channel is correlated to the surrounding channels less than 0.8, the channels is rejected. If it rejects too many channels, use less value.
  • The fourth box is disabled (-1) because line noise is supposed to be already taken care of.
  • The fifth box is the ASR. This is what it does.
  1. Find calibration data (i.e. the cleanest part of data)
  2. Apply 0.5-s sliding window principal component analysis (PCA) on the continuous data.
  3. Classify PCs into high variance (in this case, 8 standard deviation from the calibration data) or normal variance.
  4. Reconstruct high variance subspace (i.e. group of channels associated by a PC) with normal variance subspace. An important assumption is that neighboring channels are heavily correlated due to volume conductance and scalp mixing (by the way... because of this, the central limit theorem works and the scalp channel signals become closer to Gaussian than source signals; this guarantees ICA. See also Arnaud Delorme's explanation of ICA as non-Gaussianity maximization). This is how one can estimate a signal of one channel from signals of adjacent channels. From my experience, I would advice that one should start with very lax value, such as 10-20, to use it sparingly to correct only absolutely bad glitches that critically influences ICA's performance. Nima's result showed that ASR with SD=20 produced the best result. My interpretation for this is that we should let ICA explain the data as much as possible so the correction using PCA-based method should be minimized, and the empirical balancing point is somewhere around 10-20.
  • The sixth box is the window rejection criterion. If more than 0.25 of channels are judged to be bad even after ASR, the window will be rejected.

When using ASR, this is a good guidance to control the total amount of rejection.

  • You need more channels -> Use lower values in the third box
  • You need more data points -> Use lower values in the fifth box (i.e. correct more using ASR) and use higher values in the sixth box (i.e. reject less windows).

Important note: clean_rawdata() generates channel and datapoint masks that indicate rejections. They are stored under EEG.etc.

How to perform automated IC selection using IClabel() (11/13/2019 updated)

IClabel() is a relatively new EEGLAB plugin written by my former colleague Luca Pion-Tonachini (you pronounce tonakini, not tonacheeni). I found this plugin wonderful. I thought I can finally replace my manual component selection process with this completely automated process. Here, I show an example code of how to perform brain component selection with label probability > 0.7 and residual variance < 0.15. In case you do not know what 'residual variance' mean, see the next subsection.

% Perform IC rejection using ICLabel scores and r.v. from dipole fitting.
EEG       = IClabel(EEG, 'default');
brainIdx  = find(EEG.etc.ic_classification.ICLabel.classifications(:,1) >= 0.7);
rvList    = [EEG.dipfit.model.rv];
goodRvIdx = find(rvList < 0.15); 
goodIcIdx = intersect(brainIdx, goodRvIdx);
EEG = pop_subcomp(EEG, goodIcIdx, 0, 1);
EEG.etc.ic_classification.ICLabel.classifications = EEG.etc.ic_classification.ICLabel.classifications(goodIcIdx,:);

What does 'residual variance' mean (11/13/2019 updated)

'Residual variance' is a typical Swartz Center jargon (other examples includes 'PVAF', 'dipolarity', and 'temporally maximally independent'.) But the problem of 'residual variance' is, I believe, that it is underdescribed. For example, residual, from what operation? Variance, across what? In fact, until several years ago I probably could not describe how it is calculated (another such example of IC ERP... you'll be probably surprised to know how complicated it is.) I created a slide to explain the concept of 'Residual variance'.

Dipolarity.jpg

To answer my previous question, 'residual' means the difference in 'IC scalp projection' minus 'theoretical dipole projection', and 'variance' means 'variance of the differences across all the scalp channels'.

However, another layer of difficulty is to see the relations across IC, equivalent current dipole, and its scalp projection. ICA generates mixing matrix (A in X=AS) and unmixing matrix (A^-1 in S= A^-1)X). Scalp projection of IC is just a visualization of the columns of A^-1; you tile the values in this column in the order of your register channels and you obtain the IC scalp maps. Equivalent current dipole is fitted to this IC scalp map. After fitting the dipole, you can calculate noiseless scalp projection from the estimated dipole. Finally, the noiseless scalp projection is compared with the IC scalp map to calculate residual variance in the following way.

residual_variance = var(IC_scalp_map - noiseless_scalp_projection)/var(IC_scalp_map)

Comments on 03/16/2018

As of today, people use ASR more and more.

  • First, I use it all the time now, no exception. This means at least one experienced user found it useful.
  • Second, I began to explain ASR in the official EEGLAB workshops since last year (for example, see this pdf--this is the file I distributed in the 25th EEGLAB workshop at JAIST Tokyo satellite, Tokyo.)
  • Third, the main EEGLAB developers Arnaud and Ramon are writing a technical paper to demonstrate SCCN's data preprocessing, in which they decided to use ASR (it's in preparation, not published yet).
  • Fourth, people told me that they use ASR and they saw others use ASR as well.

Despite increasing popularity, the mechanism is poorly understood. I learned that people use it without knowing what it does. This is partly because ASR is inherently complicated, partly because there is no dedicated technical paper available, partly because the published material available never provides full description, partly because people get used to use a blackbox as a blackbox... As of today, the reference that is both informative and citable about ASR is probably Mullen et al. (2015). However, 'real-time wireless dry' does not sound very associated with ASR. The more detailed description of ASR will be published in near future (Loo et al., under review), but the title will be even less associated with ASR. I hope this wiki page is instrumental for the time being as a 'bridge' for those who are seeking for ASR information, until my colleagues Chiyuan Chang and Shawn Hsu publish the dedicated technical paper about ASR (it's in prep).

ASR FAQ (updated 10/15/2018)

  • Q1. In the screenshot below, why the quiet portion between 0.4-0.5 was needed to be removed?

QandA001.jpg

(The screenshot by courtesy of Małgorzata Plechawska-Wójcik)

  • A1. Three possible problems:
  1. The data are noisy all the time, so 'calibration data' (i.e., the cleanest part of the data) also contains noisiness.
  2. The left-tail cutoff is too high (default -3.5) in RMS Z-score distribution.
  3. The low-frequency drift is present.

I will explain each of them below.

1) The 'calibration data' (i.e., the cleanest part of the data) ASR chose in the very initial stage of the process contains high amplitudes on average than these 'quiet' chunks. So it is relative to within-data variance; if your data are generally noisy, you will see more rejections of these 'quiet' chunks.

2) Let me explain the mechanism to begin with. The final window rejection is performed in the following way:

  1. Applying the custom heuristic IIR spectral weighting (8th order Yule-Walker). See the screenshot below (taken from Christian's PPT).8thYuleWalker.jpg
  2. Take root-mean-square (RMS) amplitude of the filtered data.
  3. Perform Z-score transform.
  4. Find datapoints outside -3.5 and 7.0 as still noisy.

Notice that Z-score below -3.5 is determined to be 'still noisy'. If you don't want to discard any data chunk for being 'too quiet' (RMS variance too small) at all (which is understandable as long as I see the above example), please change the value [-3.5] into [-7.0], [-10.0], or even [-inf] (i.e., negative infinity) at clean_artifacts() line 184. You may also want to make the same change in line 183 for consistency.

 clean_artifacts()
 line 183
 Before: {'burst_crit_reftolerances','BurstCriterionRefTolerances'}, [-3.5 5.5], ...
 After : {'burst_crit_reftolerances','BurstCriterionRefTolerances'}, [-inf 5.5], ...
 line 184
 Before: {'window_crit_tolerances','WindowCriterionTolerances'}, [-3.5 7], ...
 After : {'window_crit_tolerances','WindowCriterionTolerances'}, [-inf 7], ...

This will disable ASR to reject chunks of data for being too 'quiet' (RMS variance too small).

3) The custom heuristic IIR spectral weighting (8th order Yule-Walker) has a quite a (inverse) drop near DC, which does not exactly follows 1/f patterns. This means that more aggressive, high cut-off frequency in designing high-pass filter for DC suppression is justified for this filter. Maybe the remaining low-frequency drift emphasized by this heuristic IIR filter went over the rejection criterion.

  • Q2. I do not want to reject this much data. What parameter should I change?
  • A2. If you use lax threshold for ASR SD, e.g., 8 -> 20, then ASR will clean less data, therefore there will be more chance for the final window rejection. Another factor to control the window rejection rate is the final 'window rejection'. 0.3 means 'If more than 30% of channels still show above-threshold RMS amplitudes, reject this window' Therefore, if you want to reject less data, you should set this value to be high. If you set it to 1, no window will be rejected so that your data length will be equal between before and after the cleaning.

How to determine the parameter for ASR (updated 09/28/2018)

There is a dedicated study by my colleague Chi-Yuan Chang at SCCN. Let me cite Chi-Yuan's conclusion here. 'Empirical results show that the optimal ASR parameter is between 10 and 100, which is small enough to remove activities from artifacts and eye-related components and large enough to retain signals from brain-related components.' If you hear SD == 10 it sounds crazily far from the mean, but for the particular distribution of RMS channel EEG this is the value (hence 'Empirical results show...'). Also, in our unpublished study done by Nima, ASR with SD==20 showed the best cleaning performance in terms of his robust mutual information reduction measure compared with other parameters (ASR with SD==40) or other non-ASR methods. So I would recommend something between 10-20.

Further optimizing the preprocessing order? (02/28/2020 updated)

You may find a lot of 'alpha flooding' (Makeig and Onton, 2011) all over the channels in clinical data, which ASR rejects a lot of windows. In this case, performing average referencing before ASR may help. The suggested preprocessing order should be:

  1. Change the option to use double precision
  2. Import data
  3. Downsample if necessary
  4. High-pass filter the data at 1-Hz (for ICA, ASR, and CleanLine)
  5. Import channel info
  6. Remove line noise using CleanLine
  7. Apply clean_rawdata() only to reject bad channels.
  8. Interpolate all the removed channels
  9. Re-reference the data to average
  10. Apply clean_rawdata() only to perform Artifact Subspace Reconstruction (ASR).
  11. Re-reference the data to average again--this is to reset the data to be zero-sum across channels again.
  12. Run AMICA using calculated data rank with 'pcakeep' option (or runica() using 'pca' option)
  13. Estimate single equivalent current dipoles
  14. Search for and estimate symmetrically constrained bilateral dipoles

To determine which preprocessing is better, the standard one or the modified one, requires an empirical study with good amount of data and computation.

For purists who do not want to use any form of interpolation (03/16/2018 updated)

If you cannot bear with interpolation, which I think is common among us psychologists and non-engineer EEG researchers, I recommend trimOutlier(). It simply rejects above-threshold channels and datapoints in amplitude. The purpose of this plugin is to perform simple and conventional data rejection conveniently. The cleaning performance may not be as good as ASR, but it never performs interpolation, which could be a relief for skeptics (I am). One can perform quick but thorough cleaning for ICA (ICA does not care how data are chopped up), and the obtained ICA weight matrix can be applied to the less-aggressively cleaned data. It is also a very useful tool to visualize all channels and datapoints in one window (somehow I don't know any other tool that does this).

Also, check out 'The PREP Pipeline: Standardized preprocessing for large-scale EEG analysis' by Bigdely-Shamlo, Mullen, Kothe, Su, and Robbins (2015) for the theory of upstream EEG preprocessing, as well as their tools freely available for Matlab.

Example of batch code to preprocess multiple subjects (01/12/2017 updated)

See this page.


Dependency across the preprocessing stages (07/05/2019 updated)

During the 28th EEGLAB workshop at UCSD, I heard from a group of people that they want to see the reasons of the order of the preprocessing stages.To answer this question means to show dependency of the stages in preprocessing. To respond to this unexpectedly popular request, I prepared the following document (those who joined my dinner table, sorry for taking so long time!) The numbers on top of the connecting curve lines indicate reference numbers for the explanations shown below. In the boxes, parenthesis means an optional process.

PreprocessingPipeline2.jpg

  1. [Remove non-filterable artifacts -> High-pass filter the data]: This preprocessing is necessary for the cases such as simultaneous fMRI-EEG or simultaneous TMS-EEG (see this page). The high-amplitude artifact of the external origin reaches to the order of millivolt. If you apply a standard frequency filter, it will leave a massive residue on the data (if you wonder how it happens, see this page). Therefore, you should NOT apply a standard filter to these (rather rare) types of data. If you are curious to know what to do with the artifact, see this page for the case of simultaneous EEG-TMS.
  2. [(Downsample the data) -> (Run Source Information Flow Toolbox)]: Downsampling to something like 100 Hz is generally recommended for connectivity analyses. The datapoint-to-parameter ratio is calculated by ((num_IC)^2*model_order)/window_length*num_trials. If you downsample the data, it will (probably) reduce the model order and (by definition) reduce the window_length. Because you want to have datapoint-to-parameter ratio < 0.1, meaning that 10 data points are used to estimate one parameter point, and smaller is better (i.e., more data points are available to estimate the model). Generally speaking, donwsampling to around 100 Hz will worsen the datapoint-to-parameter ratio. However, sometimes it is still necessary to do it. For example if your data set has a bad line noise for which you needed to apply a low-pass filter to push the high 50/60 Hz peak into the maximum suppression frequency range. However, because you don't want to have stopband within the Nyquist frequency range (which is half of your sampling rate in Hz), if you have a big 50/60Hz line noise which cannot be reduced by CleanLine (which does happen quite often, unfortunately), the only possibility is to downsample it so that the 50/60Hz is outside the Nyquist frequency. In doing so, do not forget to apply the additional two inputs, like EEG = pop_resample(EEG, 100, 0.8, 0.4); 0.8 means 100/2*0.8 = 40Hz is the -6dB point, and 0.4 means 100/2*0.4 = 20Hz as a transition band width. Thus, EEG = pop_resample(EEG, 100, 0.8, 0.4); means 'Downsample the data to 100Hz using antialiasing filter with 40Hz cutoff point (-6dB) and transition bandwidth of 20 Hz so that the pass-band edge is 40-20/2 = 30 Hz, and the stop band is 40+20/2 = 50Hz.' See more info here.
  3. [High-pass filter the data -> CleanLine to remove line noise]: If I remember correctly, if you have a large DC drift in your data, CleanLine does not work well. It's been while since I had tested it, so this needs to be revisited again to double check.
  4. [High-pass filter the data -> (Epoch the data to an event onset)]: Filtering after epoching (i.e., crop out the peri-event data points and maybe stack in the third dimension; EEG.data(:,:,:)) is not a bad idea. The FIR filter, which could be considered as a sliding window process, cannot process near the edge of the epoch (half of the width of the FIR window). The same happens around the 'boundary' event in EEGLAB. When filtering process finds the 'boundary', it will stop when the right edge of the sliding window hit the 'boundary', and resume from the point where the left edge of the sliding window is touching the 'boundary', so those data points that has the width of the sliding window and centered at the 'boundary' data are skipped (which is why you see the unfiltered original noise around 'boundary'). Particularly, FIR high-pass filter requires a quite long filter order (>2-3 sec), so it does not work after epoching. By the way, filter order means the width of the sliding window in data points) which could be even longer than your epoch.
  5. [High-pass filter the data -> Run ICA]: There is a published study showing that high-pass filter up to 2Hz improves ICA's performance. See this paper by Winkler et al.(2015).
  6. [High-pass filter the data -> (Run Source Information Flow Toolbox)]: High-pass filter improves the MVAR estimation. This makes sense intuitively that if you have a low-frequency activity, you will need a long time window to explain it, hence it will require the longer model order. SIFT has a piecewise sliding-window linear regression, but it does not mean it does better job than the standard FIR filter; Barnett and Seth (2011) showed that multivariate Granger causality is in theory invariant under zero-phase (a.k.a. phase-invariant) filter. They do recommend filtering to achieve stationarity (e.g., drift, line noise) See Seth, Barrett, Bernett (2015). For more detail, see this page.
  7. [Import channel info -> Remove bad channels with clean_rawdata()]: When the channel rejection function is called from clean_rawdata(), depending on the channel location availability, it changes the algorithm. It is of course better to perform it with the presence of channel location information so that it can use geometrical information of the scalp electrodes.
  8. [Import channel info -> Interpolate removed channels]: For channel interpolation, using the default channel location is necessary because the algorithm uses spherical spline.
  9. [Import channel info -> Coregister sensor locations to the head model]: The imported channel location is coregistered with the template head model, such as MNI (i.e., Montreal Neurological Institute) standard head model or 4-shell spherical head model. If the imported channel information contains an error, naturally the subsequent coregistration could be affected.
  10. [Remove bad channels with clean_rawdata() -> Interpolate removed channels]: Depending on which scalp electrodes are excluded, the subsequent channel interpolation is performed.
  11. [Remove bad channels with clean_rawdata() -> Run ICA]: Removing n number of scalp electrodes and the maximum rank (i.e., the number of independent time series in linear algebra) of your sensor-level data is reduced by n. Note that by interpolating the scalp electrode recovers the number of (apparent) channels, but the data rank will never be full-rank again (because, again, the rank is the number of independent time series. Anything that are interpolated out of existing data is NOT independent of the those that are mixed.) ICA, either Infomax or AMICA, would know the true rank of data by running its rank checker before starting ICA, so usually you don't need to do anything. Let me use an example. There is 64-ch data, from which you rejected 4 channels. Now you have 60 channels. You interpolated the rejected channels so that now 64 channels are back. However, your interpolated 64 channel data has rank of 60, and ICA will return 60 ICs, not 64.
  12. [Interpolate removed channels -> Re-reference to the average]: If you perform average reference, it literally computes the average across all the channels. Imagine you have reject much more channels on the right hemisphere than those on the left hemisphere. Do you think the calculated 'average' is unbiased? Definitely not. This is why channel interpolation is recommended before average referencing.
  13. [Interpolate removed channels -> Run ICA]: Why do I recommend channel interpolation before ICA? For example, after building STUDY structure, you may want to back-project the clustered ICs to a specific scalp electrode(s), say Cz, with different choices of IC clusters (see here). However, back-projecting IC(s) to the rejected electrode(s) is probably impossible, because ICA never had a chance to learn how the data observed at the rejected channel(s) correlate with other channels. This is why it is recommended to perform spatial channel interpolation BEFORE ICA, so that ICA generates sphering matrix that takes the recovered channels into account. ICA itself does not need to care about the number of the scalp channels, it only cares the data rank. It is PCA's job that performs dimension reduction to make the input to ICA full-ranked. With or without the 'recovered' (i.e., interpolated) channels, as long as they are just linearly interpolated using other channels (on this point, spline interpolation is not necessarily perfectly linear, so weird things can happen if data rank is incorrectly estimated! It will create a ghost, as discussed on this page), the size of the ICA's weight matrix is unchanged. Just sphering matrix changes it size. Hence this interpolation is not a requirement, but if you want to perform group-level back-projection, then you NEED to do it. It does not hurt at all to do it anyways, so my recommendation is to make it a part of standard routine all the time.
  14. [Re-reference to the average -> Run ICA]: ICA result is basically independent of the choice of reference, except this minor thing: If you don't perform average reference before ICA, you might obtain IC scalp topography (which are projections from ICs to the scalp) that are all read or all blue. This is because of DC across the scalp channels on the topography (i.e., the channels of the topo are not zero-sum; but they are not exactly zero-sum if you follow this method; this is the right way to apply average reference).
  15. [CleanLine to remove line noise -> Run ICA]: If the line noise fortunately very stationary (i.e., you see stationary envelope after band-pass filtering at the 50/60Hz) and CleanLine works well, then ICA will not need to decompose line noise, which should help ICA's performance. Theoretically, extended infomax and AMICA can model sub-Gaussian distribution, and the probabilistic density function of the line noise is the typical example of sub-Gaussian distribution. However, how stationary the line noise in the data is another story, it may be sensitive to subject's body movement etc. Generally speaking, ICA is not that good at modeling the line noise, so it is better to let CleanLine or other filters to take care of the line noise.
  16. [CleanLine to remove line noise -> (Run Source Information Flow Toolbox)]: If and only if your power spectral density has a smooth tail up to the Nyquist frequency that is above the line frequency, and if CleanLine can remove the line noise efficiently because it fortunately shows stationary temporal characteristics, performing CleanLine to remove the line noise helps to make the power spectral density of the data (either sensor-level or ICA-decomposed) further smooth, which helps in modeling the data using MVAR i.e., the model order could be shorter.
  17. [(Epoch the data to an event onset) -> Run ICA]: One critical assumption about ICA is stationarity. This means that any part of the data you sample, they show the same statistical property in distribution i.e., probabilistic density function. The best way to ensure data stationarity is to perform event-related recording paradigm so that your data are filled with repeated brain responses as many as possible. If your recording contains a lot of non-event-related part of data, such as between-block resting, you may consider to epoch the data so that ICA is fed only with heavily repeated brain responses (after all, ICA goes very well with ERP paradigm!) However, if your paradigm is typical mismatch negativity where there is no 'outside epoch' data points, ICA does not benefit from epoching at all. By the way, being epoched or not does not matter to ICA; epoched data are reshaped so that all the epochs are concatenated into a 2-D array of channels x timepoints.
  18. [(Epoch the data to an event onset) -> (Run Source Information Flow Toolbox)]: Again, in modeling MVAR, the datapoint-to-parameter ratio is calculated by ((num_IC)^2*model_order)/window_length*num_trials. If the data is continuous and there is no trial, the denominator 'num_trials' is 1, so the only way to gain the datapoint-to-parameter ratio is to use a very long sliding window like 30 seconds. The more trials, smaller the datapoint-to-parameter ratio becomes. If there are conditions you can combine, you may want to consider doing so for the connectivity analysis.
  19. [Run ICA -> Coregister the sensor locations]: The IC scalp maps you see is rendered on your definition of sensor locations. There is a pre-calculated electric forward model of the head separately, which does not know your definition of your coordinate system in sensor locations. To make the IC scalp topography physiologically meaningful, you need to coregister your channel space to that of the pre-calculated head model.
  20. [Run ICA -> (Run Source Information Flow Toolbox)]: There is a known issue in calculating connectivity directly at the sensor level. For detail, see Brunner et al. (2016). Thus, some process that can address volume conduction is necessary. ICA's assumption is that it returns the stationary and temporally independent sources, which addresses the issue of volume conduction.
  21. [Coregister the sensor locations -> Fit single and symmetrically-bilateral dipoles]: If you coregister the sensor locations to the electric head model and you don't perform equivalent current dipole model fitting, it's pointless! The dipole fitting process is like converting the 2-D information of IC scalp topography, which can be interpreted as scalp projections from the stationary and temporally independent sources, into 3-D space with dipole model. The important fact is that the original 2-D information, upon which 3-D dipole location is estimated, is directly derived from ICA. So it's not too much to say that the estimated dipole locations are indicated by ICA itself. To fit symmetrical bilateral dipoles, you need a plugin developed by Caterina Piazza.
  22. [Fit single and symmetrically-bilateral dipoles -> (Run Source Information Flow Toolbox)]: SIFT has an option to show either scalp topographies or dipole locations in showing the time-frequency grid of the connectivity. If you don't calculate the dipole fitting, you can't show the dipole locations to indicate the graph nodes.

SIFT tips (08/06/2019 updated)

I found that the equation used in calculating datapoint-to-parameter ratio is incorrect. According to Schlögl and Supp (2006) as well as Korzeniewska et al. (2008), the equation must be (num_IC*model_order)/window_length*num_trials. This change affects how you determine parameters, in a 'better' way I would say as shown below: 1) more ICs can be included; 2) less number of trials can be tolerated; 3) shorter sliding window can be used. This change will particularly impacts continuous data analysis, as the current equation would probably allow sliding window length of a few minutes! In fact, this datapoint-to-parameter ratio has been a limiting factor for me to apply SIFT with confidence.

To obtain the corrected datapoint-to-parameter ratio based on the above-suggested reason, make the following change on est_checkMVARParams() line 85

%winrat = ((M^2)*p)/(wlen*T);
winrat = ((M)*p)/(wlen*T);

That being said, when I discussed this issue with the author, he also told me that if we make a correction, the estimate would be overly lax and would even become useless. I kind of see the point from my experience. Somehow, most of SIFT validation measures are always either too lax (the stability test) or too stringent (the whiteness tests), which are generally hard to follow. In conclusion, I would recommend the above fix to enjoy more degrees of freedom in the analysis design, while trying to stay as conservative (i.e., lower the number, more conservative!) as possible.

Electroencephalosophy (Special contents for 100,000 hit, 09/06/2019 updated)

See my talk at Santa Fe Institute on 07/11/2019. The concept of Makoto's pessimism and scalp EEG as adolesc-i-ence was first presented at a keynote lecture for Neuroadaptive Technology 2017 (abstract and slides (8.4MB) available) on 07/19/2017. I call an investigation to determine the limit of scalp-recorded EEG electroencephalosophy.

Wavelet analysis tips (Special contents for 110,000 hit, 11/27/2019)

Did you know that 'Wavelet analysis' implemented in EEGLAB is NOT a pure wavelet analysis? Actually, it is a hybrid between pure wavelet (i.e., changes window size while the number of cycles fixed) and short-term Fourier Transform (i.e., window size fixed, while the number of cycles changes). If you are interested in knowing more about this topic, I recommend you search for the keywords Units in output of timefreq - wavelet normalization in the following archive page of our EEGLAB mailing list. Very nice clarification was done there, and possible solutions were discussed. You will also notice gorgeous members there. The best moment of EEGLAB mailing list.