# Makoto's useful EEGLAB code

I hope to help those who are eager to code to use EEGLAB more efficiently. Actually, the real purpose of building EEGLAB was to drive users to learn coding... Scott told me so. The following examples show how to deal with EEGLAB data structure using index operations wherever possible (i.e. minimal use of for loops). If you find an error or improvement, please share it on EEGLAB mailing list.

## How to search a keyword in the entire EEGLAB mailing list archive (07/19/2021 added)

Assuming you use Google as a search engine, use the option *site:sccn.ucsd.edu/pipermail/eeglablist/* in addition to your query words to limit the Google search to EEGLAB mailing list. For example, if you want to search for the key word *rank deficient* in the the archived mailing list, try

rank deficient site:sccn.ucsd.edu/pipermail/eeglablist/

I found the same solution here, which seems to work for EEGLAB mailing list.

## How to export Matlab figures in publication quality (05/05/2021 updated)

This is for Mohamed Mounir Tellache. Although we always need publication-quality figures at the end of the day to write a paper, it is often hard to find good solutions. The last resort is to take a screenshot, which I sometimes need to do too, but image quality is limited (unless you are using 8k resolution monitor).

Personally, I do not want to use any other application than Matlab in doing research (except for Word, which is rather a general tool). But here I have to admit one exception, which is Adobe Illustrator. Combined with a figure exported from Matlab in the scalable vector graphics (.svg) format, I can take control over all the details of the figure. Free software like GIMP will also do. Let me share how I process the Matlab-generated figures for publications.

The following code exports the 'current figure' (i.e. which is the figure currently selected on Matlab; click the figure and usually you can make it current, unless the figure is set to 'unable to become current' explicitly in figure property setting) to the user-specified full path. The first option '-dsvg' specifies scalable vector graphic, which is the best choice if you use Adobe Illustrator or GIMP. The second option '-painters' to force Matlab to use the vector graphics mode (otherwise a figure with more than certain level of complexity will be automatically 'rasterized' i.e. not saved as a scalable vector graphics but just a collection of pixels regardless of '-dsvg' option used--this is almost a bug in Matlab.)

print('/data/project/makoto/project/saveFigureName', '-dsvg', '-painters')

Note that by default the figure background is white. If you want to reflect the actual background color, you need to run the additional line.

set(gcf, 'InvertHardcopy', 'off')

The rest of the process depends on your skill in using Illustrator/GIMP which you can learn from Youtube videos etc. Yes, processing these figures with Illustrator/GIMP is very time consuming, but writing code to finish is even more so.

Alternatively, if you want to save the figure as high-resolution jpeg (or any other lossy or lossless formats), try the following.

print('/data/project/makoto/project/saveFigureName', '-djpeg95', '-r300')

'-djpeg95' is an option to use jpeg with 'Quality 95' compression. If you want to suppress the file size, use smaller values such as 80 or 70. '-r300' is an option to specify output resolution, and '-r300' specifies 300 dpi (dots per inch).

Note also that the figure window size on your screen affects the size of the exported figure. This may or may not make sense to you, but generally speaking the best way to see how it affects is to see the exported figure. So you need to repeat try and fix several times to finalize the parameters. Again, if '-dsvg' works, it is not a problem as you can change the scale and aspect ratio of each graphical objects on the Illustrator or Gimp.

### How to export PowerPoint figures in publication quality (12/23/2020 added)

This is not a Matlab trick but I certainly ran into the need that figures for publications are on PowerPoint. The default export parameter to jpeg (or .png) is only 96 dpi, but you want to have 300 dpi. See this Microsoft page for the instruction.

## How to create an inset plot (05/22/21 added)

Sometimes we want to show a subplot in an inset in a plot. Once can use the example code below for specifing the position of the inset plot with Matlab's relative position syntax (the second line).

subplotPosition = get(gca,'position'); insetRelativePosition = [0.6 0.2 0.35 0.2]; % SouthEast. deltax = subplotPosition(3)*insetRelativePosition(1); deltay = subplotPosition(4)*insetRelativePosition(2); insetPosition = [subplotPosition(1:2) 0 0] + [deltax deltay subplotPosition(3:4).*insetRelativePosition(3:4)]; axes('Position',insetPosition)

## How to obtain executed code with input parameters by operating graphical user interface (GUI)

Type eegh in the Matlab command line and you'll see it.

## How to plot multiple channel ERPs in one plot (07/17/2020 updated)

This is written upon the request by Mohammad Reza Khaleghi.

figure plotElecIdx = [3 7 11]; % Select the indices of the electrodes whose ERPs you want to plot. for elecIdx = 1:length(plotElecIdx) hold on plot(EEG.times, mean(EEG.data(plotElecIdx,:,:),3)); % The last '3' means 'Take mean across the third dimension (i.e. trials)' so do not change. elecLabels{elecIdx} = EEG.chanlocs(plotElecIdx(elecIdx)).labels; end legend(elecLabels) % This gives you annotation of which color is which electrode.

## How to add a channel (02/18/2021)

This is for Ilaria Berteletti. If you have already tried to do it using EEGLAB, you must have found that you can't do it with EEGLAB. But apparently I need to do it now. Maybe the right way to do it was to change the initial reference to 'Cz' then apply average reference while 'recovering the original reference channel' in the upstream of the preprocessing but it is too late. This is the hack to do it. When using it, do not forget to change the paths to valid ones. In this example, 64-channel montage but Cz was used as an initial reference, which we want to recover. Note that after applying this change, ALL ICA RESULTS BECOMES UNUSABLE (but if ICs are already rejected for artifact correction, it will be valid--valid means the scalp signal remains rank-deficient). So the right way to do it is still to add the channel first, then ICA second (See this article). This solution is more like for emergency situation for which you only need scalp electrode analysis.

allSets = dir('/data/mobi/ilaria/p0100_imported/*.set'); for setIdx = 1:length(allSets) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Create the new montage template. %%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Use the first dataset for the new template. if setIdx == 1 loadName = allSets(setIdx).name; EEG = pop_loadset('filename', loadName, 'filepath', '/data/mobi/ilaria/p0100_imported'); % Remove ICA-related info (if present) and insert Cz at ch64. EEG.nbchan = 64; EEG.data(64,:) = zeros(1,EEG.pnts); EEG.icaact = []; EEG.icawinv = []; EEG.icasphere = []; EEG.icaweights = []; EEG.icachansind = []; EEG.chanlocs(64).label = 'Cz'; EEG = pop_chanedit(EEG, 'changefield',{64 'labels' 'Cz'},'lookup','/data/projects/makoto/Tools/eeglab14_1_2b/plugins/dipfit3.3/standard_BEM/elec/standard_1005.elc', ... 'eval','chans = pop_chancenter( chans, [],[]);'); % Store it as EEG2. EEG2 = pop_select(EEG,'time',[0 1]); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Batch-apply the new montage template. %%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Load a .set file. loadName = allSets(setIdx).name; EEG = pop_loadset('filename', loadName, 'filepath', '/data/mobi/ilaria/p0100_imported'); % Just in case, re-apply the MNI template locations with chancenter. EEG = pop_chanedit(EEG, 'lookup','/data/projects/makoto/Tools/eeglab14_1_2b/plugins/dipfit3.3/standard_BEM/elec/standard_1005.elc', ... 'eval','chans = pop_chancenter( chans, [],[]);'); % Apply the new montage template. EEG = pop_interp(EEG, EEG2.chanlocs, 'spherical'); % Remove ICA-related results. EEG.icaact = []; EEG.icawinv = []; EEG.icasphere = []; EEG.icaweights = []; EEG.icachansind = []; EEG.dipfit = []; % Save the data. pop_saveset(EEG, 'filename', loadName, 'filepath', '/data/mobi/ilaria/p0110_recoverCz'); end

## How to subsample electrodes evenly (05/12/2021 added)

There is an EEGLAB function *loc_subsets*() written by Nima Bigdely-Shamlo located under the 'miscfunc' folder. I use it only once in a few years, so I forget the name of the function when I need it (thus I have been bothering Nima every few years). It happened to me today, so I put the answer here once and for all. See below for the part of the help document.

% loc_subsets() - Separate channels into maximally evenly-spaced subsets. % This is achieved by exchanging channels between subsets so as to % increase the sum of average of distances within each channel subset. % Usage: % >> subset = loc_subsets(chanlocs, nchans); % select an evenly spaced nchans % >> [subsets subidx pos] = loc_subsets(chanlocs, nchans, plotobj, plotchans, keepchans);

## How to re-reference to average potential including the initial reference (01/25/2022)

I talked to Arno this issue and I thought we agreed but I still don't see the proper solution implemented in EEGLAB2021.1. Here is my solution. Notice the denominator is EEG.nbchan+1, not EEG.nbchan. This '+1' accounts for the initial reference and does the trick so that the averaged-referenced data are NOT sum-zero i.e. not rank-deficient.

EEG.data = bsxfun(@minus, EEG.data, sum(EEG.data,1)/(EEG.nbchan+1));

## How to add the initial reference electrode back by average referencing (03/10/2021)

This is for Ilaria Berteletti. This example shows 64-channel recording data with the initial reference at Cz. The data is imported as 63-channel data, then Cz is added as the initial reference channel, which is 'recovered' when re-referencing to average. Note that this approach causes effective rank deficiency, so this process needs to be done AFTER clean_rawdata(). pop_runica() has its own rank checker so no problem.

[EEG.chanlocs.ref] = deal('Cz'); EEG = pop_chanedit(EEG, 'append',63,'changefield',{64 'labels' 'Cz'},'lookup','eeglabroot/plugins/dipfit3.3/standard_BEM/elec/standard_1005.elc',... 'eval','chans = pop_chancenter( chans, [],[]);','changefield',{64 'type' 'REF'}); EEG = pop_reref( EEG, [],'refloc',struct('labels',{'Cz'},'type',{'REF'},'theta',{-94.2457},'radius',{0.023785},'X',{-0.54445},'Y',{7.3339},'Z',{98.2349},... 'sph_theta',{94.2457},'sph_phi',{85.7187},'sph_radius',{98.5098},'urchan',{64},'ref',{''},'datachan',{0}));

## How to extract subjects and independent components from STUDY structure

Please read this page. Note that it says 'This apparently complex scheme allows to use...'

## How to extract EEG power of frequency bands (06/06/2020 updated)

This is one of the most frequently asked questions in the EEGLAB mailing list. By the way, I wrote erratum to correct code on 02/06/2016--thank you Jerry Zhu for the correction!

% This example Matlab code shows how to compute power spectrum of epoched data, channel 2. [spectra,freqs] = spectopo(EEG.data(2,:,:), 0, EEG.srate); % Set the following frequency bands: delta=1-4, theta=4-8, alpha=8-13, beta=13-30, gamma=30-80. deltaIdx = find(freqs>1 & freqs<4); thetaIdx = find(freqs>4 & freqs<8); alphaIdx = find(freqs>8 & freqs<13); betaIdx = find(freqs>13 & freqs<30); gammaIdx = find(freqs>30 & freqs<80); % Compute absolute power. deltaPower = mean(10.^(spectra(deltaIdx)/10)); thetaPower = mean(10.^(spectra(thetaIdx)/10)); alphaPower = mean(10.^(spectra(alphaIdx)/10)); betaPower = mean(10.^(spectra(betaIdx)/10)); gammaPower = mean(10.^(spectra(gammaIdx)/10));

### Validation (06/06/2020 added)

For John Johnson, I wrote following code to confirm how power values are averaged.

% Calculate mean power (uV^2). power1 = 11; power2 = 22; power3 = 33; meanPower = mean([power1 power2 power3]) % 22 % Convert power (uV^2) to 10*log10 (dB). power1_dB = 10*log10(power1); power2_dB = 10*log10(power2); power3_dB = 10*log10(power3); % Average dB values then convert to uV^2. meanPower_dB1 = mean([power1_dB power2_dB power3_dB]) % 13.0078 10^(meanPower_dB1/10) % 19.98 % Convert dB values to uV^2 then average. meanPower_dB2 = mean([10^(power1_dB/10) 10^(power2_dB/10) 10^(power3_dB/10)]) % 22

### FAQ

For Tim Göcking, let me add some explanations.

- Q. What is unit for this power?
- A. The unit is uV^2/Hz.

- Q. What does the part '/Hz' mean in this unit?
- A. This means 'per unit frequency bandwidth' i.e., frequency resolution being 1Hz. I found this page helpful to understand it (Thank you Nir Ofir for letting me know that the page was moved.)

- Q. What does 10.^ do to the values?
- A. The output from
*spectopo*() is converted to dB as 10*log10([*value_in_uV^2/Hz*]) in dB (the first '10*' part is to convert the unit from Bell to deci-Bell.) Because log10 is used for the dB conversion, '10.^' is used to convert it back to uV^2/Hz.

- A. The output from
- Q. Why dividing through 10?
- A. This is to convert back the deci-Bell's 'deci' part.

- Q. When do we switch bewtween 10*log10(X) and 20*log10(X)?
- A. 10*log10(X) is used to describe power (i.e., amplitude^2), while 20*log10(X) is used to describe amplitude.

dB_power = 10*log10(power/reference_power); = 10*log10(amplitude^2/reference_amplitude^2) = 20*log10(amplitude/reference_amplitude) Often approximated as 2x = 6dB, 10x = 20dB. dB_amplitude = 10*log10(amplitude/reference_amplitude) = dB_power/2. Often approximated as 2x = 3dB, 10x = 10dB.

### Comparison between spectopo() and pwelch() (04/27/2021 added)

This is for Hamed Taheri Gorji. EEGLAB function spectopo() uses Matlab's pwelch() function to calculate power spectral density (PSD). In theory, in the input parameters are the same, they should produce the same results. Let's test it with the following code.

% Create a dummy data with a simulated signal. EEG = eeg_emptyset; EEG.data = randn(1,10000); EEG.data = EEG.data - mean(EEG.data); EEG.srate = 256; inputData = EEG.data; % Use EEGLAB's spectopo() function. [spectra1,freqs1] = spectopo(inputData, 0, EEG.srate, 'winsize', EEG.srate, 'nfft', EEG.srate, 'plot', 'off'); % For channel 2 (C4) spectra1 = 10.^(spectra1/10); % Use Matlab pwelch() function. x = inputData; Fs = EEG.srate; L = EEG.srate; NFFT = EEG.srate; NOVERLAP = 0; WINDOW = EEG.srate; [spectra2,freqs2] = pwelch(x,WINDOW,NOVERLAP,NFFT,EEG.srate); figure subplot(1,2,1) plot(freqs1, spectra1); hold on plot(freqs2, spectra2') legend({'spectopo()', 'pwelch'}) title('PSD comparison') xlim([0 128]) subplot(1,2,2) plot(freqs1, spectra1-spectra2') xlabel('Frequency (Hz)') ylabel('Power (uV^2/Hz)') title('Difference') xlim([0 128])

In the left plot, the two PSD plots are completely overlapped and visually undistinguishable. In the right plot, the difference wave is shown. The difference across overall frequency bins are within 10^-17 uV^2/Hz, while the data range of the original PSD plot was 10^-2 uV^2/Hz. Compared with the original signal scale, the error between the two plots are virtually zero. Without reading the spectopo() function, I don't know why the difference is non-zero, but for the practical purpose the error is negligible.

Hamed contacted me to point out that my old code I wrote in "[Eeglablist] Power in spectopo vs. fft" (posted to EEGLAB mailing list on Thu May 8 11:48:54 PDT 2014) contains an error--when I defined NFFT, I entered the entire length of the continuous data (as if I'm dealing with the epoched data), with which one should not be able to reproduce the result. I would like to correct my old post here. Thank you Hamed!

%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % EEGLAB spectopo function [spectra,freqs] = spectopo(EEG.data(2,:), 0, EEG.srate, 'nfft', 1024); % For channel 2 (C4) thetaIdx = find(freqs>=4 & freqs<=8); % theta=4-8 thetaPower = 10^(mean(spectra(thetaIdx))/10); % mean theta power alphaIdx = find(freqs>=8 & freqs<=12); % alpha=8-12 alphaPower = 10^(mean(spectra(alphaIdx))/10); % mean alpha power eeglabThetaAlphaRatio = thetaPower/alphaPower; figure; subplot(1,2,1); plot(freqs,10.^(spectra/10)) % Matlab pwelch function x = EEG.data(2, :); % For channel 2 (C4) Fs = EEG.srate; % Sampling frequency L = EEG.pnts; % Length of signal NFFT = 2^nextpow2(L); % Next power of 2 from length of x <-This line is wrong! (04/27/2021 Makoto) % X = fft(x,NFFT)/L; % freqs = Fs/2*linspace(0,1,NFFT/2+1); % spectra = abs(X(1:NFFT/2+1)); NOVERLAP = 0; WINDOW = 512; [spectra,freqs] = pwelch(x,WINDOW,NOVERLAP,NFFT,EEG.srate); thetaIdx = find(freqs>=4 & freqs<=8); % theta=4-8 thetaPower = mean(spectra(thetaIdx)); % mean theta power alphaIdx = find(freqs>=8 & freqs<=12); % alpha=8-12 alphaPower = mean(spectra(alphaIdx)); % mean alpha power matlabThetaAlphaRatio = thetaPower/alphaPower; subplot(1,2,2); plot(freqs,spectra) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

## Verifying that the dB amplitude ratio is used in both PSD and ERSP (04/11/2022 added)

For the detail of this article, please see this article. Here, I show Matlab code to replicate the following simulation to verify that in EEGLAB both PSD and ERSP uses dB in amplitude ratio and not power ratio.

Fs = 1000; % Sampling frequency (Hz) T = 1/Fs; % Sampling period L = 1000; % Length of signal (ms) t = (0:L-1)*T; % Time vector (ms) targetF = 12; % Target freq (Hz) %% Validate EEGLAB's PSD by spectopo(). figure set(gcf,'position', [999 429 897 885]) scalerVector = [0.5 1 2]; for scaleIdx = 1:length(scalerVector) % Define the input data. X = sin(2*pi*targetF*t).*scalerVector(scaleIdx); subplot(3,2,1+2*(scaleIdx-1)) plot(t, X) title(['12-Hz, amplitude +/-' num2str(scalerVector(scaleIdx)) '\muV']) xlabel('Time (s)') ylabel('Amplitude (\muV)') xlim([0 1]) ylim([-2.5 2.5]) grid on % Matlab FFT. Y = fft(X); P2 = abs(Y/L); P1 = P2(1:L/2+1); P1(2:end-1) = 2*P1(2:end-1); P1 = P1.^2; P1_log10 = 10*log10(P1); subplot(3,2,2+2*(scaleIdx-1)) f = Fs*(0:(L/2))/L; plot(f,P1_log10) title('Power Spectral Density') xlabel('Frequency (Hz)') ylabel('10*log10 \muV^2/Hz (dB)') hold on % EEGLAB [spectra,freqs] = spectopo(X, L, Fs, 'plot', 'off'); plot(freqs, spectra) xlim([0 20]) ylim([-20 10]) grid on legend({'Matlab FFT' 'EEGLAB spectopo'}, 'location', 'northwest') end print('/data/projects/makoto/psdValidation', '-djpeg95', '-r150') %% Validate EEGLAB's hybrid spectro-/scalo-gram by newtimef(). Fs = 1000; % Sampling frequency (Hz) T = 1/Fs; % Sampling period L = 5000; % Length of signal (ms) t = (0:L-1)*T; % Time vector (ms) targetF = 12; % Target freq (Hz) figure set(gcf,'position', [999 429 897 885]) scalerVector = [0.5 1 2]; for scaleIdx = 1:length(scalerVector) % Define the input data. splicingIdx = 2500; X = sin(2*pi*targetF*t); X(splicingIdx:end) = sin(2*pi*targetF*t(splicingIdx:end)).*scalerVector(scaleIdx); % Plot the waveform. subplot(3,2,1+2*(scaleIdx-1)) plot(t-2.5, X) title(['12-Hz, Amp. +/-1.0 -> ' num2str(scalerVector(scaleIdx)) ' \muV']) xlabel('Latency (s)') ylabel('Amplitude (\muV)') xlim([-2.5 2.5]) ylim([-2.5 2.5]) grid on line([0 0], [-2.5 2.5], 'color', [0 0 0], 'linewidth', 2, 'linestyle', '--') % Plot ERSP. [ersp,itc,powbase,times,freqs] = newtimef(X, L, [0 5000], Fs, [3 8],... 'freqs', [2 30], 'nfreqs', 50,... 'plotitc', 'off', 'plotersp', 'off', ... 'baseline', [0 2500]); subplot(3,2,2+2*(scaleIdx-1)) erspTime = round(times)/1000; imagesc(erspTime-2.5, freqs, ersp, [-12 12]) title('Event-related spectral perturbation') xlabel('Latency (s)') ylabel('Frequency (Hz)') line([0 0], [0 50], 'color', [0 0 0], 'linewidth', 2, 'linestyle', '--') colormap jet axis xy colorbar end print('/data/projects/makoto/erspValidation', '-djpeg95', '-r150')

## How to batch-process all-channel PSDs in both dB and uV^2/Hz (03/17/2020 updated)

For Sayaka Fujita, I made this section separated for her reference. Thank you John Iversen for correction!

% This example code compares PSD in dB (left) vs. uV^2/Hz (right) rendered as scalp topography (setfile must be loaded.) lowerFreq = 4; % Hz higherFreq = 8; % Hz meanPowerDb = zeros(EEG.nbchan,1); meanPowerMicroV = zeros(EEG.nbchan,1); for channelIdx = 1:EEG.nbchan [psdOutDb(channelIdx,:), freq] = spectopo(EEG.data(channelIdx, :), 0, EEG.srate, 'plot', 'off'); lowerFreqIdx = find(freq==lowerFreq); higherFreqIdx = find(freq==higherFreq); meanPowerDb(channelIdx) = mean(psdOutDb(channelIdx, lowerFreqIdx:higherFreqIdx)); meanPowerMicroV(channelIdx) = mean(10.^((psdOutDb(channelIdx, lowerFreqIdx:higherFreqIdx))/10), 2); end figure subplot(1,2,1) topoplot(meanPowerDb, EEG.chanlocs) title('Theta band (4-8Hz) power distribution') cbarHandle = colorbar; set(get(cbarHandle, 'title'), 'string', '(dB)') subplot(1,2,2) topoplot(meanPowerMicroV, EEG.chanlocs) title('Theta band (4-8Hz) power distribution') cbarHandle = colorbar; set(get(cbarHandle, 'title'), 'string', '(uV^2/Hz)')

## How to build an EEG structure from scratch (07/13/2018 updated)

% Create empty EEG structure. EEG = eeg_emptyset; % Define basic items of EEG structure. EEG = eeg_emptyset(); EEG.data = timeSeriesDataYouHave; EEG.times = timeSeriesDataLatencyYouHave; EEG.xmin = EEG.times(1); EEG.xmax = EEG.times(end); EEG.srate = round(1/((EEG.xmax-EEG.xmin)/length(EEG.times))); % Rounded actual sampling rate. Note that the unit of the time must be in second. EEG.nbchan = size(EEG.data,1); EEG.pnts = size(EEG.data,2); % Define event information. eventStructure = struct('type', [], 'latency', []); latencyValues = num2cell(eventLatencyInSecYouHave*EEG.srate); [eventStructure(1:length(eventLatencyInSecYouHave)).latency] = latencyValues{:}; eventTypes = eventTypesYouHave; [eventStructure(1:length(eventLatencyInSecYouHave)).type] = eventTypes{:}; EEG.event = eventStructure; EEG = eeg_checkset(EEG, 'eventconsistency'); EEG = eeg_checkset(EEG, 'makeur');

### How to load EEGLAB .set and .fdt files without using EEGLAB (05/09/2020 updated)

We can load a .set file, which contains header information, by using Matlab function load() with '-mat' option. However, how can we load .fdt file that is the matrix of EEG time-series data? See below for the example of loading a dataset that is saved as a pair of XXXXX.set and XXXXX.fdt files.

% Load header info. headerInfo = load('XXXXX.set', '-mat'); % Load EEG time-series matrix (taken from eeg_getdatact.m centered at line 233) fid = fopen('XXXXX.fdt', 'r', 'ieee-le'); for trialIdx = 1:headerInfo.EEG.trials % In case the saved data are epoched, loop the process for each epoch. Thanks Ramesh Srinivasan! currentTrialData = fread(fid, [headerInfo.EEG.nbchan headerInfo.EEG.pnts], 'float32'); data(:,:,trialIdx) = currentTrialData; % Data dimentions are: electrodes, time points, and trials (the last one is for epoched data) end fclose(fid);

## How to extract code embedded within file names (07/13/2018 updated)

Notice how to deal with Matlab cell format using cellfun() in two different ways.

allSubj = dir('*.set'); % Obtain all the .set files under the current folder. allSubjNames = {allSubj.name}'; % Obtain the file names. Here, suppose the naming rule is as follows: allSubjNames = 'subj01_group1' 'subj02_group2' 'subj03_group2' 'subj04_group1' 'subj05_group2' 'subj06_group1' group1List = strfind(allSubjNames, '_group1'); % Choose those subjects who has '_group1' in the file name. Note that the output is still in the cell format. groupList = [8] [] [] [8] [] [8] group1Idx = find(~cellfun('isempty', group1List)); % Obtain the index number of the '_group1' subjects. group1Idx = 1 4 6 group1Subjects = allSubjNames(group1Idx); % Obtain the full file names of '_group1' subjects. group1Subjects = 'subj01_group1' 'subj04_group1' 'subj06_group1' group1SubjectId = cellfun(@(x) x(1:6), group1Subjects, 'Uniformoutput', false); % Extract 'subjxx' part from the file names (i.e., delete '_group1' from the file names). group1SubjectId = 'subj01' 'subj04' 'subj06'

## How to build a new event structure from scratch (01/08/2021 added)

Here I show an example code to build the minimum-requirement structure of EEG.event, which includes type and latency. You may achieve the same goal using EEGLAB GUI but the manual process is labor intensive to the level it seems impossible, and the EEGLAB function for this job is difficult to use (in my opinion). I recommend the following code which is infinitely faster than the function because no forloop or check function is included and maximally clear to see what is done in each line.

% Define dummy event labels and latencies. eventLabels = {'a' 'b' 'c' 'd' 'e'}; % Example event labels. eventLatency = [3 4 5 6 7]; % Example event latencies. % Build a minimal but valid EEG.event from scratch (i.e. EEG.event = []) [EEG.event(1:length(eventLabels)).type] = eventLabels{:}; latencyInCell = num2cell(eventLatency); [EEG.event(1:length(eventLabels)).latency] = latencyInCell{:}; eeglab redraw % Optional. This runs eeg_checkset() and refreshes GUI.

## How to modify event types

### How to obtain unique event types

% Obtain all unique event types uniqueEventTypes = unique({EEG.event.type}');

### How to obtain event indices whose event type exactly matches a keyword

% Obtain event indices whose 'type' is 'Probe' allEventTypes = {EEG.event.type}'; keywordIdx = find(strcmp(allEventTypes, 'Probe'));

### How to obtain event indices whose event types contain a part of the keyword (12/13/2016 updated)

During SCCN tea time today, Hiroyuki Kambara told me this solution using cellfun(). How smart!

% Obtain event indices whose 'type' contains letter strings 'Probe' (e.g. will obtain all indices for 'Probe1', 'Probe3', and 'Probe5' allEventTypes = {EEG.event.type}'; strFindResult = strfind(allEventTypes, 'Probe'); probeIdx = find(cellfun(@isempty, strFindResult));

### How to change event type names

% Rename event type name 'example1' into 'example2' allEvents = {EEG.event.type}'; example1Idx = strcmp(allEvents, 'example1'); [EEG.event(example1Idx).type] = deal('example2');

### How to change numerical event type names to strings (05/17/2020 added)

allEventTypes = [EEG.event.type]'; % The output is numeric with same digits. allEventTypeString = cellstr(num2str(allEventTypes)); % Converts numerics into strings in cell. [EEG.event.type] = allEventTypeString{:}; % I can't explain how this syntax works!

## How to modify event latency

### How to change event latency

This is weird but works (and the only way that works probably.)

% Change event latency into 1:length(EEG.event) dummyLatency = num2cell(1:length(EEG.event)); [EEG.event(1:length(EEG.event)).latency] = dummyLatency{:}; EEG = eeg_checkset(EEG,'eventconsistency');

### How to adjust event latency by adding or subtracting values

% Add 400ms to event 'example' allEvent = {EEG.event.type}'; exampleTypeIdx = find(strcmp(allEvent, 'exampleType')); exampleTypeLatencyInFrame = [EEG.event(exampleTypeIdx).latency]'; changedLatency = num2cell(bsxfun(@plus, exampleTypeLatencyInFrame, 400), 2); [EEG.event(exampleTypeIdx).latency] = changedLatency{:}; EEG = eeg_checkset(EEG,'eventconsistency');

## How to recover the lost events by window rejection by clean_rawdata (05/22/2021 added)

After applying clean_rawdata, or any other manual or automated window-rejection method for data cleaning, you may find some event markers are rejected as a result. If those are the markers for the onset and/or offset of a block, for example, then you can't perform the block-design analysis. I developed a prototype of a solution to recover the lost event markers for the case that clean_rawdata() log 'clean_sample_mask' and EEGLAB log EEG.urevent are BOTH available. It replaces the 'boundary' with the lost event marker. Note that the recovered event marker is not valid to show the onset/offset of the latency of the event onset/offset, so this method cannot be straightforwardly used for the event-related study design (unless you implement additional solution to detect epochs with non-canonical epoch length, although this also has a problem if the epoch contains variable SOA... but for the fixed-length epochs it should work).

% Recover the lost events. urEventType = {EEG.urevent.type}'; urEventLatencyFrame = round([EEG.urevent.latency]); cleanSampleMask = EEG.etc.clean_sample_mask; isEventPresent = cleanSampleMask(urEventLatencyFrame); boundaryIdx = find(contains({EEG.event.type}, 'boundary')); if any(isEventPresent==false) lostEventIdx = find(isEventPresent==0); for lostEventIdxIdx = 1:length(lostEventIdx) lostEventUrlatencyFrame = urEventLatencyFrame(lostEventIdx(lostEventIdxIdx)); lostEventCurrentPosition = sum(cleanSampleMask(1:lostEventUrlatencyFrame)); boundaryLatency = round([EEG.event(boundaryIdx).latency]); [differenceInFrame, selectedBoundaryIdx] = min(abs(boundaryLatency-lostEventCurrentPosition)); if differenceInFrame > 3 error('Fail to recover the lost event.') end EEG.event(selectedBoundaryIdx).type = urEventType{lostEventIdx(lostEventIdxIdx)}; end end

## How to delete non-time-locking events from epoched data (08/16/2020 added)

Motivation: Imagine you have mismatch negativity data sets in which a stimulus was presented every 0.5 s. In order to perform our standard time-frequency analysis, which uses 1116 ms Morlet wavelet kernel with the lowest 3 Hz with 3 cycles (but this does NOT mean EEGLAB performs a pure wavelet analysis--see this page), we need much longer epoch than one trial. If we apply our standard recommendation of -1 to 2 s epoching, it will inflate the data to 6 times larger (3.0 s contains 6 0.5 s). This not only inflate data size, but also inflate event structure. This redundant event structure causes many problems, although all we need in this situation is just those events whose within-epoch latency is 0. The following code addresses this issue so that EEG.event and EEG.epoch have only one event that has latency zero.

% Delete non-time-locking events from epoched data (08/16/2020 updated) for epochIdx = 1:length(EEG.epoch) allEventIdx = 1:length(EEG.epoch(epochIdx).event); if length(allEventIdx) == 1 % If there is only 1 event, it must have latency zero. continue else zeroLatencyIdx = find(cell2mat(EEG.epoch(epochIdx).eventlatency) == 0); EEG.epoch(epochIdx).event = EEG.epoch(epochIdx).event(zeroLatencyIdx); EEG.epoch(epochIdx).eventtype = EEG.epoch(epochIdx).eventtype(zeroLatencyIdx); EEG.epoch(epochIdx).eventlatency = {0}; EEG.epoch(epochIdx).eventurevent = EEG.epoch(epochIdx).eventurevent(zeroLatencyIdx); EEG.epoch(epochIdx).eventduration = 0; end end validEventIdx = [EEG.epoch.event]; deleteEventIdx = setdiff(1:length(EEG.event), validEventIdx); EEG.event(deleteEventIdx) = []; for epochIdx = 1:length(EEG.epoch) EEG.epoch(epochIdx).event = epochIdx; end

See what I it does below: Before (left) and after (right) the process. Notice the non-time-locking event markers (i.e., whose within-epoch latency is non-zero) are cleaned up on the right.

## How to extract all-IC/channel ERPs for each condition (07/24/2018 added)

If you want to process a lot of heavy, epoched datasets (such as mismatch negativity data with 0.5 s SOA with 5,000 trials but epoched to -1 to 2 sec allowing HUGE overlap... this is the situation where EEGLAB's 'epoch' scheme becomes inefficient), a bare-bone script like the one I show below works more efficient than EEGLAB functions. Taken from my std_selectICsByCluster() plugin.

% Extract event indices and names that have latency zero in an epoch. allEpochEventLatency = cell2mat({EEG.epoch.eventlatency}'); latencyZeroEventIdx = find(allEpochEventLatency == 0); latencyZeroEventType = {EEG.event(latencyZeroEventIdx).type}'; % Convert latencyZeroEventType to string in case they are numaric. allEventTypes = cellfun(@(x) num2str(x), latencyZeroEventType, 'uniformoutput', false); % Extract and store mean EEG.icaact across trials for each condition. uniqueConditions = unique(latencyZeroEventType); allConditionIcaactErp = zeros(size(EEG.icaact,1), size(EEG.icaact,2), length(uniqueConditions)); for conditionIdx = 1:length(uniqueConditions) currentConditionLabel = num2str(uniqueConditions{conditionIdx}); currentConditionIdx = find(strcmp(allEventTypes, currentConditionLabel)); currentConditionIcaact = mean(EEG.icaact(:,:,currentConditionIdx),3); allConditionIcaactErp(:,:,conditionIdx) = currentConditionIcaact; end

If necessary, project them back to the channels.

% Backproject Condition 1-3 to channels if necessary. scalpChannelErpCondition1 = EEG.icawinv*allConditionIcaactErp(:,:,1); scalpChannelErpCondition2 = EEG.icawinv*allConditionIcaactErp(:,:,2); scalpChannelErpCondition3 = EEG.icawinv*allConditionIcaactErp(:,:,3);

## How to obtain theoretical dipole projection when using pop_multifit() (02/11/2022 updated)

This section is updated. I confirmed with Dipfit4.3 that EEG.dipfit.model now has the following fields (just an example)

posxyz: [-49.0329 45.5410 23.2540] momxyz: [-2.4949e+04 -1.3269e+03 -1.7305e+04] rv: 0.0132 active: 1 select: 1 diffmap: [21×1 double] sourcepot: [21×1 double] datapot: [21×1 double]

Here, 'datapot' is the DC-corrected input (i.e. one of the columns of EEG.icawinv), 'sourcepot' is what I call the theoretical dipole projection, and 'diffmap' is the difference between the two.

## EEG data cleaning

### How to perform epoch rejection using single-trial PSD (02/20/2020 updated)

For my own project, I wrote the following solution for bad trial rejection for pre-selected ICs. This uses rectangular window short-term Fourier Transform (STFT) to convert 1 epoch into 1 PSD time point. Then, it takes sum of power for the user-specified frequency band. Then they are converted into z-score, and finally threshold it with SD. The example code rejects epochs that have > 3SD power between 15-50 Hz.

% Example: % badTrialIdx = rejectTrialSTFT(EEG.icaact, 3, EEG.pnts, EEG.srate, [15 50]); % Use 15-50 Hz freq band to evaluate PSD. % EEG = pop_select(EEG, 'notrial', badTrialIdx); % % 02/04/2020 Makoto. Modified. % 01/31/2020 Makoto. Used with modification. % 11/08/2019 Makoto. Created. Changed from Hamming to Rectangular window (Thanks Masaki!) function badTrialIdx = rejectTrialSTFT(icaact, SD, pnts, srate, freqEdges) data2D = icaact(:,:); totalExclusionIdx = []; for icIdx = 1:size(data2D,1); %[S,F,T,P,Fc,Tc] = spectrogram(data2D(icIdx,:), pnts, 0, [], srate); % This is by default Hamming. [S,F,T,P,Fc,Tc] = spectrogram(data2D(icIdx,:), rectwin(pnts), 0, [], srate); % This is rectangular window. log10P = 10*log10(P); % figure; imagesc(T, F, log10P); axis xy; colormap('jet') medianP = median(log10P,2); % figure; plot(F, medianP) errorP = bsxfun(@minus, log10P, medianP); % figure; imagesc(T, F, errorP, [-20 20]); axis xy; colormap('jet') zscoreHighFreqPower = zscore(sum(errorP(freqEdges(1):freqEdges(2),:))); % Targetting EMG. % figure; hist(zscoreHighFreqPower, 100) exclusionIdx = find(zscoreHighFreqPower>=SD); %{ figure subplot(2,1,1) imagesc(T, F, errorP, [-20 20]); axis xy; colormap('jet') title('Error from median PSD') xlabel('Frames (s)') ylabel('10*log10(Power)/Hz') subplot(2,1,2) barHandle = bar(zscoreHighFreqPower, 1); xlim([0.5 length(zscoreHighFreqPower)-0.5]) hold on line([0.5 length(zscoreHighFreqPower)-0.5], [SD SD], 'color', [1 0 0]) title(sprintf('Z-scored error between %.0f and %.0f Hz.', freqEdges(1), freqEdges(2))) xlabel('Frames (s)') ylabel('Z-score') %} totalExclusionIdx = [totalExclusionIdx exclusionIdx]; end badTrialIdx = unique(totalExclusionIdx);

Example use of code.

badTrialIdx = rejectTrialSTFT(EEG.icaact, 3, EEG.pnts, EEG.srate, [15 50]); EEG = pop_select(EEG, 'notrial', badTrialIdx);

### How to perform component selection using ICLabel() and dipole information (11/04/2020 updated)

% Perform IC rejection using ICLabel scores and r.v. from dipole fitting. EEG = iclabel(EEG, 'default'); % Original output from eegh has a bug: the second input ('default') is without ''--this is fixed here. % Obtain the most dominant class label and its label probability. [~, mostDominantClassLabelVector] = max(EEG.etc.ic_classification.ICLabel.classifications, [], 2); mostDominantClassLabelProbVector = zeros(length(mostDominantClassLabelVector),1); for icIdx = 1:length(mostDominantClassLabelVector) mostDominantClassLabelProbVector(icIdx) = EEG.etc.ic_classification.ICLabel.classifications(icIdx, mostDominantClassLabelVector(icIdx)); end %% Option 1: Identify artifact ICs. The order of the classes are {'Brain' 'Muscle' 'Eye' 'Heart' 'Line Noise' 'Channel Noise' 'Other'}. %artifactLabelProbThresh = 0; % [0-1] %artifactIcIdx = find((mostDominantClassLabelVector==2 & mostDominantClassLabelProbVector>=artifactLabelProbThresh)|... % (mostDominantClassLabelVector==3 & mostDominantClassLabelProbVector>=artifactLabelProbThresh)|... % (mostDominantClassLabelVector==4 & mostDominantClassLabelProbVector>=artifactLabelProbThresh)|... % (mostDominantClassLabelVector==5 & mostDominantClassLabelProbVector>=artifactLabelProbThresh)|... % (mostDominantClassLabelVector==6 & mostDominantClassLabelProbVector>=artifactLabelProbThresh)); %nonartifactIcIdx = setdiff(1:size(EEG.icaweights,1), artifactIcIdx); %nonartifactGoodDipIcIdx = intersect(insideBrainAndGoodRvIdx, nonartifactIcIdx); %EEG.etc.icStack.nonartifactIcIdx = nonartifactGoodDipIcIdx; % Option 2: Identify brain ICs. The order of the classes are {'Brain' 'Muscle' 'Eye' 'Heart' 'Line Noise' 'Channel Noise' 'Other'}. brainLabelProbThresh = 0; % [0-1] brainIdx = find((mostDominantClassLabelVector==1 & mostDominantClassLabelProbVector>=brainLabelProbThresh)); % Perform IC rejection using residual variance of the IC scalp maps. rvList = [EEG.dipfit.model.rv]; goodRvIdx = find(rvList < 0.15)'; % < 15% residual variance == good ICs. % Perform IC rejection using inside brain criterion. load(EEG.dipfit.hdmfile); % This returns 'vol'. dipoleXyz = zeros(length(EEG.dipfit.model),3); for icIdx = 1:length(EEG.dipfit.model) dipoleXyz(icIdx,:) = EEG.dipfit.model(icIdx).posxyz(1,:); end depth = ft_sourcedepth(dipoleXyz, vol); depthThreshold = 1; insideBrainIdx = find(depth<=depthThreshold); % Take AND across the three criteria. goodIcIdx = intersect(brainIdx, goodRvIdx); goodIcIdx = intersect(goodIcIdx, insideBrainIdx); % Perform IC rejection. EEG = pop_subcomp(EEG, goodIcIdx, 0, 1); % Post-process to update ICLabel data structure. EEG.etc.ic_classification.ICLabel.classifications = EEG.etc.ic_classification.ICLabel.classifications(goodIcIdx,:); % Post-process to update EEG.icaact. EEG.icaact = []; EEG = eeg_checkset(EEG, 'ica');

### How to compute a robust (i.e. median-version of) standard deviation (07/26/2021 updated)

You know median is robust mean-equivalent and nicely neglect crazy values. But median +/- n*standard deviation is not a reasonable way to determine outliers because standard deviation is affected by the outliers. Thus you need a robust standard deviation-equivalent. It is called *median absolute deviation* which does use median instead of mean in the calculation. See this Wiki page. https://en.wikipedia.org/wiki/Median_absolute_deviation#Relation_to_standard_deviation In Matlab, mad() is the function to calculate this value. You also need to multiply the obtained value with the constant 1.4826 if Gaussian distribution is assumed (we usually do). An example of the use of robust outlier detection can be found in clean_channels() line 104 which is included in the clean_rawdata() plugin.

## How to avoid the effect of rank-deficiency in applying ICA (03/31/2021 added)

This is for Zahra Fotovatnia and Ilaria Berteletti. Intuitively, if two lines in the 3-D space are aligned in parallel, these lines are lineary dependent and not counted as two (i.e. rank deficient). In terms of 0 or 1 decision, any non-parallel lines are technically 'linear independent'. Then, what happens when two lines are not exactly but very close to parallel? This can be measured as a condition number, which is where is the maximum singular value of , while is the minimum singular value of . Notice the denominator--if becomes very small, the condition number becomes very large. Of course, large condition number indicates unstability.

Now, pop_runica() line 531 hard-cords the variable 'rankTolerance = 1e-7'. This sets the smallest acceptable for runica() process. For example, if you have 32 channels but 2 of are smaller than 1e-7, runica() automatically removes these dimensions and will produce 30 ICs for your 32-channel data input. This is because even if the input data is technically full-ranked, if some of are very small, practically it is the same as rank deficient (i.e. ICA cannot appreciate the difference between parallel and very-close to parallel) and ICA most likely start to create ghost ICs or complex weight matrix etc, all of which indicate failure. Thus, the purpose of having this 'rankTolerance = 1e-7' is to protect the results from possible pathologically small . For example, if you interpolate one electrode using spline interpolation, the interpolated electrode has non-linearly synthesized time-series data that is not exactly linearly dependent but very close to linear dependency.

Following the heuristic 'rank tolerance' in the pop_runica(), it is a good protective approach to determine the effective rank using

dataRank = sum(eig(cov(double(EEG.data'))) > 1E-7);

instead of

dataRank = rank(double(EEG.data'));

Note that the threshold value 1E-7 was most likely determined heuristically. By using larger values, such as 1E-6, it may help to make it slightly more stable. If you use too large value here, however, it will start to reduce the number of ICs eventually. If you are in doubt with the behavior of ICA and/or its decomposition quality, you may tweak this value for test.

### How to turn on the automatic protection against rank-deficient ICA in pop_runica (3/31/2021 added)

Near the end of pop_runica(), you find the following line.

tmprank2 = max(tmprank, tmprank2);

If you want to turn on this protection against rank-deficient ICA (which I recommend), change the code as follows.

%tmprank2 = max(tmprank, tmprank2); tmprank2 = min(tmprank, tmprank2);

This change means that if the effective rank of the data, which is heuristically defined as the smallest eigenvalue of 10^-7, is smaller than the regular results of rank(), then the algorithm determines to use the effective data rank instead of true data rank. This is because true data rank including eigenvalue < 10^-7 is effectively rank deficient for ICA.

## How to speed up ICA (10/22/2020 updated)

### By downsampling (09/13/2021 modified)

ICA has a known bias toward high amplitude. If the data length were infinite, it would not have this bias--my former colleague told me so. Due to 1/f property of scalp-recorded EEG signal, which is predicted from the cable equation (see 'Electric Field of the Brain' p.174- by Nunez and Srinivasan, 2006), signal power in the higher frequency range is very small. This simply means that ICA has less things to learn from high frequency (by the way, ASR applies inverse EEG-PSD filter so that signals in those high frequency ranges, which are unlikely to be dominated by natural EEG, is enhanced so that it is detected for correction). If that's the case, why not cutting the high frequency from the beginning, at least for ICA purpose--that's the rational for this process. If you doubt it, you can anytime verify it comparing ICA results obtained from using 1000-Hz sampled data with that from using 100-Hz downsampled data. It is even possible that due to the band-limiting effect (100-Hz downsampled data is 50-Hz low-pass filtered because of anti-aliasing), ICA result could be even better. The favorable effect of low-pass filtering before ICA has been told in the past EEGLAB workshops by Scott himself, I believe ('Applying 100-Hz low-pass filter before ICA is also important...') At least, it will speed up ICA.

% Run ICA. EEG_forICA = pop_resample(EEG, 100); EEG_forICA = pop_runica(EEG_forICA, 'extended',1,'interupt','off'); EEG.icaweights = EEG_forICA.icaweights; EEG.icasphere = EEG_forICA.icasphere; EEG = eeg_checkset(EEG, 'ica');

#### How does it affect the recommended data length? (10/22/2020 added)

This is for Syanah Wynn--If you downsample the data from 250Hz to 100Hz, you have only 40% of data length. Does it negatively impact ICA's performance, since there is much less data available? This is something a simulation study can answer empirically, but the empirical answer is no. In fact, I don't recommend to use SCCN's rule of thumb literally--I would rather add an additional condition as follows: ICA requires data length of ((ch)^2)*30 *if data were sampled at 250 Hz*. I have been applying this downsampling-to-100Hz approach as ICA preprocessing over several thousands of datasets, in some of which the data length became shorter than the above recommendation but I did not see any impact. So, the length in time in the real world counts rather than frames. For example, 5-min data can be 30,000 frames when sampled at 100 Hz, and 300,000 frames when sampled at 1000Hz. Does the latter help ICA decomposition quality? It does not. It depends on how much information scalp-recorded EEG contains above 100Hz compared with below 100Hz.

This view is supported by the following description taken from Nunez and Srinivasan (2006) p.308: "EEG (and ECoG) power in mammals tends to fall off sharply above 50 or 100 Hz, **apparently reflecting cortical cell time constants in the 10 ms range** as indicated in chapter 4. That is, we expect minimal changes in the mesosource function P(r,t) over mesoscopic "relaxation times" in the range of 10 ms or perhaps several tens of milliseconds."

### By modifying runica.m (09/13/2021 added)

This is for Pål. During the 30th EEGLAB workshop, one of the workshop attendees Tjerk Gutteling told us that by commenting out the lines for 'drawnow' in runamica() speeds up ICA by several times (yes, several times, not several percent). I immediately tested it and confirmed that was true. Kudos to Tjerk! What a hacker. This discovery is mentioned in EEGLAB github and a post to the EEGLAB mailing list by Curtis Bingham. This hack will be integrated into an official EEGLAB update and released in future. Meanwhile, if you are still using EEGLAB 2021 or older, it is worth trying this trick.

- Open runica() by typing 'edit runica' in the Matlab command line
- Search for 'drawnow;'. There are 5 of them in the code.
- Replace them with 'drawnow;' with 'drawnow limitrate;' (or alternatively comment out the line as '%drawnow;' if you don't need to interrupt the process)

#### runica's drawnow conflicts with PowPowCAT? (09/13/2021 added)

Relatedly, I also got a report that under certain condition PowPowCAT, one of the EEGLAB plugins I wrote to calculate comodugram/comodulogram, and runica() can cause conflict (I could not replicate the issue). The error message is as follows (thanks Pål Gunnar Larsson for reporting this error)

matlab.graphics.internal.figfile.FigFile/read>@(hObject,eventdata)PowPowCAT('PowPowCATFigure_WindowButtonMotionFcn',hObject,eventdata,guidata(hObject)) Error using runica (line 872) Error while evaluating Figure WindowButtonMotionFcn.

Even though I don't exactly know how the conflict happens, to avoid this issue, most likely one can simply comment out 'drownow;' lines as suggested above.

### By using CUDAICA (05/09/2022 updated: Thank you Ugo, Yunhui, John, and Ernie!)

Here is my warning--you may struggle to install this solution. CUDAICA is a published computer algorithm (Raimondo et al., 2012) to perform GPU computation for runica (i.e. infomax ICA). You can see how it works in this Youtube video by John Kiat (UC Davis). If your computer has NVIDIA graphics board, and if you don't mind getting registered to Intel, you may challenge to install CUDAICA to speed up the process of ICA. How much speed does it gain compared with runica? Using no extended infomax and the initial learning rate 0.001, runica() took 90s to decompose 32ch, 500Hz, 40-min data, which CUDAICA took 20s (Core i5 10400, GTX1660, 32GB RAM). My CUDAICA code is pre-compiled and downloaded from YunHui's github which uses double precision for calculation (confirmed by Yunhui; Thank you!) Using the double precision is for numerical stability, though GPU boards usually works much more efficiently for the single precision data than the double precision data. Raimondo et al. (2012) reported x25 speed gain. My result was not that dramatic (probably, Tjerk Gutteling's 'drawnow' trick already addressed 3-5 times of speed up for runica) but still this is significant. Of course, if you have more powerful GPU (GTX1660 I used here is only $250 in the pre-COVID price range), the effect is greater! In the case of John (Ryzen 7 3800x, GTX1080ti), x15 boost even after addressing the 'drawnow' slowing issue, which makes a huge difference.

The most recent information in our community about how to install it to Windows machine was reported by Ugo Bruzadin Nunes and archived here (for Linux, see this article by Alejandro Ojeda, a former SCCN engineer). I struggled all day today to install CUDAICA and finally succeeded. Let me share the steps I took based on Ugo's suggestions and provide some update.

Updated Ugo's suggestions (modified by Makoto based on Yunhui's suggestion as of 11/15/2021)1. Update all your PC's drivers. 2. Install CUDA Toolkit version 11.5. Download the application from here. 3. Install either Intel oneAPI base Toolkit. Download the application from here. Note that you need to get register to download it. If you use oneAPI, you'll need to manually change the file names later. Alternatively, download Intel MKL 2020.4 for which file re-naming is unnecessary. 4. Make sure that no folders that path to matlab and eeglab have spaces in them. Especially, "/New folder/" and "/Program Files/" Ugo wrote the space in the path 'ruined many hours of my life'. Reinstall MATLAB if necessary. 5. Download the windows version from YunHui's github. You'll have to manually add or change two files inside eeglab, "\functions\sigprocfunc\icadefs.m" and "\functions\popfunc\pop_runica.m". Read the two "read me"s inside it. There are two versions included, EEGLAB ver.14 and 2019.1. But I confirmed that EEGLAB ver.14 on Matlab2021b has a problem (when you use GUI, a selection of algorithm is not reflected and always the first choice, runica is used). I used the version 2019.1 for 2021.1 with no problem. 6.[For those who downloaded Intel oneAPI, not Intel MKL 2020.4!]According to Ugo's advice, here I set Window's system path. However, I found that some of the files needs to be renamed. This is probably because the version changes in the past two years. In my case, at this point many .dll files are generated in C:\Program Files (x86)\Intel\oneAPI\mkl\2021.4.0\redist\intel64 such as mkl_intel_thread.1.dll. However, when I double-clicked cudaica_win.exe under CudaICA1.0, I saw an error message that mkl_intel_thread.dll is not found. So I guessed these files need to be renamed so that mkl_intel_thread.1.dll -> mkl_intel_thread.dll by removing '.1'. I think my guess was correct. Just in case, I made copies of these .dll files under 'renamed' folder to which I set Windows system path (see below). 7. After renaming the .dll files, follow modified Ugo's instruction below. * Open the System folder in control panel * Open Advanced system settings * Open "Environment Variables..." * Take a deep breath * Very carefully, on the UPPER table, click on "path", then click "Edit". * Make sure nothing is selected on the path's list (click outside the list or un-click the clicked item). * Click "browse" * Find the folder in which the renamed .dll files are located. In my case, it is C:\Program Files (x86)\Intel\oneAPI\mkl\2021.4.0\redist\intel64\renamed * Save your progress. Press ok ok ok ok to close the windows.

On the other hand, I found the following steps are probably unnecessary in Ugo's suggestions

- Visual Studio is only required if you want to build the .exe file from source. If you want to download the .exe file from YunHui's github (which I recommend), you don't need to install Microsoft Visual Studio and only NVIDIA CUDA and Intel oneAPI/MKL are needed.
- Similarly, setting the system path to C:\Users\Ugo\AppData\Local\Programs\Microsoft VS Code\bin is unnecessary as long as you use Yunhui's file. In fact, Microsoft Visual Studio Code requires separate installation which is found https://code.visualstudio.com/docs/setup/windows here].
- Installation of BINICA is unnecessary.

Here is a general tip for further speed increase. Using 'extended', 0 increases the processing speed significantly. You might be concerned if ICA's performance is worsened. But if you think about what kind of signal sources have sub-Gaussian distributions, which 'extended', 1 can specifically capture, probably you don't miss it at all.

Ernie reported in one of his blog articles titled Install CUDAICA for Windows 10 that 'Out of the several hundred megabytes of the Intel distribution only THREE files appear to be necessary to run CUDAICA' namely 'mkl_core.2.dll', 'mkl_def.2.dll', and 'mkl_intel_thread.2.dll'. In this article, he demonstrates how to generate the MKL files directly and figure out the directory and code (Thank you Ernie for sharing this info!) The merit of this approach is that (1) Since the MKL files are now in the same directory as the plugin, it will not try to find the files on your system path; (2) It saves several hundred megabytes of disk space and also frustration regarding environmental paths.

## How to export data to an Excel file (08/09/2020 added, prayer for Nagasaki)

This is an example to write ERP (i.e., data needs to be epoched) with channel labels and latency for Zahra Fotovatnia and Zoran Josipovic.

% Obtain electrode labels. chLabels = {EEG.chanlocs.labels}'; % If the electrode labels are just numbers, you have to add non-numeric % characters before that, otherwise writetable() fails. Very Dumb. for chIdx = 1:length(chLabels) chLabels{chIdx} = ['Ch' chLabels{chIdx}]; end % Obtain ERP latency. latency = cellstr(num2str(EEG.times')); % Obtain all-electrode ERP. ERP = mean(EEG.data,3)'; % Prepare output table. outputTable = array2table(ERP, 'VariableNames', chLabels, 'RowNames', latency); % Write the output table. Using '.xlsx' automatically specifies Excel % format so no need to use 'FileType', 'spreadsheet' option. writetable(outputTable, 'savePath/test.xlsx',... 'WriteVariableNames', true, 'WriteRowNames', true);

## How to send email from Matlab to report progress (07/26/2019 added)

% Set up Matlab preferences. setpref('Internet','SMTP_Server','XXXX.XXXX.XXX'); setpref('Internet','E_mail','XXXXXXXXXXX@XXXX.XXX'); allSets = dir('/data/projects/example/p0100_imported/*.set'); ticTocList = nan(length(allSets),1); for setIdx = 1:length(allSets) ticStart = tic; %%%%%%%%%%%%%%%%%%%%%%%%%% %%% Your process here. %%% %%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Calculate time lapse. %%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%% ticTocList(setIdx) = toc(ticStart); if mod(setIdx, 100) == 0 meanProcessingDurationInSec = nanmean(ticTocList); stdProcessingDurationInSec = nanstd(ticTocList); meanRemainingProcessingDurationInDay = (length(allSets)-setIdx)*meanProcessingDurationInSec/(60*60*24); stdRemainingProcessingDurationInDay = (length(allSets)-setIdx)*stdProcessingDurationInSec/(60*60*24); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Send email for update. %%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% sendmail({'XXXXXXXX@XXXX.XXX', 'XXXXXXXXX@XXXX.XXX'},... sprintf('From Matlab running batch: %.0f/%.0f done', setIdx, length(allSets)),... sprintf('Estimated remaining time: %.1f days (SD %.1f)', meanRemainingProcessingDurationInDay, stdRemainingProcessingDurationInDay)); end end

## How to apply eegfiltnew() to a numeric array (03/02/2022 added)

I want to use Andreas Widmann's FIR filter for general purposes. But the input for the eegfiltnew() is EEG. Because of this requirement, we have to build a dummy EEG structure to filter a data vector or matrix. This is cumbersome, but I realized I have done that for more than dozens of times. Most likely I will continue that in the future, so I leave a copy-and-paste solution here for future use. Note that the filter order depends on EEG_dummy.srate, 'fcutoff', and 'wtype' so it is better one runs this filter from EEGLAB main GUI then obtain the parameters using eegh().

% Apply a 2-Hz high-pass filter (FIR Hamming, TBW: 1 Hz). EEG_dummy = eeg_emptyset; EEG_dummy.nbchan = 1; EEG_dummy.trials = 1; EEG_dummy.pnts = length(input); EEG_dummy.srate = 100; EEG_dummy.xmin = 1/EEG_dummy.srate; EEG_dummy.xmax = EEG_dummy.xmin*EEG_dummy.pnts; EEG_dummy.times = EEG_dummy.xmin:EEG_dummy.xmin:EEG_dummy.xmax; EEG_dummy.data = input'; EEG_dummy_filtered = pop_firws(EEG_dummy, 'fcutoff', 2, 'ftype', 'highpass', ... 'wtype', 'hamming', 'forder', 330, 'minphase', 0, ... 'usefftfilt', 0, 'plotfresp', 0, 'causal', 0); input_filtered = EEG_dummy_filtered.data; %{ figure plot(input) hold on plot(input_filtered) %}

## How to perform time-frequency analysis with variable epoch length (10/13/2020 added)

This solution is for Steve Wu, and also could have been for Kelly Michaelis, although I took a different approach in 2019--but seems apparent that this solution was more straightforward than hacking the STUDY structure. I feel I tried to be too loyal to EEGLAB and made a wrong decision at that time (whichever works anyway... it's a matter of efficiency and integrity.)

Here in this example, ERSP for 'latencyZeroEvent', which are 'DIN6' and 'DIN60', is calculated for which baseline is calculated for 'baselineOnsetEvent'. The whole point is that **you can calculate ERSP with flexibly specified baseline period even if there are duration-varying events between the two conditions**. Use another solution 'How to obtain the optimum number of clusters' together and you can perform your own IC clustering with ERSP.

% Define events of interest/non-interest. latencyZeroEvent = {'DIN6', 'DIN60'}; epochWinEdges = [-2 1.5]; baselineOnsetEvent = 'READY'; % Baseline window is 2-s. baselineWinEdges = [0 1.5]; epochExclusionEvent = 'boundary'; windowStepSize = 0.02; % s windowLengthInSteps = (epochWinEdges(2)-epochWinEdges(1))/windowStepSize; windowLengthInFrame = 557; % For 1.5-Hz at 3 cycles. % This must be obtained from the command line log % shown after running a newtimefreq() with above % parameters. allSets = dir('[loadingDataFullPath]/*.set'); for setIdx = 1:length(allSets) % Load file. loadName = allSets(setIdx).name; EEG = pop_loadset('filename', loadName, 'filepath', '[loadingDataFullPath]/'); % Loop for ICs. icErspStack = zeros(100, windowLengthInSteps, size(EEG.icaact,1)); for icIdx = 1:size(EEG.icaact,1) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Compute EEGLAB's pseudo-wavelet (i.e. STFT-hybrid). %%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% timesoutParam = round((EEG.xmax-windowLengthInFrame*1/EEG.srate)/windowStepSize); % Temporal resolution is set to 20 ms. % Perform time-frequency decomposition. Do not calculate baseline at this point. [~, ~, ~, waveletTimesS, freqs, ~, ~, coeffs] = ... newtimef(EEG.icaact(icIdx,:), EEG.pnts, [EEG.xmin EEG.xmax], EEG.srate, [3 10], ... 'baseline', NaN,'nfreqs', 100, 'freqs', [1.5 55], 'freqscale', 'log', ... 'timesout', timesoutParam, 'plotersp', 'off', 'plotitc', 'off'); % Find critical events. allEvents = {EEG.event.type}'; latencyZeroIdx = find(strcmp(allEvents, latencyZeroEvent{1}) | strcmp(allEvents, latencyZeroEvent{2})); baselineOnsetIdx = find(strcmp(allEvents, baselineOnsetEvent)); epochExclusionIdx1 = find(strcmp(allEvents, epochExclusionEvent)); latencyZeroLatencyS = [EEG.event(latencyZeroIdx).latency]*1/EEG.srate; % s. baselineOnsetLatencyS = [EEG.event(baselineOnsetIdx).latency]*1/EEG.srate; % s. epochExclusionLatencyS = [EEG.event(epochExclusionIdx1).latency]*1/EEG.srate; % s. %%%%%%%%%%%%%%%%%%%%%%%%%% %%% Obtain the epochs. %%% %%%%%%%%%%%%%%%%%%%%%%%%%% % Obtain raw epoch edges. epochEdges = [[latencyZeroLatencyS + epochWinEdges(1)]' [latencyZeroLatencyS + epochWinEdges(2)]']; % Exclude epochs that contains epochExclusionLatencyS. epochExclusionIdx2 = []; for exclusionIdx = 1:length(epochExclusionLatencyS) currentExclusionEventS = epochExclusionLatencyS(exclusionIdx); for epochIdx = 1:size(epochEdges,1) if epochEdges(epochIdx,1)<=currentExclusionEventS & epochEdges(epochIdx,2)>=currentExclusionEventS epochExclusionIdx2 = sort([epochExclusionIdx2; epochIdx]); end end end epochEdgesRej = epochEdges; epochEdgesRej(epochExclusionIdx2,:) = []; coeffEpochTensor = zeros(length(freqs), windowLengthInSteps, size(epochEdgesRej,1)); for epochIdx = 1:size(epochEdgesRej,1) epochFrameIdx = find(waveletTimesS > epochEdgesRej(epochIdx,1) & waveletTimesS < epochEdgesRej(epochIdx,2)); if length(epochFrameIdx) < windowLengthInSteps epochFrameIdx(end+1) = epochFrameIdx(end)+1; elseif length(epochFrameIdx) > windowLengthInSteps epochFrameIdx(end+1) = epochFrameIdx(end)-1; end coeffEpochTensor(:,:,epochIdx) = coeffs(:,epochFrameIdx); end %%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Obtain the baseline. %%% %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Obtain raw epoch edges. baselineEdges = [[baselineOnsetLatencyS + baselineWinEdges(1)]' [baselineOnsetLatencyS + baselineWinEdges(2)]']; % Exclude epochs that contains epochExclusionLatencyS. baselineExclusionIdx = []; for exclusionIdx = 1:length(epochExclusionLatencyS) currentExclusionEventS = epochExclusionLatencyS(exclusionIdx); for baselineIdx = 1:size(baselineEdges,1) if baselineEdges(baselineIdx,1)<=currentExclusionEventS & baselineEdges(baselineIdx,2)>=currentExclusionEventS baselineExclusionIdx = sort([baselineExclusionIdx; baselineIdx]); end end end baselineEdgesRej = baselineEdges; baselineEdgesRej(baselineExclusionIdx,:) = []; % Extract epochs. Length of 75 was determined from data. coeffBaselineTensor = zeros(length(freqs), 75, size(baselineEdgesRej,1)); for baselineIdx = 1:size(baselineEdgesRej,1) baselineFrameIdx = find(waveletTimesS > baselineEdgesRej(baselineIdx,1) & waveletTimesS < baselineEdgesRej(baselineIdx,2)); if length(baselineFrameIdx) < 75 baselineFrameIdx(end+1) = baselineFrameIdx(end)+1; elseif length(baselineFrameIdx) > 75 baselineFrameIdx(end+1) = baselineFrameIdx(end)-1; end coeffBaselineTensor(:,:,baselineIdx) = coeffs(:,baselineFrameIdx); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Perform baseline correction. %%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% epochErspRaw = coeffEpochTensor.*conj(coeffEpochTensor); epochErspNoBaseline = 10*log10(mean(epochErspRaw,3)); baselineErspRaw = coeffBaselineTensor.*conj(coeffBaselineTensor); baselinePower = mean(10*log10(mean(baselineErspRaw,3)),2); epochErsp = bsxfun(@minus, epochErspNoBaseline, baselinePower); % Store the result. icErspStack(:,:,icIdx) = epochErsp; end % Store the result. EEG.etc.customErsp.icErspStack = icErspStack; EEG.etc.customErsp.windowStepSize = windowStepSize; EEG.etc.customErsp.freqs = freqs; EEG.etc.customErsp.latencyZeroEvent = latencyZeroEvent; EEG.etc.customErsp.epochWinEdges = epochWinEdges; EEG.etc.customErsp.baselineOnsetEvent = baselineOnsetEvent; EEG.etc.customErsp.baselineWinEdges = baselineWinEdges; EEG.etc.customErsp.epochExclusionEvent = epochExclusionEvent; % Save the data. pop_saveset(EEG, 'filename', loadName, 'filepath', '[savingDataFullPath]') end

## How to perform time-frequency transformation faster (11/29/2020 added)

Calculation of of the (pseudo-)wavelet transform is calculated in the function *timefreq()*. It takes place inside the double for-loops, the first one for time points ('index') and the second one frequency bins ('freqind'). For a standard time-frequency analysis with 200 time points ('timesout') and 50 frequencies ('nfreqs'), it executes the calculation (tmpX = sum(wav .* tmpX);) 10,000 times. If you want to decompose 30-min continuous data for every 100 ms, there will be 1800/0.1*50 = 900,000 iteration of this calculation per channel/IC. If you have 64-channel data, there will be 5,760,000 iterations. Even if each iteration takes 1 ms, it will take 1.6 hours to finish the calculation. In theory, if we can use indices instead of for-loops, we can speed up the process. For proof of concept, I modified the existing *timefreq()* to replace the existing for-loops blocks with the following one.

% In timefreq(), replace the segment after 'else' at around line 420 with the following segment. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Modified code. (11/29/2020 Makoto) %%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Zero-pad g.win so that wavelet daughters have the same length. This allows matrix operation. (11/29/2020 Makoto). longestWindow = length(g.win{1}); waveletDaughters = zeros(longestWindow, length(g.win)); for freqIdx = 1:length(g.win) currentDaughter = g.win{freqIdx}; onsetFrameToCenter = (size(waveletDaughters,1)-length(currentDaughter))/2+1; centeredFrameIdx = onsetFrameToCenter:onsetFrameToCenter+length(currentDaughter)-1; waveletDaughters(centeredFrameIdx,freqIdx) = currentDaughter; end % Obtain time points to which the wavelet daughters are applied. tfFrameList = g.indexout; timeIdxTemplate = -(size(waveletDaughters,1)-1)/2:(size(waveletDaughters,1)-1)/2; timeIdx2D = tfFrameList+timeIdxTemplate'; timeIdx1D = timeIdx2D(:); % Obtain input data that are ready to be wavelet-transformed. tmpX2D = double(data(timeIdx1D,:)); % Wavelet width x data frames. tmpX3D = reshape(tmpX2D, [size(timeIdx2D,1) size(timeIdx2D,2) size(tmpX2D,2)]); tmpX3D_demeaned = bsxfun(@minus, tmpX3D, mean(tmpX3D,1)); tmpX1D = tmpX3D_demeaned(:); repetitionCounts = length(tmpX1D)/size(waveletDaughters,1); waveletDaughtersFullLength = repmat(waveletDaughters, [repetitionCounts 1]); % Perform wavelet transform. waveletCoeff = bsxfun(@times, waveletDaughtersFullLength, tmpX1D); waveletCoeff4D = reshape(waveletCoeff, [size(timeIdx2D,1) size(timeIdx2D,2) size(tmpX2D,2) size(waveletCoeff,2)]); if size(waveletCoeff4D,3)==1 % For continuous data. waveletCoeff3D = permute(squeeze(mean(waveletCoeff4D,1)), [2 1 3]); else % For epoched data. waveletCoeff3D = permute(squeeze(sum(waveletCoeff4D,1)), [3 1 2]); % freqs, times, epochs. end tmpall = waveletCoeff3D;

I confirmed 150-200% of speed increase depending on how many time points and frequency bins there are. Generally, the difference becomes larger if more iteration is necessary in the original code. Noticeably, when I processed continuous data (1700 s, 50 freq bins, 0.1-s resolution), the speed increase recorded 450%.

I would like to share this modified code for the beta test. Please download it from the link below and overwrite the existing *timefreq.m* under the folder [eeglabroot]/functions/timefreqfunc. Do not forget to take a back up of *timefreq.m* before you overwrite it.

My current concern is that this implementation is RAM-instensive, particularly for continuous data transformation. Make sure you have a few GB of RAM avaialble. We can still push further to save RAM by optimizing code (changing variables to single, overwrite variables instead of creating others with different names etc).

## How to perform a regression analysis (12/24/2021 added)

You may want to perform a regression analysis between ERP amplitudes and trial-by-trial reaction times etc. In Matlab, a regression analysis can be done using backslash ("\"). Although there are multiple functions to do it, using backslash ("\") is computationally the most optimized.

If a vector of variables is correlated with that of , it is described as

where is the y-intercept, is the regression coeff, and is the error. In the code below, is var2, is var1, is B(1), is B(2), is Y_est-Y where Y_est being the regression line.

% Generate two correlated (r=0.5) time series data. % Taken from https://www.mathworks.com/matlabcentral/answers/101802-how-can-i-generate-two-correlated-random-vectors-with-values-drawn-from-a-normal-distribution signalLength = 100; corrCoef = 0.5; rng(0) M = randn(signalLength,2); R = [1 corrCoef; corrCoef 1]; L = chol(R); M = M*L; var1 = M(:,1); var2 = M(:,2); % Perform a regression analysis. % Taken from https://www.mathworks.com/help/matlab/data_analysis/linear-regression.html Y = var2; X = [ones(signalLength,1), var1]; % X(:,1) is for intercept. B = X\Y; % Compute the regression line. Y_est = B(1)+B(2)*X(:,2); % Calculate Rsquare Rsquare = 1-sum((Y-Y_est).^2)/sum((Y-mean(Y)).^2); % Plot the scatter plot. figure scatter(var1, var2) hold on plot(var1, Y_est) xlabel('var1') ylabel('var2') title(sprintf('r=%.2f', sqrt(Rsquare)))

## Comparing ASR with PCA (05/17/2022 added, code contributed by Hyeonseok Kim)

ASR uses PCA. Fine. Now, can you explain how ASR differs from a simple PC rejection? Hyeonseok, Chiyuan, and I had discussion on this issue on May 17, 2022 at SCCN. Hyeonseok came up with this simulation for the proof of concept of the following argument. Suppose there are 3 PCs, PC1, PC2, and PC3. Now we reject PC3 to remove artifacts. For a standard PC rejection, all the data points in the original 3-D space are projected onto a plane defined by the remaining PC1 and PC2 (which follows the least-square estimates). But ASR does not do that; the projecting plane is not defined by the remaining PCs, but it is defined by the reference data M. Following code demonstrates the difference between the two planes onto which original data points are projected after PC rejection.

rng(5) % You can change this random seed to change M U = rand(3); % dimension: 3 M = U*U'; rng('default') Xt = 10*(rand(3,500)-0.5); [V D] = eig(cov(Xt')); idxRej = [2]; % You can select the index of rejected PCs Vtr = V; Vtr(idxRej,:) = 0; Xtr = Xt; Xtr(idxRej,:) = 0; Rejected_PCA = V*Vtr'*Xtr; Rejected_ASR1 = V*V'*M*pinv(Vtr'*M)*V'*Xt; Rejected_ASR2 = V*Vtr'*M*pinv(Vtr'*M)*V'*Xt; % Following result would show the difference between a simple PCA rejection and ASR reconstruction Result = Rejected_PCA - Rejected_ASR1; % Result 2 would show ASR2 = PCA Result2 = Rejected_PCA - Rejected_ASR2; %% [Usvd Dsvd Vsvd] = svd(Vtr'*M); Dsvd_inv = Dsvd; Dsvd_inv(1,1) = 1/Dsvd_inv(1,1); Dsvd_inv(2,2) = 1/Dsvd_inv(2,2); close all Y = V'*Xt; Y2 = pinv(Vtr'*M)*V'*Xt; % Y2 is {vectors X such that Vtr'*M*X is orthogonal projection on Im(Vtr'*M)} Y3 = V(idxRej,:)'*Xt(idxRej,:); % rejected PC Y4 = V'*M*Y2; % ASR result Y5 = Vtr'*M*Y2; % PCA rejection result Y6 = Vsvd*Dsvd_inv*Usvd'*V'*Xt; % note that Y6 is the same as Y2 figure plot3(Y(1,:),Y(2,:),Y(3,:),'bo') hold on plot3(Y2(1,:),Y2(2,:),Y2(3,:),'ro','MarkerSize',2) plot3(Y3(1,:),Y3(2,:),Y3(3,:),'go') plot3(Y4(1,:),Y4(2,:),Y4(3,:),'*','markersize',8,'color',[0.3010 0.7450 0.9330]) plot3(Y5(1,:),Y5(2,:),Y5(3,:),'*y','markersize',8) plot3(Y6(1,:),Y6(2,:),Y6(3,:),'*','markersize',8,'color',[0.9290 0.6940 0.1250]) grid on grid minor axis equal xlim([-10 10]) ylim([-10 10]) zlim([-10 10]) legend({'Original data' 'Vectors X such that Vtr''*M*X is orthogonal projection on Im(Vtr''*M)' 'Rejected PC' 'ASR result' 'PC rejection result' 'Same as ASR result'})

## How to control behaviors of SIFT movie? (05/02/2022 added)

This trick was reported by Hyeonseok Kim (Thank you very much!) With Matlab R2022a, SIFT's movie does not reflect user-specified changes of view angles . The movie shows a brain image that is slightly too close. He found solutions to address these problems. See below.

1. To make the view-angle change take effect, insert a short 'pause' here.

2. To zoom out a little bit, uncomment this line.

These were not issues with older versions of Matlab, of course.

## STUDY (Before EEGLAB14)

### How to obtain the optimum number of clusters (12/19/2021 updated)

This solution consists 1) Obtain xyz coordinates of estimated equivalent dipoles from all .set files loaded and stored in ALLEEG streucture, 2) Run Matlab *evalclusters*() to perform optimization between the range of 5-15 using three algorithms (Calinski-Harabasz, Silhouette, Davies-Bouldin). Note that I am only using dipole location for the clustering criterion (if you wonder why, see this page.) Accordingly, my code below is also separated into two sections.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Obtain all dipole xyz coordinates as a list. %%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% dipXyz = []; for subjIdx = 1:length(ALLEEG) % Obtain xyz, dip moment, maxProj channel xyz. xyz = zeros(length(ALLEEG(subjIdx).dipfit.model),3); for modelIdx = 1:length(ALLEEG(subjIdx).dipfit.model) % Choose the larger dipole if symmetrical. currentXyz = ALLEEG(subjIdx).dipfit.model(modelIdx).posxyz; currentMom = ALLEEG(subjIdx).dipfit.model(modelIdx).momxyz; % nAmm. if size(currentMom,1) == 2 [~,largerOneIdx] = max([norm(currentMom(1,:)) norm(currentMom(2,:))]); currentXyz = ALLEEG(subjIdx).dipfit.model(modelIdx).posxyz(largerOneIdx,:); end xyz(modelIdx,:) = currentXyz; end dipXyz = [dipXyz; xyz]; end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Optimize the number of clusters between the range 5-15. %%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Reduce data dimension using PCA. Use Matlab function evalclusters(). kmeansClusterIdxMatrix = zeros(size(dipXyz,1),11); meanWithinClusterDistance = nan(11+4,11); for clustIdx = 1:11 [IDX, ~, SUMD] = kmeans(dipXyz, clustIdx+4, 'emptyaction', 'singleton', 'maxiter', 10000, 'replicate', 100); kmeansClusterIdxMatrix(:,clustIdx) = IDX; numIcEntries = hist(IDX, 1:clustIdx+4); meanWithinClusterDistance(1:clustIdx+4, clustIdx) = SUMD./numIcEntries'; end eva1 = evalclusters(dipXyz, kmeansClusterIdxMatrix, 'CalinskiHarabasz'); eva2 = evalclusters(dipXyz, kmeansClusterIdxMatrix, 'Silhouette'); eva3 = evalclusters(dipXyz, kmeansClusterIdxMatrix, 'DaviesBouldin'); figure subplot(2,2,1) boxplot(meanWithinClusterDistance) set(gca, 'xticklabel', 5:15) xlabel('Number of clusters') ylabel('Mean distance to cluster centroid') subplot(2,2,2) plot(eva1); title('CalinskiHarabasz'); subplot(2,2,3) plot(eva2); title('Silhouette'); subplot(2,2,4) plot(eva3); title('DaviesBouldin');

### How to transfer precomputed STUDY (10/11/2018 updated)

I found that just copying all the files from one folder to another does not copy the precomputed results. After investigation, I realized that this is because when STUDY is pre-computed (i.e., ERP, PSD, ERSP, etc), the generated files are stored in the HDD and linked with STUDY.design.filepath and STUDY.design.cell.filebase. If these are the all the things that needs to be updated, we can do it using the following code. After loading the copied STUDY, run the code, and save the current STUDY with a different name (just in case). Load the saved STUDY and everything should be there. At least it worked for me.

for designIdx = 1:length(STUDY.design) % Obtain the STUDY.design filepath with which the original data were precomputed. currentDesignFilepath = STUDY.design(designIdx).filepath; % Obtain the STUDY.design filepath which is current. newDesignFilepath = '/data/mobi/Christiane/p01_study'; % Replace the original path with the current path. STUDY.design(designIdx).filepath = newDesignFilepath; for cellIdx = 1:length(STUDY.design(designIdx).cell) % Obtain the STUDY.design.cell filepath with which the original data were precomputed. currentCellFilepath = STUDY.design(designIdx).cell(cellIdx).filebase; if strncmp(currentCellFilepath, currentDesignFilepath, length(currentDesignFilepath)); % Obtain the STUDY.design filepath which is current. oldFilepathRemoved = currentCellFilepath(length(currentDesignFilepath)+1:end); newDesignFilepathConcatenated = [newDesignFilepath oldFilepathRemoved]; % Replace the original path with the current path. STUDY.design(designIdx).cell(cellIdx).filebase = newDesignFilepathConcatenated; else error('Oops, the original filepath did not match.') end end end

### How to load .set files to create STUDY (02/27/2020 updated)

Typically, you need to enter subject names. Sometimes, group names as well. Rarely in may case, condition names. Entering these info one by one on GUI is horrible, particularly if you have many data sets. Here is code to do it automatically. It is not only efficient but it also eliminates human errors YOU will make. After loading all the .set files, use EEGLAB GUI 'Using all loaded dataset' to create a STUDY. This code is also a good example for the basic use of 'for loop' for batch processing.

% Obtain all .set file under /data/makoto/exampleProject/. % In this example, suppose all set files have names like subj123_group2.set targetFolder = '/data/makoto/exampleProject'; allSetFiles = dir([targetFolder filesep '*.set']); % filesep inserts / or \ depending on your OS. % Start the loop. for setIdx = 1:length(allSetFiles) % Obtain the file names for loading. loadName = allSetFiles(setIdx).name; % subj123_group2.set dataName = loadName(1:end-4); % subj123_group2 % Load data. Note that 'loadmode', 'info' is to avoid loading .fdt file to save time and RAM. EEG = pop_loadset('filename', loadName, 'filepath', targetFolder, 'loadmode', 'info'); % Enter EEG.subjuct. EEG.subject = dataName(1:7); % subj123 % Enter EEG.group. EEG.group = dataName(9:14); % group2 % Store the current EEG to ALLEEG. [ALLEEG, EEG, CURRENTSET] = eeg_store(ALLEEG, EEG, 0); end eeglab redraw % This is to update EEGLAB GUI so that you can build STUDY from GUI menu.

### How to obtain more robust clustering result (10/24/2020 added)

This is for Juan Molina--The EEGLAB's default parameter setting for kmeans() may allow noticable variability in performing IC clustering in multiple times (except for the order of the clusters--it varies inevitably, but I'm talking about robustness of the IC-clustering pattern). We can actually do better than the default parameters. In pop_clust() line 184, there is a line of code that performs kmeans which I suggest to change as follows.

%[IDX,C,sumd,D] = kmeans(clustdata,clus_num,'replicates',10,'emptyaction','drop'); [IDX,C,sumd,D] = kmeans(clustdata,clus_num,'replicates',100,'emptyaction','drop','MaxIter',10000);

The above change only costs one-second longer processing time. If you want, you may increase the number for 'replicates' more generously.

## Example of batch code to preprocess multiple subjects (03/02/2021 updated)

See this page for detail of the processes. Note that this is NOT an official SCCN opinion. Data are not epoched here.

%% Example of batch code to preprocess multiple subjects % Step 1: Change the option to use double precision. cd('/data/project/example') rawDataFiles = dir('*.bdf'); for subjID = 1:length(rawData) loadName = rawDataFiles(subjID).name; dataName = loadName(1:end-4); % Step2: Import data. EEG = pop_biosig(loadName); EEG.setname = dataName; % Step 3: Downsample the data. EEG = pop_resample(EEG, 250); %EEG = pop_resample(EEG, 250, 0.8, 0.4); % For SIFT to ease antialiasing filter slope. % Step 4: High-pass filter the data at 1-Hz. Note that EEGLAB uses pass-band edge, therefore 1/2 = 0.5 Hz. EEG = pop_eegfiltnew(EEG, 1, 0, 1650, 0, [], 0); % Step 5: Import channel info EEG = pop_chanedit(EEG, 'lookup','[EEGLABroot]/eeglab/plugins/dipfit2.3/standard_BEM/elec/standard_1005.elc','eval','chans = pop_chancenter( chans, [],[]);'); % Step 6: Remove line noise using CleanLine %EEG = pop_cleanline(EEG, 'bandwidth', 2,'chanlist', [1:EEG.nbchan], 'computepower', 0, 'linefreqs', [50 100 150 200 250],... % 'normSpectrum', 0, 'p', 0.01, 'pad', 2, 'plotfigures', 0, 'scanforlines', 1, 'sigtype', 'Channels', 'tau', 100,... % 'verb', 1, 'winsize', 4, 'winstep', 4); % Step 6: Run cleanLineNoise (requires PREP Pipeline toolbox) The original CleanLine was replaced (04/25/2020). signal = struct('data', EEG.data, 'srate', EEG.srate); lineNoiseIn = struct('lineNoiseMethod', 'clean', ... 'lineNoiseChannels', 1:EEG.nbchan,... 'Fs', EEG.srate, ... 'lineFrequencies', [60 120 180 240],... 'p', 0.01, ... 'fScanBandWidth', 2, ... 'taperBandWidth', 2, ... 'taperWindowSize', 4, ... 'taperWindowStep', 1, ... 'tau', 100, ... 'pad', 2, ... 'fPassBand', [0 EEG.srate/2], ... 'maximumIterations', 10); [clnOutput, lineNoiseOut] = cleanLineNoise(signal, lineNoiseIn); EEG.data = clnOutput.data; % Step 7: Apply clean_rawdata() to reject bad channels and correct continuous data using Artifact Subspace Reconstruction (ASR). % Note 'availableRAM_GB' is for clean_rawdata1.10. For any newer version, it will cause error. originalEEG = EEG; EEG = clean_rawdata(EEG, 5, -1, 0.85, 4, 20, 0.25, 'availableRAM_GB', 8); % Step 8: Interpolate all the removed channels EEG = pop_interp(EEG, originalEEG.chanlocs, 'spherical'); % Step 9: Re-reference the data to average EEG.nbchan = EEG.nbchan+1; EEG.data(end+1,:) = zeros(1, EEG.pnts); EEG.chanlocs(1,EEG.nbchan).labels = 'initialReference'; EEG = pop_reref(EEG, []); EEG = pop_select( EEG,'nochannel',{'initialReference'}); % Step 10: Run AMICA using calculated data rank with 'pcakeep' option dataRank = sum(eig(cov(double(EEG.data'))) > 1E-6); % 1E-6 follows pop_runica() line 531, changed from 1E-7. runamica15(EEG.data, 'num_chans', EEG.nbchan,... 'outdir', ['/data/projects/example/amicaResults/' dataName],... 'pcakeep', dataRank, 'num_models', 1,... 'do_reject', 1, 'numrej', 15, 'rejsig', 3, 'rejint', 1); EEG.etc.amica = loadmodout15(['/data/projects/example/amicaResults/' dataName]); EEG.etc.amica.S = EEG.etc.amica.S(1:EEG.etc.amica.num_pcs, :); % Weirdly, I saw size(S,1) be larger than rank. This process does not hurt anyway. EEG.icaweights = EEG.etc.amica.W; EEG.icasphere = EEG.etc.amica.S; EEG = eeg_checkset(EEG, 'ica'); % Step 11: Estimate single equivalent current dipoles % Note: if ft_datatype_raw() complains about channel numbers, comment out (i.e. put % letter in the line top) line 88 as follows % assert(size(data.trial{i},1)==length(data.label), 'inconsistent number of channels in trial %d', i); % CASE1: If you are using template channel locations for all the subjects: % -> Perform 'Head model and settings' from GUI on one of the % subjects, then type 'eegh' to obtain 9 parameters called 'coord_transform' % For example, my 32ch data has [0.68403 -17.0438 0.14956 1.3757e-07 1.0376e-08 -1.5708 1 1 1], therefore % % coordinateTransformParameters = [0.68403 -17.0438 0.14956 1.3757e-07 1.0376e-08 -1.5708 1 1 1]; % % CASE2: If you are using digitized (i.e. measured) channel locations that have 10-20 system names (Fz, Cz, ...) % -> Calculate the invidivualized transform parameters usign all scalp channels in this way % % [~,coordinateTransformParameters] = coregister(EEG.chanlocs, '[EEGLABroot]/eeglab/plugins/dipfit2.3/standard_BEM/elec/standard_1005.elc', 'warp', 'auto', 'manual', 'off') % % CASE3: If you are using digitized channel locations that do NOT have 10-20 system names % -> Identify several channels that are compatible with 10-20 channel locations, rename them with 10-20 system labels, perform CASE2, then rename them back. % Alternatively, rename the fiducial channels under EEG.chaninfo.nodatchans into 'Nz', 'LPA', 'RPA' accordingly, then use these fiducial channels to perform CASE2 in this way % % [~,coordinateTransformParameters] = coregister(EEG.chaninfo.nodatchans, '[EEGLABroot]/eeglab/plugins/dipfit2.3/standard_BEM/elec/standard_1005.elc', 'warp', 'auto', 'manual', 'off') templateChannelFilePath = '[EEGLABroot]/eeglab/plugins/dipfit2.3/standard_BEM/elec/standard_1005.elc'; hdmFilePath = '[EEGLABroot]/eeglab/plugins/dipfit2.3/standard_BEM/standard_vol.mat'; EEG = pop_dipfit_settings( EEG, 'hdmfile', hdmFilePath, 'coordformat', 'MNI',... 'mrifile', '[EEGLABroot]/eeglab/plugins/dipfit2.3/standard_BEM/standard_mri.mat',... 'chanfile', templateChannelFilePath, 'coord_transform', coordinateTransformParameters,... 'chansel', 1:EEG.nbchan); EEG = pop_multifit(EEG, 1:EEG.nbchan,'threshold', 100, 'dipplot','off','plotopt',{'normlen' 'on'}); % Step 12: Search for and estimate symmetrically constrained bilateral dipoles EEG = fitTwoDipoles(EEG, 'LRR', 35); % Step 13: Run ICLabel (Pion-Tonachini et al., 2019) EEG = iclabel(EEG, 'default'); % Save the dataset EEG = pop_saveset(EEG, 'filename', dataName, 'filepath', '/data/projects/example'); end

### What do those AMICA parameters mean? (04/06/2018 updated)

- 'do_reject', 1

This means 'Reject the outliers for the AMICA model being computed'.

- 'numrej', 15

This means 'Do the rejection for 15 iterations' Note that these datapoint rejection is only valid for AMICA computation, so it does not change the data length. If you want to know which data points were rejected, you can check EEG.etc.amica.Lht. If any datapoint shows 0, it means the datapoints were rejected by AMICA. Note also that you might want to use this info for further data cleaning if you don't mind rejecting randomly distributing single datapoints.

- 'rejsig', 3

This means 'Use rejection criterion as SD==3' Note that the outlier criterion and datapoints are updated for every iteration.

- 'rejint', 1

This means 'Use the rejection interval == 1 (i.e., for the first 15 continuous iterations, perform this outlier rejection.'

The reason for choosing these parameters are

- In most cases, the above parameters rejects total of 10% of data points.
- This set of parameters and options were recommended by my colleague Jason, the developer of AMICA.

### Recommended preprocessing steps and order (05/11/2020 updated)

I suggest one go through the processes explained in this page in the following order:

- Example of batch code to preprocess multiple subjects
- How to perform component selection using ICLabel() and dipole information
- How to perform epoch rejection using single-trial PSD

This is because ICLabel() has dependency on IC-scalp maps (but not dipole information, thanks Luca!), and the epoch rejection function works on single trials of ALL ICs. This means that even if one of the all ICs shows a bad behavior in an epoch, the entire epoch will be rejected. This is why you want to pre-select brain ICs only, otherwise this epoch rejection may reject too many epochs.