# EEGLAB/RERP

rERP is an open source Matlab toolbox for calculating overlapping Event Related Potentials (ERP) by multiple regression (an alternative to averaging).

## Contents |

# Download

You can download the latest version toolbox here. We rarely upload the latest version to the EEGLAB extension manager, so best to get the latest version there. Unzip it in the **plugins folder** of EEGLAB (10.2.5.5b or later, otherwise will likely encounter errors). If EEGLAB is already running you may type:

>> eeglab rebuild

to load the toolbox. You should see ‘rERP’ menu under Tools once you have loaded a dataset.

Report an issue or make a suggestion

# References

[1] M. D. Burns, N. Bigdely-Shamlo, N. J. Smith, K. Kreutz-Delgado, and S. Makeig, "Comparison of Averaging and Regression Techniques for Estimating Event Related Potentials," presented at the IEEE Engineering in Medicine and Biology Conference, 2013. [endnote] [bibtex] [download pdf]

[2] N. J. Smith and M. Kutas, "Regression-based estimation of ERP waveforms: II. Non-linear effects, overlap correction, and practical considerations." [endnote] [bibtex] [download pdf]

[3] N. J. Smith. (2011). Scaling up psycholinguistics. [endnote] [bibtex] download pdf

[4] N. Bidely-Shamlo, K. Kreutz-Delgado, M. Miyakoshi, M. Westerfield, T. Bel-Bahar, C. Kothe, et al., "Hierarchical Event Descriptor (HED) Tags for Analysis of Event-Related EEG Studies," IEEE GlobalSIP, Austin, TX, 2013.

# Quick Start (read first)

Load a continuous (un-epoched) dataset or study in EEGLAB and select the Tools-rERP-Run analysis. This will bring up the rERP setup GUI.

If you haven’t used this toolbox before, the “Select profiles:” list should be empty. Select one of the datasets from the list and click “Make profile for dataset” at the bottom. Once that completes, click “Edit profile” at the bottom right. This will bring you to the profile GUI: where you specify the regression settings.

If you have not computed ICA on that dataset, hit the “Switch to channels button”. By default, it operates on ICs. If you have less than 8GB of memory available, change the artifact function name from “rerp_reject_samples_robcov” to “rerp_reject_samples_probability”. rerp_reject_samples_robcov is more powerful, but more expensive.

Under “Included event types” you will see a comprehensive list of all event codes extracted from the EEG struct. By default, all event types are included as variables in the regression model. Be sure to remove any redundant or superfluous event types by selecting them and pressing the “Remove >>” button.

Press the “Run” button in the bottom right corner. If you have Parallel Computing Toolbox, be sure to start a pool: the program will parallelize over the individual time series. When it finishes, go back to the EEGLAB GUI and select Tools-rERP-Plot results. This will bring up the plotting GUI.

Under the “Results” heading, you see the result of the analysis you just ran.

Under the “Plot types” heading, you see a list of possible plots to perform for that result; select one of them. A more detailed description of the different plots is available below.

Under the “Components (R-Squared)” or “Channels (R-Squared)” heading, you will see a list of channels or ICs with their average R-Squared value in parentheses. You can choose multiple time series at once and it will plot them all.

Under the “Event types” heading, you will see a list of event types for that result. You can choose multiple event types at once and it will plot them all.

Select “Exclude insignificant”, to plot only the ERP estimates which resulted in a statistically significant R-squared.

Hit the plot button and a new plot window will appear. You can select another result and hit plot again to plot multiple datasets on the same axes. You will have to hit “Clear figure” if you want to plot different time series or event types as each plot is simply overlaid on the previous one.

Right click any graph to save it as an image.

# Description

Traditionally, the way of calculating an ERP estimate is to average epochs time-locked to a stimulus class of interest. This technique places severe restrictions on the experimental protocol: only a small number of stimulus categories can be used, stimulus events must be well separated in time and all other cognitive processes must be held constant. Violating the latter conditions will cause the ERP to be estimated sub-optimally.

[1] demonstrated that a multiple regression model is capable of explaining more experimental data variance than averaging for overlapping ERPs. By adopting a simple regression model over averaging, we make an additional assumption: that the observed EEG signal is a linear combination of ERPs added to white Gaussian noise. Under this new assumption, ERPs combine additively when they overlap in time. This is also known as a Linear Time Invariant (LTI) model.

We can illustrate the concept graphically as a convolution of impulsive inputs with the characteristic brain response (ERP):

We can rewrite the convolution as a sum of matrix-vector products. Here, *A_i* is the matrix which predicts the occurrence of event type *i* and *beta_i* is the ERP vector for event type *i*. We stack the ERP variables into a single vector beta and consolidate the predictors for the different event types into a single predictor matrix, *A*. Now, *y = A*beta* and we can solve for beta using our favorite technique, such as Penalized Least Squares.

The rERP toolbox includes:

- Routines for constructing sparse predictor matrix and solving for
- GUI interface to EEGLAB
- Time-frequency analysis (rERSP) analogous to ERSP
- [HED Tag] (hierarchical regression) support
- Automatic artifact rejection using [cleanrawdata] or [probability]
- Convex regularization (LASSO, Ridge, ElasticNet) with grid search
- Tools for plotting and publishing results
- Statistical significance derived from R-squared

# GUI

## Setup GUI

You can start the setup GUI from EEGLAB as described in the “Quick Start” section above, or enter at the command line

>> rerp_result_study = pop_rerp_study(‘eeg’, EEG);

or

>> rerp_result_study = pop_rerp_study(‘study’, STUDY);

You have datasets on the left and profiles on the right. All datasets you select will be analyzed with all profiles you select. Selecting 3 datasets and 2 profiles will yield 6 results total.

See [Toolbox Structure and Scripting] section for more information on how profiles are used in the toolbox.

Select datasets to analyze:

Add/Clear datasets:

Make profile for dataset:

Select profiles:

Add/Clear profiles:

Edit profile:

Cancel/Run:

Make profile from dataset:

## Profile GUI

From the top:

**Auto-save results:** toolbox will save results to disk automatically at the path to the right. You can change the path manually or by pressing “Browse path”.

**rERSP:** Do a log-power time-frequency decomposition on the data, then perform linear regression on each frequency-bin time-course. “Number of bins” defines the frequency resolution of the transform. Analogous to ERSP, but applies regression (instead of averaging) to time-frequency decomposed data.

**Include/Exclude components/channels:** Select which time series are to be analyzed. You can change the selection method by pressing the toggle buttons to the right.

**Category/Continuous epoch boundaries:** Choose the epoch window boundary, relative to event onset. See the Description section above for information about Continuous Variables.

**Artifact rejection:** If you have already removed artifact data samples by some other means, deselect. Otherwise leave it checked. The toolbox will automatically ensure that artifact frames have been identified and excluded from regression. Once a profile has been updated with artifact indexes, it will display the number of frames identified.

**Artifact function:** function used to identify artifact indexes. It should follow the prototype artifact_indexes = artifact_function(EEG).

**Include/Exclude event types:** The event types in the “Include” list will be variables in regression. This applies even when doing hierarchical regression with HED tags: only events of the included type are considered. While selecting event types, you can hit enter, which will bring up a window for entering descriptions of the event types. This will be useful later for plotting.

**Hierarchical regression:** If the EEG.event(:).hedTag field has been populated with HED strings, you have the option of using HED tags as regression variables in lieu of event types. See the section [“About HED tags”] for more information about what HED tags are and when you should consider using them.

**Enforce HED specification:** If selected, the toolbox will throw an error if the HED tags do not conform to the [HEDIT project] specification. Useful for comparing cross-study results.

**Display HED hierarchy:** show the structure of the HED hierarchy for this profile.

**Include HED tags:** The hierarchical version of “Include event types”. All tags in this list are treated as variables in regression. Some of these tags will have markings, such as stimulus/instruction in the above screenshot. * stimulus/instruction * indicates that tag has been excluded from regression (because it is redundant with stimulus/instruction/fixate). Similarly, tags enclosed with { } are affected by separator tags and tags enclosed in [ ] are combined into a single Continuous variable.

**Exclude HED tags:** are excluded from regression

**Include/Exclude:** click to add selected tags to the respective list

- We display a “parameter count” as a quick check on how many variables the model is generating. It is quite possible to generate highly collinear model, which overtaxes the data with a relatively low parameter count [2]. Future versions may include [Variance Inflation Factor] analysis to give a better idea of how hard we are pushing the data.

**Regularization (recommended):** Enable penalized regression. Prevents overfitting of data. The toolbox can use variants of L1 and L2 norm penalty function. It should be noted that these penalized estimators are biased, which could affect inferential results based on those estimates.

**Grid search (recommended):** Find the regularization constants by testing a grid of values. The constant which yields highest R-squared (i.e. best prediction of the data using cross-validation) is selected.

**Lambda:** If “Grid search” is disabled, we can enter a specific lambda to use. Always enter two values, even if only using one of them.

**Penalty function:** You probably want to use L2 norm. This is the fastest method and it has not been shown that the L1 Norm and ElasticNet variants are any better for ERP regression purposes. L1 and ElasticNet are considered experimental.

**Load profile/Save profile:** This is a time saver. If you want to try many analyses on a single dataset to see what works, you can hit “Force recompute artifact frames”, then save variants of the profile that you want to run.

**Set default profile:** When generating new profiles, these settings will be transferred.

**Cancel/OK:** Cancel will cancel all changes to the profile since you opened the profile GUI. OK keeps the changes.

**Set advanced options:** Set advanced options for grid search.

**Number of grid zoom levels:** We can do multiple levels of grid search, zooming in
on the local optima that we find. Two is a conservative number. One is just a normal “flat” grid search.

**Number of grid points:** After the first level search, how many points to generate around the local optimal value in deeper levels.

**Number of cross-validation folds:** Number of cross-validation folds to use when computing R-squared. Used in grid search and final results regardless of whether regularization is enabled.

**First phase lambda:** The exact lambda values to use for the first grid search. This determines where the grid search will look for lambda in subsequent levels as well. Default is set to a wide range of positive, exponentially increasing lambda values.

**ElasticNet quick zoom (experimental):** There is preliminary evidence that optimal ElasticNet lambda (combined L1 and L2 norm) R-squared values are achieved around the optimal separate L1 and L2 norm lambda. This finds the optimal L1 and L2 lambda values first, then does the full grid search around those values (O() vs O().

**Save all grid search results:** If you want to explore the predictive surface of the grid search, this will save every result that the grid search generates.

## Result GUI

**Results:** List of results to b plotted. Select one result at a time to be plotted. In future releases, multiselect will be enabled to combine multiple results for statistics.

**Plot type:** List of available plots will change depending on which result is selected. The possible plots are

- Rerp by event type: Make a single plot for each event type/HED tag selected. If multiple time series are selected, they are plotted on the same axis.

- Rerp by channel/component: Make a single plot for each time series selected. If multiple event types/HED tags are selected, they are plotted on the same axis.

- R2 total: Plot R-Squared (coefficient of determination) for each time series taking into account all event types/HED tags. This is essentially the percentage of total data variance predicted by a linear model.

- R2 by event type: Plot R-Squared (coefficient of determination) for each time series taking into account each event type/HED tag separately. This is essentially the percentage of total data variance accounted for by a linear model with only that event type/HED tag.

- Rerp image: Plot original data epochs, modeled data epochs and difference (noise) epochs next to each other.

- Rersp: Analogous to ERSP, but applies linear regression (instead of averaging) to time-frequency decomposed data (dB power).

- Grid search: View the predictive surface for the grid search. Shows how performance changes by varying lambda. For multilevel grid search, this produces as many graphs as there are levels.

**Rerp image boundary:** When Rerp image plot type is selected, this determines the epoch window.

**p < :** For plots which identify statistical significance (i.e. Rerp by event type, Rerp by channel/component), determines p-value threshold for significance.

**Channels/Components (R2):** List of time-series that were analyzed. It is possible to plot multiple time-series at once.

**Sort by R2:** Sort the Channels/Components list by total R2. Shows the components which had more variance accounted for at the top.

**HED tags/Event types:** List of variables that were included in the regression model. It is possible to plot multiple tags/types at once.

**Exclude insignificant:** Only plot waveforms which had statistically significant R2.

**Locking/Sorting variable:** For Rerp image. Two variables must be selected, locking and sorting. Use this toggle button to cycle between selecting each variable.

**Load results/Save result as:** Load multiple results from other locations. By default, the GUI loads all results in the rERP/results folder. Resave a particular result to another location.

**Display profile:** Display the profile used to compute the currently selected result.

**Clear figure:** Clear the plotting window of all plots.

**Plot:** Plot results based on currently selected options in the GUI. You can plot the results of multiple datasets on top of each other by hitting plot in between changing the results (results must be compatible for this to work properly, i.e. be from the same profile). Combined statistics from multiple results will be available in future releases by selecting multiple results; currently on one result may be plotted at a time.

# Toolbox Structure and Scripting

The best, most up to date reference for specific functions and classes can be found using the Matlab documentation system (e.g. doc RerpProfile; ). Simple examples of scripting the toolbox can be found in rerp_example.m

The basic premise is to consolidate all information needed to specify a regression into one object: a profile (of RerpProfile class). The only information not specified in the profile is the data. A profile is specific to the dataset it was formed from: it contains event timings and artifact indexes, as well as generic settings such as epoch boundaries and penalty functions. The profile and EEG struct (containing the data) are sent to the pop_rerp function which performs checks on the system and ultimately calls the rerp function. The pop_rerp function, after analyzing the data according to the profile, returns a result object (of RerpResult class).

The result object contains the results of the regression, as well as the profile that was used to derive it. The RerpResult class contains an RerpProfile class which contains a (generic) settings structure.

In general, it is easiest to use the GUI for modifying profiles and plotting results.

# About HED Tags

**Hierarchical Event Description (HED)** tags were proposed in [4] as a way to standardize EEG experimental event descriptions. One benefit of this approach is that it permits higher level analyses involving multiple studies and experiments. For regression purposes, we use hierarchical event descriptions to define hierarchical regression models. This is best demonstrated with a simple example.

The HED tag Stimulus/Expected/Target has three sub-tags: Stimulus, Stimulus/Expected and Stimulus/Expected/Target. Tagging an experimental event with Stimulus/Expected/Target is equivalent to tagging it with all three sub-tags. Say we have an experiment with two event types: event type 1 is the presentation of an image, event type 2 is the presentation of an image with a target that the subject is searching for. We tag event type 1 as Stimulus/Expected and event type 2 as Stimulus/Expected/Target. Both event types inherit the tag Stimulus/Expected, but only event type 2 is Stimulus/Expected/Target. Of course, we can also use tags to express non-hierarchical event descriptions. Tagging tag event type 1 as Stimulus/1 and event type 2 as Stimulus/2, we are back to using the standard event codes as our regression variables.

In the regression framework, we interpret the sub-tags as distinct variables, analogous to using event types as variables in nonhierarchical regression. In our example, we define a variable Stimulus/Expected (any time event types 1 or 2 occur) and Stimulus/Expected/Target (any time event type 2 occurs). When we include both variables in a regression model, we are effectively modeling the ERP of event type 2 as the sum of the responses to the sub-tags. Another way to put it: event type 2 is the response to being shown any image (Stimulus/Expected) plus an additional response due to the fact that the image is a target (Stimulus/Expected/Target). We can interpret the “additional response” to the target image as a recognition event, or P300 response. We have used the overlap modeling capability of regression to decompose an ERP into different layers which combine additively to produce the overall effect.

Another benefit of hierarchical event descriptions is that each experimental event is allowed to be defined in many different ways simultaneously. Each event is tagged with a HED string (e.g. EEG.event( i ).hedTag = ‘Stimulus/Expected; Stimulus/Visual; Custom/Block/1; Custom/Reaction time/2.33’ ) which is composed of many HED tags separated by comma or semicolon. This allows us to capture very specific information about each event, while still retaining the ability to reference variables which are common across many events. The toolbox will compile the HED strings from all experimental events into a master hierarchy. From there, the user can pick out which tags to include in the regression model.

## Special HED Tags

**Separator tags:** The toolbox automatically identifies tags with the **|** sublevel as a separator tag (e.g. **Custom/Block/|/1**, **Custom/Block/|/2**). If we want to separate certain events into different categories, we can add separator tags to their HED strings. For example, say we want to divide the data into two blocks and see how the rERP estimates are different between the blocks. Perhaps the experimental conditions changed between the two blocks. We can add the tag **Custom/Block/|/1** to the HED strings of all events in the first block and **Custom/Block/|/2** to all events in the second block. By including **Custom/Block** in the regression model, it two separate variable spaces for all of the included tags; one for each block. The number of parameters to estimate effectively doubles. If there were three blocks, it would triple, and so on. Tags which have been separated in the variable space are marked with **{{ }}** in the "Include HED tags" list, and show the separator tag in parentheses (e.g. **{ { Custom/Pulse (Custom/Block) } }**). The separator tag is shown in **{ }**, (e.g. **{ Custom/Block }**.

**Continuous tags:** The toolbox automatically identifies tags with the **#** sublevel as a continuous tag (e.g. **Custom/Mag/#/.22**, **Custom/Mag/#/.000001**). We may believe that the amplitude of the evoked response is related to another continuous variable that we have access to (e.g. intensity, acceleration). If so, we can tag each event with the value of that continuous variable when the event occurs (e.g. **Custom/Mag/#/2.5** ). This models the scaled responses within the regression framework. The relationship between the continuous variable and the amplitude of the response need not be linear. If you choose to make a tag continuous, be sure to include the #/ level in every single instance of that tag.

# Feature Backlog

- Variance Inflation Factor (VIF) analysis - Ability to perform regression in different basis (e.g. spline, Gaussian) - Statistics: significance values for ERP and ERSP estimates (glm-ie integration)

# Credits

Developed and Maintained by: Matthew Burns (SCCN, INC, UCSD) Email: rerptoolbox (at) gmail (dot) com |

Significant contributions from: [Nathaniel J. Smith] [Nima Bigdely-Shamlo] [Luca Pion-Tonachini] [Christian Kothe] [Scott Makeig]

Supported by a gift from The Swartz Foundation (Old Field NY).