![]()
EEGLAB Tutorial
VI. EEGLAB STUDY Structure
VI.1 The STUDY structure
EEGLAB variable "STUDY" is an EEGLAB Matlab structure available from the command line that contains descriptions and links to several EEGLAB datasets (for example, from a group of subjects performing the same task). The term "STUDY" indicates that these datasets should originate from a single experimental study. The "STUDY" structure contains information from each of the datasets plus additional information allowing to process all datasets simultaneously. The main additional information a "STUDY" structure contains is the result of component clustering across all datasets belonging to the study, assigning each of the ICA components in each dataset to a component cluster. Other additional information will be added as we develop this new structure. Below is a typical STUDY structure:
>> STUDY
STUDY =
dataset: [3 4 5 6]
filename: {'sub1.set' 'sub2.set' 'sub3.set' 'sub4.set'}
filepath: {1x4 cell}
cluster: [1x8 struct]
etc: [1x1 struct]
The 'dataset' field contains the current indices of the member datasets in the ALLEEG array of the current EEGLAB session. The 'filename' and 'filepath' fields contain the dataset filenames and full paths respectively. The 'cluster' field is an array of cluster structures as explained in more details below. The 'etc' field is a structure that holds additional information about the clustering procedure, like editing history.
VI.2 The STUDY.cluster sub-structure
The STUDY-array structure field stores information about the clustering results, methods and algorithms. Each STUDY set can have several clusters of components (for instance a cluster for eye blinks, a cluster for eye movements, a cluster for central alpha rhythms...). Each of the clusters is stored in a structure field, namely: STUDY.cluster(1), STUDY.cluster(2), etc... Typing STUDY.cluster at the Matlab workspace returns
>> STUDY.cluster
ans =
1x8 struct array with fields:
name
comps
sets
parent
child
centroid
preclust
algorithm
etc
All this information (including the clustering results) can be edited
manually from the command line or using the graphical interface function pop_clustedit().
The 'cluster.name' for each cluster is set by default to the cluster
number, but can be changed to any (more meaningful) name by the user.
To learn which ICA components belong to a cluster,
use the 'cluster.comps' and 'cluster.sets' fields:
'cluster.comps' holds the component numbers and
'cluster.sets' holds their corresponding dataset indices.
The 'cluster.preclust' field is a sub-structure with information on the pre-clustering,
which builds the data that is used for clustering.
This sub-structure includes the pre-clustering method(s), the corresponding parameters and the resulting data.
More information about the preclust structure is given in the following section.
For example, you might start by trying to cluster all the components (from all the dataset of the current
STUDY into two clusters
- one containing artifactual components (let's name it 'artifact') and the other one containing non-artifactual components (let's name it 'non-artifact').
This is only an illustrative example, till now we found it impossible to separate the 'artifact' components
from the non-artifact components by using this scheme (i.e. clustering all components into two clusters).
Then, you might pursue by performing further clustering on the 'artifact' cluster
components by attempting to separate 'eye' artifact from 'heart' and 'muscle' artifacts.
In this case the parent of the 'eye' artifact cluster will be the 'artifact' cluster (the
'eye' artifact cluster will have no child cluster). On the other hand, the 'artifact' cluster will
have three child clusters (but no parent cluster). A possible hierarchical decomposition is presented below.
The 'cluster.centroid' field holds the average value of the cluster.
The average given is the average of the information that the clustering was based on (the pre-clustering information).
Thus, when clustering on ERSPs, the centroid will be the average ERSP across the cluster components.
The 'cluster.algorithm' field holds the algorithm (for example - kmeans) and its parameters, that were used for clustering.
The clustering algorithm was run on the preclust data.
The 'cluster.etc' field contains temporary information that helps manage the STUDY structure
The 'cluster.parent' and 'cluster.child' fields are used for hierarchical clustering (see
image below). The 'cluster.child' field contains the cluster indices that were created by
clustering on this cluster components (and possibly - additional cluster
components). The 'cluster.parent' field contains the indices of the parent cluster(s).
Continuing with the same example, suppose that cluster 1 ('artifact') has
15 components from 4 datasets. The cluster structure will take the following values:
>> STUDY.cluster(1)
This cluster has no parent and 3 children. It was created by running the clustering 'algorithm'
'kmeans' on all the components. The algorithm was given the desired number of clusters, in this case 2 ('artifact' and 'non-artifact').
The data given to the clustering algorithm was in this case the component spectrum after its dimension was reduced by PCA.
This information is found in the cluster(1).preclust structure.
Thus the 'cluster.centroid' contains the mean spectrum of the 15 components in the cluster.
The preclust structure of the :
>> STUDY.cluster(1).preclust
>> STUDY.cluster(1).preclust.preclustparams
>> STUDY.cluster(1).preclust.preclustcommand
ans =
name: 'artifacts'
comps: [6 10 15 23 1 5 20 4 8 11 17 25 3 4 12]
sets: [1 1 1 1 2 2 2 3 3 3 3 3 4 4 4]
parent: []
child: [3 4 5]
centroid: [1x100 double]
preclust: [1x1 struct]
algorithm: {'kmeans', 2}
etc: []
VI.2.2 The cluster.preclust
The preclustdata field contains the data given to the clustering algorithm ('kmeans'). The data size is the number of ICA components on PCA dimensions. To prevent redundancy, only the values for the 15 components that are in this cluster were left in the data, the other components data held in the non-artifact cluster (in STUDY.cluster(2).preclust.preclustdata). The data is the components spectrum reduced to 10 principle components using runpca.
ans =
preclustdata: [15x10 double]
preclustparams: {{1x9 cell}}
preclustcommand: [1x119 char]
The preclustparams is an array of cell arrays. Each cell array contains a method string, which indicates
what type of component data was used in the clustering (e.g., component spectra ('spec'), component ersp ('ersp'), component equivalent dipole location ('dippos'), etc...), and the parameters of the method. For example:
The data was the ICA components spectrum in a specific frequency range ( 'freqrange' [3 30] ),
the spectrum dimension was reduced to 10 ( 'npca' [10]),
the principle components were normalized ( 'norm' [1]), and given a weight of 1. If more than one method is used for clustering then preclustparams will contain several cell arrays (i.e.: STUDY.cluster(1).preclust.preclustparams: {{1x9 cell} {1x9 cell}} ).
More details can be found in clustering tutorial.
ans =
Columns 1 through 7
'spec' 'npca' [10] 'norm' [1] 'weight' [1]
Columns 8 through 9
'freqrange' [1x2 double]
The preclustcommad holds the call to eeg_preclust() that created the preclust data. This is useful for scripting.
Typing STUDY.cluster(1).preclust.preclustcommand in Matlab produce:
ans =
[clustdata compind] = eeg_preclust(ALLEEG , { 'spec' 'npca' 10 'norm' 1 'weight' 1 'freqrange' [3 30] });
Writing EEGLAB scripts
Tutorial home
EEGLAB options