[Eeglablist] Exploring ICA clustering

Scott Makeig smakeig at gmail.com
Thu Jun 22 08:32:08 PDT 2006


Larry & Heleen -

Thanks for raising detailed questions (copied below) about independent
component clustering. Clustering of data points (here, components) in
multidimensional space may have no one solution in many cases. The approach
taken by the EEGLAB 5.02 functions is to allow users to build different
'component distance' metrics and then cluster components in the resulting
'component space.'

Our experience to date suggests:

- Involving more types of component information in building the distance
measure is useful: particularly dipole locations, spectra, ERSPs, ITCs, and
if relevant, ERPs.  This should be the case even when the purpose of the
study were 'only' to decompose ERPs, since ICA components are here actually
components of the ongoing EEG, so to match them across subjects it is best
to make use of their EEG activity differences.

- Clustering on multiple conditions can reveal component activity
similarities and differences.

- Use numbers of dimensions to control the relative influence of different
types of information on the clustering. In the case of dipole locations,
this number is limited to three - therefore, use relative weight (default:
10) to increase the influence of equivalent dipole location information.

- If multiple activity measures are used, it is not likely that
quasi-dipolar components (eyes, muscles) will be assigned to clusters
accounting for cortical activity.

- Nima here believes that the number of final dimensions should be
relatively large (e.g., say 20 rather than the current default 10).

- The best number of clusters to use is 'enough' :-) ... This may not be
gleaned from the ERP or any existing literature.

- The software allows hierarchic sub-clustering. I do not have a good
feeling for the utility of this in practice.

- Using outlier rejection can be useful. Rather than trying to cluster (e.g.)
*all* the near-dipolar components (for instance, with r.v. < 18%), a
worthwhile goal may be to reveal clusters of components that are found in
many or most of the subjects.

- To me, there is no wrong in hand editing clusters using the cluster
editing window, if the criteria used are reasonable, and reported. If
useful, one might think of translating the criteria used into further
features used in the clustering - we are thinking how to facilitate adding
new clustering features. In practice, with good data and data
decompositions, and enough subjects, relatively little hand editing may be
needed to obtain quite 'reasonable' results.

- Interesting problems may emerge from clustering across subjects. We found
one cluster of components from only about 70%  of our subjects. When we
looked for components with nearby equivalent dipoles in the other subjects,
we found a distinct class of components (distinct dipole distribution and
ERSPs, though similar ERPs). We are still considering whether and how to
publish this result -- which may well have been overlooked in the long
history of ERP research. Does this suggest differences in stimulus
processing in the two subject subgroups, or ??

Exploring one's data may bring to light interesting questions such as these.
Yes, one may become bogged down in such questions -- Else, one may use them
to make important discoveries. Managing one's use of such 'new ideas from
the data' is tricky however - definitely a question of time/resource
management. For instance, the observation above could be truly validated
only by examining multiple studies, ideally in the same subjects, to confirm
and better define the subgroup differences and their beahvioral correlates,
if any.

Again, however, the standard alternative -- identifying 'my Cz = your Cz',
etc., has no claim to ultimate veracity, since relative source strengths and
cortical orientations may differ across subjects -- meaning 'my Cz' receives
a rather different mixture of source activities than 'your Cz,' a
possibility ignore by e.g. standard grand averaging across subjects.

- Ultimately, cluster analysis should report only regions of significantly
above-chance component density in the 'component cloud.' We have software
for testing and visualizing this, but have not yet applied it to the full
clustering problem. The question here would be what null hypothesis to test
against. Julie Onton's poster from HBM 2005 presented evidence strongly
suggesting that clusters of independent components (there, assesed in 3-D
equivalent dipole space only) differ depending on the task the subjects are
performing.

But across 'all' tasks, what should the 'default' distribution be -- e.g.,
What is the default 'universe' of component dipole position density in the
brain? Then, after adding activity-based measures to the cluster space, what
should the default distribution be, against which the observed local dipole
densities could be tested?  Here, simple assumptions (e.g., 'a uniform
distribution') could be used to give 'significance' values, but quite
possibly not realistically.

To demonstrate statistical significance for the many component clusters
present in nearly any (living) subject and subject group might well require
huge numbers of subjects. For this reason I am looking forward to exploring
component lcustering of very large EEG datasets.

In any case, more subjects is better -- but only *when* the subjects are
drawn (randomly) from 'the same' distribution. This might not be the case.
For example, early volunteer subjects might be 'eager beavers', while late
subjects recruited 'to increase the N' might be 'from the bottom of the
barrel' in some sense... A feature I want to add to the EEGLAB clustering
software is a visualization of the subject distribution based on component
cluster differences. Simply assuming that the subjects form a gaussian cloud
from a larger gaussian subject pool is naive and, I suspect, often
untrustworthy.

For all these reasons, ICA component clustering at present remains partly
exploratory, and should be accounted as such. Yet, I believe, it is an
important way forward for electrophysiology research.

Best wishes - think of posting any helpful tales of experience on the
eeglablist...

Scott Makeig

P.S. In spirit, all the above open questions are not different in kind than
deciding how to define the baseline for an ERP or ERSP measure... or
selecting a smoothing radius for applying 'Gaussian random field theory' to
fMRI data...

--------------------------------------------------------------------------------------------------------------------------------------------------

From:   Larry L Greischar <llgreisc at wisc.edu>
To:     eeglablist at sccn.ucsd.edu

We've been working with attentional blink task data (64 channel) for 20
subjects using EEGLAB (version 5.02) and are currently attempting to
cluster ICA components. There appears to be a wide range of results
produced depending on the chosen clustering parameters. Some experienced
advice on the following would be greatly appreciated:

weight and number of pre-clustering parameters

reasonable PCA dimensions for ERP, spectrum, ERSP/ITC

better to reduce the number of parameters used for pre-clustering than use
final dimension reduction

number of clusters to compute (Should this be based on ERP literature? Is
it reasonable to have a cluster with several components per subject?)

when using dipoles it appears to be necessary to rather carefully hand
screen artifact components (EMG components are sometimes modelled by
dipoles with RV < 10% and "good" components sometimes have relatively large
RVs ~30%) How much hand screening is necessary for good results?

Thanks in advance,

Larry Greischar
Heleen Slagter
University of Wisconsin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sccn.ucsd.edu/pipermail/eeglablist/attachments/20060622/396313a0/attachment.html>


More information about the eeglablist mailing list