Larry & Heleen -<br><br>Thanks for raising detailed questions (copied below) about independent component clustering. Clustering of data points (here, components) in multidimensional space may have no one solution in many cases. The approach taken by the EEGLAB 

5.02 functions is to allow users to build different 'component distance' metrics and then cluster components in the resulting 'component space.' Our experience to date suggests: - Involving more types of component information in building the distance measure is useful: particularly dipole locations, spectra, ERSPs, ITCs, and if relevant, ERPs.  This should be the case even when the purpose of the study were 'only' to decompose ERPs, since ICA components are here actually components of the ongoing EEG, so to match them across subjects it is best to make use of their EEG activity differences.

- Clustering on multiple conditions can reveal component activity similarities and differences. - Use numbers of dimensions to control the relative influence of different types of information on the clustering. In the case of dipole locations, this number is limited to three - therefore, use relative weight (default: 10) to increase the influence of equivalent dipole location information.

<br><br>- If multiple activity measures are used, it is not likely that quasi-dipolar components (eyes, muscles) will be assigned to clusters accounting for cortical activity.<br><br>- Nima here believes that the number of final dimensions should be relatively large (

e.g., say 20 rather than the current default 10). - The best number of clusters to use is 'enough' :-) ... This may not be gleaned from the ERP or any existing literature. - The software allows hierarchic sub-clustering. I do not have a good feeling for the utility of this in practice.

- Using outlier rejection can be useful. Rather than trying to cluster (e.g.) *all* the near-dipolar components (for instance, with r.v. < 18%), a worthwhile goal may be to reveal clusters of components that are found in many or most of the subjects.

- To me, there is no wrong in hand editing clusters using the cluster editing window, if the criteria used are reasonable, and reported. If useful, one might think of translating the criteria used into further features used in the clustering - we are thinking how to facilitate adding new clustering features. In practice, with good data and data decompositions, and enough subjects, relatively little hand editing may be needed to obtain quite 'reasonable' results.

<br><br>- Interesting problems may emerge from clustering across subjects. We found one cluster of components from only about 70%  of our subjects. When we looked for components with nearby equivalent dipoles in the other subjects, we found a distinct class of components (distinct dipole distribution and ERSPs, though similar ERPs). We are still considering whether and how to publish this result -- which may well have been overlooked in the long history of ERP research. Does this suggest differences in stimulus processing in the two subject subgroups, or ?? 

Exploring one's data may bring to light interesting questions such as these. Yes, one may become bogged down in such questions -- Else, one may use them to make important discoveries. Managing one's use of such 'new ideas from the data' is tricky however - definitely a question of time/resource management. For instance, the observation above could be truly validated only by examining multiple studies, ideally in the same subjects, to confirm and better define the subgroup differences and their beahvioral correlates, if any.

<br><br>Again, however, the standard alternative -- identifying 'my Cz = your Cz', etc., has no claim to ultimate veracity, since relative source strengths and cortical orientations may differ across subjects -- meaning 'my Cz' receives a rather different mixture of source activities than 'your Cz,' a possibility ignore by 

e.g. standard grand averaging across subjects.<br><br>- Ultimately, cluster analysis should report only regions of significantly above-chance component density in the 'component cloud.' We have software for testing and visualizing this, but have not yet applied it to the full clustering problem. The question here would be what null hypothesis to test against. Julie Onton's poster from HBM 2005 presented evidence strongly suggesting that clusters of independent components (there, assesed in 3-D equivalent dipole space only) differ depending on the task the subjects are performing. 

<br><br>But across 'all' tasks, what should the 'default' distribution be -- e.g., What is the default 'universe' of component dipole position density in the brain? Then, after adding activity-based measures to the cluster space, what should the default distribution be, against which the observed local dipole densities could be tested?  Here, simple assumptions (

e.g., 'a uniform distribution') could be used to give 'significance' values, but quite possibly not realistically. <br><br>To demonstrate statistical significance for the many component clusters present in nearly any (living) subject and subject group might well require huge numbers of subjects. For this reason I am looking forward to exploring component lcustering of very large EEG datasets. 

In any case, more subjects is better -- but only *when* the subjects are drawn (randomly) from 'the same' distribution. This might not be the case. For example, early volunteer subjects might be 'eager beavers', while late subjects recruited 'to increase the N' might be 'from the bottom of the barrel' in some sense... A feature I want to add to the EEGLAB clustering software is a visualization of the subject distribution based on component cluster differences. Simply assuming that the subjects form a gaussian cloud from a larger gaussian subject pool is naive and, I suspect, often untrustworthy.

For all these reasons, ICA component clustering at present remains partly exploratory, and should be accounted as such. Yet, I believe, it is an important way forward for electrophysiology research. Best wishes - think of posting any helpful tales of experience on the eeglablist...

Scott Makeig P.S. In spirit, all the above open questions are not different in kind than deciding how to define the baseline for an ERP or ERSP measure... or selecting a smoothing radius for applying 'Gaussian random field theory' to fMRI data...

<br><br>--------------------------------------------------------------------------------------------------------------------------------------------------<br><br>From:   Larry L Greischar <<a onclick="return top.js.OpenExtLink(window,event,this)" href="mailto:llgreisc@wisc.edu">

llgreisc@wisc.edu</a>><br>To:     <a onclick="return top.js.OpenExtLink(window,event,this)" href="mailto:eeglablist@sccn.ucsd.edu">eeglablist@sccn.ucsd.edu</a><br><br>We've been working with attentional blink task data (64 channel) for 20

<br>subjects using EEGLAB (version 5.02) and are currently attempting to<br>cluster ICA components. There appears to be a wide range of results<br>produced depending on the chosen clustering parameters. Some experienced<br>

advice on the following would be greatly appreciated:<br><br>weight and number of pre-clustering parameters<br><br>reasonable PCA dimensions for ERP, spectrum, ERSP/ITC<br><br>better to reduce the number of parameters used for pre-clustering than use

<br>final dimension reduction<br><br>number of clusters to compute (Should this be based on ERP literature? Is<br>it reasonable to have a cluster with several components per subject?)<br><br>when using dipoles it appears to be necessary to rather carefully hand

screen artifact components (EMG components are sometimes modelled by dipoles with RV < 10% and "good" components sometimes have relatively large RVs ~30%) How much hand screening is necessary for good results?

<br><br>Thanks in advance,<br><br>Larry Greischar<br>Heleen Slagter<br>University of Wisconsin