[Eeglablist] How many clusters?

shirazi at ieee.org shirazi at ieee.org
Thu Jan 9 05:16:44 PST 2020


Hi Fran,

The number of clusters was a head-scratcher for me too, especially when I
read some papers with clusters that I think could be divided into two and
still have a meaningful distribution.

For my current study (which will be out soon), we used an optimal clustering
method from MATLAB ( see:
https://www.mathworks.com/help/stats/evalclusters.html). After modifying a
couple of EEGLAB functions, we could specify a lower and upper bound of the
cluster numbers that we expect, and then let the "evalclusters" find the
optimum number of the clusters (see a screenshot of the modified pop_clust
dialog: https://www.dropbox.com/s/27xzjgyalc13in7/pop_clust%20mod.png?dl=0).

You can try this method for yourself by modifying three functions in
"EEGLAB/function/studyfunc" (you can find the modified functions here:
https://www.dropbox.com/sh/qcffm0nv832nq8n/AADcvURSIEO62i6-iAkc9iyza?dl=0).
The mods should not create a malfunction in other options of pop_clust but
still, I'd suggest renaming the original functions, should you decide to use
them again. Also, the optimal_kmeans option works with and without selecting
the outlier option. BTW, I am using EEGLAB 2019.0 and MATLAB R2018b.

Last but not least, clusters are as meaningful as the pre-cluster
information. You may want to choose the pre-cluster information that is best
for you, based on your data, task, and post-processing. For example, the
range of the spectra for pre-clustering is by default from 3 to 25Hz (i.e.,
theta, alpha and beta bands), but some researchers change it to 3 to 48Hz
(which adds the low gamma band). Also, EEGLAB uses the absolute channel
values for the scalp map pre-cluster array, but you can use the Laplacian or
gradient of the scalp map as well (see a sample screenshot of these changes:
https://www.dropbox.com/s/lm70lkxos9brqma/precluster%20changes.png?dl=0).
You can also manipulate the weights to see what combination works best for
you. It all depends on your specific study and dataset.

Best,
Seyed
--
Seyed Yahya Shirazi
Ph.D. Candidate, BRaIN Lab
University of Central Florida

-----Original Message-----
From: eeglablist <eeglablist-bounces at sccn.ucsd.edu> On Behalf Of Fran
Copelli
Sent: Wednesday, January 8, 2020 12:59 PM
To: eeglablist at sccn.ucsd.edu
Subject: [Eeglablist] How many clusters?

Dear list,

My question involves clustering when using the Kmeans algorithm. It's not
clear to me how to decide the number of clusters to compute.

In an example on the EEGLAB wiki, they suggest to cluster based on the
average amount of components per subject. However, the default number of
clusters (10) is different from the average number of clusters (20). I've
striked out a few other obvious errors in the quoted text below.

"Note that the default number of clusters (10 in this case) is set so on
average there will be one computer component per subject per cluster. For
example, if about 20 component per subjects are selected based on the
residual variance thereshold threshold and the STUDY contains 10 subjects,
the average number of cluster will be set to 20 - so each cluster will
contains contain on average 10 components."

I'm also aware of a clustering "rule of thumb" which is the square root of
(number of components divided by 2). I tried finding the source for it, and
according to online forums, there isn't a clear origin.
https://stats.stackexchange.com/questions/277007/rule
-of-thumb-on-the-best-k-in-k-means-clustering

My question is whether there is a clear rule/equation for how many clusters
to create from unclustered components from multiple subjects.

Any help is appreciated. Thank you!


Fran

SMART Lab, Psychology Department

Ryerson University
_______________________________________________
Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
To unsubscribe, send an empty email to eeglablist-unsubscribe at sccn.ucsd.edu
For digest mode, send an email with the subject "set digest mime" to
eeglablist-request at sccn.ucsd.edu




More information about the eeglablist mailing list