[Eeglablist] How many clusters?

Tue Jan 14 11:30:08 PST 2020

Dear Fran and Seyed,

Fran, your question has a good point!
Seyed, I appreciate you shared yours with Fran. I had the same solution.
Let me also share mine (just a snippet, sorry) with you for your
comparison.
I used it in EEGLAB ver <= 14. Dipole locations must be the only clustering
criterion (if you wonder why, see
https://sccn.ucsd.edu/wiki/Makoto%27s_preprocessing_pipeline#Avoid_double_dipping_.2803.2F07.2F2019_updated.29).
Then you can use this code.
See this page
https://sccn.ucsd.edu/wiki/Makoto%27s_preprocessing_pipeline#Can_we_determine_the_optimum_number_of_IC_clusters.3F_.2801.2F14.2F2020_added.29

Makoto

On Fri, Jan 10, 2020 at 12:12 AM <shirazi at ieee.org> wrote:

> Hi Fran,
>
> The number of clusters was a head-scratcher for me too, especially when I
> read some papers with clusters that I think could be divided into two and
> still have a meaningful distribution.
>
> For my current study (which will be out soon), we used an optimal
> clustering
> method from MATLAB ( see:
> https://www.mathworks.com/help/stats/evalclusters.html). After modifying a
> couple of EEGLAB functions, we could specify a lower and upper bound of the
> cluster numbers that we expect, and then let the "evalclusters" find the
> optimum number of the clusters (see a screenshot of the modified pop_clust
> dialog: https://www.dropbox.com/s/27xzjgyalc13in7/pop_clust%20mod.png?dl=0
> ).
>
> You can try this method for yourself by modifying three functions in
> "EEGLAB/function/studyfunc" (you can find the modified functions here:
> https://www.dropbox.com/sh/qcffm0nv832nq8n/AADcvURSIEO62i6-iAkc9iyza?dl=0
> ).
> The mods should not create a malfunction in other options of pop_clust but
> still, I'd suggest renaming the original functions, should you decide to
> use
> them again. Also, the optimal_kmeans option works with and without
> selecting
> the outlier option. BTW, I am using EEGLAB 2019.0 and MATLAB R2018b.
>
> Last but not least, clusters are as meaningful as the pre-cluster
> information. You may want to choose the pre-cluster information that is
> best
> for you, based on your data, task, and post-processing. For example, the
> range of the spectra for pre-clustering is by default from 3 to 25Hz (i.e.,
> theta, alpha and beta bands), but some researchers change it to 3 to 48Hz
> (which adds the low gamma band). Also, EEGLAB uses the absolute channel
> values for the scalp map pre-cluster array, but you can use the Laplacian
> or
> gradient of the scalp map as well (see a sample screenshot of these
> changes:
> https://www.dropbox.com/s/lm70lkxos9brqma/precluster%20changes.png?dl=0).
> You can also manipulate the weights to see what combination works best for
> you. It all depends on your specific study and dataset.
>
> Best,
> Seyed
> --
> Seyed Yahya Shirazi
> Ph.D. Candidate, BRaIN Lab
> University of Central Florida
>
> -----Original Message-----
> From: eeglablist <eeglablist-bounces at sccn.ucsd.edu> On Behalf Of Fran
> Copelli
> Sent: Wednesday, January 8, 2020 12:59 PM
> To: eeglablist at sccn.ucsd.edu
> Subject: [Eeglablist] How many clusters?
>
> Dear list,
>
> My question involves clustering when using the Kmeans algorithm. It's not
> clear to me how to decide the number of clusters to compute.
>
> In an example on the EEGLAB wiki, they suggest to cluster based on the
> average amount of components per subject. However, the default number of
> clusters (10) is different from the average number of clusters (20). I've
> striked out a few other obvious errors in the quoted text below.
>
> "Note that the default number of clusters (10 in this case) is set so on
> average there will be one computer component per subject per cluster. For
> example, if about 20 component per subjects are selected based on the
> residual variance thereshold threshold and the STUDY contains 10 subjects,
> the average number of cluster will be set to 20 - so each cluster will
> contains contain on average 10 components."
>
> I'm also aware of a clustering "rule of thumb" which is the square root of
> (number of components divided by 2). I tried finding the source for it, and
> according to online forums, there isn't a clear origin.
> https://stats.stackexchange.com/questions/277007/rule
> -of-thumb-on-the-best-k-in-k-means-clustering
>
> My question is whether there is a clear rule/equation for how many clusters
> to create from unclustered components from multiple subjects.
>
> Any help is appreciated. Thank you!
>
>
> Fran
>
> SMART Lab, Psychology Department
>
> Ryerson University
> _______________________________________________
> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
> To unsubscribe, send an empty email to
> eeglablist-unsubscribe at sccn.ucsd.edu
> For digest mode, send an email with the subject "set digest mime" to
> eeglablist-request at sccn.ucsd.edu
>
> _______________________________________________
> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
> To unsubscribe, send an empty email to
> eeglablist-unsubscribe at sccn.ucsd.edu
> For digest mode, send an email with the subject "set digest mime" to
> eeglablist-request at sccn.ucsd.edu
>