[Eeglablist] How many clusters?

Delorme, Arnaud adelorme at ucsd.edu
Wed Jan 15 18:19:51 PST 2020


Dear Fran,

About this section

"Note that the default number of clusters (10 in this case) is set so on average there will be one computer component per subject per cluster. For example, if about 20 component per subjects are selected based on the residual variance threshold threshold and the STUDY contains 10 subjects, the average number of cluster will be set to 20 - so each cluster will contains contain on average 10 components.”

You want on average one component from each subject in each cluster. So if you have 20 good components in each subject (based on the residual variance threshold) in 10 subjects. Then you select 20 clusters. Each cluster will have on average one component from each subject.

Just rephrasing, but I hope it helps.
Best wishes,

Arno

> On Jan 14, 2020, at 12:36 PM, Makoto Miyakoshi <mmiyakoshi at ucsd.edu> wrote:
> 
> Dear Fran and Seyed,
> 
> Fran, I recommend you try Seyed's first because his should work out of the
> box. If you say you are familiar with coding, fine.
> 
> Seyed, do you have any comment on which criterion/algorithm you prefer to
> use out of the four ones (Calinski-Harabasz, Silhouette, Gap, and
> Davies-Bouldin) supported by evalclusters()? Also, I have never bothered to
> try Gap because somehow I could not use it similarly, hence my current code
> does not use it. I was too lazy to look into the cause of the error, but do
> you know anything about it?
> 
> Makoto
> 
> On Tue, Jan 14, 2020 at 12:17 PM Fran Copelli <fcopelli at ryerson.ca> wrote:
> 
>> Thank you, Makoto! This is very helpful— I made a note last week that I
>> need to further investigate how to select clustering criterion. Happy to
>> hear that it is relatively straightforward. Regarding optimum clusters,
>> I'll try out the code that you provided.
>> 
>> 
>> Best,
>> 
>> Fran
>> 
>> SMART Lab, Psychology Department
>> 
>> 
>> On Tue, Jan 14, 2020 at 2:57 PM Makoto Miyakoshi <mmiyakoshi at ucsd.edu>
>> wrote:
>> 
>>> Dear Fran and Seyed,
>>> 
>>> Fran, your question has a good point!
>>> Seyed, I appreciate you shared yours with Fran. I had the same solution.
>>> Let me also share mine (just a snippet, sorry) with you for your
>>> comparison.
>>> I used it in EEGLAB ver <= 14. Dipole locations must be the only
>>> clustering
>>> criterion (if you wonder why, see
>>> 
>>> https://sccn.ucsd.edu/wiki/Makoto%27s_preprocessing_pipeline#Avoid_double_dipping_.2803.2F07.2F2019_updated.29
>>> ).
>>> Then you can use this code.
>>> See this page
>>> 
>>> https://sccn.ucsd.edu/wiki/Makoto%27s_preprocessing_pipeline#Can_we_determine_the_optimum_number_of_IC_clusters.3F_.2801.2F14.2F2020_added.29
>>> 
>>> Makoto
>>> 
>>> On Fri, Jan 10, 2020 at 12:12 AM <shirazi at ieee.org> wrote:
>>> 
>>>> Hi Fran,
>>>> 
>>>> The number of clusters was a head-scratcher for me too, especially when
>>> I
>>>> read some papers with clusters that I think could be divided into two
>>> and
>>>> still have a meaningful distribution.
>>>> 
>>>> For my current study (which will be out soon), we used an optimal
>>>> clustering
>>>> method from MATLAB ( see:
>>>> https://www.mathworks.com/help/stats/evalclusters.html). After
>>> modifying a
>>>> couple of EEGLAB functions, we could specify a lower and upper bound of
>>> the
>>>> cluster numbers that we expect, and then let the "evalclusters" find the
>>>> optimum number of the clusters (see a screenshot of the modified
>>> pop_clust
>>>> dialog:
>>> https://www.dropbox.com/s/27xzjgyalc13in7/pop_clust%20mod.png?dl=0
>>>> ).
>>>> 
>>>> You can try this method for yourself by modifying three functions in
>>>> "EEGLAB/function/studyfunc" (you can find the modified functions here:
>>>> 
>>> https://www.dropbox.com/sh/qcffm0nv832nq8n/AADcvURSIEO62i6-iAkc9iyza?dl=0
>>>> ).
>>>> The mods should not create a malfunction in other options of pop_clust
>>> but
>>>> still, I'd suggest renaming the original functions, should you decide to
>>>> use
>>>> them again. Also, the optimal_kmeans option works with and without
>>>> selecting
>>>> the outlier option. BTW, I am using EEGLAB 2019.0 and MATLAB R2018b.
>>>> 
>>>> Last but not least, clusters are as meaningful as the pre-cluster
>>>> information. You may want to choose the pre-cluster information that is
>>>> best
>>>> for you, based on your data, task, and post-processing. For example, the
>>>> range of the spectra for pre-clustering is by default from 3 to 25Hz
>>> (i.e.,
>>>> theta, alpha and beta bands), but some researchers change it to 3 to
>>> 48Hz
>>>> (which adds the low gamma band). Also, EEGLAB uses the absolute channel
>>>> values for the scalp map pre-cluster array, but you can use the
>>> Laplacian
>>>> or
>>>> gradient of the scalp map as well (see a sample screenshot of these
>>>> changes:
>>>> https://www.dropbox.com/s/lm70lkxos9brqma/precluster%20changes.png?dl=0
>>> ).
>>>> You can also manipulate the weights to see what combination works best
>>> for
>>>> you. It all depends on your specific study and dataset.
>>>> 
>>>> Best,
>>>> Seyed
>>>> --
>>>> Seyed Yahya Shirazi
>>>> Ph.D. Candidate, BRaIN Lab
>>>> University of Central Florida
>>>> 
>>>> -----Original Message-----
>>>> From: eeglablist <eeglablist-bounces at sccn.ucsd.edu> On Behalf Of Fran
>>>> Copelli
>>>> Sent: Wednesday, January 8, 2020 12:59 PM
>>>> To: eeglablist at sccn.ucsd.edu
>>>> Subject: [Eeglablist] How many clusters?
>>>> 
>>>> Dear list,
>>>> 
>>>> My question involves clustering when using the Kmeans algorithm. It's
>>> not
>>>> clear to me how to decide the number of clusters to compute.
>>>> 
>>>> In an example on the EEGLAB wiki, they suggest to cluster based on the
>>>> average amount of components per subject. However, the default number of
>>>> clusters (10) is different from the average number of clusters (20).
>>> I've
>>>> striked out a few other obvious errors in the quoted text below.
>>>> 
>>>> "Note that the default number of clusters (10 in this case) is set so on
>>>> average there will be one computer component per subject per cluster.
>>> For
>>>> example, if about 20 component per subjects are selected based on the
>>>> residual variance thereshold threshold and the STUDY contains 10
>>> subjects,
>>>> the average number of cluster will be set to 20 - so each cluster will
>>>> contains contain on average 10 components."
>>>> 
>>>> I'm also aware of a clustering "rule of thumb" which is the square root
>>> of
>>>> (number of components divided by 2). I tried finding the source for it,
>>> and
>>>> according to online forums, there isn't a clear origin.
>>>> https://stats.stackexchange.com/questions/277007/rule
>>>> -of-thumb-on-the-best-k-in-k-means-clustering
>>>> 
>>>> My question is whether there is a clear rule/equation for how many
>>> clusters
>>>> to create from unclustered components from multiple subjects.
>>>> 
>>>> Any help is appreciated. Thank you!
>>>> 
>>>> 
>>>> Fran
>>>> 
>>>> SMART Lab, Psychology Department
>>>> 
>>>> Ryerson University
>>>> _______________________________________________
>>>> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
>>>> To unsubscribe, send an empty email to
>>>> eeglablist-unsubscribe at sccn.ucsd.edu
>>>> For digest mode, send an email with the subject "set digest mime" to
>>>> eeglablist-request at sccn.ucsd.edu
>>>> 
>>>> _______________________________________________
>>>> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
>>>> To unsubscribe, send an empty email to
>>>> eeglablist-unsubscribe at sccn.ucsd.edu
>>>> For digest mode, send an email with the subject "set digest mime" to
>>>> eeglablist-request at sccn.ucsd.edu
>>>> 
>>> _______________________________________________
>>> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
>>> To unsubscribe, send an empty email to
>>> eeglablist-unsubscribe at sccn.ucsd.edu
>>> For digest mode, send an email with the subject "set digest mime" to
>>> eeglablist-request at sccn.ucsd.edu
>>> 
>> 
> _______________________________________________
> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
> To unsubscribe, send an empty email to eeglablist-unsubscribe at sccn.ucsd.edu
> For digest mode, send an email with the subject "set digest mime" to eeglablist-request at sccn.ucsd.edu



More information about the eeglablist mailing list