[Eeglablist] How many clusters?

Seyed Yahya Shirazi shirazi at ieee.org
Thu Jan 16 17:43:11 PST 2020


Hi Makoto,

I usually use silhouette because it is the simplest algorithm, has the easiest interpretation, and adds only a small computational overhead. For the simple clustering problems such as dipoles or even a combination of dipole locations and other metrics, I don’t think there is a need for more complicated algorithms.

BTW, I included a fourth function (cprintf) to the Dropbox folder (https://www.dropbox.com/sh/qcffm0nv832nq8n/AADcvURSIEO62i6-iAkc9iyza?dl=0). This function was used in the optimal_kmeans function to create colored messages in Matlab command window.

Best,
Seyed

________________________________
From: eeglablist <eeglablist-bounces at sccn.ucsd.edu> on behalf of Makoto Miyakoshi <mmiyakoshi at ucsd.edu>
Sent: Tuesday, January 14, 2020 3:36:37 PM
To: EEGLAB List <eeglablist at sccn.ucsd.edu>
Subject: Re: [Eeglablist] How many clusters?

Dear Fran and Seyed,

Fran, I recommend you try Seyed's first because his should work out of the
box. If you say you are familiar with coding, fine.

Seyed, do you have any comment on which criterion/algorithm you prefer to
use out of the four ones (Calinski-Harabasz, Silhouette, Gap, and
Davies-Bouldin) supported by evalclusters()? Also, I have never bothered to
try Gap because somehow I could not use it similarly, hence my current code
does not use it. I was too lazy to look into the cause of the error, but do
you know anything about it?

Makoto

On Tue, Jan 14, 2020 at 12:17 PM Fran Copelli <fcopelli at ryerson.ca> wrote:

> Thank you, Makoto! This is very helpful— I made a note last week that I
> need to further investigate how to select clustering criterion. Happy to
> hear that it is relatively straightforward. Regarding optimum clusters,
> I'll try out the code that you provided.
>
>
> Best,
>
> Fran
>
> SMART Lab, Psychology Department
>
>
> On Tue, Jan 14, 2020 at 2:57 PM Makoto Miyakoshi <mmiyakoshi at ucsd.edu>
> wrote:
>
>> Dear Fran and Seyed,
>>
>> Fran, your question has a good point!
>> Seyed, I appreciate you shared yours with Fran. I had the same solution.
>> Let me also share mine (just a snippet, sorry) with you for your
>> comparison.
>> I used it in EEGLAB ver <= 14. Dipole locations must be the only
>> clustering
>> criterion (if you wonder why, see
>>
>> https://sccn.ucsd.edu/wiki/Makoto%27s_preprocessing_pipeline#Avoid_double_dipping_.2803.2F07.2F2019_updated.29
>> ).
>> Then you can use this code.
>> See this page
>>
>> https://sccn.ucsd.edu/wiki/Makoto%27s_preprocessing_pipeline#Can_we_determine_the_optimum_number_of_IC_clusters.3F_.2801.2F14.2F2020_added.29
>>
>> Makoto
>>
>> On Fri, Jan 10, 2020 at 12:12 AM <shirazi at ieee.org> wrote:
>>
>> > Hi Fran,
>> >
>> > The number of clusters was a head-scratcher for me too, especially when
>> I
>> > read some papers with clusters that I think could be divided into two
>> and
>> > still have a meaningful distribution.
>> >
>> > For my current study (which will be out soon), we used an optimal
>> > clustering
>> > method from MATLAB ( see:
>> > https://www.mathworks.com/help/stats/evalclusters.html). After
>> modifying a
>> > couple of EEGLAB functions, we could specify a lower and upper bound of
>> the
>> > cluster numbers that we expect, and then let the "evalclusters" find the
>> > optimum number of the clusters (see a screenshot of the modified
>> pop_clust
>> > dialog:
>> https://www.dropbox.com/s/27xzjgyalc13in7/pop_clust%20mod.png?dl=0
>> > ).
>> >
>> > You can try this method for yourself by modifying three functions in
>> > "EEGLAB/function/studyfunc" (you can find the modified functions here:
>> >
>> https://www.dropbox.com/sh/qcffm0nv832nq8n/AADcvURSIEO62i6-iAkc9iyza?dl=0
>> > ).
>> > The mods should not create a malfunction in other options of pop_clust
>> but
>> > still, I'd suggest renaming the original functions, should you decide to
>> > use
>> > them again. Also, the optimal_kmeans option works with and without
>> > selecting
>> > the outlier option. BTW, I am using EEGLAB 2019.0 and MATLAB R2018b.
>> >
>> > Last but not least, clusters are as meaningful as the pre-cluster
>> > information. You may want to choose the pre-cluster information that is
>> > best
>> > for you, based on your data, task, and post-processing. For example, the
>> > range of the spectra for pre-clustering is by default from 3 to 25Hz
>> (i.e.,
>> > theta, alpha and beta bands), but some researchers change it to 3 to
>> 48Hz
>> > (which adds the low gamma band). Also, EEGLAB uses the absolute channel
>> > values for the scalp map pre-cluster array, but you can use the
>> Laplacian
>> > or
>> > gradient of the scalp map as well (see a sample screenshot of these
>> > changes:
>> > https://www.dropbox.com/s/lm70lkxos9brqma/precluster%20changes.png?dl=0
>> ).
>> > You can also manipulate the weights to see what combination works best
>> for
>> > you. It all depends on your specific study and dataset.
>> >
>> > Best,
>> > Seyed
>> > --
>> > Seyed Yahya Shirazi
>> > Ph.D. Candidate, BRaIN Lab
>> > University of Central Florida
>> >
>> > -----Original Message-----
>> > From: eeglablist <eeglablist-bounces at sccn.ucsd.edu> On Behalf Of Fran
>> > Copelli
>> > Sent: Wednesday, January 8, 2020 12:59 PM
>> > To: eeglablist at sccn.ucsd.edu
>> > Subject: [Eeglablist] How many clusters?
>> >
>> > Dear list,
>> >
>> > My question involves clustering when using the Kmeans algorithm. It's
>> not
>> > clear to me how to decide the number of clusters to compute.
>> >
>> > In an example on the EEGLAB wiki, they suggest to cluster based on the
>> > average amount of components per subject. However, the default number of
>> > clusters (10) is different from the average number of clusters (20).
>> I've
>> > striked out a few other obvious errors in the quoted text below.
>> >
>> > "Note that the default number of clusters (10 in this case) is set so on
>> > average there will be one computer component per subject per cluster.
>> For
>> > example, if about 20 component per subjects are selected based on the
>> > residual variance thereshold threshold and the STUDY contains 10
>> subjects,
>> > the average number of cluster will be set to 20 - so each cluster will
>> > contains contain on average 10 components."
>> >
>> > I'm also aware of a clustering "rule of thumb" which is the square root
>> of
>> > (number of components divided by 2). I tried finding the source for it,
>> and
>> > according to online forums, there isn't a clear origin.
>> > https://stats.stackexchange.com/questions/277007/rule
>> > -of-thumb-on-the-best-k-in-k-means-clustering
>> >
>> > My question is whether there is a clear rule/equation for how many
>> clusters
>> > to create from unclustered components from multiple subjects.
>> >
>> > Any help is appreciated. Thank you!
>> >
>> >
>> > Fran
>> >
>> > SMART Lab, Psychology Department
>> >
>> > Ryerson University
>> > _______________________________________________
>> > Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
>> > To unsubscribe, send an empty email to
>> > eeglablist-unsubscribe at sccn.ucsd.edu
>> > For digest mode, send an email with the subject "set digest mime" to
>> > eeglablist-request at sccn.ucsd.edu
>> >
>> > _______________________________________________
>> > Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
>> > To unsubscribe, send an empty email to
>> > eeglablist-unsubscribe at sccn.ucsd.edu
>> > For digest mode, send an email with the subject "set digest mime" to
>> > eeglablist-request at sccn.ucsd.edu
>> >
>> _______________________________________________
>> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
>> To unsubscribe, send an empty email to
>> eeglablist-unsubscribe at sccn.ucsd.edu
>> For digest mode, send an email with the subject "set digest mime" to
>> eeglablist-request at sccn.ucsd.edu
>>
>
_______________________________________________
Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
To unsubscribe, send an empty email to eeglablist-unsubscribe at sccn.ucsd.edu
For digest mode, send an email with the subject "set digest mime" to eeglablist-request at sccn.ucsd.edu



More information about the eeglablist mailing list