[Eeglablist] AMICA number of mixture components

Makoto Miyakoshi mmiyakoshi at ucsd.edu
Mon Apr 4 21:38:14 PDT 2016


Dear Jason,

> [Makoto, I think he is referring to the num_mix_comps variable which
controls the source density mixture model, not the number of ICA models
(num_models).]

Oops, you are right. I was wrong. Thank you for correcting Jason.

> basically having one density to fit each tail and one to fit the peak.

That makes sense.

> The parameter may be automatically determined based on likelihood in the
next version, which will have density models other than Generalized
Gaussian, including non-symmetric (skew) models.

That sounds interesting. I'm looking forward to seeing how the performance
changes by supporting skewed distributions, which I guess often happen.

Makoto



On Sat, Mar 19, 2016 at 8:46 AM, Jason Palmer <japalmer29 at gmail.com> wrote:

> Hi Tatu,
>
>
>
> Sorry for the delay in responding. [Makoto, I think he is referring to the
> num_mix_comps variable which controls the source density mixture model, not
> the number of ICA models (num_models).]
>
>
>
> The num_mix_comps setting controls the number of densities used in the
> Generalized Gaussian mixture model for each source density. The default is
> 3, which seems to work well, basically having one density to fit each tail
> and one to fit the peak. Using up to 5 or more should give very similar
> results but take longer to run. Some very unusual source pdfs might benefit
> from having a larger number of mixture components, but actual sources seem
> to be well represented by the 3 density mixture. In principle, having too
> many densities in the mixture could lead to overfitting, but with the usual
> number of samples in EEG data (100,000 or more), it is not likely to be
> able to overfit the source distributions as even with 6 mixture components
> there are a relatively small number of degrees of freedom compared with the
> number of samples. In general ICA can work with simple sub- or
> super-gaussian density models for sources, but the error for a finite
> number of samples depends on the fidelity of the source density model, with
> minimum variance using the actual source density.
>
>
>
> It is also possible to use Gaussian mixture models instead of Generalized
> Gaussian, for comparison purposes. With Gaussians, having a higher
> num_mix_comps should improve the estimation since log linear tails can be
> fit better with more Gaussians, which aren’t necessary using the default
> Generalized Gaussian mixture model.
>
>
>
> So, basically the num_mix_comps parameter is not meant to be changed in
> standard usage. The parameter may be automatically determined based on
> likelihood in the next version, which will have density models other than
> Generalized Gaussian, including non-symmetric (skew) models.
>
>
>
> Best,
>
> Jason
>
>
>
> *From:* eeglablist-bounces at sccn.ucsd.edu [mailto:
> eeglablist-bounces at sccn.ucsd.edu] *On Behalf Of *Makoto Miyakoshi
> *Sent:* Friday, March 18, 2016 12:36 AM
> *To:* Tatu Huovilainen
> *Cc:* EEGLAB List
> *Subject:* Re: [Eeglablist] AMICA number of mixture components
>
>
>
> Dear Tatu,
>
>
>
> That's a good question. I've only heard of heuristic way to determine it.
>
> Jason once told me that start with 5 or 6 models, and if you find one or
> two models that does not account so much data (you can check it with amica
> utility tools to see which model explains which part of data) remove
> them... does that make sense?
>
>
>
> Makoto
>
>
>
>
>
> On Wed, Jan 20, 2016 at 7:35 PM, Tatu Huovilainen <
> Tatu.Huovilainen at helsinki.fi> wrote:
>
> Hi Makoto, Dr. Palmer & eeglab list,
>
> I have a few specific questions about AMICA, that I failed to find answers
> to from previous discussions. What should I use as a criterion for choosing
> the 'num_mix_comps' parameter? I've understood that increasing the number
> will result in better model fit, but with a chance of overfitting. Is there
> a way to make an approximation of how many mixture components it's ok to
> estimate, like the k(n_channels)^2 rule for infomax? Will it cause trouble
> (besides taking much longer), given that I have enough samples to avoid
> overfitting, if a source is well approximated with 3 densities but I'm
> using, say, 6? Are there other aspects of the data that affect choosing
> this number, like sensor types or snr?
>
> Regards,
> Tatu Huovilainen
> _______________________________________________
> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
> To unsubscribe, send an empty email to
> eeglablist-unsubscribe at sccn.ucsd.edu
> For digest mode, send an email with the subject "set digest mime" to
> eeglablist-request at sccn.ucsd.edu
>
>
>
>
>
> --
>
> Makoto Miyakoshi
> Swartz Center for Computational Neuroscience
> Institute for Neural Computation, University of California San Diego
>



-- 
Makoto Miyakoshi
Swartz Center for Computational Neuroscience
Institute for Neural Computation, University of California San Diego
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sccn.ucsd.edu/pipermail/eeglablist/attachments/20160404/c6f22c9c/attachment.html>


More information about the eeglablist mailing list