[Eeglablist] AMICA number of mixture components

Tatu Huovilainen Tatu.Huovilainen at helsinki.fi
Mon Mar 21 01:59:08 PDT 2016


Hi Jason & Makoto,

yes I was referring to the num_mix_comps variable. I've been doing only single model learning as the task is a single continuous task. I've used 5 mixture components now after an observation that it seemed to better pick up eye movements as a single component. But moving away from default settings without knowing about possible downsides got me worried.

But thank you, I appreciate the answer. The relevant AMICA publications are hard to approach without strong background in math.

Best,
Tatu

On Sat, 19 Mar 2016 08:46:45 -0700
Jason Palmer <japalmer29 at gmail.com> wrote:

> Hi Tatu,
> 
>  
> 
> Sorry for the delay in responding. [Makoto, I think he is referring to the num_mix_comps variable which controls the source density mixture model, not the number of ICA models (num_models).]
> 
>  
> 
> The num_mix_comps setting controls the number of densities used in the Generalized Gaussian mixture model for each source density. The default is 3, which seems to work well, basically having one density to fit each tail and one to fit the peak. Using up to 5 or more should give very similar results but take longer to run. Some very unusual source pdfs might benefit from having a larger number of mixture components, but actual sources seem to be well represented by the 3 density mixture. In principle, having too many densities in the mixture could lead to overfitting, but with the usual number of samples in EEG data (100,000 or more), it is not likely to be able to overfit the source distributions as even with 6 mixture components there are a relatively small number of degrees of freedom compared with the number of samples. In general ICA can work with simple sub- or super-gaussian density models for sources, but the error for a finite number of samples depends on the fidelity of the source density model, with minimum variance using the actual source density.
> 
>  
> 
> It is also possible to use Gaussian mixture models instead of Generalized Gaussian, for comparison purposes. With Gaussians, having a higher num_mix_comps should improve the estimation since log linear tails can be fit better with more Gaussians, which aren’t necessary using the default Generalized Gaussian mixture model.
> 
>  
> 
> So, basically the num_mix_comps parameter is not meant to be changed in standard usage. The parameter may be automatically determined based on likelihood in the next version, which will have density models other than Generalized Gaussian, including non-symmetric (skew) models.
> 
>  
> 
> Best,
> 
> Jason
> 
>  
> 
> From: eeglablist-bounces at sccn.ucsd.edu [mailto:eeglablist-bounces at sccn.ucsd.edu] On Behalf Of Makoto Miyakoshi
> Sent: Friday, March 18, 2016 12:36 AM
> To: Tatu Huovilainen
> Cc: EEGLAB List
> Subject: Re: [Eeglablist] AMICA number of mixture components
> 
>  
> 
> Dear Tatu,
> 
>  
> 
> That's a good question. I've only heard of heuristic way to determine it.
> 
> Jason once told me that start with 5 or 6 models, and if you find one or two models that does not account so much data (you can check it with amica utility tools to see which model explains which part of data) remove them... does that make sense?
> 
>  
> 
> Makoto
> 
>  
> 
>  
> 
> On Wed, Jan 20, 2016 at 7:35 PM, Tatu Huovilainen <Tatu.Huovilainen at helsinki.fi> wrote:
> 
> Hi Makoto, Dr. Palmer & eeglab list,
> 
> I have a few specific questions about AMICA, that I failed to find answers to from previous discussions. What should I use as a criterion for choosing the 'num_mix_comps' parameter? I've understood that increasing the number will result in better model fit, but with a chance of overfitting. Is there a way to make an approximation of how many mixture components it's ok to estimate, like the k(n_channels)^2 rule for infomax? Will it cause trouble (besides taking much longer), given that I have enough samples to avoid overfitting, if a source is well approximated with 3 densities but I'm using, say, 6? Are there other aspects of the data that affect choosing this number, like sensor types or snr?
> 
> Regards,
> Tatu Huovilainen
> _______________________________________________
> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
> To unsubscribe, send an empty email to eeglablist-unsubscribe at sccn.ucsd.edu
> For digest mode, send an email with the subject "set digest mime" to eeglablist-request at sccn.ucsd.edu
> 
> 
> 
> 
> 
>  
> 
> -- 
> 
> Makoto Miyakoshi
> Swartz Center for Computational Neuroscience
> Institute for Neural Computation, University of California San Diego
> 



More information about the eeglablist mailing list