Home > gmmbayestb-v1.0 > gmmb_em.m

gmmb_em

PURPOSE ^

GMMB_EM - EM estimated GMM parameters

SYNOPSIS ^

function [estimate, varargout] = gmmb_em(data, varargin);

DESCRIPTION ^

GMMB_EM     - EM estimated GMM parameters

 estS = gmmb_em(data)
 estS = gmmb_em(data, <params>...)
 [estS, stats] = gmmb_em(...)

 This version works with complex numbers too.

 data = N x D matrix
 params can be a list of 'name', value -pairs.
 stats is a matrix, row (cov fixes, loops, final log-likelihood)

 Parameters (default value):

   maxloops    maximum number of loops allowed (100)
   thr        convergence threshold; relative log-likelihood change (1e-6)
        set negative to use only maxloops condition
   components    number of components in GMM (3)
   verbose    print progress messages (false)
   init        name of the initialization method to use ('fcm1')

 Init methods:

  fcm1         Fuzzy C-means clustering, requires the Fuzzy Logic Toolbox.
               This is the original init method from GMMBayes Toolbox v0.1
  cmeans1      C-means clustering for means, uniform weigths and covariances
  cmeans2      C-means clustering for means, weigths and covariances

 Example:
   estS = gmmb_em(data, 'init', 'fcm1', 'components', 5, 'thr', 1e-8)

 References:
   [1] Duda, R.O., Hart, P.E, Stork, D.G, Pattern Classification,
   2nd ed., John Wiley & Sons, Inc., 2001.
   [2] Bilmes, J.A., A Gentle Tutorial of the EM Algorithm and its
    Application to Parameter Estimation for Gaussian Mixture and Hidden
    Markov Models
   International Computer Science Institute, 1998

 Author(s):
    Joni Kamarainen <Joni.Kamarainen@lut.fi>
    Pekka Paalanen <pekka.paalanen@lut.fi>

 Copyright:

   Bayesian Classifier with Gaussian Mixture Model Pdf
   functionality is Copyright (C) 2003 by Pekka Paalanen and
   Joni-Kristian Kamarainen.

   $Name:  $ $Revision: 1.2 $  $Date: 2004/11/02 09:00:18 $

 Logging
   parameters

      logging   What kind of logging to do:
        0 - no logging
        1 - normal logging
        2 - extra logging: store all intermediate mixtures
      If the 'stats' output parameter is defined, then 'logging'
      defaults to 1, otherwise it is forced to 0.

  the 'stats' struct:
      iterations: EM iteration count
      covfixer2:  iterations-by-C matrix of gmmb_covfixer fix round counts
      loglikes:   iterations long vector of the log-likelihood
    extra logging:
      initialmix: parameters for the initial mixture
      mixtures:   parameters for all intermediate mixtures

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 %GMMB_EM     - EM estimated GMM parameters
0002 %
0003 % estS = gmmb_em(data)
0004 % estS = gmmb_em(data, <params>...)
0005 % [estS, stats] = gmmb_em(...)
0006 %
0007 % This version works with complex numbers too.
0008 %
0009 % data = N x D matrix
0010 % params can be a list of 'name', value -pairs.
0011 % stats is a matrix, row (cov fixes, loops, final log-likelihood)
0012 %
0013 % Parameters (default value):
0014 %
0015 %   maxloops    maximum number of loops allowed (100)
0016 %   thr        convergence threshold; relative log-likelihood change (1e-6)
0017 %        set negative to use only maxloops condition
0018 %   components    number of components in GMM (3)
0019 %   verbose    print progress messages (false)
0020 %   init        name of the initialization method to use ('fcm1')
0021 %
0022 % Init methods:
0023 %
0024 %  fcm1         Fuzzy C-means clustering, requires the Fuzzy Logic Toolbox.
0025 %               This is the original init method from GMMBayes Toolbox v0.1
0026 %  cmeans1      C-means clustering for means, uniform weigths and covariances
0027 %  cmeans2      C-means clustering for means, weigths and covariances
0028 %
0029 % Example:
0030 %   estS = gmmb_em(data, 'init', 'fcm1', 'components', 5, 'thr', 1e-8)
0031 %
0032 % References:
0033 %   [1] Duda, R.O., Hart, P.E, Stork, D.G, Pattern Classification,
0034 %   2nd ed., John Wiley & Sons, Inc., 2001.
0035 %   [2] Bilmes, J.A., A Gentle Tutorial of the EM Algorithm and its
0036 %    Application to Parameter Estimation for Gaussian Mixture and Hidden
0037 %    Markov Models
0038 %   International Computer Science Institute, 1998
0039 %
0040 % Author(s):
0041 %    Joni Kamarainen <Joni.Kamarainen@lut.fi>
0042 %    Pekka Paalanen <pekka.paalanen@lut.fi>
0043 %
0044 % Copyright:
0045 %
0046 %   Bayesian Classifier with Gaussian Mixture Model Pdf
0047 %   functionality is Copyright (C) 2003 by Pekka Paalanen and
0048 %   Joni-Kristian Kamarainen.
0049 %
0050 %   $Name:  $ $Revision: 1.2 $  $Date: 2004/11/02 09:00:18 $
0051 %
0052 % Logging
0053 %   parameters
0054 %
0055 %      logging   What kind of logging to do:
0056 %        0 - no logging
0057 %        1 - normal logging
0058 %        2 - extra logging: store all intermediate mixtures
0059 %      If the 'stats' output parameter is defined, then 'logging'
0060 %      defaults to 1, otherwise it is forced to 0.
0061 %
0062 %  the 'stats' struct:
0063 %      iterations: EM iteration count
0064 %      covfixer2:  iterations-by-C matrix of gmmb_covfixer fix round counts
0065 %      loglikes:   iterations long vector of the log-likelihood
0066 %    extra logging:
0067 %      initialmix: parameters for the initial mixture
0068 %      mixtures:   parameters for all intermediate mixtures
0069 %
0070 
0071 
0072 function [estimate, varargout] = gmmb_em(data, varargin);
0073 
0074 % default parameters
0075 conf = struct(...
0076     'maxloops', 100, ...
0077     'thr', 1e-6, ...
0078     'verbose', 0, ...
0079     'components', 3, ...
0080     'logging', 0, ...
0081     'init', 'fcm1' ...
0082     );
0083 
0084 if nargout>1
0085     conf.logging = 1;
0086     varargout{1} = [];
0087 end
0088 
0089 conf = getargs(conf, varargin);
0090 
0091 if nargout<2
0092     conf.logging=0;
0093 end
0094 
0095 % for logging
0096 log_covfixer2 = {};
0097 log_loglikes = {};
0098 log_initialmix = {};
0099 log_mixtures = {};
0100 
0101 
0102 % --- initialization ---
0103 
0104 N = size(data,1);    % number of points
0105 D = size(data,2);    % dimensions
0106 C = conf.components;
0107 
0108 % the number of free parameters in a Gaussian
0109 if isreal(data)
0110     Nparc = D+D*(D+1)/2;
0111 else
0112     Nparc = 2*D + D*D;
0113 end
0114 N_limit = (Nparc+1)*3*C;
0115 if N < N_limit
0116     warning_wrap('gmmb_em:data_amount', ...
0117        ['Training data may be insufficient for selected ' ...
0118         'number of components. ' ...
0119         'Have: ' num2str(N) ', recommended: >' num2str(N_limit) ...
0120         ' points.']);
0121 end
0122 
0123 switch lower(conf.init)
0124     case 'fcm1'
0125         initS = gmmb_em_init_fcm1(data, C, conf.verbose);
0126     case 'cmeans1'
0127         initS = gmmb_em_init_cmeans1(data, C);
0128     case 'cmeans2'
0129         initS = gmmb_em_init_cmeans2(data, C);
0130     otherwise
0131         error(['Unknown initializer method: ' conf.init]);
0132 end
0133 
0134 
0135 if any(initS.weight == 0)
0136     error('Initialization produced a zero weight.');
0137 end
0138 
0139 mu = initS.mu;
0140 sigma = initS.sigma;
0141 weight = initS.weight;
0142 
0143 
0144 log_initialmix = initS;
0145 fixerloops = zeros(1, C);
0146 
0147 
0148 % old values for stopping condition calculations
0149 old_loglike = -realmax;
0150 
0151 loops=1;
0152 fixing_cycles = 0;
0153 
0154 tulo = gmmcpdf(data, mu, sigma, weight);
0155 
0156 while 1
0157     % one EM cycle
0158     pcompx = tulo ./ (sum(tulo,2)*ones(1,C));
0159     
0160     if ~all( isfinite(pcompx(:))  )
0161         error('Probabilities are no longer finite.');
0162     end
0163     
0164     for c = 1:C
0165         % calculate new estimates
0166         psum = sum(pcompx(:,c));
0167         
0168         % weight
0169         weight(c) = 1/N*psum;
0170     
0171         % mean
0172         nmu = sum(data.*(pcompx(:,c)*ones(1,D)), 1).' ./ psum;
0173         mu(:,c) = nmu;
0174         
0175         % covariance
0176         moddata = (data - ones(N,1)*(nmu.')) .* (sqrt(pcompx(:,c))*ones(1,D));
0177         % sqrt(pcompx) is because it will be squared back
0178         nsigma = (moddata' * moddata) ./ psum;
0179         
0180         % covariance matrix goodness assurance
0181         [sigma(:,:,c), fixerloops(1,c)] = gmmb_covfixer(nsigma);
0182         % covfixer may change the matrix so that log-likelihood
0183         % decreases. So, if covfixer changes something,
0184         % disable the stop condition. If going into infinite
0185         % fix/estimate -loop, quit.
0186     end
0187     
0188     % finish test
0189     tulo = gmmcpdf(data, mu, sigma, weight);
0190     loglike = sum(log(sum(tulo, 2)));
0191     
0192     if conf.verbose ~= 0
0193         disp([ 'log-likelihood diff ' num2str(loglike-old_loglike)  ' on round ' num2str(loops) ]);
0194     end
0195     
0196     if conf.logging>0
0197         log_covfixer2{loops} = fixerloops;
0198         log_loglikes{loops} = loglike;
0199     end
0200     if conf.logging>1
0201         log_mixtures{loops} = struct(...
0202             'weight', weight, ...
0203             'mu', mu, ...
0204             'sigma', sigma);
0205     end
0206 
0207     if any(fixerloops ~= 0)
0208         % if any cov's were fixed, increase count and
0209         % do not evaluate stopping threshold.
0210         fixing_cycles = fixing_cycles +1;
0211         if conf.verbose ~= 0
0212             disp(['fix cycle ' num2str(fixing_cycles) ...
0213                   ', fix loops ' num2str(fixerloops)]);
0214         end
0215     else
0216         % no cov's were fixed this round, reset the counter
0217         % and evaluate threshold.
0218         fixing_cycles = 0;
0219         if (abs(loglike/old_loglike -1) < conf.thr)
0220             break;
0221         end
0222     end
0223     
0224     if fixing_cycles > 20
0225         warning_wrap('gmmb_em:fixing_loop', ...
0226                ['A covariance matrix has been fixed repeatedly' ...
0227                 ' too many times, quitting EM estimation.']);
0228         break;
0229     end
0230     
0231     if loops >= conf.maxloops
0232         break;
0233     end
0234 
0235     loops = loops +1;
0236     old_loglike = loglike;
0237 end
0238 
0239 
0240 
0241 estimate = struct('mu', mu,...
0242         'sigma', sigma,...
0243         'weight', weight);
0244 
0245 if conf.logging>1
0246     varargout{1} = struct(...
0247         'iterations', {loops}, ...
0248         'covfixer2', {cat(1,log_covfixer2{:})}, ...
0249         'loglikes', {cat(1,log_loglikes{:})}, ...
0250         'initialmix', {log_initialmix}, ...
0251         'mixtures', {log_mixtures});
0252 end
0253 if conf.logging == 1
0254     varargout{1} = struct(...
0255         'iterations', {loops}, ...
0256         'covfixer2', {cat(1,log_covfixer2{:})}, ...
0257         'loglikes', {cat(1,log_loglikes{:})} );
0258 end
0259 
0260 
0261 % ------------------------------------------
0262 
0263 function tulo = gmmcpdf(data, mu, sigma, weight);
0264 N = size(data, 1);
0265 C = size(weight,1);
0266 
0267 pxcomp = zeros(N,C);
0268 for c = 1:C
0269     pxcomp(:,c) = gmmb_cmvnpdf(data, mu(:,c).', sigma(:,:,c));
0270 end
0271 tulo = pxcomp.*repmat(weight.', N,1);
0272

Generated on Thu 14-Apr-2005 13:50:22 by m2html © 2003