GMMB_EM - EM estimated GMM parameters estS = gmmb_em(data) estS = gmmb_em(data, <params>...) [estS, stats] = gmmb_em(...) This version works with complex numbers too. data = N x D matrix params can be a list of 'name', value -pairs. stats is a matrix, row (cov fixes, loops, final log-likelihood) Parameters (default value): maxloops maximum number of loops allowed (100) thr convergence threshold; relative log-likelihood change (1e-6) set negative to use only maxloops condition components number of components in GMM (3) verbose print progress messages (false) init name of the initialization method to use ('fcm1') Init methods: fcm1 Fuzzy C-means clustering, requires the Fuzzy Logic Toolbox. This is the original init method from GMMBayes Toolbox v0.1 cmeans1 C-means clustering for means, uniform weigths and covariances cmeans2 C-means clustering for means, weigths and covariances Example: estS = gmmb_em(data, 'init', 'fcm1', 'components', 5, 'thr', 1e-8) References: [1] Duda, R.O., Hart, P.E, Stork, D.G, Pattern Classification, 2nd ed., John Wiley & Sons, Inc., 2001. [2] Bilmes, J.A., A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models International Computer Science Institute, 1998 Author(s): Joni Kamarainen <Joni.Kamarainen@lut.fi> Pekka Paalanen <pekka.paalanen@lut.fi> Copyright: Bayesian Classifier with Gaussian Mixture Model Pdf functionality is Copyright (C) 2003 by Pekka Paalanen and Joni-Kristian Kamarainen. $Name: $ $Revision: 1.2 $ $Date: 2004/11/02 09:00:18 $ Logging parameters logging What kind of logging to do: 0 - no logging 1 - normal logging 2 - extra logging: store all intermediate mixtures If the 'stats' output parameter is defined, then 'logging' defaults to 1, otherwise it is forced to 0. the 'stats' struct: iterations: EM iteration count covfixer2: iterations-by-C matrix of gmmb_covfixer fix round counts loglikes: iterations long vector of the log-likelihood extra logging: initialmix: parameters for the initial mixture mixtures: parameters for all intermediate mixtures
0001 %GMMB_EM - EM estimated GMM parameters 0002 % 0003 % estS = gmmb_em(data) 0004 % estS = gmmb_em(data, <params>...) 0005 % [estS, stats] = gmmb_em(...) 0006 % 0007 % This version works with complex numbers too. 0008 % 0009 % data = N x D matrix 0010 % params can be a list of 'name', value -pairs. 0011 % stats is a matrix, row (cov fixes, loops, final log-likelihood) 0012 % 0013 % Parameters (default value): 0014 % 0015 % maxloops maximum number of loops allowed (100) 0016 % thr convergence threshold; relative log-likelihood change (1e-6) 0017 % set negative to use only maxloops condition 0018 % components number of components in GMM (3) 0019 % verbose print progress messages (false) 0020 % init name of the initialization method to use ('fcm1') 0021 % 0022 % Init methods: 0023 % 0024 % fcm1 Fuzzy C-means clustering, requires the Fuzzy Logic Toolbox. 0025 % This is the original init method from GMMBayes Toolbox v0.1 0026 % cmeans1 C-means clustering for means, uniform weigths and covariances 0027 % cmeans2 C-means clustering for means, weigths and covariances 0028 % 0029 % Example: 0030 % estS = gmmb_em(data, 'init', 'fcm1', 'components', 5, 'thr', 1e-8) 0031 % 0032 % References: 0033 % [1] Duda, R.O., Hart, P.E, Stork, D.G, Pattern Classification, 0034 % 2nd ed., John Wiley & Sons, Inc., 2001. 0035 % [2] Bilmes, J.A., A Gentle Tutorial of the EM Algorithm and its 0036 % Application to Parameter Estimation for Gaussian Mixture and Hidden 0037 % Markov Models 0038 % International Computer Science Institute, 1998 0039 % 0040 % Author(s): 0041 % Joni Kamarainen <Joni.Kamarainen@lut.fi> 0042 % Pekka Paalanen <pekka.paalanen@lut.fi> 0043 % 0044 % Copyright: 0045 % 0046 % Bayesian Classifier with Gaussian Mixture Model Pdf 0047 % functionality is Copyright (C) 2003 by Pekka Paalanen and 0048 % Joni-Kristian Kamarainen. 0049 % 0050 % $Name: $ $Revision: 1.2 $ $Date: 2004/11/02 09:00:18 $ 0051 % 0052 % Logging 0053 % parameters 0054 % 0055 % logging What kind of logging to do: 0056 % 0 - no logging 0057 % 1 - normal logging 0058 % 2 - extra logging: store all intermediate mixtures 0059 % If the 'stats' output parameter is defined, then 'logging' 0060 % defaults to 1, otherwise it is forced to 0. 0061 % 0062 % the 'stats' struct: 0063 % iterations: EM iteration count 0064 % covfixer2: iterations-by-C matrix of gmmb_covfixer fix round counts 0065 % loglikes: iterations long vector of the log-likelihood 0066 % extra logging: 0067 % initialmix: parameters for the initial mixture 0068 % mixtures: parameters for all intermediate mixtures 0069 % 0070 0071 0072 function [estimate, varargout] = gmmb_em(data, varargin); 0073 0074 % default parameters 0075 conf = struct(... 0076 'maxloops', 100, ... 0077 'thr', 1e-6, ... 0078 'verbose', 0, ... 0079 'components', 3, ... 0080 'logging', 0, ... 0081 'init', 'fcm1' ... 0082 ); 0083 0084 if nargout>1 0085 conf.logging = 1; 0086 varargout{1} = []; 0087 end 0088 0089 conf = getargs(conf, varargin); 0090 0091 if nargout<2 0092 conf.logging=0; 0093 end 0094 0095 % for logging 0096 log_covfixer2 = {}; 0097 log_loglikes = {}; 0098 log_initialmix = {}; 0099 log_mixtures = {}; 0100 0101 0102 % --- initialization --- 0103 0104 N = size(data,1); % number of points 0105 D = size(data,2); % dimensions 0106 C = conf.components; 0107 0108 % the number of free parameters in a Gaussian 0109 if isreal(data) 0110 Nparc = D+D*(D+1)/2; 0111 else 0112 Nparc = 2*D + D*D; 0113 end 0114 N_limit = (Nparc+1)*3*C; 0115 if N < N_limit 0116 warning_wrap('gmmb_em:data_amount', ... 0117 ['Training data may be insufficient for selected ' ... 0118 'number of components. ' ... 0119 'Have: ' num2str(N) ', recommended: >' num2str(N_limit) ... 0120 ' points.']); 0121 end 0122 0123 switch lower(conf.init) 0124 case 'fcm1' 0125 initS = gmmb_em_init_fcm1(data, C, conf.verbose); 0126 case 'cmeans1' 0127 initS = gmmb_em_init_cmeans1(data, C); 0128 case 'cmeans2' 0129 initS = gmmb_em_init_cmeans2(data, C); 0130 otherwise 0131 error(['Unknown initializer method: ' conf.init]); 0132 end 0133 0134 0135 if any(initS.weight == 0) 0136 error('Initialization produced a zero weight.'); 0137 end 0138 0139 mu = initS.mu; 0140 sigma = initS.sigma; 0141 weight = initS.weight; 0142 0143 0144 log_initialmix = initS; 0145 fixerloops = zeros(1, C); 0146 0147 0148 % old values for stopping condition calculations 0149 old_loglike = -realmax; 0150 0151 loops=1; 0152 fixing_cycles = 0; 0153 0154 tulo = gmmcpdf(data, mu, sigma, weight); 0155 0156 while 1 0157 % one EM cycle 0158 pcompx = tulo ./ (sum(tulo,2)*ones(1,C)); 0159 0160 if ~all( isfinite(pcompx(:)) ) 0161 error('Probabilities are no longer finite.'); 0162 end 0163 0164 for c = 1:C 0165 % calculate new estimates 0166 psum = sum(pcompx(:,c)); 0167 0168 % weight 0169 weight(c) = 1/N*psum; 0170 0171 % mean 0172 nmu = sum(data.*(pcompx(:,c)*ones(1,D)), 1).' ./ psum; 0173 mu(:,c) = nmu; 0174 0175 % covariance 0176 moddata = (data - ones(N,1)*(nmu.')) .* (sqrt(pcompx(:,c))*ones(1,D)); 0177 % sqrt(pcompx) is because it will be squared back 0178 nsigma = (moddata' * moddata) ./ psum; 0179 0180 % covariance matrix goodness assurance 0181 [sigma(:,:,c), fixerloops(1,c)] = gmmb_covfixer(nsigma); 0182 % covfixer may change the matrix so that log-likelihood 0183 % decreases. So, if covfixer changes something, 0184 % disable the stop condition. If going into infinite 0185 % fix/estimate -loop, quit. 0186 end 0187 0188 % finish test 0189 tulo = gmmcpdf(data, mu, sigma, weight); 0190 loglike = sum(log(sum(tulo, 2))); 0191 0192 if conf.verbose ~= 0 0193 disp([ 'log-likelihood diff ' num2str(loglike-old_loglike) ' on round ' num2str(loops) ]); 0194 end 0195 0196 if conf.logging>0 0197 log_covfixer2{loops} = fixerloops; 0198 log_loglikes{loops} = loglike; 0199 end 0200 if conf.logging>1 0201 log_mixtures{loops} = struct(... 0202 'weight', weight, ... 0203 'mu', mu, ... 0204 'sigma', sigma); 0205 end 0206 0207 if any(fixerloops ~= 0) 0208 % if any cov's were fixed, increase count and 0209 % do not evaluate stopping threshold. 0210 fixing_cycles = fixing_cycles +1; 0211 if conf.verbose ~= 0 0212 disp(['fix cycle ' num2str(fixing_cycles) ... 0213 ', fix loops ' num2str(fixerloops)]); 0214 end 0215 else 0216 % no cov's were fixed this round, reset the counter 0217 % and evaluate threshold. 0218 fixing_cycles = 0; 0219 if (abs(loglike/old_loglike -1) < conf.thr) 0220 break; 0221 end 0222 end 0223 0224 if fixing_cycles > 20 0225 warning_wrap('gmmb_em:fixing_loop', ... 0226 ['A covariance matrix has been fixed repeatedly' ... 0227 ' too many times, quitting EM estimation.']); 0228 break; 0229 end 0230 0231 if loops >= conf.maxloops 0232 break; 0233 end 0234 0235 loops = loops +1; 0236 old_loglike = loglike; 0237 end 0238 0239 0240 0241 estimate = struct('mu', mu,... 0242 'sigma', sigma,... 0243 'weight', weight); 0244 0245 if conf.logging>1 0246 varargout{1} = struct(... 0247 'iterations', {loops}, ... 0248 'covfixer2', {cat(1,log_covfixer2{:})}, ... 0249 'loglikes', {cat(1,log_loglikes{:})}, ... 0250 'initialmix', {log_initialmix}, ... 0251 'mixtures', {log_mixtures}); 0252 end 0253 if conf.logging == 1 0254 varargout{1} = struct(... 0255 'iterations', {loops}, ... 0256 'covfixer2', {cat(1,log_covfixer2{:})}, ... 0257 'loglikes', {cat(1,log_loglikes{:})} ); 0258 end 0259 0260 0261 % ------------------------------------------ 0262 0263 function tulo = gmmcpdf(data, mu, sigma, weight); 0264 N = size(data, 1); 0265 C = size(weight,1); 0266 0267 pxcomp = zeros(N,C); 0268 for c = 1:C 0269 pxcomp(:,c) = gmmb_cmvnpdf(data, mu(:,c).', sigma(:,:,c)); 0270 end 0271 tulo = pxcomp.*repmat(weight.', N,1); 0272