The function gprSRPP.m is based on the function gpr.m. The specification of the covariance function is the same as in gpr.m, except that the covariance function is assumed to be specified as a cell array with the first entry being covSum and the last entry being covNoise.
[mu, S2SR, S2PP] = gprSRPP(logtheta, covfunc, x, INDEX, y, xstar)
inputs | |
logtheta | a (column) vector containing the logarithm of the hyperparameters of the covariance function |
covfunc | the covariance function, which is assumed to be a covSum, and the last entry of the sum is covNoise |
x | a n by D matrix of training inputs |
INDEX | a row vector of length m (with m<= n) used to specify which inputs are used in the active set |
y | a (column) vector of training set targets (of length n) |
xstar | a nstar by D matrix of test inputs |
outputs | |
mu | (column) vector (of length nstar) of predictive means |
S2SR | (column) vector (of length nstar) of predictive variances from the SR algorithm (incl noise variance) |
S2PP | (column) vector (of length nstar) of predictive variances from the PP algorithm (incl noise variance) |
addpath('../gpml'); % add path for the code load data_bostonAs usual the training inputs are stored in X, the training targets in y, the test inputs in Xstar and the test targets in ystar. A random subset of the training data points are selected using the randperm function. This set is of size m= 200.
m = 200; perm = randperm(n); INDEX = perm(1:m); Xm = X(INDEX,:); ym = y(INDEX);We use a covariance function made up of the sum of a squared exponential (SE) covariance term with ARD, and independent noise. Thus, the covariance function is specified as:
covfunc = {'covSum', {'covSEard','covNoise'}};The hyperparameters are initialized to logtheta0=[0; 0; ... 0; 0; -1.15]
logtheta0 = zeros(D+2,1); logtheta0(D+2) = -1.15; % starting value for log(noise std dev)Note that the noise standard deviation is set to exp(-1.15) corresponding to a noise variance of 0.1. The hyperparameters are trained by maximizing the approximate marginal likelihood of the SD method given in eq. 8.31. This simply computes the marginal likelihood of the subset of size m.
[logtheta, fvals, iter] = minimize(logtheta0, 'gpr', covfunc, -100, Xm, ym);Predictions can now be made using the 3 methods. SD is implemented simply by calling gpr on the reduced training set,
[fstarSD S2SD] = gpr(logtheta, Xm, ym, Xstar);The outputs are the mean predictions fstarSD and the predictive (noise-free) variances S2SD. The SR and PP methods are called using the function gprSRPP; note that the INDEX vector is passed:
[fstarSRPP S2SR S2PP] = gprSRPP(logtheta, X, INDEX, y, Xstar);gprSRPP returns the predictive mean fstarSRPP (which is identical for both methods), and the predictive (noise-free) variances S2SR and S2PP. For comparison we also make predictions using gpr.m on the full training dataset, and a dumb predictor that just predicts the mean and variance of the training data. We compute the residuals, the mean squared error (mse) and the predictive log likelihood (pll) for all methods. Note that the predictive variance for ystar includes the noise variance, as explained on p. 18. Thus for example we have
resSR = fstarSRPP-ystar; mseSR = sum(resSR.^2)/nstar; pllSR = -0.5*mean(log(2*pi*S2SR)+resSR.^2./S2SR);The test results are:
mse_full 0.118924 pll_full -0.414019 mse_SD 0.179692 pll_SD -0.54915 mse_SR 0.138302 pll_SR -0.658645 mse_PP 0.138302 pll_PP -0.395815 mse_dumb 1.11464 pll_dumb -1.4769where mse denotes mean squared error and pll denotes predictive log likelihood. A higher (less negative) pll is more desirable. Note that the mse for the SR and PP methods is identical as expected. The SR and PP methods outperform SD on mse, and are close to the full mse. On pll, the PP method does slightly better than the full predictor, followed by the SD and SR methods. You can experiment further by varying the value of the subset size m in the script demo_gprsparse.m. Go back to the web page for Gaussian Processes for Machine Learning.