Title: | Separable Gaussian Process Interpolation (Emulation) |
---|---|
Description: | Functions for simplified emulation of time series computer model output in model parameter space using Gaussian processes. Stilt can be used more generally for Kriging of spatio-temporal fields. There are functions to predict at new parameter settings, to test the emulator using cross-validation (which includes information on 95% confidence interval empirical coverage), and to produce contour plots over 2D slices in model parameter space. |
Authors: | Roman Olson, Won Chang, Klaus Keller, and Murali Haran |
Maintainer: | Kelsey Ruckert <[email protected]> |
License: | GPL-3 |
Version: | 1.3.0 |
Built: | 2024-12-22 06:23:46 UTC |
Source: | CRAN |
'Stilt' is used for interpolating multivariate data in multivariate space. It is tailored to the case of interpolating ("emulating") regularly spaced time-series of computer model output between the model input parameters, but can be also used more broadly (e.g., for output in the form of regularly-spaced spatial 1D transects, or for interpolating time-series between locations on the Earth's surface). The interpolation technique relies on a separable Gaussian Process emulator. Package consists of functions to fit the Gaussian Process emulator, to interpolate to (predict at) a given model input setting, and to validate the emulator using cross-validation. In addition, a function is provided to plot model response surface as generated by the emulator as a function of two model parameters. The package works both on one and multi-parameter ensembles, with an exception of the 'rsurface.plot' function, which by definition is restricted to 2+ parameter ensembles. The emulator is restricted to multivariate model output and is not designed to work on interpolating a single scalar.
Package: | stilt |
Type: | Package |
Version: | 1.3.0 |
Date: | 2017-08-15 |
License: | GPL-3 |
LazyLoad: | no |
This package is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. In addition, the authors maintain no responsibility for any possible code errors or bugs.
Fits a separable Gaussian Process emulator to ensemble model output
Predicts using a Gaussian Process emulator
Plots a 2D model response surface
Cross-validates an emulator by removing one or many model runs, training the emulator on the remaining runs, and then predicting at the withheld model runs
Tests an emulator using leave-one-out cross validation
Constructs time and parameter covariance matrices
Model output and parameter settings for a simple 1-parameter ensemble example
Gaussian Process emulator that has been fit to the aforementioned model ensemble output
Temperature variability and future change for 29 CMIP5 global climate models
Model output and parameter settings for temperature anomaly output from 3-parameter ensemble of climate model UVic ESCM
Model output and parameter settings for ice mass loss output from a 5-parameter ensemble of ice sheet model SICOPOLIS
A Gaussian Process emulator that has been fit to the SICOPOLIS ensemble output
The emulator (like any other available software packages) will not work well for 'jagged' model response surfaces (high nugget): predictive uncertainty will be too high.
The emulator is restricted to output at regular intervals in time and space
The code has not been tested under conditions of extreme high / extreme low input parameter range, output time(space) coordinates range, and output range (an example would be model output ranging from -1E20 to 1E20, etc.). In such cases it is recommended to re-scale the time(space) coordinates vector, the input parameters, and/or the model output.
The emulator will not work on scalar model output – it requires multivariate data
The emulator assumes a separable covariance function, and stationarity of the covariance part of the Gaussian process.
The optimization of the emulator parameters degrades dramatically (and increases in time) as a function of number of free parameters. Hence, the emulator might be of limited use for large parameter ensembles
The emulator authors are not responsible for any code errors and/or bugs
Roman Olson, Won Chang, Klaus Keller and Murali Haran
Maintainer: Kelsey Ruckert <[email protected]>
R. Olson and W. Chang (2013): Mathematical framework for a separable
Gaussian Process Emulator. Tech. Rep., available from
www.scrimhub.org/resources/stilt/Olson_and_Chang_2013_Stilt_Emulator_Technical_Report.pdf.
C. E. Rasmussen and C. K. I. Williams (2006): Gaussian Processes for machine learning, the MIT press, available from www.gaussianprocess.org/gpml/
Synthetic model output for a simple 1-parameter ensemble example
data(Data.1D.model)
data(Data.1D.model)
The format is a list containing five elements
: 11-element time vector: (0, 1, ..., 10)
: Time units="Year"
: [1:11, 1:21] matrix of model response, [row,col] = [time index, parameter index]. The corresponding vector of parameters for columns is in the $par component of Data.1D.par.Rd ...
: Output name="Sample Output"
: Output units="Unitless"
Model response is a function of time and parameter setting
as follows:
# Fit an emulator to the 1-parameter ensemble data data(Data.1D.model) data(Data.1D.par) emulator(Data.1D.par, Data.1D.model, FALSE, FALSE, 1, 1)
# Fit an emulator to the 1-parameter ensemble data data(Data.1D.model) data(Data.1D.par) emulator(Data.1D.par, Data.1D.model, FALSE, FALSE, 1, 1)
Parameter settings for 1-parameter ensemble model output in Data.1D.model
data(Data.1D.par)
data(Data.1D.par)
The format is list containing three elements:
[1, 1:21] matrix of parameter settings. The columns correspond to columns of $out in Data.1D.model. The parameter settings are (0, 1, ..., 20)
Parameter name = "Parameter 1"
Parameter units = "Unitless"
# Fit an emulator to the 1-parameter ensemble data data(Data.1D.model) data(Data.1D.par) emulator(Data.1D.par, Data.1D.model, FALSE, FALSE, 1, 1)
# Fit an emulator to the 1-parameter ensemble data data(Data.1D.model) data(Data.1D.par) emulator(Data.1D.par, Data.1D.model, FALSE, FALSE, 1, 1)
Future 2081-2100 RCP8.5 emissions scenario change in summer mean maximum temperature compared to present (1973-2005) over Korea as a function of present-day (1973-2005) variability (innovation standard deviation and autocorrelation from CMIP5 GCMs)
data(Data.AR1Korea.model)
data(Data.AR1Korea.model)
The format is a list containing five elements:
Time vector for model output: (2081, 2082, ... 2100)
Time units = "Year"
[1:20, 1:29] matrix of GCM temperature change output. [row, col] = [time, model index]. The variability parameters corresponding to columns of $out are in the $par element of Data.AR1Korea.par
Output name = "Temp Change 1973-2005 to 2081-2100"
Output units = "K"
The CMIP5 GCM output was analysed by Jong-Soo Shin and Roman Olson at Yonsei University. The variability properties were found by maximum likelihood.
# Fit an emulator to the CMIP5 GCM Korean temperature variability and change # data with innovation standard deviation and time as covariates data(Data.AR1Korea.model) data(Data.AR1Korea.par) emulator(Data.AR1Korea.par, Data.AR1Korea.model, c(TRUE, FALSE), TRUE, 3, 3)
# Fit an emulator to the CMIP5 GCM Korean temperature variability and change # data with innovation standard deviation and time as covariates data(Data.AR1Korea.model) data(Data.AR1Korea.par) emulator(Data.AR1Korea.par, Data.AR1Korea.model, c(TRUE, FALSE), TRUE, 3, 3)
Korean summer mean maximum temperature 1973-2005 interannual variability parameters (innovation standard deviation and autocorrelation) for 29 CMIP5 GCMs.
data(Data.AR1Korea.par)
data(Data.AR1Korea.par)
The format is a list containing three variables
Variability settings for 29 CMIP5 GCMs. [1:2, 1:29] matrix. Rows are: sigma (innovation std) and rho (autocorrelation). The columns of $par correspond to columns of $out in Data.AR1Korea.model.
Parameter names for rows of $par
Parameter units for rows of $par
The CMIP5 GCM output was analysed by Jong-Soo Shin and Roman Olson at Yonsei University. The variability properties were found by maximum likelihood.
# Fit an emulator to the CMIP5 GCM Korean temperature variability and change # data with innovation standard deviation and time as covariates data(Data.AR1Korea.model) data(Data.AR1Korea.par) emulator(Data.AR1Korea.par, Data.AR1Korea.model, c(TRUE, FALSE), TRUE, 3, 3)
# Fit an emulator to the CMIP5 GCM Korean temperature variability and change # data with innovation standard deviation and time as covariates data(Data.AR1Korea.model) data(Data.AR1Korea.par) emulator(Data.AR1Korea.par, Data.AR1Korea.model, c(TRUE, FALSE), TRUE, 3, 3)
Ice mass loss output from a 5-parameter ensemble of SICOPOLIS ice sheet model
data(Data.Sicopolis.model)
data(Data.Sicopolis.model)
The format is a list containing five elements
661-element time vector for model output: (1840, 1841, ..., 2500)
Time units = "Year"
[1:661, 1:100] matrix of Greenland Ice Sheet mass anomaly with respect to year 2003. [row, col] = [time index, parameter index]. The vector of parameter values for columns of $out is in the $par element of Data.Sicopolis.par.
Output name = "GIS Mass Anomaly wrt 2003"
Output units "Gt"
The ensemble was generated by Patrick Applegate. The ensemble varies five important ice sheet model parameters: Flow Enhancement Factor, Basal Sliding Factor, Geothermal Heat Flux, Snow PDD Factor, and Ice PDD Factor
Applegate, P. J., Kirchner, N., Stone, E. J., Keller, K., and Greve, R., 2012, An assessment of key model parametric uncertainties in projections of Greenland Ice Sheet behavior: The Cryosphere 6, 589-606.
# Fit an emulator to the SICOPOLIS ensemble data data(Data.Sicopolis.par) data(Data.Sicopolis.model) ## Not run: emulator(Data.Sicopolis.par, Data.Sicopolis.model, c(FALSE, FALSE, FALSE, FALSE, FALSE), FALSE, 200000, 20000) ## End(Not run)
# Fit an emulator to the SICOPOLIS ensemble data data(Data.Sicopolis.par) data(Data.Sicopolis.model) ## Not run: emulator(Data.Sicopolis.par, Data.Sicopolis.model, c(FALSE, FALSE, FALSE, FALSE, FALSE), FALSE, 200000, 20000) ## End(Not run)
Parameter settings corresponding to SICOPOLIS ensemble model output in Data.Sicopolis.model.
data(Data.Sicopolis.par)
data(Data.Sicopolis.par)
The format is a list containing three elements
[1:5, 1:100] matrix of parameter settings for the SICOPOLIS model ensemble. [row, col] = [parameter number, ensemble run number]. The columns match the columns of $out element of Data.Sicopolis.model
Parameter names for rows of $par
Parameter units for rows of $par
The ensemble was generated by Patrick Applegate. The ensemble varies five important ice sheet model parameters: Flow Enhancement Factor, Basal Sliding Factor, Geothermal Heat Flux, Snow PDD Factor, and Ice PDD Factor
Applegate, P. J., Kirchner, N., Stone, E. J., Keller, K., and Greve, R., 2012, An assessment of key model parametric uncertainties in projections of Greenland Ice Sheet behavior: The Cryosphere 6, 589-606.
# Fit an emulator to the SICOPOLIS ensemble data data(Data.Sicopolis.par) data(Data.Sicopolis.model) ## Not run: emulator(Data.Sicopolis.par, Data.Sicopolis.model, c(FALSE, FALSE, FALSE, FALSE, FALSE), FALSE, 200000, 20000) ## End(Not run)
# Fit an emulator to the SICOPOLIS ensemble data data(Data.Sicopolis.par) data(Data.Sicopolis.model) ## Not run: emulator(Data.Sicopolis.par, Data.Sicopolis.model, c(FALSE, FALSE, FALSE, FALSE, FALSE), FALSE, 200000, 20000) ## End(Not run)
Temperature anomaly output from UVic ESCM climate model ensemble
data(Data.UVic.model)
data(Data.UVic.model)
The format is a list containing five elements:
Time vector for model output: (1850, 1851, ... 2009)
Time units = "Year"
[1:160, 1:250] matrix of UVic ESCM temperature output. [row, col] = [time, run index]. The parameter settings corresponding to runs in columns of $out are in the $par element of Data.UVic.par
Output name = "Temperature Anomaly from 1850-1899"
Output units = "[K]"
The ensemble was run by Roman Olson at Penn State University. The ensemble varies key parameters that affect decadal- and century-scale climate response to external climate forcings: vertical ocean diffusivity, climate sensitivity, and anthropogenic sulfate aerosol scaling factor.
Olson, R., R. Sriver, M. Goes, N. M. Urban, H. D. Matthews, M. Haran and K. Keller (2012): A climate sensitivity estimate using Bayesian fusion of instrumental observations and an Earth System model. Journal of Geosphysical Research - Atmospheres, 117(D04103), doi:10.1029/2011JD016620
Olson, R., R. Sriver, W. Chang, M. Haran, N. M. Urban and K. Keller (2013): What is the effect of unresolved internal climate variability on climate sensitivity estimates? Journal of Geophysical Research - Atmospheres, 118, doi:10.1002/jgrd.50390
# Fit an emulator to the UVic ESCM ensemble data using all parameter and #time covariates data(Data.UVic.model) data(Data.UVic.par) ## Not run: emulator(Data.UVic.par, Data.UVic.model, c(TRUE, TRUE, TRUE), TRUE, 1, 1)
# Fit an emulator to the UVic ESCM ensemble data using all parameter and #time covariates data(Data.UVic.model) data(Data.UVic.par) ## Not run: emulator(Data.UVic.par, Data.UVic.model, c(TRUE, TRUE, TRUE), TRUE, 1, 1)
Parameter settings for the UVic ESCM climate model ensemble output in Data.UVic.model
data(Data.UVic.par)
data(Data.UVic.par)
The format is a list containing three variables
Parameter settings for the temperature output from the UVic ESCM ensemble in Data.UVic.model. [1:3, 1:250] matrix. [row, col] = [parameter number, run number]. The columns of $par correspond to columns of $out in Data.UVic.model.
Parameter names for rows of $par
Parameter units for rows of $par
The ensemble was run by Roman Olson at Penn State University. The ensemble varies key parameters that affect decadal- and century-scale climate response to external climate forcings: vertical ocean diffusivity, climate sensitivity, and anthropogenic sulfate aerosol scaling factor.
Olson, R., R. Sriver, M. Goes, N. M. Urban, H. D. Matthews, M. Haran and K. Keller (2012): A climate sensitivity estimate using Bayesian fusion of instrumental observations and an Earth System model. Journal of Geosphysical Research - Atmospheres, 117(D04103), doi:10.1029/2011JD016620
Olson, R., R. Sriver, W. Chang, M. Haran, N. M. Urban and K. Keller (2013): What is the effect of unresolved internal climate variability on climate sensitivity estimates? Journal of Geophysical Research - Atmospheres, 118, doi:10.1002/jgrd.50390
# Fit an emulator to the UVic ESCM ensemble data using all parameter and # time covariates data(Data.UVic.par) data(Data.UVic.model) ## Not run: emulator(Data.UVic.par, Data.UVic.model, c(TRUE, TRUE, TRUE), TRUE, 1, 1)
# Fit an emulator to the UVic ESCM ensemble data using all parameter and # time covariates data(Data.UVic.par) data(Data.UVic.model) ## Not run: emulator(Data.UVic.par, Data.UVic.model, c(TRUE, TRUE, TRUE), TRUE, 1, 1)
Gaussian Process emulator object fitted to the 1-parameter example given in Data.1D.par and Data.1D.model
data(emul.1D)
data(emul.1D)
This is a standard emulator list object, containing standard emulator components as described in 'emulator' function
The emulator was fit with both the parameter and time covariates, with a relative tolerance of 1E-9, with two optimization attempts. The regression coefficients were estimated during the optimization process.
# Cross-validate the 1-parameter model emulator and plot the results at # time index 5 data(emul.1D) test.all(emul.1D, 5)
# Cross-validate the 1-parameter model emulator and plot the results at # time index 5 data(emul.1D) test.all(emul.1D, 5)
To predict using an emulator. This function is deprecated. Use predict.emul
instead.
emul.predict(emul, theta.star)
emul.predict(emul, theta.star)
emul |
Standard emulator object, as output, for example, by the 'emulator' function. |
theta.star |
Parameter setting at which to predict. Must have the same number of
elements as there are columns in the |
Emulator prediction follows standard formulation in Gaussian Process theory. For more details, see References.
Prediction at the parameter setting theta.star
for all times
specified by emul$t.vec
. List with components
mean |
Posterior mean. n*1 matrix. Rows correspond to the times of |
covariance |
Posterior covariance matrix. n*n matrix. Diagonal
elements represent variances at each time in |
Prediction outside the emulator range ("extrapolation") is currently not allowed.
R. Olson and W. Chang (2013): Mathematical framework for a separable
Gaussian Process Emulator. Tech. Rep., available from
www.scrimhub.org/resources/stilt/Olson_and_Chang_2013_Stilt_Emulator_Technical_Report.pdf.
rsurface.plot
, test.csv
, emulator
, predict.emul
## Not run: # Predict using the SICOPOLIS model at a mid-range parameter setting, and plot # the prediction and associated uncertainty data(emul.Sicopolis) pred <- emul.predict(emul.Sicopolis, c(3, 10, 50, 3, 12.5)) plot.default(NA, xlim=range(emul.Sicopolis$t.vec), ylim=range(pred$mean), xlab="Year", ylab="Ice Mass Loss relative to year 2003") lines(emul.Sicopolis$t.vec, pred$mean, col="brown", lwd=3) std <- sqrt(diag(pred$covariance)) lines(emul.Sicopolis$t.vec, pred$mean + std, col="brown", lty=2) lines(emul.Sicopolis$t.vec, pred$mean - std, col="brown", lty=2) # Fit an emulator to the 1-parameter test ensemble data, predict at # Theta*=8, and plot the prediction data(Data.1D.par) data(Data.1D.model) emul.1D <- emulator(Data.1D.par, Data.1D.model, TRUE, TRUE, 100, 0.1) pred.1D <- emul.predict(emul.1D, 8) plot(emul.1D$t.vec, pred.1D$mean, xlab="Year", ylab="Sample Model Output at Theta*=8") ## End(Not run)
## Not run: # Predict using the SICOPOLIS model at a mid-range parameter setting, and plot # the prediction and associated uncertainty data(emul.Sicopolis) pred <- emul.predict(emul.Sicopolis, c(3, 10, 50, 3, 12.5)) plot.default(NA, xlim=range(emul.Sicopolis$t.vec), ylim=range(pred$mean), xlab="Year", ylab="Ice Mass Loss relative to year 2003") lines(emul.Sicopolis$t.vec, pred$mean, col="brown", lwd=3) std <- sqrt(diag(pred$covariance)) lines(emul.Sicopolis$t.vec, pred$mean + std, col="brown", lty=2) lines(emul.Sicopolis$t.vec, pred$mean - std, col="brown", lty=2) # Fit an emulator to the 1-parameter test ensemble data, predict at # Theta*=8, and plot the prediction data(Data.1D.par) data(Data.1D.model) emul.1D <- emulator(Data.1D.par, Data.1D.model, TRUE, TRUE, 100, 0.1) pred.1D <- emul.predict(emul.1D, 8) plot(emul.1D$t.vec, pred.1D$mean, xlab="Year", ylab="Sample Model Output at Theta*=8") ## End(Not run)
Sample emulator for ice mass loss from the 5-parameter SICOPOLIS ice sheet model ensemble output
data(emul.Sicopolis)
data(emul.Sicopolis)
This is a standard emulator list object, containing standard emulator components as described in 'emulator' function. The emulated quantity is Greenland ice mass loss relative to year 2003 (Gt).
The emulator was fit using all five parameter covariates, and the time covariate. The beta parameters are the multiple linear regression estimates. The model ensemble was generated by Patrick Applegate. The ensemble varies five important ice sheet model parameters: Flow Enhancement Factor, Basal Sliding Factor, Geothermal Heat Flux, Snow PDD Factor, and Ice PDD Factor.
Applegate, P. J., Kirchner, N., Stone, E. J., Keller, K., and Greve, R., 2012, An assessment of key model parametric uncertainties in projections of Greenland Ice Sheet behavior: The Cryosphere 6, 589-606.
# Plot the response surface as a function of snow PDD factor, and ice # PDD factor at year 2500 (time index 661), while keeping other parameters # at mid-range values data(emul.Sicopolis) ## Not run: rsurface.plot(emul=emul.Sicopolis, parind=c(4,5), parvals=c(3, 10, 50, NA, NA), tind=661, n1=5, n2=5) ## End(Not run)
# Plot the response surface as a function of snow PDD factor, and ice # PDD factor at year 2500 (time index 661), while keeping other parameters # at mid-range values data(emul.Sicopolis) ## Not run: rsurface.plot(emul=emul.Sicopolis, parind=c(4,5), parvals=c(3, 10, 50, NA, NA), tind=661, n1=5, n2=5) ## End(Not run)
Fits a Gaussian Process emulator to regularly spaced time-series (or 1D spatial) model output. This emulator can later be used to interpolate the model output across model parameter settings. The emulator is fit by maximizing the emulator likelihood using a PORT local optimization routine. The emulator has a flexible structure which can be chosen by the user (see Details).
emulator(mpars, moutput, par.reg, time.reg, kappa0, zeta0, myrel.tol = NULL, twice = FALSE, fix.betas = TRUE)
emulator(mpars, moutput, par.reg, time.reg, kappa0, zeta0, myrel.tol = NULL, twice = FALSE, fix.betas = TRUE)
mpars |
Model ensemble parameter settings. A list with components:
|
moutput |
Model output. A list with components:
|
par.reg |
Logical vector specifying which parameters to include into the
regression matrix. Elements refer to rows of |
time.reg |
Logical specifying whether to include time into the regression matrix. |
kappa0 |
Initial guess for the parameter covariance scaling factor. Larger values lead to higher parameter covariance. |
zeta0 |
Initial guess for the nugget. |
myrel.tol |
Relative tolerance for optimization. If you want to use the defaults, assign it to NULL. The optimization stops if it thinks the true likelihood is within this fraction of current likelihood. The higher the number the sooner the optimization stops, but the emulator is 'worse'. |
twice |
Logical. If |
fix.betas |
Logical. If |
The emulator parameters are optimized using a local optimization
method, to within the specified relative tolerance. The optimization maximizes
emulator likelihood. Optionally, if twice=TRUE
, the optimization is done twice with
different initial parameter settings, and the emulator with the highest
likelihood is selected. Depending on the value of fix.betas
, the regression
beta parameters can either be fixed at their multiple linear regression
estimates, or estimated together with other
emulator parameters. Mean structure is flexible, and user can choose which
parameter and time dimensions to use as regressors via the
time.reg
and par.reg
parameters
Emulator returns an object of class
"emul" and is a list with components
(see also Reference in the References Section).
$Theta.mat |
Theta matrix. [row, col] = [run number, parameter number] |
$t.vec |
Time vector |
$Y.mat |
Data matrix Y |
$X.mat |
Regression matrix |
$beta.vec |
A vector of regression parameters |
$kappa |
Parameter covariance scaling factor |
$phi.vec |
A vector of range parameters phi |
$zeta |
Nugget |
$n |
Length of time dimension |
$rho |
Time AR(1) parameter |
$p |
Number of model runs in the ensemble |
$vecC |
Data matrix Y minus the mean vector |
$par.reg |
Logical vector specifying which parameters to include in the regression matrix |
$time.reg |
Logical, specifies whether to include time into the regression matrix |
$Sigma.mats |
List of two precomputed matrices that may be useful:
|
$Sigma.theta.inv.mat |
Inverse of |
Evaluation of this function (especially for large datasets and many parameters) might take a long time.
The emulator will not work well for 'jagged' model response surfaces (high nugget)
The emulator is restricted to output at regular intervals in time and space
The code has not been tested under conditions of extreme high / extreme low input parameter range, output time(space) coordinates range, and output range (an example would be model output ranging from -1E20 to 1E20, etc.). In such cases it is recommended to re-scale the time(space) coordinates vector, the input parameters, and/or the model output.
The emulator will not work on scalar model output – it requires multivariate data
The emulator assumes a separable covariance function, and stationarity of the covariance part of the Gaussian process.
The optimization of the emulator parameters degrades dramatically (and increases in time) as a function of number of free parameters. Hence, the emulator might be of limited use for large parameter ensembles
The emulator authors are not responsible for any code errors and/or bugs
If the final emulator performs poorly, try adjusting the
settings, especially the initial kappa0
and zeta0
values.
R. Olson and W. Chang (2013): Mathematical framework for a separable
Gaussian Process Emulator. Tech. Rep., available from
www.scrimhub.org/resources/stilt/Olson_and_Chang_2013_Stilt_Emulator_Technical_Report.pdf.
# Fit an emulator to the example 1D parameter ensemble data # Do not use any covariates, use standard settings data(Data.1D.model) data(Data.1D.par) emul.1D <- emulator(Data.1D.par, Data.1D.model, FALSE, FALSE, 1, 1) # Take a look at the range and regression parameters cat("Range parameters are:", emul.1D$phi.vec, "\n") cat("Regression parameters are:", emul.1D$beta.vec, "\n") # Predict using the emulator at Theta*=3 pred.1D <- predict(emul.1D, 3) # Plot the prediction plot(emul.1D$t.vec, pred.1D$mean, xlab="Year", ylab="Sample Output") ## Not run: # Fit an emulator to the UVic ESCM 3-parameter ensemble temperature # output Use time and aerosol scaling covariates (parameter 3), run # the optimization twice with relative tolerance of 0.1, keep # regression parameters at their multiple linear regression # estimates data(Data.UVic.model) data(Data.UVic.par) UVic.emul <- # emulator(mpars=Data.UVic.par, moutput=Data.UVic.model, # par.reg=c(FALSE, FALSE, TRUE), time.reg=TRUE, kappa0=1, zeta0=1, # myrel.tol=0.1, twice=TRUE, fix.betas=TRUE) ## End(Not run)
# Fit an emulator to the example 1D parameter ensemble data # Do not use any covariates, use standard settings data(Data.1D.model) data(Data.1D.par) emul.1D <- emulator(Data.1D.par, Data.1D.model, FALSE, FALSE, 1, 1) # Take a look at the range and regression parameters cat("Range parameters are:", emul.1D$phi.vec, "\n") cat("Regression parameters are:", emul.1D$beta.vec, "\n") # Predict using the emulator at Theta*=3 pred.1D <- predict(emul.1D, 3) # Plot the prediction plot(emul.1D$t.vec, pred.1D$mean, xlab="Year", ylab="Sample Output") ## Not run: # Fit an emulator to the UVic ESCM 3-parameter ensemble temperature # output Use time and aerosol scaling covariates (parameter 3), run # the optimization twice with relative tolerance of 0.1, keep # regression parameters at their multiple linear regression # estimates data(Data.UVic.model) data(Data.UVic.par) UVic.emul <- # emulator(mpars=Data.UVic.par, moutput=Data.UVic.model, # par.reg=c(FALSE, FALSE, TRUE), time.reg=TRUE, kappa0=1, zeta0=1, # myrel.tol=0.1, twice=TRUE, fix.betas=TRUE) ## End(Not run)
To predict using an emulator
## S3 method for class 'emul' predict(object, theta.star, ...)
## S3 method for class 'emul' predict(object, theta.star, ...)
object |
Standard emulator object of class "emul", as output, for example, by the |
theta.star |
Parameter setting at which to predict. Must have the same number of
elements as there are columns in the |
... |
Additional optional arguments. At present no optional arguments are used. |
Emulator prediction follows standard formulation in Gaussian Process theory. For more details, see References.
Prediction at the parameter setting theta.star
for all times
specified by object$t.vec
. List with components
mean |
Posterior mean. n*1 matrix. Rows correspond to the times of |
covariance |
Posterior covariance matrix. n*n matrix. Diagonal
elements represent variances at each time in |
Prediction outside the emulator range ("extrapolation") is currently not allowed.
R. Olson and W. Chang (2013): Mathematical framework for a separable
Gaussian Process Emulator. Tech. Rep., available from
www.scrimhub.org/resources/stilt/Olson_and_Chang_2013_Stilt_Emulator_Technical_Report.pdf.
rsurface.plot
, test.csv
, emulator
# Predict using the SICOPOLIS model at a mid-range parameter setting, and plot # the prediction and associated uncertainty data(emul.Sicopolis) pred <- predict(emul.Sicopolis, c(3, 10, 50, 3, 12.5)) plot.default(NA, xlim=range(emul.Sicopolis$t.vec), ylim=range(pred$mean), xlab="Year", ylab="Ice Mass Loss relative to year 2003") lines(emul.Sicopolis$t.vec, pred$mean, col="brown", lwd=3) std <- sqrt(diag(pred$covariance)) lines(emul.Sicopolis$t.vec, pred$mean + std, col="brown", lty=2) lines(emul.Sicopolis$t.vec, pred$mean - std, col="brown", lty=2) # Fit an emulator to the 1-parameter test ensemble data, predict at # Theta*=8, and plot the prediction data(Data.1D.par) data(Data.1D.model) emul.1D <- emulator(Data.1D.par, Data.1D.model, TRUE, TRUE, 100, 0.1) pred.1D <- predict(emul.1D, 8) plot(emul.1D$t.vec, pred.1D$mean, xlab="Year", ylab="Sample Model Output at Theta*=8")
# Predict using the SICOPOLIS model at a mid-range parameter setting, and plot # the prediction and associated uncertainty data(emul.Sicopolis) pred <- predict(emul.Sicopolis, c(3, 10, 50, 3, 12.5)) plot.default(NA, xlim=range(emul.Sicopolis$t.vec), ylim=range(pred$mean), xlab="Year", ylab="Ice Mass Loss relative to year 2003") lines(emul.Sicopolis$t.vec, pred$mean, col="brown", lwd=3) std <- sqrt(diag(pred$covariance)) lines(emul.Sicopolis$t.vec, pred$mean + std, col="brown", lty=2) lines(emul.Sicopolis$t.vec, pred$mean - std, col="brown", lty=2) # Fit an emulator to the 1-parameter test ensemble data, predict at # Theta*=8, and plot the prediction data(Data.1D.par) data(Data.1D.model) emul.1D <- emulator(Data.1D.par, Data.1D.model, TRUE, TRUE, 100, 0.1) pred.1D <- predict(emul.1D, 8) plot(emul.1D$t.vec, pred.1D$mean, xlab="Year", ylab="Sample Model Output at Theta*=8")
To produce a response surface plot of the emulator
rsurface.plot(emul, parind, parvals, tind, n1, n2, zlim = NULL)
rsurface.plot(emul, parind, parvals, tind, n1, n2, zlim = NULL)
emul |
A standard emulator object, as output, for example, by the 'emulator' function |
parind |
Vector of parameter indices (x and y) (columns of |
parvals |
A vector of parameter values to use for the rest of the
parameters (that are kept at constant values). Must be the same length as
the number of columns in |
tind |
Time index at which to predict |
n1 |
X direction grid size |
n2 |
Y direction grid size |
zlim |
Vector of z-limits for the filled.contour function. Default is the range of data plotted. |
Produces a response surface plot of the emulator as a function of
selected parameters, while the rest of the parameters are fixed at
their values set by parvals
. Use n1
and n2
to
specify X and Y grid size. Optionally, plot limits can be specified
through zlim
. The code relies on the filled.contour function.
None
Evaluation of this function (especially for large datasets and grid sizes) might take a long time. This is because in current specification, the 'predict.emul' function predicts the entire time-series (or space transect) at once.
R. Olson and W. Chang (2013): Mathematical framework for a separable
Gaussian Process Emulator. Tech. Rep., available from
www.scrimhub.org/resources/stilt/Olson_and_Chang_2013_Stilt_Emulator_Technical_Report.pdf.
# Plot the SICOPOLIS ice mass loss in year 2500 as a function of Snow # PDD Factor and Ice PDD Factor, at mid-range values of other parameters data(emul.Sicopolis) ## Not run: rsurface.plot(emul=emul.Sicopolis, parind=c(4,5), parvals=c(3, 10, 50, NA, NA), tind=661, n1=5, n2=5) ## End(Not run)
# Plot the SICOPOLIS ice mass loss in year 2500 as a function of Snow # PDD Factor and Ice PDD Factor, at mid-range values of other parameters data(emul.Sicopolis) ## Not run: rsurface.plot(emul=emul.Sicopolis, parind=c(4,5), parvals=c(3, 10, 50, NA, NA), tind=661, n1=5, n2=5) ## End(Not run)
Construct time and parameter covariance matrices. The time matrix follows standard AR(1) covariance. The parameter covariance is squared exponential, separable between the parameters, with an extra nugget term. While not directly related to other functions in this package, this function can be re-used if someone wants to build a new/different Gaussian Process emulator code.
sep.cov(Theta.mat, t.vec, rho, kappa, phi.vec, zeta)
sep.cov(Theta.mat, t.vec, rho, kappa, phi.vec, zeta)
Theta.mat |
Parameter matrix of the ensemble. [row, col] = [run number, parameter number]. See References for more information on this and other arguments. |
t.vec |
Evenly spaced vector of times for model output (can also be a coordinate vector for a regularly spaced spatial transect) |
rho |
Lag-1 time autocorrelation parameter |
kappa |
Parameter covariance scaling factor |
phi.vec |
Vector of range parameters. Must have the same length as
number of columns in |
zeta |
Nugget |
The time covariance is an n*n matrix (where n is the length of time
dimension). Its (j,k) element is specified by . The parameter covariance is a p*p matrix (where p
is the number of runs in the model ensemble). Specifically,
where
. Here
refers to parameter index.
For more details on the time and parameter covariance matrices produced by
this function, see References.
List with components
$Sigma.t.mat |
Time covariance matrix |
$Sigma.theta.mat |
Parameter covariance matrix |
R. Olson and W. Chang (2013): Mathematical framework for a separable
Gaussian Process Emulator. Tech. Rep., available from
www.scrimhub.org/resources/stilt/Olson_and_Chang_2013_Stilt_Emulator_Technical_Report.pdf.
# Construct time AR(1) and square exponential separable parameter # covariance matrix for a simple 1D parameter ensemble output example data(Data.1D.par) data(Data.1D.model) Theta.mat.1D <- t(Data.1D.par$par) t.vec.1D <- Data.1D.model$t # Use lag-1 time autocorrelation of 0.9, nugget and parameter covariance # scaling factor of 100, and range of 3. cov <- sep.cov(Theta.mat.1D, t.vec.1D, 0.9, 100, 3, 100) # Find the covariance between second parameter setting (Theta=1) and # ninth parameter setting (Theta=8) cov.2.9 <- cov$Sigma.theta.mat[2,9] cat("Covariance between Theta=1 and Theta=8 is:", cov.2.9, "\n") # Plot the time covariance matrix [for fun] # Note how covariance is high between similar years, but is low for markedly # different years. Produces a pretty plot. filled.contour(t.vec.1D, t.vec.1D, cov$Sigma.t.mat, xlab="Year", ylab="Year")
# Construct time AR(1) and square exponential separable parameter # covariance matrix for a simple 1D parameter ensemble output example data(Data.1D.par) data(Data.1D.model) Theta.mat.1D <- t(Data.1D.par$par) t.vec.1D <- Data.1D.model$t # Use lag-1 time autocorrelation of 0.9, nugget and parameter covariance # scaling factor of 100, and range of 3. cov <- sep.cov(Theta.mat.1D, t.vec.1D, 0.9, 100, 3, 100) # Find the covariance between second parameter setting (Theta=1) and # ninth parameter setting (Theta=8) cov.2.9 <- cov$Sigma.theta.mat[2,9] cat("Covariance between Theta=1 and Theta=8 is:", cov.2.9, "\n") # Plot the time covariance matrix [for fun] # Note how covariance is high between similar years, but is low for markedly # different years. Produces a pretty plot. filled.contour(t.vec.1D, t.vec.1D, cov$Sigma.t.mat, xlab="Year", ylab="Year")
To test an emulator using the leave-one-out cross validation analysis, whereby each ensemble run is first excluded from an emulator, and the emulator is then used to predict at the parameter settings of the excluded run. The process is repeated for all runs within the ensemble. The function also makes a plot of emulator predictions vs. actual model output, and a plot of normalized prediction errors for each excluded run.
test.all(final.emul, t.plot)
test.all(final.emul, t.plot)
final.emul |
Standard emulator object, that can be generated, for example, by 'emulator' function |
t.plot |
Time index of the emulator to analyze |
For a perfect emulator, emulator predictions vs. actual model output will fall exactly on a 1:1 line. If a withheld run is at the corner of parameter space, prediction gives a warning. Such runs are omitted from the plot.
# Cross-validate the 1D-parameter ensemble emulator at the first time index data(emul.1D) test.all(emul.1D, 1)
# Cross-validate the 1D-parameter ensemble emulator at the first time index data(emul.1D) test.all(emul.1D, 1)
To test an emulator using cross-validation. Any reasonable number of runs can be excluded from the emulator for testing. Then, the emulator is used to predict at the excluded parameter settings, and the results can be plotted. Optionally, prediction standard deviation can also be plotted. The program is reproducible if the seed number is specified.
test.csv(final.emul, num.test, plot.std, theseed = NULL, test.runind = NULL, make.plot = TRUE)
test.csv(final.emul, num.test, plot.std, theseed = NULL, test.runind = NULL, make.plot = TRUE)
final.emul |
A standard emulator object. It can be generated by, for example, the 'emulator' function. |
num.test |
Number of test runs to withhold at random from the emulator. Can not be more than half of the ensemble. |
plot.std |
If |
theseed |
Seed for random number generator. Default is |
test.runind |
If not |
make.plot |
If |
The function withholds model runs from the emulator, and then uses the
withheld emulator to predict at the excluded parameter settings. Any
reasonable number of runs up to 1/2 of the ensemble can be excluded. If
make.plot=TRUE
the plot of model output and emulator prediction
at the excluded setting is produced. Setting plot.std=TRUE
adds
the prediction standard deviation to the plot. If plot.std=TRUE
but make.plot=FALSE
a warning is given and nothing is plotted.
The default setting is to
exclude num.test
runs at random. If theseed
is specified
the results become reproducible as the seed passed on to the R random
number
generator is fixed. Optionally, the excluded run indices can be
specified explicitly via the vector test.runind
. If both
theseed
and test.runind
arguments are specified, then
theseed
is ignored.
The function prints out the excluded run indices and outputs some basic guidance. The function handles runs which are at parameter space boundaries by passing NA to the relevant rows of output matrices. If prediction at a particular withheld run gives error (most likely due to the fact that extrapolation is not allowed and the excluded run is at the corner of parameter space) that particular run is not plotted. Model output is beige, and emulator output is brown.
By 'withholding' runs from an emulator it is meant that the emulator
statistical parameters are kept the same, whereas the withheld runs are
eliminated from all relevant emulator components. The ensemble size final.emul$p
is reduced
accordingly.
List with components
$model.out.test |
Model output at test parameter settings [row,col] = [test run index, time index] |
$emul.out.test |
Emulator output at test parameter settings [row,col] = [test run index, time index] |
$emul.std.test |
Emulator prediction standard deviation at test parameter settings [row,col] = [test run index, time index] |
$coverage |
Empirical 95% coverage, expressed as a fraction |
The time vector for columns of all matrix components is
final.emul$t.vec
R. Olson and W. Chang (2013): Mathematical framework for a separable
Gaussian Process Emulator. Tech. Rep., available from
www.scrimhub.org/resources/stilt/Olson_and_Chang_2013_Stilt_Emulator_Technical_Report.pdf.
# Test the 1D emulator by withholding the second parameter # setting, and then predicting at the withheld run data(emul.1D) test.1D <- test.csv(final.emul=emul.1D, num.test=1, plot.std=FALSE, test.runind=2, make.plot=FALSE) # Create a custom plot plot.default(NA, xlim=range(emul.1D$t.vec), ylim=range(test.1D$model.out.test), xlab="Year", ylab="Sample Output", main="Emulator Test at Theta=1", cex.lab=1.3, cex.axis=1.3, cex.main=1.3) lines(emul.1D$t.vec, test.1D$model.out.test, lwd=4, lty=2, col="green") # The emulator prediction is close to perfect! Yay! lines(emul.1D$t.vec, test.1D$emul.out.test, lwd=2, col="yellow") # Produce a legend model.max <- max(test.1D$model.out.test) # We already know that time ranges from 0 to 10 so figuring out x placement # for the legend is not hard legend(1, model.max, c("Model Output", "Emulator Predictions"), col=c("green", "yellow"), lty=c(2,1), lwd=c(4,2), cex=1.3) # Test SICOPOLIS emulator at three pre-selected parameter settings # specified by theseed=1234321 data(emul.Sicopolis) # Plot the standard deviation in all three cases test.csv(final.emul=emul.Sicopolis, num.test=3, plot.std=TRUE, theseed=1234321, make.plot=TRUE)
# Test the 1D emulator by withholding the second parameter # setting, and then predicting at the withheld run data(emul.1D) test.1D <- test.csv(final.emul=emul.1D, num.test=1, plot.std=FALSE, test.runind=2, make.plot=FALSE) # Create a custom plot plot.default(NA, xlim=range(emul.1D$t.vec), ylim=range(test.1D$model.out.test), xlab="Year", ylab="Sample Output", main="Emulator Test at Theta=1", cex.lab=1.3, cex.axis=1.3, cex.main=1.3) lines(emul.1D$t.vec, test.1D$model.out.test, lwd=4, lty=2, col="green") # The emulator prediction is close to perfect! Yay! lines(emul.1D$t.vec, test.1D$emul.out.test, lwd=2, col="yellow") # Produce a legend model.max <- max(test.1D$model.out.test) # We already know that time ranges from 0 to 10 so figuring out x placement # for the legend is not hard legend(1, model.max, c("Model Output", "Emulator Predictions"), col=c("green", "yellow"), lty=c(2,1), lwd=c(4,2), cex=1.3) # Test SICOPOLIS emulator at three pre-selected parameter settings # specified by theseed=1234321 data(emul.Sicopolis) # Plot the standard deviation in all three cases test.csv(final.emul=emul.Sicopolis, num.test=3, plot.std=TRUE, theseed=1234321, make.plot=TRUE)